Intro to Python Programming (Spicy Delve)
Downloading Python
Download Python 3.1 from python.org. It's python.org
, not
python.com
. On the left sidebar, there will be a "Quick Links
(3.1.2)" section; if you're using Windows, you can just grab the
"Windows Installer" right there. If you're on Mac OS X, you're going
to want to hit the "Quick Links (3.1.2)" section header and find the
installer there. If you're on Linux, you probably already have Python,
or know how to install it (if not, talk to me). However you install
it, check off all optional packages: they're useful, so make sure you
install them. If you're on Linux, install not only the base
python-3.1
package (or whatever your distro calls it) but also the
python3-dev
and idle3
packages, if available. Make sure you have
version 3.something, not 2.something (they're different in superficial
ways only, but enough that we have to choose one or the other; the
same programs will not work for both).
After you do that, you should have a program called "IDLE". Run it. Before you is the Python prompt.
How Python Programming Works
To write programs in Python, you have to give Python commands that tell is what to do. Generally, this is done by typing the necessary commands into what is called the Python "prompt". This is a different mode of interaction with your computer than many of you are used to — there are no buttons to press or menus to pull down. That's alright — the text-based interface is that way because that is the most convenient and powerful way to use Python; you'll learn to love it. For now, keep in mind that minor misspellings, or placing parentheses incorrectly, or even adding spurious spaces to the beginning of the line, all will cause Python to spew out an error message ending in "SyntaxError". That's alright. Examine the line and the error message, figure out what you did wrong, and try again. Error messages are your friends; don't be afraid of them. (Trust me: when your program doesn't throw out error messages, but still functions incorrectly, it's much, much worse.)
Not only can you type commands directly into the Python prompt, but you can also write them in a file. This has the same effect as typing the same lines one by one into a prompt.
Let's start by typing directly into a prompt; but keep in mind that later, for larger programs, we'll want to transition into writing our programs into a file.
Very Basics
The simplest program you can write in Python is one line, and looks like this; type it directly into the Python prompt:
>>> print("Hello, World") Hello, World
What you did here is called the function print
on the argument
"Hello, World"
. The parentheses always go around the arguments of a
function (which are comma-separated if there is more than one); the
double quotes denote a string, a literal block of text. You can also
use single quotes for strings.
Python can do more than print fixed strings. It can do math. Try:
>>> 2 + 2 4 >>> 4 / (5 + 6**2) 0.0975609756097561
Here the double astrisk represents exponentiation (for historical
reasons, 6^2
does something else — try to figure it out if you
want).
Variables
So far, we have only been able to do large operations in one go. But
sometimes, we want to break things down into intermediate steps. For
example, here is an algorithm to find the square root of a number. To
find the square root of \(N\), start with a guess \(g\) (1
works; it
doesn't really matter that much how good your guess is, you'll just
get the answer faster if it's better). Then iteratively refine your
guess by replacing it over and over again with \(\frac12(\frac{N}{g} +
g)\). At some point the current guess is close enough and you can stop.
So let's try it. Let's find the square root of 17. I'm going to guess 4 as my square root, because the 17 is close to 16, whose square root is 4. We can `assign` this guess to a `variable`, which is basically giving a name to some quantity:
>>> N = 17
>>> guess = 4
>>> print(guess)
4
Now, we want to refine that guess. We can "reassign" the variable: make it point to something else:
>>> guess = .5 * (N / guess + guess)
>>> print(guess)
4.125
I can just repeat those last two lines over and over again until I get close enough to the correct answer (as in, until the answer stops changing much).
>>> guess = .5 * (N / guess + guess) >>> print(guess) 4.125 >>> guess = .5 * (N / guess + guess) >>> print(guess) 4.12310606061 >>> guess = .5 * (N / guess + guess) >>> print(guess) 4.12310562562 >>> guess = .5 * (N / guess + guess) >>> print(guess) 4.12310562562 >>> guess = .5 * (N / guess + guess) >>> print(guess) 4.12310562562 >>> guess = .5 * (N / guess + guess) >>> print(guess) 4.12310562562
So we have a pretty good reason to claim that the square root of 17 is
approximately 4.12310562562; we can check that by "importing" the
math
module. Modules are collections of functions. For example, if
you wrote a bunch of useful functions for interacting with, say,
facebook, you'd probably publish a module online for other people to
use that collected all of those functions. The math
module contains
a function, sqrt
, that we can use to get the square root of numbers:
>>> import math
>>> math.sqrt(17)
4.123105625617661
So we were approximately correct. Yay!
Tests and Loops
Often, we want to do the same thing over and over again. For example, we just did the same two lines over and over again by copying and pasting them. There's got to be a better way.
>>> guess = 4 >>> for i in range(10): ... guess = .5 * (N / guess + guess) ... print(guess) ... 4.125 4.12310606061 4.12310562562 4.12310562562 4.12310562562 4.12310562562 4.12310562562 4.12310562562 4.12310562562 4.12310562562
That for i in range(10)
bit makes the variable i
take on the
values in the list range(10)
one by one. Here, we don't actually
use the variable i
at all; we just want the fact that there are ten
items in the list range(10)
. But we could use it:
>>> for i in range(10): ... print(i) 0 1 2 3 4 5 6 7 8 9
So the list range(10)
contains all of the numbers between zero and
nine. Why not between one and ten? Actually, this choice is very
often convenient. I'm not going to give an explanation, I'm just
going to say, "Deal with it".
We can verify that these are the elements of range(10)
just by
printing the list:
>>> print(range(10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
That is Python's syntax for lists: brackets on the outside and commas between elements. You can, in fact, do this yourself:
>>> tasks = ["18.701 pset", "18.100B pset", "21H.001 Essay"] >>> print(tasks[0]) 18.701 pset >>> print(tasks[1]) 18.100B pset >>> del tasks[2] >>> print(tasks) ['18.701 pset', '18.100B pset'] >>> tasks.append("Prepare Spicy Delve class") >>> print(tasks) ['18.701 pset', '18.100B pset', 'Prepare Spicy Delve class']
So a list maintains the order of its elements, and you can get the nth
element with l[n-1]
. As with the range()
, we start counting from
zero; and again, this turns out to be a good decision, but for now
just deal with it.
There's another way we could do something multiple times. Suppose
that instead of improving our guess just ten time, we want to improve
it until our error is less than some threshold. In other words, we
want the current guess to be no more that, say, .000001 away from the
previous guess. We could use, instead of a for
loop, a while
loop:
>>> N = 9001 >>> current_guess = 4 >>> previous_guess = 0 >>> while abs(current_guess - previous_guess) > .000001: ... previous_guess = current_guess ... current_guess = .5 * (N / current_guess + current_guess) ... print(current_guess) ... 1127.125 567.555402296 291.707323493 161.281796858 108.54547345 95.7346223013 94.8774720707 94.8736002004 94.8736001214
Note how the loop stops on its own as soon as our condition (that the
difference between guesses is greater than one one-millionth) becomes
false. Also, note how we have to constantly maintain the previous
guess by updating it with old values of current_guess
.
These are the two most important uses of looping.
User Input
You can ask the user questions with the input
function. You pass
the input
function the question to prompt the user with, and it
returns a string with the user's answer.
>>> age = input("What is your age? ") What is your age? 18 >>> print(age) 18
A Few Useful Modules
Python comes with many, many useful modules that do all sorts of wonderful things. Generally, a Google search for "python download file" or "python pause" will yield good results. two modules we're going to need very soon are:
- The
time
module provides a useful function calledsleep
which pauses the execution of your program - The
urllib.request
module provides a function calledurlopen
that can be used to read from a URL.
Everyone uses Twitter nowadays. Well, I don't, but at least most of you understand the concept. Let's write a script to fetch tweets in real time from a certain user.
The first thing we'll need is a twitter user to follow. I set up the
thisisfortestin
user for this class. The password is mYpassword
,
don't put anything stupid on it. Goodness, people these days.
Now, to access a certain user's Twitter page, it turns out that you
can go to the URL http://www.twitter.com/[username]
. So, let's try
that.
>>> USERNAME = "thisisfortestin" >>> import urllib.request >>> page = urllib.request.urlopen("http://www.twitter.com/" + USERNAME) >>> print(page) <http.client.HTTPResponse object at 0x8bdd42c>
Huh? Well, the urlopen
module returned some "object". Objects
represent data, in this case a web page. You could get the HTML source
code of the page by using the read
method:
>>> src = str(page.read()) >>> print(src) '<!DOCTYPE html [...] </body>\n\n</html>\n'
What's that \n
stuff? That's how we represent a newline
character. For example, you could do:
>>> print("a\nb") a b
I should note that you can only do read
once, so make sure to store
the result.
Now, let's open up that source code in a browser. If you look, you'll see that each post is surrounded by a
<span class="entry-content">
and a </span>
. So we can get each of the entries that way. We can
find the start and end of the first entry by using the find
method
that strings have: you give it a substring and it tells you where in
the string that substring occurs:
>>> abcs = "abcdefghijklmnopqrstuvwxyz" >>> print(abcs.find("a")) 0 >>> print(abcs.find("m")) 12 >>> print(abcs.find("4")) -1
Note that we get -1
as the index if we can't find the substring. So,
we can use this to find our tweets:
>>> start = src.find('<span class="entry-content">') + len('<span class="entry-content">') >>> end = src.find('</span>', start)
We add the length of the <span>
tag because we don't actually want
that part of the source, just what's between the start and end. Also,
see how we pass a second argument to src.find
? That tells us where
to start looking. We know the end must be after the start, so we tell
src.find
to start looking after the start.
Now we can print this tweet (the first on the page):
>>> print(src[start:end]) This is more testing of real-time updates before I actually teach this shindig, to make sure everything works...
Now, what we really want is to get all of the tweets. We can store them in a list, and then keep looping as long as we keep finding new tweets:
>>> tweets = [] >>> start = src.find('<span class="entry-content">') + len('<span class="entry-content">') >>> end = src.find('</span>', start) >>> while start != -1: ... tweets.append(src[start:end]) ... start = src.find('<span class="entry-content">', end) + len('<span class="entry-content">') ... end = src.find('</span>', start) ... >>> print(len(tweets)) 4
That 4
is correct, because my account only has four tweets. Now,
our overall goal is to have the program print new tweets as they come
in, so we should really store the tweets in chronological order (not
newest-first, as they are on the page). Thus, we have to reverse the
list of tweets.
>>> tweets = list(reversed(tweets))
Functions
We have this mechanism to get the tweets of any user. Before we go
further, let's "encapsulate" this series of steps into one function,
so that instead of writing out all of those steps, we can just write
get_tweets('[USERNAME]')
. We can do this by defining a
function. Defining a function is easy: you give a name, a list of
arguments, and then what you do, and you're done:
>>> def f(x, y): ... return x + y >>> print(f(3, 4)) 7
That return
part makes the function f
that we're defining actually
return some value; if we don't include one, our function doesn't
return anything at all. So, we could take the code we wrote before to
get tweets and put it into a function. Let's write this code into its
own file. I'm going to call that file "twitter.py":
def get_tweets(user): import urllib.request src = str(urlopen("http://www.twitter.com/" + user).read()) tweets = [] start = src.find('<span class="entry-content">') + len('<span class="entry-content">') end = src.find('</span>', start) while start != -1: tweets.append(src[start:end]) start = src.find('<span class="entry-content">', end) + len('<span class="entry-content">') end = src.find('</span>', start) return list(reversed(tweets))
Cool! Now we can get the tweets from any user easily: we just import our newly created file as a module:
>>> import twitter >>> tweets = twitter.get_tweets('thisisfortestin') >>> print(tweets[2]) Stuff's happening... now I can give real-time updates!
Again, since we reversed the list, that is the third-oldest entry (start counting from 0!) and thus the second youngest, so it's the second on the twitter page.
Now, how do we print all of the tweets? Well, we can just loop over the list:
>>> for tweet in tweets: ... print(" * " + tweet) ... print() * First post is for testing that I can properly read from Twitter * Adding more posts so that I can test my twitter thing better. * Stuff's happening... now I can give real-time updates! * This is more testing of real-time updates before I actually teach this shindig, to make sure everything works...
In this case, I'm adding bullets and spaces for extra prettiness.
Now, I'm going to put this into our "twitter.py" file, but I'm going
to put it inside an if
statement; in fact, inside a strange
one. Don't ask what it does, it's a bit complicated, but basically, it
makes sure that something only runs when I run the program, not when I
(say) import the file as a module:
if __name__ == "__main__": USERNAME = input("Which twitter user? ") print() tweets = get_tweets(USERNAME) for tweet in tweets: print(" * " + tweet) print()
Finally, how do we check get new tweets? Well, we can just set up a
loop that goes forever (while True
; it loops forever because the
condition, True
is always true). That loop will simply get the new
list of tweets, and if any of those aren't in the old list, print them
and add them to the list of tweets we've seen. We don't want to
inundate twitter with requests, or use too much of our computer's
power, so let's do this check every ten seconds. Add the following
code inside the if
statement in the twitter.py
file:
if __name__ == "__main__": ... import time while True: newtweets = get_tweets(USERNAME) for tweet in newtweets: if tweet not in tweets: # Remember, this is the list of tweets we've already seen tweets.append(tweet) print(" * " + tweet) print() time.sleep(10)
If you run this file, you will have a working Twitter client! Try it!
How do you run a file? Well, in IDLE, you just press F5; or, in Mac
and Windows, you can double click the file. In Linux, the command
python3 [file]
works.
Cleaning Tweets
Sometimes, tweets will have HTML in them: links and such, for example. We want to filter that out. Basically, we want to remove anything between two angle brackets. Left as exercise for reader.