Pavel Panchekha

By

Share under CC-BY-SA.

Intro to Python Programming (Spicy Delve)

Downloading Python

Download Python 3.1 from python.org. It's python.org, not python.com. On the left sidebar, there will be a "Quick Links (3.1.2)" section; if you're using Windows, you can just grab the "Windows Installer" right there. If you're on Mac OS X, you're going to want to hit the "Quick Links (3.1.2)" section header and find the installer there. If you're on Linux, you probably already have Python, or know how to install it (if not, talk to me). However you install it, check off all optional packages: they're useful, so make sure you install them. If you're on Linux, install not only the base python-3.1 package (or whatever your distro calls it) but also the python3-dev and idle3 packages, if available. Make sure you have version 3.something, not 2.something (they're different in superficial ways only, but enough that we have to choose one or the other; the same programs will not work for both).

After you do that, you should have a program called "IDLE". Run it. Before you is the Python prompt.

How Python Programming Works

To write programs in Python, you have to give Python commands that tell is what to do. Generally, this is done by typing the necessary commands into what is called the Python "prompt". This is a different mode of interaction with your computer than many of you are used to — there are no buttons to press or menus to pull down. That's alright — the text-based interface is that way because that is the most convenient and powerful way to use Python; you'll learn to love it. For now, keep in mind that minor misspellings, or placing parentheses incorrectly, or even adding spurious spaces to the beginning of the line, all will cause Python to spew out an error message ending in "SyntaxError". That's alright. Examine the line and the error message, figure out what you did wrong, and try again. Error messages are your friends; don't be afraid of them. (Trust me: when your program doesn't throw out error messages, but still functions incorrectly, it's much, much worse.)

Not only can you type commands directly into the Python prompt, but you can also write them in a file. This has the same effect as typing the same lines one by one into a prompt.

Let's start by typing directly into a prompt; but keep in mind that later, for larger programs, we'll want to transition into writing our programs into a file.

Very Basics

The simplest program you can write in Python is one line, and looks like this; type it directly into the Python prompt:

>>> print("Hello, World")
Hello, World

What you did here is called the function print on the argument "Hello, World". The parentheses always go around the arguments of a function (which are comma-separated if there is more than one); the double quotes denote a string, a literal block of text. You can also use single quotes for strings.

Python can do more than print fixed strings. It can do math. Try:

>>> 2 + 2
4
>>> 4 / (5 + 6**2)
0.0975609756097561

Here the double astrisk represents exponentiation (for historical reasons, 6^2 does something else — try to figure it out if you want).

Variables

So far, we have only been able to do large operations in one go. But sometimes, we want to break things down into intermediate steps. For example, here is an algorithm to find the square root of a number. To find the square root of \(N\), start with a guess \(g\) (1 works; it doesn't really matter that much how good your guess is, you'll just get the answer faster if it's better). Then iteratively refine your guess by replacing it over and over again with \(\frac12(\frac{N}{g} + g)\). At some point the current guess is close enough and you can stop.

So let's try it. Let's find the square root of 17. I'm going to guess 4 as my square root, because the 17 is close to 16, whose square root is 4. We can `assign` this guess to a `variable`, which is basically giving a name to some quantity:

>>> N = 17
>>> guess = 4
>>> print(guess)
4

Now, we want to refine that guess. We can "reassign" the variable: make it point to something else:

>>> guess = .5 * (N / guess + guess)
>>> print(guess)
4.125

I can just repeat those last two lines over and over again until I get close enough to the correct answer (as in, until the answer stops changing much).

>>> guess = .5 * (N / guess + guess)
>>> print(guess)
4.125
>>> guess = .5 * (N / guess + guess)
>>> print(guess)
4.12310606061
>>> guess = .5 * (N / guess + guess)
>>> print(guess)
4.12310562562
>>> guess = .5 * (N / guess + guess)
>>> print(guess)
4.12310562562
>>> guess = .5 * (N / guess + guess)
>>> print(guess)
4.12310562562
>>> guess = .5 * (N / guess + guess)
>>> print(guess)
4.12310562562

So we have a pretty good reason to claim that the square root of 17 is approximately 4.12310562562; we can check that by "importing" the math module. Modules are collections of functions. For example, if you wrote a bunch of useful functions for interacting with, say, facebook, you'd probably publish a module online for other people to use that collected all of those functions. The math module contains a function, sqrt, that we can use to get the square root of numbers:

>>> import math
>>> math.sqrt(17)
4.123105625617661

So we were approximately correct. Yay!

Tests and Loops

Often, we want to do the same thing over and over again. For example, we just did the same two lines over and over again by copying and pasting them. There's got to be a better way.

>>> guess = 4
>>> for i in range(10):
...     guess = .5 * (N / guess + guess)
...     print(guess)
...
4.125
4.12310606061
4.12310562562
4.12310562562
4.12310562562
4.12310562562
4.12310562562
4.12310562562
4.12310562562
4.12310562562

That for i in range(10) bit makes the variable i take on the values in the list range(10) one by one. Here, we don't actually use the variable i at all; we just want the fact that there are ten items in the list range(10). But we could use it:

>>> for i in range(10):
...     print(i)
0
1
2
3
4
5
6
7
8
9

So the list range(10) contains all of the numbers between zero and nine. Why not between one and ten? Actually, this choice is very often convenient. I'm not going to give an explanation, I'm just going to say, "Deal with it".

We can verify that these are the elements of range(10) just by printing the list:

>>> print(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

That is Python's syntax for lists: brackets on the outside and commas between elements. You can, in fact, do this yourself:

>>> tasks = ["18.701 pset", "18.100B pset", "21H.001 Essay"]
>>> print(tasks[0])
18.701 pset
>>> print(tasks[1])
18.100B pset
>>> del tasks[2]
>>> print(tasks)
['18.701 pset', '18.100B pset']
>>> tasks.append("Prepare Spicy Delve class")
>>> print(tasks)
['18.701 pset', '18.100B pset', 'Prepare Spicy Delve class']

So a list maintains the order of its elements, and you can get the nth element with l[n-1]. As with the range(), we start counting from zero; and again, this turns out to be a good decision, but for now just deal with it.

There's another way we could do something multiple times. Suppose that instead of improving our guess just ten time, we want to improve it until our error is less than some threshold. In other words, we want the current guess to be no more that, say, .000001 away from the previous guess. We could use, instead of a for loop, a while loop:

>>> N = 9001
>>> current_guess = 4
>>> previous_guess = 0
>>> while abs(current_guess - previous_guess) > .000001:
...     previous_guess = current_guess
...     current_guess = .5 * (N / current_guess + current_guess)
...     print(current_guess)
...
1127.125
567.555402296
291.707323493
161.281796858
108.54547345
95.7346223013
94.8774720707
94.8736002004
94.8736001214

Note how the loop stops on its own as soon as our condition (that the difference between guesses is greater than one one-millionth) becomes false. Also, note how we have to constantly maintain the previous guess by updating it with old values of current_guess.

These are the two most important uses of looping.

User Input

You can ask the user questions with the input function. You pass the input function the question to prompt the user with, and it returns a string with the user's answer.

>>> age = input("What is your age? ")
What is your age? 18
>>> print(age)
18

A Few Useful Modules

Python comes with many, many useful modules that do all sorts of wonderful things. Generally, a Google search for "python download file" or "python pause" will yield good results. two modules we're going to need very soon are:

  • The time module provides a useful function called sleep which pauses the execution of your program
  • The urllib.request module provides a function called urlopen that can be used to read from a URL.

Twitter

Everyone uses Twitter nowadays. Well, I don't, but at least most of you understand the concept. Let's write a script to fetch tweets in real time from a certain user.

The first thing we'll need is a twitter user to follow. I set up the thisisfortestin user for this class. The password is mYpassword, don't put anything stupid on it. Goodness, people these days.

Now, to access a certain user's Twitter page, it turns out that you can go to the URL http://www.twitter.com/[username]. So, let's try that.

>>> USERNAME = "thisisfortestin"
>>> import urllib.request
>>> page = urllib.request.urlopen("http://www.twitter.com/" + USERNAME)
>>> print(page)
<http.client.HTTPResponse object at 0x8bdd42c>

Huh? Well, the urlopen module returned some "object". Objects represent data, in this case a web page. You could get the HTML source code of the page by using the read method:

>>> src = str(page.read())
>>> print(src)
'<!DOCTYPE html [...] </body>\n\n</html>\n'

What's that \n stuff? That's how we represent a newline character. For example, you could do:

>>> print("a\nb")
a
b

I should note that you can only do read once, so make sure to store the result.

Now, let's open up that source code in a browser. If you look, you'll see that each post is surrounded by a

<span class="entry-content">

and a </span>. So we can get each of the entries that way. We can find the start and end of the first entry by using the find method that strings have: you give it a substring and it tells you where in the string that substring occurs:

>>> abcs = "abcdefghijklmnopqrstuvwxyz"
>>> print(abcs.find("a"))
0
>>> print(abcs.find("m"))
12
>>> print(abcs.find("4"))
-1

Note that we get -1 as the index if we can't find the substring. So, we can use this to find our tweets:

>>> start = src.find('<span class="entry-content">') + len('<span class="entry-content">')
>>> end = src.find('</span>', start)

We add the length of the <span> tag because we don't actually want that part of the source, just what's between the start and end. Also, see how we pass a second argument to src.find? That tells us where to start looking. We know the end must be after the start, so we tell src.find to start looking after the start.

Now we can print this tweet (the first on the page):

>>> print(src[start:end])
This is more testing of real-time updates before I actually teach this shindig, to make sure everything works... 

Now, what we really want is to get all of the tweets. We can store them in a list, and then keep looping as long as we keep finding new tweets:

>>> tweets = []
>>> start = src.find('<span class="entry-content">') + len('<span class="entry-content">')
>>> end = src.find('</span>', start)
>>> while start != -1:
...    tweets.append(src[start:end])
...    start = src.find('<span class="entry-content">', end) + len('<span class="entry-content">')
...    end = src.find('</span>', start)
...
>>> print(len(tweets))
4

That 4 is correct, because my account only has four tweets. Now, our overall goal is to have the program print new tweets as they come in, so we should really store the tweets in chronological order (not newest-first, as they are on the page). Thus, we have to reverse the list of tweets.

>>> tweets = list(reversed(tweets))

Functions

We have this mechanism to get the tweets of any user. Before we go further, let's "encapsulate" this series of steps into one function, so that instead of writing out all of those steps, we can just write get_tweets('[USERNAME]'). We can do this by defining a function. Defining a function is easy: you give a name, a list of arguments, and then what you do, and you're done:

>>> def f(x, y):
...     return x + y
>>> print(f(3, 4))
7

That return part makes the function f that we're defining actually return some value; if we don't include one, our function doesn't return anything at all. So, we could take the code we wrote before to get tweets and put it into a function. Let's write this code into its own file. I'm going to call that file "twitter.py":

def get_tweets(user):
    import urllib.request
    src = str(urlopen("http://www.twitter.com/" + user).read())

    tweets = []
    start = src.find('<span class="entry-content">') + len('<span class="entry-content">')
    end = src.find('</span>', start)
    while start != -1:
        tweets.append(src[start:end])
        start = src.find('<span class="entry-content">', end) + len('<span class="entry-content">')
        end = src.find('</span>', start)

    return list(reversed(tweets))

Cool! Now we can get the tweets from any user easily: we just import our newly created file as a module:

>>> import twitter
>>> tweets = twitter.get_tweets('thisisfortestin')
>>> print(tweets[2])
Stuff's happening... now I can give real-time updates!

Again, since we reversed the list, that is the third-oldest entry (start counting from 0!) and thus the second youngest, so it's the second on the twitter page.

Now, how do we print all of the tweets? Well, we can just loop over the list:

>>> for tweet in tweets:
...     print(" * " + tweet)
...     print()
 * First post is for testing that I can properly read from Twitter

 * Adding more posts so that I can test my twitter thing better.

 * Stuff's happening... now I can give real-time updates!

 * This is more testing of real-time updates before I actually teach this shindig, to make sure everything works...

In this case, I'm adding bullets and spaces for extra prettiness. Now, I'm going to put this into our "twitter.py" file, but I'm going to put it inside an if statement; in fact, inside a strange one. Don't ask what it does, it's a bit complicated, but basically, it makes sure that something only runs when I run the program, not when I (say) import the file as a module:

if __name__ == "__main__":
    USERNAME = input("Which twitter user? ")
    print()
    tweets = get_tweets(USERNAME)
    for tweet in tweets:
        print(" * " + tweet)
        print()

Finally, how do we check get new tweets? Well, we can just set up a loop that goes forever (while True; it loops forever because the condition, True is always true). That loop will simply get the new list of tweets, and if any of those aren't in the old list, print them and add them to the list of tweets we've seen. We don't want to inundate twitter with requests, or use too much of our computer's power, so let's do this check every ten seconds. Add the following code inside the if statement in the twitter.py file:

if __name__ == "__main__":
    ...

    import time
    while True:
        newtweets = get_tweets(USERNAME)
        for tweet in newtweets:
            if tweet not in tweets: # Remember, this is the list of tweets we've already seen
                tweets.append(tweet)
                print(" * " + tweet)
                print()
        time.sleep(10)

If you run this file, you will have a working Twitter client! Try it! How do you run a file? Well, in IDLE, you just press F5; or, in Mac and Windows, you can double click the file. In Linux, the command python3 [file] works.

Cleaning Tweets

Sometimes, tweets will have HTML in them: links and such, for example. We want to filter that out. Basically, we want to remove anything between two angle brackets. Left as exercise for reader.