Pavel Panchekha

By

Share under CC-BY-SA.

Programming with Types

Static types have a bit of a bad name among many programmers, mostly due to the high (conceptual) overhead of Java and C++. These days, we have better, and type systems let us program without fearing that we're going to accidentally break something.

A lot of the ideas of this page come from "What to Know Before Debating Type Systems", but I've tried to repackage it with examples and explanation more easily accessible to the average programmer.

Static Typing

Static typing just means that your compiler knows something about what will happen in your program before it happens. For example, in the snippet below, the compiler knows that it's going to print an integer.

def x <- read/int
def y <- read/int
print <- + x y

It does this by attaching tags to your program. For historical reasons, the tags are called "types"; here, the compiler knew that read/int produces something tagged <int>, and that given two things tagged <int>, + produces something tagged <int>. I'm going to write types with < > around them, to make them easy to distinguish; we'll say that they type of something is <x> if it is tagged <x>.

The goal of typing is to attach enough tags to figure out useful things about the program.

Typing a Function

Functions should have types, because there's data we might want to record about them. And as we know, functions are values. What would a function's type be? Well, let's take the function add.

def/fn add [x y] <-
    + x y

It takes two numbers and produces a number. So how do we record that? We might say that it takes two numbers and returns a number. We usually write that as <fn> [<num> <num>] [<num>] — the first pair of parentheses is what the function takes, the second is what the function returns. I put a big <fn> in front to remind you that it's a function.

Polymorphism

What about the following function?

def/fn just [x] <-
    return x

It certainly takes something and returns something, so you might be tempted to type it <fn> [<*>] [<*>] (I'll call the generic type <*>). But that's not quite right, because it doesn't capture all of our knowledge. Our function doesn't just take something and return an arbitrary something else; it returns the same thing. So I'm going to write that type as <fn> [,a] [,a]:; it takes some type, which I'll call a, and return the same type (the comma denotes "placeholder"). This way we can talk about

def/fn swap [x y] <-
    return y x

as a function <fn> [,a ,b] [,b ,a] — we don't lose the information about a and b.

What about the function reverse, which reverses a list? This function clearly takes a list, and returns a list, so you might be tempted to call it <fn> [<list>] [<list>]. But, as above, this loses some information: what is in the list? So, in fact, we call this type <fn> [(<list> ,a)] [(<list> ,a)]. This way <list> is a sort of "type factory" — given a type, it produces the type of lists of that type of object. We write it like that, too — like the application of the function <list> to the value of a.

Why Types?

Now that we have the basics out of the way, let's see how we can program with types. The most basic idea of using types is that you can ask the compiler to test aspects of your program for you. For example, suppose you run a large website that allows user-submitted content. Like any self-respecting web developer, you're worried about cross-site scripting; how can you make sure you never insert user input into your web page unescaped?

You could, of course, just carefully escape the user input in all situations. But even really smart people have gotten this wrong in the past. Time to ask the compiler to do our work for us.

What we're going to do is create a new type — call it SafeString. We can then modify our template system to ensure that it always receives SafeStrings. For example, we might define the type <safe-string> to contain a single string, and have a single function, unsafe-extract, to "unsafify" the string. Hopefully, it's obvious from the name that using the method is not recommended (and at least it's easy to grep for).

def/adt <safe-string> [(safe <str>)]

def/fn unsafe-extract [str] <-
    match str <-
      (safe ,s) <- s

Our type checker can derive the type of unsafe-extract to be <fn> (<safe-string>) (<str>).

Now, our template system can ensure that it requires <safe-string> inputs. Suppose we write the following. (In case you're not following the syntax, the <= just means that it extracts all of the variables one at a time)

def/fn render/template [tmpl variables] <-
    let [unsafe-variables <- unsafe-extract <= variables] <-
    ; do something with unsafe-variables

The type system sees that we call unsafe-extract across variables, and thus variables must contain <safe-string> objects.

So now, our template will only allow us to use strings that we've explicitly marked safe. So we can write some escaping functions, such as this one:

def/fn escape/html [s] <-
    let [escaped <- escape-char/html <= s] <-
    safe escaped

escape-char/html presumably escapes characters so that they're safe for inclusion in HTML text (it's an annoyingly hard problem, so I'll save you the implementation).

escape/html has type <fn> [<str>] [<safe-string>], so we know that after passing through it, the string is "safe" for inclusion in a template. Now, the call

let [user-data [
    'username ~ "<script src='http://example.com/evil'></script>"
]] <-

render/template my-site user-data

will produce a type error, as the username is not a <safe-string>. This clues us in that we have to escape/html that string, saving us from an XSS vulnerability!

Return Types

Some clever Python programmers are reading the above and thinking "eh, we can do the same (granted at runtime)". But we've not yet discussed one of the most powerful aspects of types: that we can change what we do based on what the caller expects as output, not just what it gives us as input. A striking example is the following example of list manipulation1 [1 I got this example from http://perl.plover.com/yak/typing/notes.html.].

Suppose we want to write a merge sort implementation. We start off by writing the splitting function; I'm using pattern matching here to make things shorter.

def/fns split <-
    [[]]  <- return [] []
    [[x]] <- return [x] []
    [[fst snd & rest]] <-
        let (s1 s2 <- split rest) <-
        return [fst & s1] [snd & s2]

Alright, this is a perfectly standard even-odd split function. What is its type? It must be <fn> [(<list> ,a)] [(<list> ,a) (<list> ,a)], since it takes and produces lists, and since it does not care about what is in that list.

Now, we write the merge function. If either list is empty, merging is easy; otherwise, we test which element should be at the front and recurse as needed. Again, with pattern matching:

def/fns merge <-
    [[] x] <- return x
    [x []] <- return x
    [[fst1 & rest1] [fst2 & rest2]] <-
        if (< fst1 fst2) <-
            return [fst1 & merge rest1 [fst2 & rest2]]
            return [fst2 & merge [fst1 & rest1] rest2]

Finally, a recursive sort function.

def/fns sort <-
   [[]] <- return []
   [x] <-
       let (even odd <- split x) <-
       merge (sort even) (sort odd)

What should the type be? We expect it to be <fn> [(<list> <num>)] [(<list> <num>)], since it merges two lists into one, and since the < function works only on numbers. But when we pass it through the type checker, we instead get the type <fn> [(<list> ,a)] [(<list> <num>)]. Huh?! How can this function possibly work on any lists of anything, if we explicitly call <?

The only solution must be that we do not call <. And if so, that must mean that we never reach the merge function (since it calls < immediately), and that must mean that we hit an infinite loop inside split and sort. Huh… why would that be? Well, if we try to split the list [x], we get [x] []. Sorting the first list require us to again split it, giving us mutual recursion between split and sort! The fix is to special-case one-element lists as well, giving us the following sort routine:

def/fns sort <-
   [[]] <- return []
   [[x]] <- return [x]
   [x] <-
       let (even odd <- split x) <-
       merge (sort even) (sort odd)

So in this case, the type system caught our infinite loop, something a dynamic language could never do, since it can never "see" the infinity of the loop looming ahead — typing, by examining return types, can. In this way, powerful use of the type system can save you headaches by finding rather non-trivial bugs in programs.

Type Systems and Invariants

Another great example of using type systems to catch bugs is in the implementation of red-black trees. In a red-black tree, each tree node is either red or black, and the rule is that a red node can only have black nodes as children. Combined with the invariant that all paths to leaf nodes have the same number of black nodes, we ensure that the tree is relatively balanced.

How can a type system make programming easier for us? We can have the type system tell us if we failed to enforce the red node, black children constraint. We'll create two types, <red-node> and <black-node> and a general <node> type that both are special cases of.

def/adt (<red-node> ,a) [(red-node (<black-node> ,a) (<black-node> ,a))]
def/adt (<black-node> ,a) [
    (black-node (<node> ,a) (<node> ,a))
    (leaf ,a)
]

def/adt <node> [<red-node> <black-node>]

It is now impossible to construct a red-black tree in which a red node has a red child. This means that an invalid implementation of red-black tree methods will fail to type-check, saving hours of debugging (if you've ever tried to write a red-black tree implementation, you know what I mean).

Using the Type System

Type systems are there to enforce rules. In the same way as alarm clocks, really — waking up on time is hard, sometimes we're not fully conscious when that duty comes around, and sometimes we just forget. An alarm clock does our work for us by "remembering" to wake us. Type systems, too, can remind us to do things we'd otherwise forget: escape user input, add special cases to your input, and so on.

People have taught type systems about database normal forms, user authorization, and array bounds. People have also taught type systems to recognize broken links, invalid CSS properties, and more. All of these things make it easier to write bug-free programs by preventing subtle bugs from wasting hours of debugging. The best way of thinking about types isn't that they make a program safer — it's that they save you from remembering to keep a program safe.

Footnotes: