Programming with Types
Static types have a bit of a bad name among many programmers, mostly due to the high (conceptual) overhead of Java and C++. These days, we have better, and type systems let us program without fearing that we're going to accidentally break something.
A lot of the ideas of this page come from "What to Know Before Debating Type Systems", but I've tried to repackage it with examples and explanation more easily accessible to the average programmer.
Static Typing
Static typing just means that your compiler knows something about what will happen in your program before it happens. For example, in the snippet below, the compiler knows that it's going to print an integer.
def x <- read/int def y <- read/int print <- + x y
It does this by attaching tags to your program. For historical
reasons, the tags are called "types"; here, the compiler knew that
read/int
produces something tagged <int>
, and that given two
things tagged <int>
, +
produces something tagged <int>
. I'm
going to write types with <
>
around them, to make them easy to
distinguish; we'll say that they type of something is <x>
if it is
tagged <x>
.
The goal of typing is to attach enough tags to figure out useful things about the program.
Typing a Function
Functions should have types, because there's data we might want to
record about them. And as we know, functions are values. What would
a function's type be? Well, let's take the function add
.
def/fn add [x y] <- + x y
It takes two numbers and produces a number. So how do we record that?
We might say that it takes two numbers and returns a number. We
usually write that as <fn> [<num> <num>] [<num>]
— the first pair
of parentheses is what the function takes, the second is what the
function returns. I put a big <fn>
in front to remind you that it's
a function.
Polymorphism
What about the following function?
def/fn just [x] <- return x
It certainly takes something and returns something, so you might be
tempted to type it <fn> [<*>] [<*>]
(I'll call the generic type
<*>
). But that's not quite right, because it doesn't capture all of
our knowledge. Our function doesn't just take something and return an
arbitrary something else; it returns the same thing. So I'm going to
write that type as <fn> [,a] [,a]
:; it takes some type, which I'll
call a
, and return the same type (the comma denotes "placeholder").
This way we can talk about
def/fn swap [x y] <- return y x
as a function <fn> [,a ,b] [,b ,a]
— we don't lose the information
about a
and b
.
What about the function reverse
, which reverses a list? This
function clearly takes a list, and returns a list, so you might be
tempted to call it <fn> [<list>] [<list>]
. But, as above, this
loses some information: what is in the list? So, in fact, we call
this type <fn> [(<list> ,a)] [(<list> ,a)]
. This way <list>
is a
sort of "type factory" — given a type, it produces the type of lists
of that type of object. We write it like that, too — like the
application of the function <list>
to the value of a
.
Why Types?
Now that we have the basics out of the way, let's see how we can program with types. The most basic idea of using types is that you can ask the compiler to test aspects of your program for you. For example, suppose you run a large website that allows user-submitted content. Like any self-respecting web developer, you're worried about cross-site scripting; how can you make sure you never insert user input into your web page unescaped?
You could, of course, just carefully escape the user input in all situations. But even really smart people have gotten this wrong in the past. Time to ask the compiler to do our work for us.
What we're going to do is create a new type — call it SafeString
.
We can then modify our template system to ensure that it always
receives SafeStrings
. For example, we might define the type
<safe-string>
to contain a single string, and have a single
function, unsafe-extract
, to "unsafify" the string. Hopefully, it's
obvious from the name that using the method is not recommended (and at
least it's easy to grep
for).
def/adt <safe-string> [(safe <str>)] def/fn unsafe-extract [str] <- match str <- (safe ,s) <- s
Our type checker can derive the type of unsafe-extract
to be <fn>
(<safe-string>) (<str>)
.
Now, our template system can ensure that it requires <safe-string>
inputs. Suppose we write the following. (In case you're not following
the syntax, the <=
just means that it extracts all of the variables
one at a time)
def/fn render/template [tmpl variables] <- let [unsafe-variables <- unsafe-extract <= variables] <- ; do something with unsafe-variables
The type system sees that we call unsafe-extract
across variables
,
and thus variables
must contain <safe-string>
objects.
So now, our template will only allow us to use strings that we've explicitly marked safe. So we can write some escaping functions, such as this one:
def/fn escape/html [s] <- let [escaped <- escape-char/html <= s] <- safe escaped
escape-char/html
presumably escapes characters so that they're safe
for inclusion in HTML text (it's an annoyingly hard problem, so I'll
save you the implementation).
escape/html
has type <fn> [<str>] [<safe-string>]
, so we know that
after passing through it, the string is "safe" for inclusion in a
template. Now, the call
let [user-data [ 'username ~ "<script src='http://example.com/evil'></script>" ]] <- render/template my-site user-data
will produce a type error, as the username
is not a <safe-string>
.
This clues us in that we have to escape/html
that string, saving us
from an XSS vulnerability!
Return Types
Some clever Python programmers are reading the above and thinking "eh, we can do the same (granted at runtime)". But we've not yet discussed one of the most powerful aspects of types: that we can change what we do based on what the caller expects as output, not just what it gives us as input. A striking example is the following example of list manipulation1 [1 I got this example from http://perl.plover.com/yak/typing/notes.html.].
Suppose we want to write a merge sort implementation. We start off by writing the splitting function; I'm using pattern matching here to make things shorter.
def/fns split <- [[]] <- return [] [] [[x]] <- return [x] [] [[fst snd & rest]] <- let (s1 s2 <- split rest) <- return [fst & s1] [snd & s2]
Alright, this is a perfectly standard even-odd split function. What
is its type? It must be <fn> [(<list> ,a)] [(<list> ,a) (<list>
,a)]
, since it takes and produces lists, and since it does not care
about what is in that list.
Now, we write the merge
function. If either list is empty, merging
is easy; otherwise, we test which element should be at the front and
recurse as needed. Again, with pattern matching:
def/fns merge <- [[] x] <- return x [x []] <- return x [[fst1 & rest1] [fst2 & rest2]] <- if (< fst1 fst2) <- return [fst1 & merge rest1 [fst2 & rest2]] return [fst2 & merge [fst1 & rest1] rest2]
Finally, a recursive sort
function.
def/fns sort <- [[]] <- return [] [x] <- let (even odd <- split x) <- merge (sort even) (sort odd)
What should the type be? We expect it to be <fn> [(<list> <num>)]
[(<list> <num>)]
, since it merges two lists into one, and since the
<
function works only on numbers. But when we pass it through the
type checker, we instead get the type <fn> [(<list> ,a)] [(<list>
<num>)]
. Huh?! How can this function possibly work on any lists of
anything, if we explicitly call <
?
The only solution must be that we do not call <
. And if so, that
must mean that we never reach the merge
function (since it calls <
immediately), and that must mean that we hit an infinite loop inside
split
and sort
. Huh… why would that be? Well, if we try to split
the list [x]
, we get [x] []
. Sorting the first list require us to
again split it, giving us mutual recursion between split
and sort
!
The fix is to special-case one-element lists as well, giving us the
following sort
routine:
def/fns sort <- [[]] <- return [] [[x]] <- return [x] [x] <- let (even odd <- split x) <- merge (sort even) (sort odd)
So in this case, the type system caught our infinite loop, something a dynamic language could never do, since it can never "see" the infinity of the loop looming ahead — typing, by examining return types, can. In this way, powerful use of the type system can save you headaches by finding rather non-trivial bugs in programs.
Type Systems and Invariants
Another great example of using type systems to catch bugs is in the implementation of red-black trees. In a red-black tree, each tree node is either red or black, and the rule is that a red node can only have black nodes as children. Combined with the invariant that all paths to leaf nodes have the same number of black nodes, we ensure that the tree is relatively balanced.
How can a type system make programming easier for us? We can have the
type system tell us if we failed to enforce the red node, black
children constraint. We'll create two types, <red-node>
and
<black-node>
and a general <node>
type that both are special cases
of.
def/adt (<red-node> ,a) [(red-node (<black-node> ,a) (<black-node> ,a))] def/adt (<black-node> ,a) [ (black-node (<node> ,a) (<node> ,a)) (leaf ,a) ] def/adt <node> [<red-node> <black-node>]
It is now impossible to construct a red-black tree in which a red node has a red child. This means that an invalid implementation of red-black tree methods will fail to type-check, saving hours of debugging (if you've ever tried to write a red-black tree implementation, you know what I mean).
Using the Type System
Type systems are there to enforce rules. In the same way as alarm clocks, really — waking up on time is hard, sometimes we're not fully conscious when that duty comes around, and sometimes we just forget. An alarm clock does our work for us by "remembering" to wake us. Type systems, too, can remind us to do things we'd otherwise forget: escape user input, add special cases to your input, and so on.
People have taught type systems about database normal forms, user authorization, and array bounds. People have also taught type systems to recognize broken links, invalid CSS properties, and more. All of these things make it easier to write bug-free programs by preventing subtle bugs from wasting hours of debugging. The best way of thinking about types isn't that they make a program safer — it's that they save you from remembering to keep a program safe.
Footnotes:
I got this example from http://perl.plover.com/yak/typing/notes.html.