Pavel Panchekha


Share under CC-BY-SA.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

A Wretched Hive of Dots and Arrows


This is just a quick sketch of the class, and will be filled in likely only if the class it taught again.

Category theory is a mathematical framework for understanding the basics of functions and algebraic structure. It allows us to generalize similar notions from disparate fields: set theory, group theory, and ring theory, as well as programming language theory. At its core, category theory is about structure and structure-preserving maps.

What do we mean? Well, there are many different types of structure which we deal with in mathematics. For example, we might work with natural numbers, or with the real numbers, or perhaps some geometric surface; we might be working with an algebraic structure like groups or rings or vector spaces; and so on. The fundamental insight we need to unify these is that in all cases, we care about some sort of way of talking about structure and functions that maintain it. Our method of doing so is category theory.


A category is defined pretty simply. We have a collection[0] of objects and a collection of arrows between them. The rules we must follow are that objects and arrows are distinguishable (we can discuss whether two objects or two arrows are equal or not, with the usual rules for that), and that the arrows must be somewhat similar to functions. In particular, there has to be an "identity" or "unit" arrow for every object such that composing it with anything else yields that other thing, that points from that object to itself, and that we can compose arrows to yield an arrow.

[0] This is a technicality. We don't like to talk about categories having a set of objects because it is often useful to talk about categories that are "too big" to be contained in a set – for example, the category of all sets with all functions as the arrows. If the objects were in a set, that set would contain itself, which is illegal in set theory. There is a way rigorize the "collection of" idea with what is called a class, which avoids the paradoxes of naive set theory while still giving us the power we need. Often enough the objects do fit in a set, in which case we call the category a "small" category.

So, let's look at some examples of categories. We can consider the category of sets, which we'll call \(((Set))\); in this case, the objects are sets, and the arrows are functions in the usual set-theoretic sense. To check that the axioms hold, we need an identity and composition. An identity function on a set \(X\) is given by \(f : X \to X = x \mapsto x\); and to compose two functions \(f : X \to Y\) and \(g : Y \to Z\) we consider the function \(f \circ g : X \to Z = x \to g(f(x))\). note that we're being explicit about what the starting and ending object is for each arrow; this is because we want to emphasize the fact that arrows are between two objects, and that the identity function on one set and the identity function on a different set are different functions (even though they both have the same high-level description).

Another example of a category is the category of compact shapes in \(n\) dimensions – where compact means that the shape includes its boundary and isn't infinite in size (is bounded). Now we can't allow just any sort of arrow, because we can image many "poorly-behaved" functions, which take a compact shape to one that is not compact. For example, imagine the function \((x, y) \to (1 / x, 0)\) on the unit disk in 2 dimensions. The starting shape (the unit disk) is bounded and includes its boundary, but the output shape is two lines that go off to infinity. This output shape isn't compact since it is infinite in size (points very close to zero in the unit disk map to very far away points). So we have to restrict our arrows somehow – in this case, it is enough to stipulate that the function be continuous and well-defined everywhere in its input shape. With this restriction, all of our arrows have a well-defined start and end object, and at it turns out (this isn't hard to show) the identity arrow satisfies these restrictions, and the composition of two so restricted functions is still so restricted. This yields a category we might call something like \(((Compact^n))\).

Note how above, we had to add specific constraints on our arrows to make the category well-defined – otherwise arrows that start at a valid object might output something that isn't a valid object.

Categories aren't always about actual functions on objects. For example, consider the "category of a total order". We'll have some number of objects on which we have a total order – a relation \(\le\) on them that defines an ordering of objects. Then we represent this as a category by taking the objects of our category the things we are ordering, and for the arrows, we just draw an arrow from an object to all objects that are less than or equal to it. This is a valid category, because we have an arrow from an object to itself (since anything is less than or equal to itself), and because if we have \(a \le b\) and \(b \le c\) (arrows from \(a\) to \(b\) and \(b\) to \(c\)), then \(a \le c\). So even though the arrows we've drawn don't represent physical functions, we don't care! The arrows in this example are completely abstract, they have no meaning internally – but in category theory we care only about where the arrows point to/from, not what they do internally. We imagine objects to be black boxes, and this gives us a lot of power to abstract away details.

Also note that we have a lot of arrows here: between any two objects there is an arrow in one or the other direction, which means that drawing all of the arrows is actually rather hard. Instead, we sometimes talk about a generator set of arrows; that is, some minimal set of arrows, where we can get any arrow by composing two arrows in this set. For the above example, the arrows between any two "adjacent" elements are a generating set of arrows.

For an example of how abstract we can get, we can consider the category of a logic, where every object is some mathematical statement, and each arrow is a proof that the starting statement implies the ending statement. This is a valid category, because as our identity arrows we have the obvious proof that \(A \Rightarrow A\): \(A \therefore A\); and for composition, we can just concatenate the proofs. Suppose \(A\). Then use the proof of \(A \Rightarrow B\) to show \(B\). Then use the proof of \(B \Rightarrow C\) to show \(C\). Thus \(A \Rightarrow C\). This category is also totally abstract, but it also turns out to be hugely important: in programming language theory, a famous theorem called the Curry-Howard isomorphism establishes that a type theory for a programming language (that is, a way of assigning types to values and rules for deducing how these types interact) also form a category, so we can talk about what logic a type theory is isomorphic to (see my class on Modern Programming Language Theory for more).

Anyways, the rules that define a category are such because they permit us to talk about structure without getting bogged down in the details of the underlying arrows and objects and what they do.

Universal Properties

Why do we go through this effort to talk about such an abstract thing? What benefit does it bring us? For one, the language of category theory makes it easy and intuitive to talk about many concepts, especially to talk about concepts that stretch across different mathematical structures — such as products, quotients, and so on. In category theory, we talk about these by way of universal properties.

As an example, let's talk about products. What do I mean? Well, we often talk about the direct product of two groups, or the product of two rings, or the product of two vector spaces, and so on. In all of these cases, when we talk about the product of objects \(A\) and \(B\), we want their product \(A \times B\) to be some object that combines both \(A\) and \(B\) while keeping the two independent. For example, in group theory, the product \(G \times H\) of two groups is the group of pairs of elements \((g, h)\), where all operations are pairwise. We can formalize this "combine independently" ideal by way of a universal property: \(A \times B\) the universal object for which there exist projections \(\pi_1 : A \times B \to A\) and \(\pi_2 : A \times B \to B\) such that for any functions \(f_1 : C \to A\) and \(f_2 : C \to B\) there exists a function \(f : C \to A \times B\) such that \(f \circ \pi_1 = f_1\), \(f \circ \pi_2 = f_2\).

Let's spend some time digesting this. The way we're defining the product is by stating a property that it satisfies, and furthermore by adding that it is the universal object doing so. "Universal" here just means that there is exactly one of these objects, and that if two objects satisfy the property given above, these two objects must be equal. Now, what is the property in this case? It is that if we have functions from something to \(A\) and to \(B\), we must be able to combine them into a function to \(A \times B\). In other words, if some object \(C\) contains "pieces" of \(A\) and \(B\), it contains "pieces" of \(A \times B\). Does this capture what we wanted, that the product must contain both \(A\) and \(B\)? Well, it means that functions to \(A\) and to \(B\) can be combined into functions to \(A \times B\), so this is indeed a pretty natural translation of that idea.

Of course, for any universal property we have to prove that the object involved is indeed universal. But this is hopefully relatively easy.

In category theory we often use universal properties to define the concepts we'd like to talk about. While universal properties usually make it hard to think about what an object looks like internally, they have the benefits of being particularly easy to use in proofs — if you need to show that some object is, say, the product of two other objects, all we have to do is show that it satisfies the universal property.

Commutative Diagrams

Often we represent universal properties by way of diagrams, in particular "commutative diagrams". These are hugely important in category theory, so let's see an example. Here's a commutative diagram for the "product" definition above.


Figure 1: The commutative diagram for \(A \times B\). Image from Wikipedia.

It takes a bit of know-how to read this, but it isn't hard. Each arrow in the picture represents an arrow, and each of the things at the end of an arrow is an object. So this diagram has five arrows: \(f_1\) and \(f_2\), \(\pi_1\) and \(\pi_2\), and \(f\). The arrow for \(f\) is dashed, implying that it isn't given; instead, it's a condition we have to satisfy[1]. Finally, the diagram is a commutative diagram; this means that any path between two points (that is, any way to compose arrows to get between two arrows) must be the same. There are two ways, in the diagram above, to go from \(Y\) to \(X_1\): you can either travel along \(f_1\) or along \(f\) and then \(\pi_1\). Since the diagram commutes, these two ways must be the same, so \(f_1\) must equal \(f ^ \pi_1\).

Note how the diagram compactly expresses our universal property. It implies (since those arrows are solid) that \(\pi_1\) and \(\pi_2\) are given. Since they are between \(X_1 \times X_2\) and \(X_1\) and \(X_2\), it is implied that these are the same for any choice of \(Y\). The diagram also makes \(f_1 : Y \to X_1\) and \(f_2 : Y \to X_2\) given, and requires the existence of \(f : Y \to X_1 \times X_2\). This is the same as we did in our definition above. Finally the commutativity forces \(f_1 = f ^ \pi_1\), \(f_2 = f ^ \pi_2\), the same conditions we gave above.

The diagram above is a compact and elegant way of capturing the definition of product. But there's another benefit to this form. The thing is, we often find ourselves drawing out commutative diagrams to express problems in category theory. If we display our definitions as commutative diagrams, we can effectively turn proofs into pattern matching: you look at the problem at hand, and you can just see the fact that some object is the product of two others, since it's the right shape. This makes category theory a lot easier to work with and also a lot less abstract, since it is transformed into an intensely visual field.

[1] This is a common convention, but it isn't universal.


Natural Transformations



Remember our definition for a product? Here's a different diagram.


Figure 2: The commutative diagram for \(X_1 \oplus X_2\). Image from Wikipedia.

Here, we're trying to define the object \(X\) in terms of the required function \(f\). We are given a set of "inclusions" \(i_j\) and functions from each \(X_j\) to \(Y\); the object \(X\) (called the "coproduct" of the \(X_j\)) is one which combines the \(X_j\) such that all the maps \(f_j : X_j \to Y\) can be put together into one big map \(f : X \to Y\).

This is called the category-theory dual of the product. It's the dual because the diagrams are the same, except that the arrows are reversed, and as a result it is called the "coproduct" construction, not the "product" construction. In the same way, one can have "colimits" and "limits", and so on.

As you might expect, products and coproducts are different, but share similar traits. Both have something to do with combining two objects, but they differ in that for products, we combine two target objects into one large target, and for coproducts, we combine two source objects into one large source.

For sets, the coproduct is the "disjoint union" — like the union, but if the two objects have the same element, we keep both copies, and track which is which. You can imagine coloring the elements of one set red, those of the other set blue, and then taking the union of the sets.