Batches as Virtual Mu Types
This post is the third in a series of me trying to work out batches, a new data structure I am testing for Herbie and more generally for programming with recursive data structures. In my first post I laid out a type theory for batches, and in my second post I discussed how we can check that programs that use batches are safe. But I'm still after a "holy grail": how to making programming with batches "normal". Specifically, most programming with batches involves lots of indices, which need to be used correctly. I'd like instead to write normal recursive programs and have them automatically be batchified.
This third step has been far more difficult than the type theory and temporal verification discussed in the last two posts; I've been especially thankful for Milind Kulkarni and his students. There's an enormous amount of really interesting work on tree traversals done by his group, which really formed my thinking here. I particularly liked Grafter, which introduced the idea of using regular automata for expressing access sets and recursive calls.
The goal
Let me be explicit about the goal. I would ideally like to be able to write 1) a normal recursive program over expressions, and then 2) separately, write a "schedule" for how it should execute on batches. Ideally "execute recursively" would be a valid schedule so we could transition step by step. Ideally ideally, it would also have a nice Racket syntax so I can deploy it in Herbie.
The way I'm approaching this problem is to try to rigorously describe the relationship between recursive and batch code. Batches definitely have some quirks that recursive functions don't have: they involve mutation, the traversal order matters more, sometimes the deduplication has weird effects, and so on. If we could connect batch code to recursive code, then hopefully we could see all the degrees of freedom and then building a language around these degrees of freedom would provide the desired "schedule language".
So I've been trying to do this, specifically by starting with a lambda calculus and trying to see how to translate generic lambda terms to batches.
Mu Types (a quick introduction)
Normally in functional programming we define recursive data structures, like this:
data IntList = Cons Int IntList | Nil
But an alternative is to first define the recursive functor:
data IntList' r = Cons' Int r | Nil
It's the same, except recursive IntList
's are replaced by the r
parameter. Then you can recover IntList
with a special type, Mu
:
data Mu F = Fold { unfold :: F (Mu F) }
Then IntList = Mu (IntList')
gives you the recursive data structure again. A
sort of core idea here is that IntList'
represents the "actual data"
while Mu
represents the recursive aspect to the data.
The Mu
type is entirely defined by two functions:1 [1 Plus various
identities linking these two functions with the fmap
on F
.]
fold :: F (Mu F) -> Mu F unfold :: Mu F -> F (Mu F)
There's a similar trick to functions. Suppose you have some recursive function:
length l = case l of | Cons a l' => 1 + length l' | Nil => 0
You can replace the recursive instance of length
with a new function:
length' r l = case l of | Cons a l' => 1 + f l' | Nil => 0
Now length
is not recursive, but we can recover the recursion with an
extra function usually called fix
:
fix f e = f (fix f) e
Then you can define length = fix length
and get back the recursive
function. Again, length
defines the "actual computation" while fix
adds the actual recursion.
Mu types as data representations
Let's switch to a lower level. Suppose I want to define a linked list in Rust. I can't write:
pub enum IntList { Cons(i32, IntList), Nil, }
The reason this doesn't make sense is that in Rust when you say
something is a struct member that means it is laid out in line, and it
doesn't make sense to have an IntList
store another IntList
inline, it
would require IntList
to be an infinitely large structure. So instead
you have to add a Box
:
pub enum IntList { Cons(i32, Box<IntList>), Nil, }
I think this is straightforward, but here's the cool thing:
Box<IntList>
now looks a lot like Mu<IntList'>
. You have to unfold
(deref) to a Box<IntList>
to get an IntList
, which contains other
Box<IntList>
for the leaves. Or you need to Box::new
an IntList
, with
Box<IntList>
already at the leaves, to get a new Box<IntList>
.
In other words, the concept of a Mu
type not only separates out the
"recursive" aspect but also the "allocation" aspect. If you don't have
a Mu
type, that is, if you don't have any recursive data structures,
then you could stack-allocate everything.2 [2 It might involve
returning really large structures across the stack, of course.]
A batch represents a recursive data structure (possibly deduplicated)
but it is also a place to store recursive data structures, kind of
like a region in region-based allocation. So let's extend Mu
types
with an annotation showing which batch the data is allocated in. For
example, the length function might start out as:
length :: Mu IntList -> Integer
But if we want to store the input in a batch3 [3 Which, fun fact,
would basically transform the linked list into a vector.4 [4 A cool
follow-on to that is that if you apply this same trick to nat
, the
canonical inductive data structure, what you get is a vector of
basically unit type, which has zero length. So the vector is just an
integer.]] called A
, its type would become:
length :: Mu A IntList -> Integer
This also explains the role of indices. An index is basically a
pointer, restricted to a specific set of allocations, so it is the
same as the runtime representation of Box<T>
, so in other words with
batches we just want Mu<A, F>
to be represented by indices into A
. The
unfold
function just indexes into the batch, while the fold
function
adds terms to a (mutable) batch.
By the way, we could also have a special "batch" called Heap
to
represent heap-allocated terms, so that Mu<Heap, F>
was basically the
normal representation of inductive data types with real pointers.
Batches polymorphism
Naturally, you would never actually write a function that could only take inputs from a specific batch; instead, you'd want it to be generic over input batches:
length :: forall A. Mu A IntList -> Integer
But if you think through what this means at the level of execution, it
means that length
now receives as input both a batch and an index into
that batch. Which is the same as the earlier idea of batchref
objects,
which are a pair of batch and index, but now I see batchref
as just
one common pattern for batch functions.
Now let's consider a function that rewrites an expression in some way. Normally it would have type:
rewrite :: Mu Expr -> Mu Expr
But with the batch annotations, we now have a number of different
rewrite
functions. First, there's this one:
rewrite :: forall A B, Mu A Expr -> Mu B Expr
This one takes input from one batch and writes output to another batch, where the output batch is passed in by the user. A slightly different option is:
rewrite :: forall A, Mu A Expr -> exists B, Mu B Expr
Here the output batch is allocated and returned by rewrite
. Finally,
there's the weird option,
rewrite :: forall A, Mu A Expr -> Mu A Expr
which puts rewritten terms back into the same batch as it read from. This is kind of what, say, egraphs do, in that they apply rewrite rules to expressions and put them back into the same "pool" as the original expressions, ready for the next iteration.
Functions as vectors
The other big idea with batches is that an (immutable) batch describes
a closed, finite universe of objects, so a function over that tree can
be represented (materialized) by a simple vector. So for example, the
length
function above, with type:
length :: forall A, Mu A IntList -> Integer
Instead of applying this to a specific index in the batch, we could apply it to all the indices in the batch, store the results in an output vector, and then instead of calling the function we index into the vector instead.
I think of this as a replacement to fix
. When we fix
a function from
Mu<A, IntList>
, we could allocate a vector of length equal to A
,
evaluate it on all of the nodes in A
, and just have the vector. Note
the order of operations here: you first pass in the batch A
, then that
returns a fix
'd lambda
which is materialized to a vector. So you end
up with length
represented by:
mat_length :: forall A, Column A Integer
Where Column A X
is defined as Vector Integer
, but the type parameter
keeps track of how long it is.
If you recall that length
can be defined in terms of a fix
, you can
imagine the fix
operation itself constructing this column and
returning a function that references directly into it.
Desugaring to batches
With all of that background out of the way, here's how I conceive of the batch problem now.
Imagine first writing a normal program over recursive data structures.
This program is written in a normal lambda calculus with explicit Mu
types and fix
combinators. It's convenient if we give regular lambdas
and fix
functions different types. So we have regular lambdas a -> b
eliminated by function calls f(x)
and "queries" a ->> b
eliminated by
special function calls query f x
.
To get the equivalent program over batches, we have annotate every Mu
type with a batch. To do so, we'll need to:
- Add terms to allocate new batches
- Identify which
Mu
types go into which batches - Determine how those
Mu
types are quantified
Also, since fix
terms now have effects, we will need to describe the
order in which fix
'd functions are materialized. On a type level that
means type abstraction (forall
) over batches and then type application
to eliminate those abstractions.
Here's a toy but somewhat complex example. Suppose you want to convert a binary tree into a height-annotated one:
data Tree r = Tree r r | Nil data ATree r = ATree Int r r | Nil height :: Mu Tree ->> Int height = fix \r t. case (unfold t) of | Tree a b => max (r a) (r b) | Nil => 0 add_height :: Mu Tree ->> Mu ATree add_height = fix \r t. case (unfold t) of | Tree a b => fold (ATree (query height t) (f a) (f b)) | Nil => Nil
This is the definition you'd normally write, except:
- All recursive data types are written as functors, requiring explicit
Mu
constructors; - All uses of
Mu
types require manualunfold
andunfold
calls; - Every recursive function requires an explicit
fix
call and thereby has typea ->> b
. - Every call to a recursive function (the call to
height
insideadd_height
) needs to usequery
to eliminate query functions.
To turn it into batches, we need to make a bunch of changes:
height
needs to have typeforall A, Mu A Tree ->> Int
; this meansheight
needs to be specialized to a specific batchA
first, at which point you can compute the height of any or all of the trees in the batchA
.add_height
probably needs to have typeforall A B, Mu A Tree ->> Mu B ATree
. In principle it could allocate the batch itself but I don't see the point, might as well pass in the output batch from the caller.add_height
queriesheight
with a term of typeMu A Tree
, so we need to specializeheight
's newforall
term withA
. We should do that specialization once (because each specialization is expensive).
More formally, if you take a normal program, with explicit Mu
, fix
,
fold
, unfold
, and query
terms, I think you can turn it into a batch
program by:
- For each
Mu
type, pick a fresh variable for the batch it is from. - Run unification in case arguments have to come from the same batch.
- For each function, universally quantify over the batches.
- Type check the body of each function. You need to figure out the
type specializations of each
fold
,unfold
, andquery
term. - Most of these will be type directed; in some cases maybe not (like
the term
unfold (fold Nil))
, which constructs a temporary batch, and reads back out of it) in which case I think it's fine to useHeap
as the batch. - Now each
fold
,unfold
, andquery
have explicit batch annotations, so everything uses a batch. - Lift out all arguments to
query
.
That gets you to here:
data Tree r = Tree r r | Nil data ATree r = ATree Int r r | Nil height :: forall A, Mu A Tree ->> Int height = \A. fix \r t. case (unfold A t) of | Tree a b => max (r a) (r b) | Nil => 0 add_height :: forall A B, Mu A Tree ->> Mu B ATree add_height = \A B. let height_A = height A in fix \r t. case (unfold A t) of | Tree a b => fold B (ATree (query height_A t) (f a) (f b)) | Nil => Nil
Note that when add_height
calls height A
, that returns a fix
term, so
add_height
actually has this body:
add_height = let height_A = fix ... in fix ...
Since I said fix
actually constructs a vector and evaluates the
function on every element of the input batch, this means that
specializing add_height
computes columns of all the heights, then all
the annotated trees, and a query
of that specialized add_height
just
indexes into those columns.
Now suppose you want to run this on a specific tree:
main = let t = ... in print (query add_height t)
Then you need to specialize that to batches too, but this time you can actually go ahead and just allocate the batches:
main = let A = Batch.new in let t = Batch.add A (...) in let B = Batch.new in let add_height_AB = add_height A B print (query B add_height t)
In summary—this is a sketch, but I think there is a systematic way to turn non-batch code into batch code, which can be done in a fairly type-directed way. And there is some flexibility in that conversion, in terms of what order you specialize things in, and that flexibility is what the specialization comes down to.
Scheduling as semi-naive evaluation
Another separate line of work that I found useful is the idea of semi-naive evaluation in Datalog. In Datalog we are interested in computing "facts" and facts are only ever discovered (they are monotonic). This means that, in the middle of a computation, we have "new" facts and "old" facts, where "old" means that we've already processed their consequences.
I think the same idea is applicable to batches. In some cases you are materializing a recursive query of an immutable batch. In this case, you can just apply the function to every element of the batch. But in some of the more complicated, mutually-recursive options I've looked at, you need to compute data on elements of a mutable batch; for example, you might need to compute some semantic details of terms in a batch to construct more terms in that batch.
Here the idea behind "semi-naive" is that, as discussed last time, every column or batch is valid up until a certain index. You start by considering zero indices of the input columns, and have no valid indices in the output columns. Everything is "quiescent", you've computed every fact that is possible. Next, you "reveal" one more row of the input columns, and that makes a bunch of new rows of all the output columns possibly computable, and you compute them in some order. And often you have some freedom to choose the order and what you allocate when and so on. Then you repeat with the next row.
So the idea is that you'd write a schedule that looks something like this:
bottom-up A materialize (height A) materialize (add_height A B)
This schedule means that, for every row of A
, first compute height
for
that row, and then compute add_height
for that row. So this fills in
both height
and add_height
in parallel. Specifically, we imagine that
A
starts as entirely invalid, and then we repeatedly "reveal" a row of
A
and run the body, which updates both height A
and add_height A B
to
take advantage of the new rows.
Alternatively, consider this schedule:
bottom-up A materialize (height A) bottom-up A materialize (add_height A B)
Here we first fill in height
, then we repeatedly "reveal" one row at a
time of both height
and A
(since both have the same index space) and
compute add_height
. So in this case we compute all the heights first,
then all the annotated trees.
You can write incorrect schedules, like this:
bottom-up A materialize (add_height A B) materialize (height A)
What's incorrect about this is that add_height
reads from both A
and
height
but those have different lengths.
You can also write complex schedules, like this:
bottom-up A materialize reduce A B bottom-up B materialize gather-additive-terms B
This is the schedule from the previous post, where reduce
does some
simplification, adds to the batch B
, and uses gather-additive-terms
to
perform normalization. So we first do the rewriting and then we need
to call gather-additive-terms
for each added term. Note that the B
column is filled in in one go by reduce
but the gather-additive-terms
column is filled in a bit at a time, as stuff is added to B
.
Conclusion
I'm three days in to working on this post, so it's time to post it even though it's a bit of a mess. But here's what I think are the most valuable ideas:
- I think batches are all about specializing
Mu
andfix
to refer to finite universes of terms, which allow them to be materialized into a vector.Mu
corresponds to batches andfix
to columns, including columns that map between index spaces. - You can convert a regular lambda calculus program into a batch one in a type-directed way, using type annotations and type specialization. There's probably a canonical / free way to do this.
- Then the order in which
fix
terms actually fill in columns is the schedule. - That schedule is described by "semi-naive" evaluation where each index space is revealed step by step, and that then updates output columns.
I'm not 100% sure on the scheduling language here but I think something can be worked out. Maybe another post.
Footnotes:
Plus various
identities linking these two functions with the fmap
on F
.
It might involve returning really large structures across the stack, of course.
Which, fun fact,
would basically transform the linked list into a vector.4 [4 A cool
follow-on to that is that if you apply this same trick to nat
, the
canonical inductive data structure, what you get is a vector of
basically unit type, which has zero length. So the vector is just an
integer.]
A cool
follow-on to that is that if you apply this same trick to nat
, the
canonical inductive data structure, what you get is a vector of
basically unit type, which has zero length. So the vector is just an
integer.