# Batches as Virtual Mu Types

This post is the third in a series of me trying to work out batches, a new data structure I am testing for Herbie and more generally for programming with recursive data structures. In my first post I laid out a type theory for batches, and in my second post I discussed how we can check that programs that use batches are safe. But I'm still after a "holy grail": how to making programming with batches "normal". Specifically, most programming with batches involves lots of indices, which need to be used correctly. I'd like instead to write normal recursive programs and have them automatically be batchified.

This third step has been far more difficult than the type theory and temporal verification discussed in the last two posts; I've been especially thankful for Milind Kulkarni and his students. There's an enormous amount of really interesting work on tree traversals done by his group, which really formed my thinking here. I particularly liked Grafter, which introduced the idea of using regular automata for expressing access sets and recursive calls.

## The goal

Let me be explicit about the goal. I would ideally like to be able to write 1) a normal recursive program over expressions, and then 2) separately, write a "schedule" for how it should execute on batches. Ideally "execute recursively" would be a valid schedule so we could transition step by step. Ideally ideally, it would also have a nice Racket syntax so I can deploy it in Herbie.

The way I'm approaching this problem is to try to rigorously describe the relationship between recursive and batch code. Batches definitely have some quirks that recursive functions don't have: they involve mutation, the traversal order matters more, sometimes the deduplication has weird effects, and so on. If we could connect batch code to recursive code, then hopefully we could see all the degrees of freedom and then building a language around these degrees of freedom would provide the desired "schedule language".

So I've been trying to do this, specifically by starting with a lambda calculus and trying to see how to translate generic lambda terms to batches.

## Mu Types (a quick introduction)

Normally in functional programming we define recursive data structures, like this:

data IntList = Cons Int IntList | Nil

But an alternative is to first define the recursive *functor*:

data IntList' r = Cons' Int r | Nil

It's the same, except recursive `IntList`

's are replaced by the `r`

parameter. Then you can recover `IntList`

with a special type, `Mu`

:

data Mu F = Fold { unfold :: F (Mu F) }

Then `IntList = Mu (IntList')`

gives you the recursive data structure again. A
sort of core idea here is that `IntList'`

represents the "actual data"
while `Mu`

represents the recursive aspect to the data.

The `Mu`

type is entirely defined by two functions:^{1} [^{1} Plus various
identities linking these two functions with the `fmap`

on `F`

.]

fold :: F (Mu F) -> Mu F unfold :: Mu F -> F (Mu F)

There's a similar trick to functions. Suppose you have some recursive function:

length l = case l of | Cons a l' => 1 + length l' | Nil => 0

You can replace the recursive instance of `length`

with a new function:

length' r l = case l of | Cons a l' => 1 + f l' | Nil => 0

Now `length`

is not recursive, but we can recover the recursion with an
extra function usually called `fix`

:

fix f e = f (fix f) e

Then you can define `length = fix length`

and get back the recursive
function. Again, `length`

defines the "actual computation" while `fix`

adds the actual recursion.

## Mu types as data representations

Let's switch to a lower level. Suppose I want to define a linked list in Rust. I can't write:

pub enum IntList { Cons(i32, IntList), Nil, }

The reason this doesn't make sense is that in Rust when you say
something is a struct member that means it is laid out *in line*, and it
doesn't make sense to have an `IntList`

store another `IntList`

inline, it
would require `IntList`

to be an infinitely large structure. So instead
you have to add a `Box`

:

pub enum IntList { Cons(i32, Box<IntList>), Nil, }

I think this is straightforward, but here's the cool thing:
`Box<IntList>`

now looks a lot like `Mu<IntList'>`

. You have to unfold
(deref) to a `Box<IntList>`

to get an `IntList`

, which contains other
`Box<IntList>`

for the leaves. Or you need to `Box::new`

an `IntList`

, with
`Box<IntList>`

already at the leaves, to get a new `Box<IntList>`

.

In other words, the concept of a `Mu`

type not only separates out the
"recursive" aspect but also the "allocation" aspect. If you don't have
a `Mu`

type, that is, if you don't have any recursive data structures,
then you could stack-allocate everything.^{2} [^{2} It might involve
returning really large structures across the stack, of course.]

A batch represents a recursive data structure (possibly deduplicated)
but it is also a place to store recursive data structures, kind of
like a region in region-based allocation. So let's extend `Mu`

types
with an annotation showing which batch the data is allocated in. For
example, the length function might start out as:

length :: Mu IntList -> Integer

But if we want to store the input in a batch^{3} [^{3} Which, fun fact,
would basically transform the linked list into a vector.^{4} [^{4} A cool
follow-on to that is that if you apply this same trick to `nat`

, the
canonical inductive data structure, what you get is a vector of
basically unit type, which has zero length. So the vector is just an
integer.]] called `A`

, its type would become:

length :: Mu A IntList -> Integer

This also explains the role of indices. An index is basically a
pointer, restricted to a specific set of allocations, so it is the
same as the runtime representation of `Box<T>`

, so in other words with
batches we just want `Mu<A, F>`

to be represented by indices into `A`

. The
`unfold`

function just indexes into the batch, while the `fold`

function
adds terms to a (mutable) batch.

By the way, we could also have a special "batch" called `Heap`

to
represent heap-allocated terms, so that `Mu<Heap, F>`

was basically the
normal representation of inductive data types with real pointers.

## Batches polymorphism

Naturally, you would never actually write a function that could only take inputs from a specific batch; instead, you'd want it to be generic over input batches:

length :: forall A. Mu A IntList -> Integer

But if you think through what this means at the level of execution, it
means that `length`

now receives as input both a batch and an index into
that batch. Which is the same as the earlier idea of `batchref`

objects,
which are a pair of batch and index, but now I see `batchref`

as just
one common pattern for batch functions.

Now let's consider a function that rewrites an expression in some way. Normally it would have type:

rewrite :: Mu Expr -> Mu Expr

But with the batch annotations, we now have a number of different
`rewrite`

functions. First, there's this one:

rewrite :: forall A B, Mu A Expr -> Mu B Expr

This one takes input from one batch and writes output to another batch, where the output batch is passed in by the user. A slightly different option is:

rewrite :: forall A, Mu A Expr -> exists B, Mu B Expr

Here the output batch is allocated and returned by `rewrite`

. Finally,
there's the weird option,

rewrite :: forall A, Mu A Expr -> Mu A Expr

which puts rewritten terms back into the same batch as it read from. This is kind of what, say, egraphs do, in that they apply rewrite rules to expressions and put them back into the same "pool" as the original expressions, ready for the next iteration.

## Functions as vectors

The other big idea with batches is that an (immutable) batch describes
a closed, finite universe of objects, so a function over that tree can
be represented (materialized) by a simple vector. So for example, the
`length`

function above, with type:

length :: forall A, Mu A IntList -> Integer

Instead of applying this to a *specific* index in the batch, we could
apply it to *all* the indices in the batch, store the results in an
output vector, and then instead of calling the function we index into
the vector instead.

I think of this as a replacement to `fix`

. When we `fix`

a function from
`Mu<A, IntList>`

, we could allocate a vector of length equal to `A`

,
evaluate it on all of the nodes in `A`

, and just have the vector. Note
the order of operations here: you *first* pass in the batch `A`

, then that
returns a `fix`

'd `lambda`

which is materialized to a vector. So you end
up with `length`

represented by:

mat_length :: forall A, Column A Integer

Where `Column A X`

is defined as `Vector Integer`

, but the type parameter
keeps track of how long it is.

If you recall that `length`

can be defined in terms of a `fix`

, you can
imagine the `fix`

operation itself constructing this column and
returning a function that references directly into it.

## Desugaring to batches

With all of that background out of the way, here's how I conceive of the batch problem now.

Imagine first writing a normal program over recursive data structures.
This program is written in a normal lambda calculus with explicit `Mu`

types and `fix`

combinators. It's convenient if we give regular lambdas
and `fix`

functions different types. So we have regular lambdas `a -> b`

eliminated by function calls `f(x)`

and "queries" `a ->> b`

eliminated by
special function calls `query f x`

.

To get the equivalent program over batches, we have annotate every `Mu`

type with a batch. To do so, we'll need to:

- Add terms to allocate new batches
- Identify which
`Mu`

types go into which batches - Determine how those
`Mu`

types are quantified

Also, since `fix`

terms now have effects, we will need to describe the
order in which `fix`

'd functions are materialized. On a type level that
means type abstraction (`forall`

) over batches and then type application
to eliminate those abstractions.

Here's a toy but somewhat complex example. Suppose you want to convert a binary tree into a height-annotated one:

data Tree r = Tree r r | Nil data ATree r = ATree Int r r | Nil height :: Mu Tree ->> Int height = fix \r t. case (unfold t) of | Tree a b => max (r a) (r b) | Nil => 0 add_height :: Mu Tree ->> Mu ATree add_height = fix \r t. case (unfold t) of | Tree a b => fold (ATree (query height t) (f a) (f b)) | Nil => Nil

This is the definition you'd normally write, except:

- All recursive data types are written as functors, requiring explicit
`Mu`

constructors; - All uses of
`Mu`

types require manual`unfold`

and`unfold`

calls; - Every recursive function requires an explicit
`fix`

call and thereby has type`a ->> b`

. - Every call to a recursive function (the call to
`height`

inside`add_height`

) needs to use`query`

to eliminate query functions.

To turn it into batches, we need to make a bunch of changes:

`height`

needs to have type`forall A, Mu A Tree ->> Int`

; this means`height`

needs to be specialized to a specific batch`A`

first, at which point you can compute the height of any or all of the trees in the batch`A`

.`add_height`

probably needs to have type`forall A B, Mu A Tree ->> Mu B ATree`

. In principle it could allocate the batch itself but I don't see the point, might as well pass in the output batch from the caller.`add_height`

queries`height`

with a term of type`Mu A Tree`

, so we need to specialize`height`

's new`forall`

term with`A`

. We should do that specialization once (because each specialization is expensive).

More formally, if you take a normal program, with explicit `Mu`

, `fix`

,
`fold`

, `unfold`

, and `query`

terms, I think you can turn it into a batch
program by:

- For each
`Mu`

type, pick a fresh variable for the batch it is from. - Run unification in case arguments have to come from the same batch.
- For each function, universally quantify over the batches.
- Type check the body of each function. You need to figure out the
type specializations of each
`fold`

,`unfold`

, and`query`

term. - Most of these will be type directed; in some cases maybe not (like
the term
`unfold (fold Nil))`

, which constructs a temporary batch, and reads back out of it) in which case I think it's fine to use`Heap`

as the batch. - Now each
`fold`

,`unfold`

, and`query`

have explicit batch annotations, so everything uses a batch. - Lift out all arguments to
`query`

.

That gets you to here:

data Tree r = Tree r r | Nil data ATree r = ATree Int r r | Nil height :: forall A, Mu A Tree ->> Int height = \A. fix \r t. case (unfold A t) of | Tree a b => max (r a) (r b) | Nil => 0 add_height :: forall A B, Mu A Tree ->> Mu B ATree add_height = \A B. let height_A = height A in fix \r t. case (unfold A t) of | Tree a b => fold B (ATree (query height_A t) (f a) (f b)) | Nil => Nil

Note that when `add_height`

calls `height A`

, that returns a `fix`

term, so
`add_height`

actually has this body:

add_height = let height_A = fix ... in fix ...

Since I said `fix`

actually constructs a vector and evaluates the
function on every element of the input batch, this means that
specializing `add_height`

computes columns of all the heights, then all
the annotated trees, and a `query`

of that specialized `add_height`

just
indexes into those columns.

Now suppose you want to run this on a specific tree:

main = let t = ... in print (query add_height t)

Then you need to specialize that to batches too, but this time you can actually go ahead and just allocate the batches:

main = let A = Batch.new in let t = Batch.add A (...) in let B = Batch.new in let add_height_AB = add_height A B print (query B add_height t)

In summary—this is a sketch, but I think there is a systematic way
to turn non-batch code into batch code, which can be done in a fairly
type-directed way. And there is *some* flexibility in that conversion,
in terms of what order you specialize things in, and that flexibility
is what the specialization comes down to.

## Scheduling as semi-naive evaluation

Another separate line of work that I found useful is the idea of semi-naive evaluation in Datalog. In Datalog we are interested in computing "facts" and facts are only ever discovered (they are monotonic). This means that, in the middle of a computation, we have "new" facts and "old" facts, where "old" means that we've already processed their consequences.

I think the same idea is applicable to batches. In some cases you are materializing a recursive query of an immutable batch. In this case, you can just apply the function to every element of the batch. But in some of the more complicated, mutually-recursive options I've looked at, you need to compute data on elements of a mutable batch; for example, you might need to compute some semantic details of terms in a batch to construct more terms in that batch.

Here the idea behind "semi-naive" is that, as discussed last time, every column or batch is valid up until a certain index. You start by considering zero indices of the input columns, and have no valid indices in the output columns. Everything is "quiescent", you've computed every fact that is possible. Next, you "reveal" one more row of the input columns, and that makes a bunch of new rows of all the output columns possibly computable, and you compute them in some order. And often you have some freedom to choose the order and what you allocate when and so on. Then you repeat with the next row.

So the idea is that you'd write a schedule that looks something like this:

bottom-up A materialize (height A) materialize (add_height A B)

This schedule means that, for every row of `A`

, first compute `height`

for
that row, and then compute `add_height`

for that row. So this fills in
both `height`

and `add_height`

in parallel. Specifically, we imagine that
`A`

starts as entirely invalid, and then we repeatedly "reveal" a row of
`A`

and run the body, which updates both `height A`

and `add_height A B`

to
take advantage of the new rows.

Alternatively, consider this schedule:

bottom-up A materialize (height A) bottom-up A materialize (add_height A B)

Here we first fill in `height`

, then we repeatedly "reveal" one row at a
time of both `height`

and `A`

(since both have the same index space) and
compute `add_height`

. So in this case we compute all the heights first,
then all the annotated trees.

You can write incorrect schedules, like this:

bottom-up A materialize (add_height A B) materialize (height A)

What's incorrect about this is that `add_height`

reads from both `A`

and
`height`

but those have different lengths.

You can also write complex schedules, like this:

bottom-up A materialize reduce A B bottom-up B materialize gather-additive-terms B

This is the schedule from the previous post, where `reduce`

does some
simplification, adds to the batch `B`

, and uses `gather-additive-terms`

to
perform normalization. So we first do the rewriting and then we need
to call `gather-additive-terms`

for *each* added term. Note that the `B`

column is filled in in one go by `reduce`

but the `gather-additive-terms`

column is filled in a bit at a time, as stuff is added to `B`

.

## Conclusion

I'm three days in to working on this post, so it's time to post it even though it's a bit of a mess. But here's what I think are the most valuable ideas:

- I think batches are all about specializing
`Mu`

and`fix`

to refer to finite universes of terms, which allow them to be materialized into a vector.`Mu`

corresponds to batches and`fix`

to columns, including columns that map between index spaces. - You can convert a regular lambda calculus program into a batch one in a type-directed way, using type annotations and type specialization. There's probably a canonical / free way to do this.
- Then the order in which
`fix`

terms actually fill in columns is the schedule. - That schedule is described by "semi-naive" evaluation where each index space is revealed step by step, and that then updates output columns.

I'm not 100% sure on the scheduling language here but I think something can be worked out. Maybe another post.

## Footnotes:

^{1}

Plus various
identities linking these two functions with the `fmap`

on `F`

.

^{2}

It might involve returning really large structures across the stack, of course.

^{3}

Which, fun fact,
would basically transform the linked list into a vector.^{4} [^{4} A cool
follow-on to that is that if you apply this same trick to `nat`

, the
canonical inductive data structure, what you get is a vector of
basically unit type, which has zero length. So the vector is just an
integer.]

^{4}

A cool
follow-on to that is that if you apply this same trick to `nat`

, the
canonical inductive data structure, what you get is a vector of
basically unit type, which has zero length. So the vector is just an
integer.