# Binding in E-graphs

E-graphs are good. They are a powerful, general-purpose mechanism for representing sets of programs and reasoning about those sets of programs via equational rules. My colleagues at the University of Washington who work on the Egg library have been developing a "cookbook" of techniques for common subexpression elimination, synthesis, constant folding, and binding in e-graphs. This post is a new take on represent binding in an e-graph.

## The problem of binding

The basic problem is that. Suppose you have an e-graph, and in that
egraph you want to represent higher-order terms like `(lambda (x) x)`

.
Ideally, you'd be able to apply both first-order rewrite rules
(perhaps there are first-order operators like `+`

with identities you
want to represent) and also function application, both inside the
e-graph. What's the best way to do so?

Well, the obvious thing to do is just encode the `lambda`

terms
directly. So perhaps you define the language like this:

define_language! { pub enum Lang { "+" = Add([Id; 2]), // ... "apply" = Apply([Id; 2]), "lambda" = Lambda(egg::Symbol, [Id; 1]), Symbol(egg:Symbol), } }

I define some first-order operators (here, `+`

), and then the basics of
a binding system: `(lambda v e)`

, `(apply e1 e2)`

, and variable references
`v`

.

This representation is simple and direct, and if you don't need to do anything to the lambdas it's good enough. But it has some flaws.

## de Bruijin indices

The first flaw we run into is that lambda terms are not identical when
alpha renamed; that is, `(lambda x x)`

is not the same exact term as
`(lambda y y)`

. Because e-graphs are all about congruence, this means
that `(apply (lambda x x) 2)`

is not the same term as ```
(apply (lambda y
y) 2)
```

, and any reasoning you do about the first has to be repeated for
the second.

Now, one option is to manually add these equalities to the
e-graph.^{1} [^{1} If you're familiar with the meta-theory of CiC or, say,
HoTT, you can think of this as a definitional / propositional equality
thing. E-graphs are great at representing propositional equality, but
definitional equalities are even better because they are "free".] But
finding alpha-equialvent terms in an e-graph is not easy.^{2} [^{2} I
suspect it could be done with e-graph intersection, a powerful and
under-explored technique, but that would still be slower than you'd
like.]

Since this problem comes up in a lot of other rewrite systems and
proof assistants, there is a standard solution: de Bruijin indices.
This means that instead of thinking of variables as *named*, we think of
them as *numbered*, with the `0`

variable being the
most-recently-introduced variable and higher numbers moving outward in
scope.^{3} [^{3} This numbering scheme is essential for variable names to be
context-invariant.]

define_language! { pub enum Lang { "+" = Add([Id; 2]), // ... "apply" = Apply([Id; 2]), "lambda" = Lambda([Id; 1]), Variable(int), } }

Note that variables are now represented by integers and lamdbas do not
have a symbol they bind. So now `(lambda x x)`

and `(lambda y y)`

are both
written `(lambda @0)`

, and facts about the first are automatically facts
about the second. (I will write variable references with an `@`

sign to
avoid confusing them with integer constants.)

## Explicit substitution

This fixes alpha renaming. Now let's think about beta substitution.

Basically, we want a way to rewrite `(apply (lambda e1) e2)`

to some
other term. The term we want to get is:

`e1`

- Except with all
`@0`

references not inside another`lambda`

replaced by`e2`

- And also all other references inside
`lambda`

terms with the right number - And also decrement all of the non-replaced variable references

This is a basic operation inside all proof assistants, and it is famously annoying to implement, but basically it is doable. However, for an e-graph this procedure has a lot of negatives:

- It involves creating a lot of new terms, which is slow and blows up the e-graph
- Modifying all other references means every references to a bound variable ends up duplicated with every possible different index.
- It involves traversing the term, and in an e-graph it is not totally obvious what the best way to do that is

Numbers 1 and 2 are not ideal but 3 is the real killer. Traversing a term is not trivial in an e-graph because the whole point of an e-graph is to have a lot of sharing. E-graphs can contain loops, so simple recursions become complex traversal algorithms. Also it's just not very "e-graphy"—e-graphs prefer local rewrites, not global traversals—which is both an aesthetic judgement and a summary of the other points.

Explicit substitution calculi address this problem. Normally when we define a language semantics we define application via a step like:

(apply (lambda v e1) e2) -> e1[v -> e2]

This kind of assumes you know what substitution is. So sometimes people go out of their way to define substitution, with rules like these:

`x[x -> e] = e`

`x[y -> e] = x`

`f(e1)[x -> e2] = f(e1[x -> e2])`

Explicit substitution calculi go one step further and consider the
square brackets a syntactic expression in the language, whose
semantics is defined by the equations above, instead of just an
operator on expressions used when running the program. Imagine, in
other words, that the programming language had `subst`

as a built-in
operator, and you could write a program like `(subst x 3 (+ x 5))`

,
which of course would evaluate to `8`

.

The benefit of explicit substitutions in an e-graph is that we've broken down our big complicated traversal of a term into small local rewrites.

## Modal type theories

This is a bit of a digression but let me briefly discuss the semantics of explicit substitution lambda calculi. If you don't like category theory skip this section.

Imagine a first-order language with no contexts, just expression trees. Those expressions have types like lists, products, sums, and so on—but no lamdbas. All its operators construct and operate on values. For example, here's product:

Γ ⊢ a : A Γ ⊢ b : B ---------------------- Γ ⊢ (pair a b) : A × B

When we add lambdas, though, we do something weird. Lambdas don't operate on values—they operate on contexts:

Γ, x : A ⊢ e : T ------------------------ Γ ⊢ (lambda x e) : A → T

The semantics of a term is as a map in a certain category \(C\), there we think of points in the category as contexts and the map is pointing from the input context to the output type. Standard type formation rules are functors on this category.

In this setup, lambdas as *modal operator*. Generically, this is a term
former `s`

whose semantics is given by adjoint functors `F`

and `G`

between
\(C \to C\). Adjointness means `Γ → F T`

is equivalent to `G Γ → T`

, which
we can write as the following typing rule:

G Γ ⊢ e : T ---------------- Γ ⊢ (s e) : F T

In other words, `s`

is a special form that is a type former for `F T`

, and
its semantics is defined by changing the context to `G Γ`

. In this
framework normal lambdas correspond to the standard cartesian
adjunction `? × A ⊢ A → ?`

. In other words, the type `A → T`

is defined as
derivations of `T`

where the context is producted with `A`

.

Modal operators are nice. Every pair of adjoint functors forms a
comonad. Since comonads are functors, this means that all non-modal
operators behave the same in any context, or in other words that all
first-order terms behave identically no matter what higher-order
operators you add. Also every pair of adjoint functors forms a monad,
and that monad gives you an operator like `apply`

, that destructs the
new terms you are introducing. In other words, the modal operator is a
complete package of semantics, introduction form, and elimination
operator.

And modal operators give you are recipe for adding other modalities.
For example, do you want a proof assistant for temporal logic? Find an
appropriate functor `G`

that describes what you want to happen to the
context, then find an adjoint `F`

to it, done! You can also reason
equationally about modalities. Why is weakening and ordering true for
lambdas? Because the categorical product is commutative and has maps
like `A × B → A`

.

## Modal de Bruijin indices

Modal operators suggest an improvement to explicit substitution calculi: think of substitution as a modal operator.

Imagine an operator `S`

which conceptually increments all of the
variable references that appear inside it. You can think of `S`

as
meaning "successor", as if the variable references have indices
written in unary. To refer to a variable N levels up, we can write ```
(S
(S (S ... (S @0))))
```

, where there are N `S`

terms to increment the
variable reference to the right count. Naturally we write `O`

instead of
`@0`

, and no longer need any other variable references.

Here's an example. Consider the function \(\lambda x. \lambda y. x\). With de Bruijin indices, we don't write variable names in the \(\lambda\) binding, and in the variable reference of \(x\) we use the index "1" (because "y" was bound most recently, giving it index "0", while "x" was bound next, giving it index "1").

So we write this expression `(lambda (lambda (S O)))`

.

But we can apply `S`

to expressions as well as variable references. For
example, the function \(\lambda x. \lambda y. x * x\) can be written
`(lambda (lambda (* (S O) (S O))))`

but it can also be written ```
(lambda
(lambda (S (* O O))))
```

. This is useful for copy-and-pasting. If you
want a constant function that returns the value of some expression `E`

,
you can write that as `(lambda (S E))`

, and any variable references in `E`

are automatically incremented to refer to the right thing.

Instead of thinking of `S`

as incrementing all variable references below
it, you can also think of it as deleting the most recently introduced
variable binding—a modal operator.

## Reasoning equationally

This one trick lets you reason about binding equationally. The
important operators here are `let`

, `lambda`

, `S`

, `O`

, `apply`

. I'll write `f`

for any functions not otherwise listed; I write it in single-variable
form but you can generalize to multiple variables.

First, let's do the rules for generic functions. They commute with the operators:

`(S (f x)) = (f (S x))`

`(let (f x)) = (f (let x))`

This is because as a function, not an operator, `f`

does not modify or
reference the stack.

Next, let's show how `let`

interacts with variable binding. Recall that
`(let v body)`

binds an anonymous variable to the value `v`

before
executing `body`

:

`(let v O) = v`

`(let v (S body)) = body`

The second rule shows how `let`

and `S`

are inverses of a sort. Note that
if `f`

has more than one argument, this second rule lets you generalize
the commutativity of `let`

and `f`

:

(f x (let v y)) = (f (let v (S x)) (let v y)) = (let v (f (S x) y))

That said, if you have multi-argument first-order operators, you will
probably want to encode all variations of the above as basic rules, so
your rewrite system does not need to manufacture the `v`

in the first
step.

Finally on the basis of these stack-manipulation operators we can
implement `lambda`

and `apply`

:

`(apply (lambda body) arg) = (let arg body)`

`(S (apply f x)) = (apply (S f) (S x))`

`(let v (apply f x)) = (apply (let v f) (let v x))`

Note that this last rule is just saying that `apply`

is a normal
function that does not manipulate the stack.

These rules are enough to execute simple code. For example, to evaluate \((\lambda x. \lambda y. x) 1 2\), we have:

- Start with
`(apply (appy (lambda (lambda (S O))) 1) 2)`

- Apply rule 6 to get
`(apply (let 1 (lambda (S O))) 2)`

- Apply rule 4 to get
`(apply (let 1 (lambda (S O))) (let 1 (S 2)))`

- Apply rule 8 to get
`(let 1 (apply (lambda (S O)) (S 2)))`

- Thinking of "
`2`

" as a zero-ary operator, apply rule 1 to get`(let 1 (apply (lambda (S O)) 2))`

- Apply rule 6 to get
`(let 1 (let 2 (S O)))`

- Apply rule 4 to get
`(let 1 O)`

- Apply rule 4 to get
`1`

## Conclusions and future work

I think this is one promising mechanism for representing higher-order languages in an e-graph, with definitional equality for alpha renaming and local rewrite rules for beta substitution. I hope to evaluate this against an implementation of implicit substitution lambda calculus that the Egg folks already wrote and see if it is better in some way.

I also suspect that there are more optimizations to make. For example,
if you have the term `(let v e)`

, and the e-class for `e`

contains the
e-node `(S e2)`

along with other e-nodes like `(f e3)`

we should only
consider the `(let v (S e2)) -> e2`

rewrite instead of applying ```
(let v
(f e3)) -> (f (let v e3))
```

, even though both rewrites are valid. This
should cut down on unnecessary copies.

It would be cool to use Egg to build a proof assistant. It would be kind of like Zombie, in that recursion would not be native but congruence would be. It's also interesting to consider adding other modal operators, though I don't have great examples to consider. Considering both, it would be awesome to have a proof assistant for more general modal type theories.

## Footnotes:

^{1}

If you're familiar with the meta-theory of CiC or, say, HoTT, you can think of this as a definitional / propositional equality thing. E-graphs are great at representing propositional equality, but definitional equalities are even better because they are "free".

^{2}

I suspect it could be done with e-graph intersection, a powerful and under-explored technique, but that would still be slower than you'd like.

^{3}

This numbering scheme is essential for variable names to be context-invariant.