Binding in E-graphs

E-graphs are good. They are a powerful, general-purpose mechanism for representing sets of programs and reasoning about those sets of programs via equational rules. My colleagues at the University of Washington who work on the Egg library have been developing a "cookbook" of techniques for common subexpression elimination, synthesis, constant folding, and binding in e-graphs. This post is a new take on represent binding in an e-graph.

The problem of binding

The basic problem is that. Suppose you have an e-graph, and in that egraph you want to represent higher-order terms like (lambda (x) x). Ideally, you'd be able to apply both first-order rewrite rules (perhaps there are first-order operators like + with identities you want to represent) and also function application, both inside the e-graph. What's the best way to do so?

Well, the obvious thing to do is just encode the lambda terms directly. So perhaps you define the language like this:

define_language! {
    pub enum Lang {
        "+" = Add([Id; 2]),
        // ...
        "apply" = Apply([Id; 2]),
        "lambda" = Lambda(egg::Symbol, [Id; 1]),
        Symbol(egg:Symbol),
    }
}

I define some first-order operators (here, +), and then the basics of a binding system: (lambda v e), (apply e1 e2), and variable references v.

This representation is simple and direct, and if you don't need to do anything to the lambdas it's good enough. But it has some flaws.

de Bruijin indices

The first flaw we run into is that lambda terms are not identical when alpha renamed; that is, (lambda x x) is not the same exact term as (lambda y y). Because e-graphs are all about congruence, this means that (apply (lambda x x) 2) is not the same term as (apply (lambda y y) 2), and any reasoning you do about the first has to be repeated for the second.

Now, one option is to manually add these equalities to the e-graph.¹ [¹ If you're familiar with the meta-theory of CiC or, say, HoTT, you can think of this as a definitional / propositional equality thing. E-graphs are great at representing propositional equality, but definitional equalities are even better because they are "free".] But finding alpha-equialvent terms in an e-graph is not easy.² [² I suspect it could be done with e-graph intersection, a powerful and under-explored technique, but that would still be slower than you'd like.]

Since this problem comes up in a lot of other rewrite systems and proof assistants, there is a standard solution: de Bruijin indices. This means that instead of thinking of variables as named, we think of them as numbered, with the 0 variable being the most-recently-introduced variable and higher numbers moving outward in scope.³ [³ This numbering scheme is essential for variable names to be context-invariant.]

define_language! {
    pub enum Lang {
        "+" = Add([Id; 2]),
        // ...
        "apply" = Apply([Id; 2]),
        "lambda" = Lambda([Id; 1]),
        Variable(int),
    }
}

Note that variables are now represented by integers and lamdbas do not have a symbol they bind. So now (lambda x x) and (lambda y y) are both written (lambda @0), and facts about the first are automatically facts about the second. (I will write variable references with an @ sign to avoid confusing them with integer constants.)

Explicit substitution

This fixes alpha renaming. Now let's think about beta substitution.

Basically, we want a way to rewrite (apply (lambda e1) e2) to some other term. The term we want to get is:

e1
Except with all @0 references not inside another lambda replaced by e2
And also all other references inside lambda terms with the right number
And also decrement all of the non-replaced variable references

This is a basic operation inside all proof assistants, and it is famously annoying to implement, but basically it is doable. However, for an e-graph this procedure has a lot of negatives:

It involves creating a lot of new terms, which is slow and blows up the e-graph
Modifying all other references means every references to a bound variable ends up duplicated with every possible different index.
It involves traversing the term, and in an e-graph it is not totally obvious what the best way to do that is

Numbers 1 and 2 are not ideal but 3 is the real killer. Traversing a term is not trivial in an e-graph because the whole point of an e-graph is to have a lot of sharing. E-graphs can contain loops, so simple recursions become complex traversal algorithms. Also it's just not very "e-graphy"—e-graphs prefer local rewrites, not global traversals—which is both an aesthetic judgement and a summary of the other points.

Explicit substitution calculi address this problem. Normally when we define a language semantics we define application via a step like:

(apply (lambda v e1) e2) -> e1[v -> e2]

This kind of assumes you know what substitution is. So sometimes people go out of their way to define substitution, with rules like these:

x[x -> e] = e
x[y -> e] = x
f(e1)[x -> e2] = f(e1[x -> e2])

Explicit substitution calculi go one step further and consider the square brackets a syntactic expression in the language, whose semantics is defined by the equations above, instead of just an operator on expressions used when running the program. Imagine, in other words, that the programming language had subst as a built-in operator, and you could write a program like (subst x 3 (+ x 5)), which of course would evaluate to 8.

The benefit of explicit substitutions in an e-graph is that we've broken down our big complicated traversal of a term into small local rewrites.

Modal type theories

This is a bit of a digression but let me briefly discuss the semantics of explicit substitution lambda calculi. If you don't like category theory skip this section.

Imagine a first-order language with no contexts, just expression trees. Those expressions have types like lists, products, sums, and so on—but no lamdbas. All its operators construct and operate on values. For example, here's product:

Γ ⊢ a : A    Γ ⊢ b : B
----------------------
Γ ⊢ (pair a b) : A × B

When we add lambdas, though, we do something weird. Lambdas don't operate on values—they operate on contexts:

    Γ, x : A ⊢ e : T
------------------------
Γ ⊢ (lambda x e) : A → T

The semantics of a term is as a map in a certain category \(C\), there we think of points in the category as contexts and the map is pointing from the input context to the output type. Standard type formation rules are functors on this category.

In this setup, lambdas as modal operator. Generically, this is a term former s whose semantics is given by adjoint functors F and G between \(C \to C\). Adjointness means Γ → F T is equivalent to G Γ → T, which we can write as the following typing rule:

  G Γ ⊢ e : T
----------------
Γ ⊢ (s e) : F T

In other words, s is a special form that is a type former for F T, and its semantics is defined by changing the context to G Γ. In this framework normal lambdas correspond to the standard cartesian adjunction ? × A ⊢ A → ?. In other words, the type A → T is defined as derivations of T where the context is producted with A.

Modal operators are nice. Every pair of adjoint functors forms a comonad. Since comonads are functors, this means that all non-modal operators behave the same in any context, or in other words that all first-order terms behave identically no matter what higher-order operators you add. Also every pair of adjoint functors forms a monad, and that monad gives you an operator like apply, that destructs the new terms you are introducing. In other words, the modal operator is a complete package of semantics, introduction form, and elimination operator.

And modal operators give you are recipe for adding other modalities. For example, do you want a proof assistant for temporal logic? Find an appropriate functor G that describes what you want to happen to the context, then find an adjoint F to it, done! You can also reason equationally about modalities. Why is weakening and ordering true for lambdas? Because the categorical product is commutative and has maps like A × B → A.

Modal de Bruijin indices

Modal operators suggest an improvement to explicit substitution calculi: think of substitution as a modal operator.

Imagine an operator S which conceptually increments all of the variable references that appear inside it. You can think of S as meaning "successor", as if the variable references have indices written in unary. To refer to a variable N levels up, we can write (S (S (S ... (S @0)))), where there are N S terms to increment the variable reference to the right count. Naturally we write O instead of @0, and no longer need any other variable references.

Here's an example. Consider the function \(\lambda x. \lambda y. x\). With de Bruijin indices, we don't write variable names in the \(\lambda\) binding, and in the variable reference of \(x\) we use the index "1" (because "y" was bound most recently, giving it index "0", while "x" was bound next, giving it index "1").

So we write this expression (lambda (lambda (S O))).

But we can apply S to expressions as well as variable references. For example, the function \(\lambda x. \lambda y. x * x\) can be written (lambda (lambda (* (S O) (S O)))) but it can also be written (lambda (lambda (S (* O O)))). This is useful for copy-and-pasting. If you want a constant function that returns the value of some expression E, you can write that as (lambda (S E)), and any variable references in E are automatically incremented to refer to the right thing.

Instead of thinking of S as incrementing all variable references below it, you can also think of it as deleting the most recently introduced variable binding—a modal operator.

Reasoning equationally

This one trick lets you reason about binding equationally. The important operators here are let, lambda, S, O, apply. I'll write f for any functions not otherwise listed; I write it in single-variable form but you can generalize to multiple variables.

First, let's do the rules for generic functions. They commute with the operators:

(S (f x)) = (f (S x))
(let (f x)) = (f (let x))

This is because as a function, not an operator, f does not modify or reference the stack.

Next, let's show how let interacts with variable binding. Recall that (let v body) binds an anonymous variable to the value v before executing body:

(let v O) = v
(let v (S body)) = body

The second rule shows how let and S are inverses of a sort. Note that if f has more than one argument, this second rule lets you generalize the commutativity of let and f:

(f x (let v y)) = (f (let v (S x)) (let v y)) = (let v (f (S x) y))

That said, if you have multi-argument first-order operators, you will probably want to encode all variations of the above as basic rules, so your rewrite system does not need to manufacture the v in the first step.

Finally on the basis of these stack-manipulation operators we can implement lambda and apply:

(apply (lambda body) arg) = (let arg body)
(S (apply f x)) = (apply (S f) (S x))
(let v (apply f x)) = (apply (let v f) (let v x))

Note that this last rule is just saying that apply is a normal function that does not manipulate the stack.

These rules are enough to execute simple code. For example, to evaluate \((\lambda x. \lambda y. x) 1 2\), we have:

Start with (apply (appy (lambda (lambda (S O))) 1) 2)
Apply rule 6 to get (apply (let 1 (lambda (S O))) 2)
Apply rule 4 to get (apply (let 1 (lambda (S O))) (let 1 (S 2)))
Apply rule 8 to get (let 1 (apply (lambda (S O)) (S 2)))
Thinking of "2" as a zero-ary operator, apply rule 1 to get (let 1 (apply (lambda (S O)) 2))
Apply rule 6 to get (let 1 (let 2 (S O)))
Apply rule 4 to get (let 1 O)
Apply rule 4 to get 1

Conclusions and future work

I think this is one promising mechanism for representing higher-order languages in an e-graph, with definitional equality for alpha renaming and local rewrite rules for beta substitution. I hope to evaluate this against an implementation of implicit substitution lambda calculus that the Egg folks already wrote and see if it is better in some way.

I also suspect that there are more optimizations to make. For example, if you have the term (let v e), and the e-class for e contains the e-node (S e2) along with other e-nodes like (f e3) we should only consider the (let v (S e2)) -> e2 rewrite instead of applying (let v (f e3)) -> (f (let v e3)), even though both rewrites are valid. This should cut down on unnecessary copies.

It would be cool to use Egg to build a proof assistant. It would be kind of like Zombie, in that recursion would not be native but congruence would be. It's also interesting to consider adding other modal operators, though I don't have great examples to consider. Considering both, it would be awesome to have a proof assistant for more general modal type theories.

Footnotes:

If you're familiar with the meta-theory of CiC or, say, HoTT, you can think of this as a definitional / propositional equality thing. E-graphs are great at representing propositional equality, but definitional equalities are even better because they are "free".

I suspect it could be done with e-graph intersection, a powerful and under-explored technique, but that would still be slower than you'd like.

This numbering scheme is essential for variable names to be context-invariant.

By Pavel Panchekha

28 October 2021