The Promise of P-Graphs
In Herbie we use e-graphs to do algebraic rewriting of mathematical
expressions. For example, in the Rust standard library's acosh
function, e-graphs rewrite log(sqrt(x^2 + 1))
…
- into
log(1 + sqrt(x^2 + 1) - 1)
- into
log1p(sqrt(x^2 + 1) - 1)
- into
log1p(x^2 / (sqrt(x^2 + 1) + 1))
- into
log1p(x / (sqrt(1 + (1/x)^2) + (1/x)))
,
Still, they're bad at some things, and one that really stands out is basic sums and products. Canceling like terms, distributing products over sums, and factoring things is hard for E-graphs because it can require many, many rewrite steps, during which time the E-graph will grow exponentially. At its core, this is because the E-graph does a brute-force, breadth-first search for a sequence of rewrites that cancels, distributes, or factors. That's just not an efficient way to do things!
In fact, powerful decision procedures for sums and products (the "ring axioms") already exist. We should use them.
This post is shaped by conversations with Jackson Brough and by his work prototyping basic P-graphs using SymPy and Python. Also, thank you to ChatGPT o3 for repeatedly jogging my memory about Grobner bases and various constructions using them.
P-graph Intuition
A P-graph is a hypothetical extension of E-graphs with basic
polynomial reasoning built in—polynomials being what you get by
taking sums and products. Specifically, a P-graph is an E-graph, but
with special operations like +
, *
, -
, and special syntax for numeric
constants.
For example, consider the example above with log(sqrt(x^2 + 1))
. This
expression has some ring operations (the square, addition, and one
terms) and some non-ring operations (log
, sqrt
, and x
). As a P-graph,
it has three p-nodes:
- P-node \(e_1\) contains a
x
term - P-node \(e_2\) contains a
sqrt
term with argument \(e_1^2 + 1\) - P-node \(e_3\) contains a
log
term with argument \(e_2\)
Note that the special ring operations don't create e-nodes. This is
critical, because P-graphs are "unbounded" in a certain way: they
always consider all polynomial terms, unlike an E-graph that only
considers terms that have somehow been explicitly added to it. So, for
example, the first step of our derivation rewrites log(sqrt(x^2 + 1))
into log(1 + (sqrt(x^2 + 1) - 1))
; in a P-graph, this does not require
an explicit rewrite step because the right hand side is already in the
E-graph.
E-graphs can contain equalities over E-nodes. Likewise P-nodes. For
example, I can merge log(1 + (sqrt(x^2 + 1) - 1))
with
log1p(sqrt(x^2 + 1) - 1)
; in the P-graph I can do the same, by adding
one p-node (\(e_4\), containing a log1p
term with argument \(e_3 - 1\))
and unioning two p-nodes (\(e_4\) and \(e_3\)).
However, P-nodes can also contain equalities based on algebraic
identities. For example, sometime later in the derivation I need the
identity sqrt(x^2 + 1)^2 = x^2 + 1
, which is the polynomial identity
\(e_2^2 = e_1^2 + 1\). These algebraic identities are not stored in a
traditional union-find structure; instead, they are stored on the side
in a polynomial-specific data structure: a Grobner basis.1 [1 In
principle you could drop the union-find entirely and rely only on the
Grobner basis for all equalities. In practice union-find is very fast
while Grobner bases are slow.]
Grobner Bases
Look: I took an undergrad, semester-long course on Grobner bases, and I remember bits of it at best. (Sorry, Prof. Kleiman, you did a great job.) But what I do remember still requires a lot of background, which I'm not going to cover here. But here's the very basics.
Suppose you have a set of polynomial equalities \(P_i = Q_i\).
Mathematicians like think of these as just a set of polynomials by
rewriting to \(P_i - Q_i = 0\). Either way, given this set of
equalities, we might want to test if some other pair of polynomials,
\(A\) and \(B\) are equal. Or, more generally, we might want to put a
polynomial \(A\) into normal form modulo these equalities. (More
generally, because \(A = B\) iff \(A - B\)'s normal form is 0
.)
Grobner bases are a data structure we can build from the polynomial
equalities that make these steps fast. The details actually don't
matter for this post, but just to be complete a Grobner basis G
for a
set of equalities I
is a set of polynomials with a leading term
property. Finding a Grobner basis for a given set of equalities is
slow, but then once you have it putting polynomials in normal form is
fast. Except, no, all of these operations are very slow in practical
terms; but theoretically, finding a Grobner basis is something like NP
while the normal form stuff is O(n^3)
or something else polynomial.
Even more specifically, a Grobner basis depends on the polynomial
equalities and also a variable order.2 [2 More technically, reduced
Grobner bases are unique up to a monomial order, which in this post
I'm assuming is always lexical.]
In a P-graph, we'll use Grobner bases to reason about polynomial identities and the union-find to reason about non-polynomial equalities. The polynomial identities will be over p-classes, so there will be one variable per p-class.
Formalizing P-Graphs
To define a P-graph let's start by assuming we have a term functor
L<T>
defining the non-polynomial terms we are interested in. Write
Ring<T>
for polynomial expressions over variables of type T
; the
overall language of terms we'll be working with is μ T. Ring<L<T>>
,
terms containing both non-polynomial terms in L
and polynomial terms
over them. We call this large term language L+
.
Define a type C
of p-class names. It's going to be important to
distinguish between p-class names and p-classes.
Then a p-node is an L<Ring<C>>
: a non-polynomial term whose arguments
are polynomials over p-classes. Each p-node is stored in a hash-cons.
Note that the analog in an E-graph is L<C>
, where arguments are
e-class references.
A p-class is an index C
corresponding to a set of p-nodes. Each
p-class is stored as a union-find.
A P-graph also has a set of Ring<C>
representing polynomial
identities, stored as an associated Grobner basis. The Grobner basis
orders the p-class references in C
chronologically, with newer
p-classes being later (bigger) in the order.
Like in an E-graph there is a notion of canonicity. The main
difference is that there's now a distinction between C
and Ring<C>
,
both of which need to be canonized separately. A p-class index in C
is
canonical if it is the root of its union-find. A polynomial in Ring<C>
is canonical if all its C
indices are canonical, and also the
polynomial itself is in reduced form according to the Grobner basis.
Finally, there is one extra bit of canonicity that ties together the union-find and the Grobner basis: if the Grobner basis proves that two p-classes are equal, they must be equal in the union-find as well. (The other direction: equal in union find means equal in Grobner basis, is implicit in the fact that the Grobner basis is over p-classes.)
One thing I haven't quite figured out is what to do with P-nodes that
are proven equal to constants, like exp(0)
. I think we need to keep
them in the P-graph. The easy option is keeping them in the
union-find, or just dropping the union-find entirely and just using
the Grobner basis for equality. But that's probably too slow.
P-graph operations
Like an E-graph, the main operations the P-graph supports are add
and
union
.
Let's start with add
, which takes an L+
term as its argument. Much
like in an E-graph, the first step to both of these operations is
flattening their L+
arguments into p-nodes that can be canonicalized.
Note that, when flattening, the ring operations disappear—they don't
produce p-nodes. (See the first section, above, which shows a
flattening of log(sqrt(x^2 + 1))
.) To add
the L+
term, one adds all of
the flattened p-nodes to the p-graph. That introduces new p-classes,
which have to be added to the Grobner basis.
Formally, flattening converts a term in L+
into a "RecExpr", which is
an array with indices in R
. The array contains p-nodes of type
L<Ring<R>>
. Since the root of the L+
term might be a ring operation,
flattening also produces an extra polynomial root : Ring<R>
. Each
p-node in the RecExpr is canonicalized, producing a map R -> C
, which
add
then substitutes into root
, returning the resulting Ring<C>
.
The union
operation takes two Ring<C>
arguments (returned, for
example, from add
). If those arguments are both solo variables, the
corresponding P-classes are unioned in the union-find. Otherwise, the
corresponding polynomial \(P - Q\) is added to the ideal and the Grobner
basis is extended.
The modern F5 algorithm for Grobner bases makes adding both variables and polynomials "fast".
"Rebuilding" the e-graph just means repeatedly canonicalizing every
single p-node in the e-graph. If two p-classes end up containing the
same p-node, the p-classes are merged; this means both connecting
their union-find structures and also adding a c = d
equation to the
Grobner basis. (This equation can be "eliminated" so that the extra
variable is no longer tracked once rebuilding is done. This is similar
to how you do rebuilding in E-graphs with e-node identity.) With
E-graphs there is the egg worklist-parent-pointer algorithm for
rebuilding quickly; I don't know what the fast way is for P-graphs but
there probably is one. I think it would involve sorting parent
pointers by monomial order.
Even with just these operations, P-graphs offer interesting use cases.
For example, you can add identities like sin(x)^2 + cos(x)^2 = 1
, for
a specific x
, and have trignometric functions reduced to normal form.
Still, the true power of E-graphs comes from rewrite rules and
E-matching, so let's now talk about P-matching.
Relational E-matching
Actually, wait, before moving on the P-matching let's talk about E-matching, with a lot of inspiration from the "relational e-matching" perspective.
Recall that in a standard E-graph rewrite rule, you've got a left hand side and a right hand side. You then first match the left hand side against the E-graph, yielding a set of eclass bindings. Finally, you substitute those bindings into the right hand side, add that into the e-graph, and union the two. We've already discussed adding and unioning, so let's talk about matching.
The relational e-matching perspective is most useful here. This
perspective flattens an L+
term into a list of p-nodes (with pointers
P
to pattern variables). For example, if you want to rewrite log(1 +
?x)
to log1p(?x)
, and you were using a plan E-graph with plain
e-matching, you'd flatten the pattern log(1 + ?x)
into:
?c = 1 ?y = ?c + ?x ?z = log(?y)
Then we eliminate these variables one by one using trie join. So, for
example, maybe we decide to eliminate the c
variable first; we use the
constant relation for this. There's only one c
that matches ?c = 1
so
now we have the new pattern:
?y = 1 + ?x ?z = log(?y)
Now let's say we eliminate the y
variable; we look up all e-classes
that contain 1 + something
in the trie, and intersect them with all
e-classes that are arguments to log
; perhaps there are two options e1
and e2
; this then yields the pattern:
e1 = 1 + ?x ?z = log(e1)
and also the pattern
e2 = 1 + ?x ?z = log(e2)
From there you go on to eliminate ?z
and ?x
and you're done.
Note a few things about this process that will be relevant to P-matching.
First, we flatten the initial pattern into p-nodes, basically, except p-nodes over pattern variables.
Second, we order the pattern variables; this order is actually global and has to match the trie order used to index each operator type. (I believe egglog does this with on-the-fly construction plus caching.)
Third, we intersect matches from each individual p-node.
P-matching
Let's move on. In the P-graph world rewrite rules themselves look the same; but how do you match them?
Well, the first step is to flatten the left hand side into a set of
p-nodes. This gives us a set of pattern variables and a set of
constraints over them. However, the constraints look like ?y =
f(P₁(?x), P₂(?x))
, with arguments having a polynomial over the other
pattern variables.
Pick a variable to eliminate.
When the variable is on the left hand side of an equality (like ?y
above), use a normal trie join trie to find all possible e-class
matches.
When the variable is on the right hand side of an equality (like ?x
),
things are a little harder. The trie join trie lets us look up all
p-nodes matching the f
operator (and all other already-eliminated
variables), but for (say) the first argument it has a bunch of
polynomials Q
, which we now need to match against P₁(?x)
.
So for each match, we need to form the equality Q = P₁(?x)
and
eliminate ?x
. We can do that by:
- Write
?x
as an explicit sum of monomials with unknown coefficients \(x_i\). Use "enough" monomials. - Extend the Grobner basis with the variables \(x_i\). Use block grevlex order so that we can eliminate everything except \(x_i\).
- Eliminate.
- Now we're left with a set of polynomials over the \(x_i\), which we
can solve with the standard Grobner basis technique, to get all
assignments to the \(x_i\) that make the equality true. Each
assignment yields a polynomial
?x
, over monomials in the p-classes.
Ideals support "and" and "or" operations, so intersecting ?x
across
multiple p-nodes (or multiple arguments within a p-node) and unioning
them across matches should both be doable.
In other words, P-matching is possible and actually uses the same
algorithm as relational e-matching. That's because trie join is the
same as Grobner bases, or more precisely, both are just quantifier
elimination, where a pattern P
is seen as ∃ ?x, ∃ ?y, ..., P
and
eliminating the quantifier produces a large disjunction of all
possible matches.
Practical Uses and Future Work
Jackson implemented a rudimentary version of this, and I'm hoping he'll write it up. Suffice to say: maybe it works?
If it could be made to work well—I'm especially concerned about performance but in general practicality is suspect—it could potentially allow much better reasoning about ring operations inside e-graphs. That could apply not only to real arithmetic, like in Herbie, but possibly also to CAD operations like in Szalinski or linear algebra like in SPORES.
One thing I do want to note is that, even though p-matching presents the same "API" as e-matching, it's not really the same. For example, consider the rewrite rule:
exp(?a + ?b) = exp(?a) * exp(?b)
Here exp
is a non-polynomial term, while addition and multiplication
are ring operations. The issue is that every single e-class matches
the pattern ?a + ?b
. Even worse, every single e-class has an infinite
number of matches. For example, the eclass containing x
matches with
?a = x, ?b = 0
; with ?a = x + 1; ?b = -1
; with ?a = x + 2; ?b = -2
;
and so on. I think you can detect that when rewriting (elimination
doesn't work at some point), but the point is that this makes writing
rules a lot harder. Maybe there's something smart you can do? Like,
you store a reduced form of the last non-eliminatable polynomial and
use it as a "smart rewrite" as new terms pop into the p-graph? I'm not
sure.
Footnotes:
In principle you could drop the union-find entirely and rely only on the Grobner basis for all equalities. In practice union-find is very fast while Grobner bases are slow.
More technically, reduced Grobner bases are unique up to a monomial order, which in this post I'm assuming is always lexical.