Using Egg in Herbie
Over the last semester of 2019, Oliver Flatt rewrote Herbie's simplifier—its most important and most expensive component—to use Max Willsey's Egg library. There were quite some difficulties involved, but this change speeds up Herbie by 2.8× and opens up opportunities for further improvements in the future.
E-graphs
Herbie's simplification component takes an algebraic expression and simplifies it. The main intended benefit is to remove algebraic cancellations, which would otherwise introduce a lot of error, and to clean up the results of other rewrites, which can't always be counted on to clean up after themselves.
Early on, Herbie used a traditional canonicalization-based algebraic simplifier, but we found that we had to keep adding more and more hacks to it in order to fix more and more kinds of junk introduced by rewrites. For example: how much of the distributive property do you want to apply? (Enough to find extra cancellations, but no more!) When do you want to cancel inverse functions?
Eventually, we found that ever-more-hacks was not a sustainable path and switched to an interesting data structure from the compilers world: equivalence graphs. In short, an equivalence graph stores equivalence classes of expressions, where each expression stores pointers to equivalence classes of its subexpressions. This structure makes it possible to compactly store exponentially-many expressions, which differ only via local rewrites, in a linear amount of space.
Herbie's equivalence graph implementation was painstakingly written by Alex Sanchez-Stern in Racket. It uses a traditional algorithmic approach, and involved years of debugging to get all the bugs out. As a consequence, it is not the fastest implementation you could imagine, and indeed roughly 40% of Herbie's runtime was spent inside simplification. And that was after a sustained effort to improve our use of the e-graph, including a 2× speed-up in Herbie 1.3 and further bug-fixes since then.
Transitioning to Egg
Egg, on the other hand, uses a novel algorithmic approach that seems to be 30× faster than the traditional one (the paper is being written!) while also being written in Rust and with at least some effort made toward cache-friendly tuning. That makes it orders of magnitude faster than Herbie's Racket version, while also able to scale to much larger e-graphs (in case we need them). Naturally, Egg is an attractive option for Herbie, so Oliver got to work exploring it and seeing it through.
Using Egg was not immediate. The Racket e-graph implementation integrated directly with Herbie, sharing its list of rules, its data structures, and some internal functions. Egg is in Rust, and is a separate project, so all that had to happen some other way. Thus, to use Egg, we needed to write a way to serialize both rewrite rules (which Herbie uses to define the mathematical operators) and expressions.
An essential aspect of simplification is evaluating constant
expressions (at least when the evaluation can be done exacts: 2/2
evaluates to 1
but sqrt(2)
is left alone). Herbie used to use the
exactness rules of Racket eval
, and those needed to be replicated in
Rust.
And finally, a lot of plumbing had to be written to compile the Rust version to a binary, distribute it on all platforms, install it automatically when installing Herbie, fall back to the Racket version if necessary, and integrate the same logging and metrics that the Racket version had.
All that has now been done! The hope is that Herbie 1.4, whenever it is released, will transparently use Egg when available.
Results
So what were the benefits of using Egg? Well, the first set of benefits is that, expression for expression, Egg is much faster. To quantify this, I looked at Herbie's runtime with the old e-graph and with Egg, for the same seed, on the same benchmarks (the full Herbie benchmark suite). The result is that Herbie overall is 2.81× faster. Simplification is now 4.1%, not 45%, of Herbie's runtime.
Herbie's benchmark suite is divided into 9 suites, run with two different sets of flags, which exercise different aspects of Herbie (different kinds of expressions). The speedup varies from 32% faster to 7× faster, with an IQR of 1.82–3.43×. That suggests to me that the speed-up is broad-based and will be seen by users as well.
That speed-up should be mostly due to the simplification phase being faster, but in principle there may also be follow-on effects on other phases, since Egg and the existing e-graph will produce slightly different expressions. To look into this, I measured the time saved in simplification and subtracted that from the total speed-up to measure the secondary effects. Surprisingly, the secondary effects are further speed-ups: 24.2% on average. That suggests that Egg is not only producing expressions faster, but it is also producing simpler expressions.
Breaking down by suite, I find that the secondary effect is positive
everywhere except the libraries
suite. That suggests that the
secondary effects have a broad base, and also suggests that I look
into libraries
to find out what's up. I did and found that we simply
need more exact computation rules, like for square root (sqrt(1)
is
exactly 1
). So I hope we'll be able to do that soon.
Finally, using Egg also allows us to try some things that weren't possible before. I've already tried raising the node limit, the key parameter to simplification, but there are more experiments with rule scheduling and back-off on the horizon. I am also playing with the possibility of making other components of Herbie more aggressive, increasing the amount of simplification we must do (given that simplification is now cheap).