User Trust in Herbie

For NSV 2020, Zach and I were invited to give a keynote talk about numerica accuracy, and we set out to write down some lessons learned from our five years of work on Herbie. One theme of our talk, Toward Numerical Assistants, was on user trust: how we can help users trust Herbie. A recent interaction on the Herbie mailing list underscored this issue.

The Problem

About a month ago, Steve Hoelzer wrote in with a fun Herbie bug. You can see the bug for yourself, but in short Herbie is proposing a strange rewrite:

\[ \sqrt{x^2 + y^2 + z^2} \leadsto \mathsf{hypot}(x, z) \]

Why does Herbie drop the \(y\) variable?

Steve, very reasonably, pointed out that this was a terrible rewrite, terrible enough that it must be a bug. And he wasn't the only reporter; a variant came up in a Github issue and a third was privately reported by Ganesh. Since this came up several times, and we'll definitely be fixing things and better communicating what happened. But the reason this happened is interesting.

Preprocessing

First, note the "Preprocess" block at the top at the top of Herbie's output. That's actually supposed to be part of the output; that is, Herbie actually rewrote the original \(\sqrt{x^2 + y^2 + z^2}\) into the following program:

[x, y, z] = sort([x, y, z])
hypot(z, x)

So, for example, the input (0, 1000, 0) actually yields the correct answer with Herbie's intended solution: (0, 1000, 0) sorts to (0, 0, 1000), at which point dropping \(y\) is correct. And this fact makes Herbie's rewritten version not totally crazy: after sorting \(x\), \(y\), and \(z\), the \(y\) variable is guaranteed to have the smallest magnitude, so dropping it will affect the output least.

Sorting variables like this is a recent Herbie feature, and it's good that it fired on this expression. But because the way we presented its output wasn't intuitive, it ended up being worse than nothing, actively misleading users! This is one way that user trust is difficult. In any case, here we want to overhaul how we present "preprocessing" steps like sorting to users. I have ongoing work to strengthen and improve these preprocessing steps, so improving presentation will continue to pay dividents.

Input Ranges

Moving on, Herbie's output version does poorly on an input like (1, 1, 1), where the correct output is \(\sqrt{3}\) but Herbie's rewritten version produces \(\sqrt{2}\) instead. This has to do with the input range assumed by Herbie. When you call Herbie without specifying a precondition, what it assumes is that each variable ranges from -10³⁰⁸ to 10³⁰⁸, with values as small as 10^-308. In other words, Herbie assumes any double-precision value is an equally likely input.

This means that in the vast majority of \((x, y, z)\) points the \(x\), \(y\), and \(z\) values differ dramatically in magnitude. So there's little harm from dropping a variable, as long as you doesn't drop the biggest one—and after sorting, \(y\) is never the biggest one. That's why Herbie thinks the result is very good (0.5 bits error on average) when in fact it is pretty bad. If you give Herbie some input ranges for the variables, like perhaps \(0.001 < x < 1000\) and similarly for \(y\) and \(z\), it avoids the bad rewrite.

This error metric is a long-standing feature of Herbie, and there's a lot to recommend it. But this default isn't obvious to users, so again Herbie's output metrics mislead, and in any case the "average error" metric Herbie uses is often unintuitive. So one thing we're planning for the next Herbie version is more metrics to help users understand what points got more accurate, and which ones got less accurate, with Herbie's rewrite. And come to think of it—perhaps we should nudge users more strongly to give us preconditions. This is another thing to think about.

Why now?

Now, average error is nothing new, but we got several bug reports about this hypot issue in the span of a few weeks. That's kind of surprising on its own—and shows how innocuous improvements can reveal long-standing flaws.

For example, Taylor expansion has recently been rewritten, and one consequence is that it can now generate zero-order approximations. It generates a zero-order approximation in \(y\), which is how it gets \(\mathsf{hypot}(x, z)\). It is unlikely for Herbie's other passes to drop variables, and in previous versions Herbie would only generate order-3 Taylor series, so dropping variables didn't used to happen, but now it does. Nothing in Herbie's internals really guaranteed it—so the presentation and trust issues above never came up!

Likewise, while the preprocessing steps are new, the front-end styling they use is borrowed from preconditions. Users probably ignored the precondition blocks too—but that might not have been a problem, since the precondition came from the user to begin with. Once we gave the same styling to generated code, though, problems appeared.

Anyway, now that we know how old and creaky some of this code—like the frontend layout and styling—is, we can rewrite and improve it for the next release

Conclusion

Herbie is now about seven years old, and while it has spawned some new research papers most of my work on Herbie is improving the design, simplifying the internals, and generalizing internal components. That's because my interactions with users always reveal that Herbie's already far from living up to its own implemented potential: many of Herbie's outputs and ideas are already ignored because they are poorly presented, misunderstood, or are hard for users to implement. That means the best way to make Herbie better is working on those things, not new core algorithms.

By Pavel Panchekha

22 September 2021