Floating Point Error in Polynomials
I recently had an interesting conversation with Xin Yi, of NUDT, about accurately evaluating the polynomial \(P(x) = 5x^2 - 3\) near its root \(x = \sqrt{3/5}\). He wanted to know: what's the most accurate way to evaluate this polynomial, and why doesn't Herbie find it?
Error in evaluating polynomials
Evaluating a polynomial P(x) at one of its roots is bound to be inaccurate. If you pick a floating point input x
, think of it as representing a range (x-ε, x+ε). and very close to a zero, P(x-ε) and P(x+ε) are bound to be pretty far apart. For example, let x
be as close to the root as possible. Then likely one of P(x-ε) and P(x+ε) will be positive and one will be negative.
Let's see how much relative error that causes. x
is between ½ and 1 in this case, so ε is on the order of \(-10^{-16}\). Since \(P\) is linear near its root, P(x+ε) and P(x-ε) will also have magnitude around \(10^{-16}\), but they'll have different signs,1 [1 Or, one will be zero and one will not—similar reasoning applies.] and there are a lot of floating point values between \(10^{-16}\) and \(-10^{-16}\) (roughly half of floating point values, or 63 bits of error).2 [2 If one is zero, it'll be 62 bits of error.]
Of course that reasoning only applies for the particular input closest to the root of P. But for nearby inputs, P(x+ε) and P(x - ε) will differ by something like \(10^{-15}\), and the value will be the same order of magnitude, so you'd still be looking at 52 or so bits of error. If you pick a range of values differing from the root by, say, 1e-5 values and roughly centered on the root, the distance from the root goes from 1e-16 to 1e-6, and \([ P(x + \varepsilon) - P(x - \varepsilon) ] / \varepsilon\) ranges from 1 (52 bits of error) and 1e-10 (19 bits of error). Average those together and you get roughly 20 bits of error, which is roughly what Herbie measures.3 [3 I'm off by one in powers of 2 in a bunch of places because the root is between ½ and 1, not 1 and 2, and I should be using powers of 2, not powers of 10, to estimate the size of the error. That might improve the match between theory and Herbie.]
Evaluating polynomials more accurately
Sometimes you can improve polynomial accuracy by rewriting, such as by factoring. This can help, because if you are evaluating \(P(x) = (x - r) Q(x)\) near \(x = r\), then the evaluation of \(x - r\) in floating point is exact thanks to Sterbenz's theorem, and \(Q(x)\) will not have a root at \(r\). This guarantees that small relative changes in the input \(x\) only cause small relative changes in the output \(Q(x)\), so its condition number will be low and the overall result should be accurate.
However, In this case, Herbie thinks that the factored version of this polynomial, \(5 (x - \sqrt{3/5}) (x + \sqrt{3/5})\), or variants, don't seem to change the accuracy much. I think that's because \(\sqrt{3/5}\) is not exactly representable in float, so while the subtraction happens exactly, you are subtracting the wrong thing! You can improve this a bit with something like this: \(5 ((x - r_1) - r_2) (x + \sqrt{3/5})\), where \(r_1\) is the floating-point \(\sqrt{3/5}\) and where \(r_2\) is the approximation error there, which I calculate to be \(-2.7242061734927363 \cdot 10^{-17}\). Note that I only did this trick to one root, the one we're evaluating near, because \(x\) is not going to be near any roots of the other factor, so it will be evaluated more or less accurately.
However, this is still, I think, a bad approach. If you use the input \(\sqrt{3/5} \approx 0.7745966692414834\), the approach where you subtract more and more approximation errors will get you closer and closer to the value of \(P(0.7745966692414834)\), but if you actually wanted to get \(P(\sqrt{3/5})\), it will not help and may well hurt.
This is the logic of the condition number: small changes in the input lead to large relative changes in the output. The condition number is independent of implementation, and for \(P\) near the root, it is large. Any implementation of \(P\) will thus have the problem that floating-point inputs are inaccurately specified, and for the range of values you are looking at even small inaccuracies in inputs produce large inaccuracies in output.
Why Herbie does not help
Herbie in particular can't usually do much to improve the polynomial computation.4 [4 On my machine, it does something weird, which seems to help. I don't understand it.] That's because we never taught Herbie any polynomial smarts, besides basic ones like factoring \(x^2 - y^2\). Herbie thus works poorly on this example.
Nonetheless, Herbie works well on the NMSE examples and a variety of other real-world problems. I think the kinds of inaccuracy in the NMSE examples, like cancellation and overflow, are different than the problems here. Herbie is not good at the sort of finicky analysis described above (it is a dumb search algorithm, after all).
Furthermore, since Herbie uses simplistic random sampling, it would not even find the inaccuracy in this polynomial unless given a narrow range of inputs near the root of \(P\).
And as I described above, the ill-conditioned nature of the problem also suggests that there will not be useful rewrites of this program anyway.
Footnotes:
Or, one will be zero and one will not—similar reasoning applies.
If one is zero, it'll be 62 bits of error.
I'm off by one in powers of 2 in a bunch of places because the root is between ½ and 1, not 1 and 2, and I should be using powers of 2, not powers of 10, to estimate the size of the error. That might improve the match between theory and Herbie.
On my machine, it does something weird, which seems to help. I don't understand it.