Local Error and Blame Analysis
Cindy Rubio Gonzáles recently visited UW PLSE and gave a fantastic lecture on her work—both on Precimonious and on her ICSE'16 paper on blame analysis. I spent some time this weekend reading that paper carefully and thinking about how it relates to my floating-point work. I think blame analysis is related to a key technique that appears in both Herbie and Herbgrind: local error.
Local error
To identify the operations in a floating-point computation most responsible for error, local error looks for operations that produce inaccurate output even when given accurate inputs. The idea is that an operation isn't responsible for garbage-in garbage-out behavior, but it is responsible for the accuracy of its outputs given accurate inputs. Local error finds these operations by running each computation in high precision.1 [1 Using MPFR, so something like 1000-bit floating-point.] Then for each operation it has high-precision inputs and high-precision outputs. It now computes two possible low-precision outputs:2 [2 Low-precision meaning some sort of hardware floating-point.]
- You can get a best-case output by rounding the high-precision output to low precision
- You can get a realistic output by rounding the high-precision inputs to low precision and then applying the operation in lower precision.
The difference between these two possible low-precision outputs is the local error of that operation. A cryptic formula that describes this process is:
\[ \mathcal{E}(\mathsf{rnd}(f(x)), f(\mathsf{rnd}(x))) \]
Here, \(f\) is some operation, perhaps the square root (since I've written it with one argument), and \(x\) is the argument, in high-precision. The function \(\mathsf{rnd}\) rounds high-precision values to low-precision values. I've written \(f\) the same way when it operates on high-precision and low-precision function, but it is a of course a different function (just from the types). I've also left \(\mathcal{E}\), the error measurement function, unspecified. My projects tend to use bits of error, but it's not essential to the idea of local error.
Let me show an example: consider the program sqrt(x*x)
on the input 1.0e+300
. The accurate values are \(10^{300}\) for x
, \(10^{600}\) for x*x
, and \(10^{300}\) for sqrt(x*x)
. Now let's look at the local error of both the multiplication and the square root.
For the multiplication, if we round the accurate input \(10^{300}\) to 1.0e+300
and then multiply that by itself, you get +inf
(in floating-point). The accurate output is \(10^{600}\), which rounds to +inf
in floating point. So there's no local error at all (despite the overflow).
Now for the square root. If we round the accurate input \(10^{600}\) to +inf
and then take the square root, you get +inf
. The accurate output is \(10^{300}\), which rounds to 1.0e+300
. The gap between 1.0e+300
and +inf
, however you measure it, is the local error of that square root operation.
You can see that local error can occur despite the fact that the primitive floating-point operations are as accurate as can be—the error really sneaks in during the rounding of inputs.
The nice part of local error is that if a portion of a computation has zero local error, then you can replace low-precision operators with high-precision operators in that portion of the computation. High-precision operators satisfy algebraic and analytic laws, so identifying portions without local error is useful for reasoning, especially if you plan to use algebraic and analytic laws to rewrite expressions, which of course is what my work focuses on.
Now that I've described local error, let me try to do the more difficult task of summarizing someone else's idea: blame analysis.
Blame analysis
Blame analysis was introduced by Cindy Rubio Gonzáles at ICSE’16, and in that paper she focuses on the task of determining which variables in a program can be stored in single precision without impacting accuracy. This fits well with her work on precision tuning: finding mixed-precision implementations of programs to make those programs fast yet accurate. (In the ICSE’16 paper, she finds speed-ups of up to 40% on large, well-known scientific programs!)
Blame analysis identifies variables that can be stored in single precision with a kind of backwards-flowing analysis. Blame analysis starts with a precision requirement on the program’s output. Then, it simulates the last operation in the program under various precisions of its inputs and determines what combinations of input precisions lead to acceptable output precision. In other words, it attempts to invert each operation’s error behavior. In the ICSE’16 paper, it does this inversion by brute force attempting each possible input accuracy from a set of six: exact single-precision, exact double-precision, and double-precision, but accurate only up to \(10^{-4}\), \(10^{-6}\), \(10^{-8}\), or \(10^{-10}\).3 [3 The discrete set of precisions makes brute force search possible. If a better search algorithm were invented, maybe the precisions could be continuous.]
For example, suppose you are adding \(10^6\) and \(\pi\) and you need the answer accurate to \(10^{-10}\). Well, then you need to determine an accuracy for \(10^6\) and \(\pi\) that give you an answer of \(1\thinspace000\thinspace003.1415\). Clearly \(10^6\) must be accurate to a level of \(10^{-10}\), otherwise some of those digits would be changed. But \(\pi\) need only be accurate to a level of \(10^{-6}\), because any other digits of \(\pi\) are not important. So we see that the output accuracy requirement of \(10^{-10}\) is an input accuracy requirement of \(10^{-10}\) for one input and \(10^{-6}\) for the other. (And single-precision floating-point can satisfy an accuracy requirement of \(10^{-6}\)!)
A quick note: I've elided a few details here. First, the results can depend on the details of how rounding works. For example, \(10^6\), rounded to an accuracy of \(10^{-4}\), might be \(10^6\), or it could be \(10^6\) with some junk bits. It can depend a lot on how you simulate lower precision (do you truncate the decimal representation? the binary? do you add a small random value?). Also, there may be multiple possible incomparable accuracy assignments for operation inputs that lead to an operation output. This seems theoretically possible, but I couldn't come up with any examples, so it's probably practically irrelevant.4 [4 It is probably provably impossible for sufficiently nice operations and sufficiently well-behaved rounding operators, maybe for almost all instead of all inputs.]
By inverting the error behavior of each operation in the program, blame analysis produces a precision for each variable. Then it's easy to check if a variable's precision requirement is satisfied by single-precision floating-point.
I hope I understood blame analysis correctly—let me know if I didn't—but now I want to try to connect blame analysis and local error.
Connecting the two
Though both blame analysis and local error are about localizing floating-point error to particular points in a computation, I initially had trouble making the relationship clear. Blame analysis involves multiple variable precision assignments, while local error a single fixed precision assignment (for low precision).
To bridge this gap, I need to make changes to my description of local error. Local error has a single fixed notion of low precision. First, instead of a fixed notion, think of this as a parameter. So local error is not a single measure of error anymore, but one parameterized by a precision setting. Second, instead of rounding the high-precision output to low precision and comparing the two low-precision outputs, upgrade the low-precision output to high precision and do the error comparison there. This way we think of the error comparison as being done on two high-precision numbers using an error measure that does not depend on the chosen low precision.
Now I must make some changes to the description of blame analysis. As described, blame analysis requires the answer to be exactly correct up to some precision. Let's rephrase that as a bound on error, computed over high-precision inputs. Blame analysis as described uses double precision as its high precision ground truth, but imagine using true high precision instead. The goal of blame analysis is now to find a precision setting such that rounding high precision inputs to that precision meets an error bound on the output.
In other words, after these changes, which preserve the spirit though not the details of local error and blame analysis, we can say that blame analysis is the inverse of local error: if local error is a function from precision setting to error measure, blame analysis sets a fixed bound on the error measure and searches for a precision parameter to meet that bound.
I think this exercise in relating blame analysis and local error is valuable. It connects two different concepts in floating-point research, which makes the field smaller and more comprehensible. It generalizes both local error and blame analysis, which may help with future work on both Herbie and Precimonious. And it also suggests ways Herbie could be modified to understand mixed precision programs, or even to change the precision of operators in a program; and in the same way, suggests ways Precimonious could move to integrate higher-precision ground truth and algebraic rewrites.
Footnotes:
Using MPFR, so something like 1000-bit floating-point.
Low-precision meaning some sort of hardware floating-point.
The discrete set of precisions makes brute force search possible. If a better search algorithm were invented, maybe the precisions could be continuous.
It is probably provably impossible for sufficiently nice operations and sufficiently well-behaved rounding operators, maybe for almost all instead of all inputs.