Affine Arithmetic and Error Taylor Series
I've been thinking recently about the differences between affine arithmetic and error Taylor series. In both, an expression is represented as 1) the answer, plus 2) a number of error terms. But in my experience, tools based on affine arithmetic are less precise than those based on error Taylor series. What gives? And how are the two analyses related?
In affine arithmetic, the true answer is modeled as the computed answer plus a series of errors, each of which are a symbolic epsilon times a real coefficient. In error Taylor series, the computed answer is modeled as the true answer plus a series of errors, each of which are a symbolic epsilon times an expression coefficient.
Besides the different perspectives on modeling the true or computed answer (which really just switches the sign of the error terms), the main difference so far looks like using real or symbolic coefficients. However, I think this difference is pretty superficial. The most prominent uses of error Taylor series use global nonlinear optimization to search for the highest-error points, but in principle error Taylor series are perfectly usable for a specific input value the way that affine arithmetic is typically used. And since error Taylor series are just computing a derivative, there's nothing stopping you from implementing them with automated differentiation, so that you don't need to carry around symbolic expressions.
In affine arithmetic, linear operations are easy but non-linear
operations like multiplication or sin
are complicated. For example, if
you know that x = 1 ± .01₁
, to compute sin(x)
you need to make a
series of steps. First, you approximate sin(x)
with a linear function
around the constant term in x
; for sin(x)
this is roughly .84 + .54
(x - 1)
. Applying this approximation gets us sin(x) = .84 ± .0054₁
.
But then, because our approximation wasn't quite correct, we also need
to account for that error by considering the maximum value of sin(x) -
(.84 + .54 (x - 1))
over the interval [.99, 1.01]
, which is about
.00147. So we add that as another source of error, and our final
answer is sin(x) = .84 ± .0054₁ ± .00147₂
.
Something similar happens in error Taylor series. As before, the constant term is the computation on the central point, and the error term coefficients are computed via derivatives, which is another way of saying they are based on linear approximations. Plus, error Taylor series also account for second-order error, so there's something similar to that last "new error term" step.
Now, one difference between error Taylor series and affine arithmetic is precisely in this handling of the second-order error. In short, an error Taylor series models the second-order error via the second derivative, which can over-approximate compared to exactly computing the maximum as affine arithmetic does. On the other hand, error Taylor series compute the second-order error at the end of the computation, which allows some cancellation of errors in opposite directions, while affine arithmetic does it per-operation, which is more pessimistic. So there are differences in both directions, but I don't think they're super important, because generally speaking second-order error is just not very important, so much so that many applications of error Taylor series and affine arithmetic just ignore it.
So if error Taylor series and affine arithmetic are so similar, why do they get different results? I think the key question is about how you generalize either technique to multiple inputs. Typically each input to a computation will have a range of possible values. Affine arithmetic tools typically model this as an interval and propogate it through the computation via interval arithmetic. On the other hand, error Taylor series treat this symbolically and eventually use a stronger technique like global non-linear optimization. I suspect that this is the major difference: global non-linear optimization is just a much more powerful technique. In many cases, affine arithmetic tools use interval subdivision (breaking the input interval into many subintervals, and computing the worst error across them). If this subdivision were done via a branch-and-bound search, I expect they would basically get the same results global optimization tools like Gelpia.