Quadrature in Bash
A long time ago, I was taking numerical analysis class where the instructor allowed us to use any language. Mine were all over the map1 [1 Looks like I used C, Go, JavaScript, OCaml, Python, Perl, Factor, Julia, J, SmallTalk, Fortran 2003, and Bash; the selection was limited to languages that supported floating point and had a collection of mathematical functions implemented.], but my favorite was the quadrature algorithm I wrote in Bash.
Quadrature
Quadrature is what numerical analysis call integrating a function.2 [2 Integration is their word for solving a differential equation.] There are many algorithms for quadrature,3 [3 I would implement Clenshaw-Curtis quadrature if I had to pick.] but the point of this assignment was to implement Newton-Cotes quadrature. This involve approximating the integral by:
- Splitting the domain of integration into \(N\) pieces,
- Fitting an order \(n\) polynomial to each piece, and
- Summing the analytic integral of each piece.
In practice, to evaluate \(\int f\), these steps evaluate \(f\) at evenly-spaced points and sum the results multiplied by weights. For example, Simpson's rule quadrature samples, for each interval \([0, 1]\) that the domain of integration is split into, evaluates \(\frac16 f(0) + \frac23 f(\frac12) + \frac16 f(1)\),4 [4 Note that each interval shares endpoints with the next and previous intervals, so in practice the weights are ⅙, ⅔, ⅓, ⅔, ⅓, …, ⅔, ⅓, ⅔, ⅙.] and the points evaluated and the weights differ for different quadrature algorithms. So an integration algorithm naturally decomposes into a few steps:
- Generate the points to evaluate \(f\) on, and evaluate \(f\)
- Generate the weights and multiply
- Sum the results
Multiple rules in Bash
One of my goals was to support multiple Newton-Cotes rules in one program.5 [5 Because the assignment asked for results from several rules.] That is, I want to source my Bash file and run:
integrate "sin(X*X) / sqrt(X*X + 1)" -n 10001 -r trap -s recursive --rhomberg
This integrates \(\sin x^2 / \sqrt{x^2 + 1}\) using 10001 points with the trapezoid rule, using pairwise summation and rhomberg integration to improve accuracy.
The first step here is to support multiple Newton-Cotes rules. My implementation had three: rectangular (order 1), trapezoidal (order 2), and Simpson's rule (order 4). I went with an "object-oriented" approach for defining these in Bash. For example, for Simpson's rule I defined simp_points
, simp_weights
, and simp_order
, where simp_points
picked out points to evaluate \(f\) on, simp_weights
chose the weights for each point, and simp_order
was a constant for the order of the rule. If the bash variable $RULE
stored the name of the rule being used, I could use ${RULE}_points
and similar to call the appropriate function.
For the rectangular rule, which evaluates on each interval \(f(0)\), these definitions are:
rect_order=1 rect_points() { seq 0 `calc "1/$1"` 1 | head -n-1 } rect_weights() { WEIGHT=`calc 1/$1` for i in `seq 1 $1`; do echo $WEIGHT done }
The rectangular rule is first-order method, hence rect_order
. The domain of integration is fixed to \([0, 1]\) in my program, so rect_points
takes \(N\) as an argument and outputs that many points between 0 and 1. Note that, for example, seq 0 0.25 1
prints out five numbers, so I threw in a head
call to strip off the last one; a call to head
with -n-$X
prints all but the last $X
lines.
Finally, for outputting the weights, I want to output \(1/N\) for each point, corresponding to each of \(N\) intervals having a weight of \(1\) on their point. I first compute the weight \(1/N\) and store it in a variable, then print it \(N\) times. The calc
command I use to compute \(1/N\) is
calc() { echo "$@; exit" | bc -lq | head -n1 }
This little thing passes its arguments to bc
to evaluate, then reads out the answer.
The code for the trapezoidal and Simpson's rules is similar, but gets a little tricker as there are multiple weights. For example, the program that outputs weights for Simpson's rule is
N=$1 W1=`calc 1/6/$N` W2=`calc 2/6/$N` W4=`calc 4/6/$N` echo $W1 for i in `seq 2 2 $[N-2]`; do echo $W4 echo $W2 done echo $W4 echo $W1
Basic Quadrature
Now that I have various rules implemented, I can do a first cut of integration. I have to generate points and weights, evaluate the program on the points and multiply by weights, and sum.
To generate points and weights,
paste <(${RULE}_weights $N) <(${RULE}_points $N)
I'm using the "object-orientation" trick from above, the paste
command (which takes two files and concatenates the first line from each, then the second lines from each, and so on), and Bash's pipe redirection syntax: <(cmd)
as an argument creates a named pipe, pipes cmd
into it, and gives the name of that pipe as an argument. You can think of the above as
${RULE}_weights $N > /tmp/a ${RULE}_points $N > /tmp/b paste a b
But the pipe redirection trick doesn't involve generating your own names, or remembering to clean them up, and lets you use the streaming features of pipes.
Once the points and weights are generated, I have to evaluate \(f\) on the points and multiply by the weights. I'll do that by running awk
to process each line by itself. My integrator takes an expression in X
as an argument; I want to turn that into awk
code.
awk -e "{ printf \"%0.20f\\n\", (\$1 * ${EXPR//X/\$2}) }"
${EXPR//X/\$2}
replaces each instance of X
in the program by $2
, which holds the value of X
. Running this awk
program on the output of the paste
command evaluates the expression on $2
, the point, and multiplies by $1
, the weight, turning each "weight, point" line into a line with the single float answer.
Finally, these values need to be summed. The easiest way to do this is a simple awk
script:
awk -e 'BEGIN {sum = 0} { sum += $1 } END { printf "%.20f\n", sum }'
Gluing these steps together yields a very slow but pretty accurate quadrature algorithm.
Recursive summation
However, there's a pernicious source of error that I needed to eliminate. Adding floating point numbers one after another, like I did above, can be inaccurate when summing a lot of numbers. That blog post advocates Kahan summation, which is really good for being very accurate, but my assignment suggested pairwise summation, which is where to sum \(N\) numbers you add every adjacent pair (reducing the original list to \(N/2\) numbers), and then recursively do the same until you have one number left.
Doing this recursively in Bash is actually quite a pain! To start, I had to do the body of the loop: pair numbers and sum them.
pairer() { sed '$!N;s/\n/ /' } sum_pairs() { pairer | awk -e '{ printf "%0.20f\n", ($1 + $2) }' }
The sed
script pairs numbers by replacing every other newline with a space. At least, I think that's what it does. To be honest, I can no longer read and understand that line of code. The sum_pairs
function just uses an awk
script to sum each pair.
Proper pairwise summation is a recursive algorithm: if you have one number, it is your sum; otherwise, sum pairs and recurse. Doing this test in Bash is a little hard: you need to read the pipe, and recurse only if there are at least two lines. I'd wanted to get away without using any temporary files, but I didn't find a way to do that, so the code looks like
TMP=`mktemp` cat >$TMP if [ `wc -l < $TMP` -eq 1 ]; then cat $TMP rm $TMP else cat $TMP | sum_pairs | recursive_summer rm $TMP fi
I write the input lines to a temporary file and count the lines. If there's one line, I just print it; otherwise I print the file, sum pairs, and recurse.
Romberg Integration
Romberg integration improves the accuracy of a quadrature routine by evaluating the same rule with \(N\) and \(2N\) points, and extrapolating the results. If an integration rule produces answer \(s(N) = a + c N^{-k} + \dotsb\), where \(a\) is the integral's true value, then \((s(N) - 2^k s(2N)) / (1 - 2^k)\) cancels the two \(c N^{-k}\) terms leaving just terms of lower order. So, it gives a more accurate integration rule. I implement this with
VAR=${RULE}_order O=${!VAR} r1=`integrate $[N / 3] $problem $rule $summer` r2=`integrate $[2 * N / 3] $problem $rule $summer` calc "($r1 - 2^$O * $r2) / (1 - 2^$O)"
First, I dereference the variable ${RULE}_order
, then integrate with \(N/3\) and \(2N/3\) points (so that in total I evaluate the function \(N\) times), and finally do the extrapolation.
Conclusion
Bash is a terrible language to implement numerical algorithms, from its lack of floating-point arithmetic to the extreme overhead spawning processes imposes on sums or divisions. Nonetheless, where there's a will, there's a way.
Footnotes:
Looks like I used C, Go, JavaScript, OCaml, Python, Perl, Factor, Julia, J, SmallTalk, Fortran 2003, and Bash; the selection was limited to languages that supported floating point and had a collection of mathematical functions implemented.
Integration is their word for solving a differential equation.
I would implement Clenshaw-Curtis quadrature if I had to pick.
Note that each interval shares endpoints with the next and previous intervals, so in practice the weights are ⅙, ⅔, ⅓, ⅔, ⅓, …, ⅔, ⅓, ⅔, ⅙.
Because the assignment asked for results from several rules.