Why are Shell Scripts Bad?
Shell scripts are bug prone, unmaintainable, and inscrutible. But what in particular makes them bad? This is a good question without answers. I have four independent theories.1 [1 Naturally all probably contribute, but assigning blame, or resolving different kinds of errors to different causes, would be valuable. "All of the above" is usually true but without more details also usually useless.]
- Shell is a bad programming language
- Data structures beyond text are necessary
- Processes are a baroque, complex replacement for function calls
- All shell scripts use the global, mutable file-system
Reading the list, you likely nodded your head to each of them. But
these theories have very different implications! To fix the first
requires only switching shells: should we all use Fish?2 [2 I used to,
and don't any more, and did not see any benefits.] To fix the second,
it's enough to use a real programming language like Python.3 [3 When I
replace shell scripts with Python I usually decrease the bug rate, but
it remains substantially higher than "normal" Python code.] The third
suggests that Python programs that use real modules instead of
subprocess
should be enough. The last suggests that private file
system namespaces, in the Plan 9 sense, could make shell scripting
sane again.
Another place to look for answers is analogs where one of the above
theories might apply. JavaScript is a (comparatively) good programming
language, with real data structures, with functions instead of
processes, but where everything has access to a global, mutable tree
structure. Bugs and unexpected interactions are common, and in fact
modern JS design involves virtualizing the entire DOM to enforce
isolation. In TCL the main data structure is a string but it doesn't
have Shell's reputation. Shell uses processes instead of function
calls because the system API is exposed to C, a difficult language;
but in Emacs those APIs are expressed directly to the usual scripting
language. Emacs-Lisp is pretty bug-prone to me, and the modern style
involves a lot of private buffers and save-excursion
macros.
So examining this evidence… I guess I lean toward theory 4? In which case, did iOS get it right by not exposing a common file system? Should shell scripts make a lot more use of Linux's private namespaces?
Bonus question: Is this why build systems are bad? Or is that something else?
Footnotes:
Naturally all probably contribute, but assigning blame, or resolving different kinds of errors to different causes, would be valuable. "All of the above" is usually true but without more details also usually useless.
I used to, and don't any more, and did not see any benefits.
When I replace shell scripts with Python I usually decrease the bug rate, but it remains substantially higher than "normal" Python code.