-
Notifications
You must be signed in to change notification settings - Fork 59
Description
Add set -e to all of our test case for better error handling.
EDIT @marc-hb : shellcheck can catch a good number of set -e incompatibilities: do not try to add set -e to a test before it's shellcheck-clean (#729)
@marc-hb 's very long description added here:
Rationale
The family of Unix shell language(s) is part of the few languages that don't support exceptions and silently ignore errors by default. Both C and the shell were designed at a time when RAM was measured in kilobytes. This is a huge drawback for test code which is of course meant to find and report errors. There is unfortunately no more convenient language than the shell to deal with processes, files and pipes. It is also very productive to use the exact same language than for interactive use.
There is fortunately one underrated and standard feature of the shell that mitigates this error handling design issue: set -e which is a shortcut for set -o errexit. errexit stops a script immediately when a command fails. The aborted script exits (as usual) with the exit status of the last command that was run.
errexit is far from perfect, for instance it does not catch all errors. However it catches many errors so it should be used in all new scripts. It should be added to existing scripts but only after extensive testing because using set -e requires some minor coding changes described below.
Why stop immediately when something fails?
https://en.wikipedia.org/wiki/Fail-fast
-
This is simply what users expect. When a test fail, people don't look at every single line of a test log. They go straight to the end of the log because they is where they expect the relevant and useful error message(s) to be. Some test logs can be very long and it would be ridiculous to expect users to scan every line for error message(s).
-
Unexpected errors often cause a cascade of more unexpected errors. After some unexpected error then all bets are off, no one has any idea what the following code will do. It will typically add more and more errors at random points in the future until some error gets finally caught manually. This means
a) The last, only error message users look at is not relevant and very confusing, see 1. above.
b) The behaviour of the rest of the code is now undefined and could lead to much more serious issues. Here's a simplified example.
some_directory=$(failing_command)
# without set -e the entire $basedir gets deleted!
rm -rf $basedir/$some_directory Fun fact, unexpected errors in userspace C code are more likely to "fail faster" than in shell scripts because memory corruption often gets caught by the operating system.
An unexpected test failure is a stressful situation because it's an unexpected delay. Spending even more time to understand test logs and debug test code instead of debug the actual product code adds a lot more tension and delays. As opposed to some product code, test code should never try to recover from unexpected failures and perform some work anyway because this either hides bugs and/or generates complicated and time-consuming cascades of failures even harder to debug.
Why not just check the exit code of every command like in C?
Because it makes the code extremely verbose.
Because life is always too short so everyone who promised to check every exit code of every command never delivered on their promise (this failed promise tends to happens in C too). Instead they only check the exit code of the commands they "expect" to fail. But the line between expected and unexpected failures is very subjective; it depends on the developer.
Should I still manually check the exit code of some commands (which ones) even when using set -e?
You should simply check commands that return an error but don't provide a good error message. No need to do anything for commands that print good error messages. For instance some commands will just print "file not found" without printing the filename. In that case you must manually check the exit code exactly like when not using errexit. Example:
cmd1_with_bad_error_messages "$somefile" ||
die 'Failed to run cmd1 on file %s\n' "$somefile"This will print:
File not found # from cmd1
Failed to run cmd1 on file some_file # from you
Why do I need "some_command || true" and other tricks?
There are some commmands that return a non-zero code but it's not an error or not always an error. This is why adding set -e to existing scripts requires a fair amount of testing, almost as much testing as when they were written the first time.
grep is a typical example, this does NOT work, the script stops silently:
set -e
found_maybe=$( some_command | grep something) # aborts when not found
if [ -n "$something"] ; then ... # does not run when nothing foundOK:
set -e
found_maybe=$( some_command | grep something || true)
if [ -n "$found_maybe"] ; then ...|| true works because set -e is obviously disabled when checking the exit code manually.
Also OK:
set -e
if [ -n "$( some_command | grep something)" ] ; then ...
The return value of expr can sometimes be non-zero for surprising reasons. Simple fix: don't use expr at all, it's antiquated. Use $(( )) instead. See https://github.com/koalaman/shellcheck/wiki/SC2003
How do I use $?
Most of the time you shouldn't use $?, see why https://github.com/koalaman/shellcheck/wiki/SC2181
But sometimes you really need $?, for instance because you want to treat different error codes differently. Solution:
ret=0
some_command || ret=$?
case $ret in ...Prefer || over && when possible
https://www.shellcheck.net/wiki/SC2015
set -e is compatible with && and it is possible to use both in the same script, however the interaction between && and set -e can be quite tricky, see errexit_demo() demo below. Prefer || and if then else over && when possible.
Especially avoid negations like: test not good_thing && handle_bad, always replace this with the simpler, "assert-style": test good_thing || handle_bad.
You especially want to avoid && on the last line of the script for a subtle reason explained at http://redsymbol.net/articles/unofficial-bash-strict-mode/#short-circuiting
errexit_demo ()
{
for x in '||' '&&';
do
for i in true false;
do
for j in true false;
do
tst="$i $x $j";
printf "${tst} : ";
bash -e -c "$tst; printf 'no_errexit, '";
bash -c "$tst; "'printf "\texit=$?\n"'
done;
done;
done;
}
errexit_demo
true || true : no_errexit, exit=0
true || false : no_errexit, exit=0
false || true : no_errexit, exit=0
false || false : exit=1
true && true : no_errexit, exit=0
true && false : exit=1
false && true : no_errexit, exit=1 # no match
false && false : no_errexit, exit=1 # no match
|| and && trigger set -e (errexit) when and only when the second command runs and fails. In other words, set -e is ignored by the first command only. This is usually what you want but this behavoir does NOT match $? when using &&! On the other hand, set -e does match $? when using ||
Another common misconception: A && B || C is a (barely) shorter if; then; else. It is not. It is usually equivalent to if;then; else but not always! Compare:
if true; then grep notfound /etc/passwd ; else 'ELSE is not run'; fi
true && grep notfound /etc/passwd || echo 'ELSE is also run!!'Even if you understand the subtle difference between A && B || C and if; then; else, someone else will not and they will re-use the approach. Avoid it.
Must separate local/export and $( )
# set -e does NOT catch this failure!
local foo=$(failed_command)
# FIXED: set -e does catch this failure
local foo; foo=$(failed_command)https://github.com/koalaman/shellcheck/wiki/SC2155
More
http://redsymbol.net/articles/unofficial-bash-strict-mode/#issues-and-solutions
Warning: this page documents an even stricter way to use bash, so some "solutions" are unrelated to set -e.