Skip to content

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

@aiChaoSONG

Description

@aiChaoSONG

Add set -e to all of our test case for better error handling.

EDIT @marc-hb : shellcheck can catch a good number of set -e incompatibilities: do not try to add set -e to a test before it's shellcheck-clean (#729)

@marc-hb 's very long description added here:

Rationale

The family of Unix shell language(s) is part of the few languages that don't support exceptions and silently ignore errors by default. Both C and the shell were designed at a time when RAM was measured in kilobytes. This is a huge drawback for test code which is of course meant to find and report errors. There is unfortunately no more convenient language than the shell to deal with processes, files and pipes. It is also very productive to use the exact same language than for interactive use.

There is fortunately one underrated and standard feature of the shell that mitigates this error handling design issue: set -e which is a shortcut for set -o errexit. errexit stops a script immediately when a command fails. The aborted script exits (as usual) with the exit status of the last command that was run.

errexit is far from perfect, for instance it does not catch all errors. However it catches many errors so it should be used in all new scripts. It should be added to existing scripts but only after extensive testing because using set -e requires some minor coding changes described below.

Why stop immediately when something fails?

https://en.wikipedia.org/wiki/Fail-fast

  1. This is simply what users expect. When a test fail, people don't look at every single line of a test log. They go straight to the end of the log because they is where they expect the relevant and useful error message(s) to be. Some test logs can be very long and it would be ridiculous to expect users to scan every line for error message(s).

  2. Unexpected errors often cause a cascade of more unexpected errors. After some unexpected error then all bets are off, no one has any idea what the following code will do. It will typically add more and more errors at random points in the future until some error gets finally caught manually. This means
    a) The last, only error message users look at is not relevant and very confusing, see 1. above.
    b) The behaviour of the rest of the code is now undefined and could lead to much more serious issues. Here's a simplified example.

some_directory=$(failing_command)  
# without set -e the entire $basedir gets deleted!
rm -rf $basedir/$some_directory 

Fun fact, unexpected errors in userspace C code are more likely to "fail faster" than in shell scripts because memory corruption often gets caught by the operating system.

An unexpected test failure is a stressful situation because it's an unexpected delay. Spending even more time to understand test logs and debug test code instead of debug the actual product code adds a lot more tension and delays. As opposed to some product code, test code should never try to recover from unexpected failures and perform some work anyway because this either hides bugs and/or generates complicated and time-consuming cascades of failures even harder to debug.

Why not just check the exit code of every command like in C?

Because it makes the code extremely verbose.

Because life is always too short so everyone who promised to check every exit code of every command never delivered on their promise (this failed promise tends to happens in C too). Instead they only check the exit code of the commands they "expect" to fail. But the line between expected and unexpected failures is very subjective; it depends on the developer.

Should I still manually check the exit code of some commands (which ones) even when using set -e?

You should simply check commands that return an error but don't provide a good error message. No need to do anything for commands that print good error messages. For instance some commands will just print "file not found" without printing the filename. In that case you must manually check the exit code exactly like when not using errexit. Example:

cmd1_with_bad_error_messages "$somefile" ||
    die 'Failed to run cmd1 on file %s\n' "$somefile"

This will print:

File not found               # from cmd1
Failed to run cmd1 on file some_file  # from you

Why do I need "some_command || true" and other tricks?

There are some commmands that return a non-zero code but it's not an error or not always an error. This is why adding set -e to existing scripts requires a fair amount of testing, almost as much testing as when they were written the first time.
grep is a typical example, this does NOT work, the script stops silently:

set -e

found_maybe=$( some_command | grep something) # aborts when not found
if [ -n "$something"] ; then ...  # does not run when nothing found

OK:

set -e

found_maybe=$( some_command | grep something || true)
if [ -n "$found_maybe"] ; then ...

|| true works because set -e is obviously disabled when checking the exit code manually.
Also OK:

set -e

if [ -n "$( some_command | grep something)" ] ; then ...

The return value of expr can sometimes be non-zero for surprising reasons. Simple fix: don't use expr at all, it's antiquated. Use $(( )) instead. See https://github.com/koalaman/shellcheck/wiki/SC2003

How do I use $?

Most of the time you shouldn't use $?, see why https://github.com/koalaman/shellcheck/wiki/SC2181

But sometimes you really need $?, for instance because you want to treat different error codes differently. Solution:

ret=0
some_command || ret=$?
case $ret in ...

Prefer || over && when possible

https://www.shellcheck.net/wiki/SC2015

set -e is compatible with && and it is possible to use both in the same script, however the interaction between && and set -e can be quite tricky, see errexit_demo() demo below. Prefer || and if then else over && when possible.

Especially avoid negations like: test not good_thing && handle_bad, always replace this with the simpler, "assert-style": test good_thing || handle_bad.

You especially want to avoid && on the last line of the script for a subtle reason explained at http://redsymbol.net/articles/unofficial-bash-strict-mode/#short-circuiting

errexit_demo () 
{ 
    for x in '||' '&&';
    do
        for i in true false;
        do
            for j in true false;
            do
                tst="$i $x $j";
                printf "${tst} : ";
                bash -e -c "$tst; printf 'no_errexit, '";
                bash -c "$tst; "'printf "\texit=$?\n"'
            done;
        done;
    done;
}

errexit_demo

true || true : no_errexit, 	exit=0
true || false : no_errexit, 	exit=0
false || true : no_errexit, 	exit=0
false || false : 	exit=1
true && true : no_errexit, 	exit=0
true && false : 	exit=1
false && true : no_errexit, 	exit=1  # no match
false && false : no_errexit, 	exit=1  # no match

|| and && trigger set -e (errexit) when and only when the second command runs and fails. In other words, set -e is ignored by the first command only. This is usually what you want but this behavoir does NOT match $? when using &&! On the other hand, set -e does match $? when using ||

Another common misconception: A && B || C is a (barely) shorter if; then; else. It is not. It is usually equivalent to if;then; else but not always! Compare:

if true; then grep notfound /etc/passwd ; else 'ELSE is not run'; fi
true && grep notfound /etc/passwd || echo 'ELSE is also run!!'

Even if you understand the subtle difference between A && B || C and if; then; else, someone else will not and they will re-use the approach. Avoid it.

Must separate local/export and $( )

# set -e does NOT catch this failure!
local foo=$(failed_command) 
# FIXED: set -e does catch this failure
local foo; foo=$(failed_command)

https://github.com/koalaman/shellcheck/wiki/SC2155

More

http://redsymbol.net/articles/unofficial-bash-strict-mode/#issues-and-solutions

Warning: this page documents an even stricter way to use bash, so some "solutions" are unrelated to set -e.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions