[sof-test] add set -e to all test cases (Unless You Love Debugging)

Add set -e to all of our test case for better error handling.

EDIT @marc-hb : shellcheck can catch a good number of set -e incompatibilities: do not try to add `set -e` to a test before it's shellcheck-clean (#729)

@marc-hb 's very long description added here:

Rationale
---------
The family of Unix shell language(s) is part of the few languages that don't support exceptions and silently ignore errors by default. Both C and the shell were designed at a time when RAM was measured in kilobytes. This is a huge drawback for test code which is of course meant to find and report errors. There is unfortunately no more convenient language than the shell to deal with processes, files and pipes. It is also very productive to use the exact same language than for interactive use.

There is fortunately one underrated and standard feature of the shell that mitigates this error handling design issue: `set -e` which is a shortcut for `set -o errexit`. errexit stops a script immediately when a command fails. The aborted script exits (as usual) with the exit status of the last command that was run.

errexit is far from perfect, for instance it does not catch all errors. However it catches many errors so it should be used in all new scripts. It should be added to existing scripts but only after extensive testing because using `set -e` requires some minor coding changes described below.


Why stop immediately when something fails?
------------------------------------------

https://en.wikipedia.org/wiki/Fail-fast

1. This is simply what users expect. When a test fail, people don't look at every single line of a test log. They go straight to the end of the log because they is where they expect the relevant and useful error message(s) to be. Some test logs can be very long and it would be ridiculous to expect users to scan every line for error message(s). 

2. Unexpected errors often cause a cascade of more unexpected errors. After some unexpected error then all bets are off, no one has any idea what the following code will do. It will typically add more and more  errors at random points in the future until some error gets finally caught manually. This means
a) The last, only error message users look at is not relevant and very confusing, see 1. above.
b) The behaviour of the rest of the code is now undefined and could lead to much more serious issues. Here's a simplified example. 
```shell
some_directory=$(failing_command)  
# without set -e the entire $basedir gets deleted!
rm -rf $basedir/$some_directory 
```
Fun fact, unexpected errors in userspace C code are more likely to "fail faster" than in shell scripts because memory corruption often gets caught by the operating system.

An unexpected test failure is a stressful situation because it's an unexpected delay. Spending even more time to understand test logs and debug test code instead of debug the actual product code adds a lot more tension and delays. As opposed to some product code, test code should never try to recover from unexpected failures and perform some work anyway because this either hides bugs and/or generates complicated and time-consuming cascades of failures even harder to debug.

Why not just check the exit code of every command like in C?
-------------------------------------------------------

Because it makes the code extremely verbose.

Because life is always too short so everyone who promised to check every exit code of every command never delivered on their promise (this failed promise tends to happens in C too). Instead they only check the exit code of the commands they "expect" to fail. But the line between expected and unexpected failures is very subjective; it depends on the developer.

Should I still manually check the exit code of some commands (which ones) even when using set -e? 
--------------------

You should simply check commands that return an error but don't provide a good error message. No need to do anything for commands that print good error messages. For instance some commands will just print "file not found" without printing the filename. In that case you must manually check the exit code exactly like when not using errexit. Example:
```shell
cmd1_with_bad_error_messages "$somefile" ||
    die 'Failed to run cmd1 on file %s\n' "$somefile"
```
This will print:
```
File not found               # from cmd1
Failed to run cmd1 on file some_file  # from you
```

Why do I need "some_command || true" and other tricks?
------------------------------------------------------

There are some commmands that return a non-zero code but it's not an error or not always an error. This is why adding `set -e` to existing scripts requires a fair amount of testing, almost as much testing as when they were written the first time.
grep is a typical example, this does NOT work, the script stops silently:

```shell
set -e

found_maybe=$( some_command | grep something) # aborts when not found
if [ -n "$something"] ; then ...  # does not run when nothing found
```

OK:
```shell
set -e

found_maybe=$( some_command | grep something || true)
if [ -n "$found_maybe"] ; then ...
```

`|| true` works because `set -e` is obviously disabled when checking the exit code manually.
 Also OK:
 
```shell
set -e

if [ -n "$( some_command | grep something)" ] ; then ...

```

The return value of `expr` can sometimes be non-zero for surprising reasons. Simple fix: don't use `expr` at all, it's antiquated. Use `$(( ))` instead. See https://github.com/koalaman/shellcheck/wiki/SC2003


How do I use $?
---------------

Most of the time you shouldn't use $?, see why https://github.com/koalaman/shellcheck/wiki/SC2181

But sometimes you really need `$?`, for instance because you want to treat different error codes differently. Solution:

```shell
ret=0
some_command || ret=$?
case $ret in ...
```

Prefer || over && when possible
-------------------------------

https://www.shellcheck.net/wiki/SC2015

`set -e` is compatible with `&&` and it is possible to use both in the same script, however the interaction between `&&` and `set -e` can be quite tricky, see `errexit_demo()` demo below. Prefer `||` and `if then else` over `&&` when possible.

Especially avoid negations like: `test not good_thing && handle_bad`, always replace this with the simpler, "assert-style": `test good_thing || handle_bad`. 



You especially want to avoid `&&` _on the last line of the script_ for a subtle reason explained at http://redsymbol.net/articles/unofficial-bash-strict-mode/#short-circuiting


```shell
errexit_demo () 
{ 
    for x in '||' '&&';
    do
        for i in true false;
        do
            for j in true false;
            do
                tst="$i $x $j";
                printf "${tst} : ";
                bash -e -c "$tst; printf 'no_errexit, '";
                bash -c "$tst; "'printf "\texit=$?\n"'
            done;
        done;
    done;
}

errexit_demo

true || true : no_errexit, 	exit=0
true || false : no_errexit, 	exit=0
false || true : no_errexit, 	exit=0
false || false : 	exit=1
true && true : no_errexit, 	exit=0
true && false : 	exit=1
false && true : no_errexit, 	exit=1  # no match
false && false : no_errexit, 	exit=1  # no match

```

`||` and `&&` trigger `set -e` (errexit) when and only when the second command runs and fails. In other words, `set -e` is ignored by the first command only. This is usually what you want but this behavoir does NOT match `$?` when using `&&`! On the other hand, `set -e` does match `$?` when using `||`

Another common misconception: `A && B || C` is a (barely) shorter `if; then; else`. It is not. It is _usually_ equivalent to `if;then; else` but not always! Compare:

```shell
if true; then grep notfound /etc/passwd ; else 'ELSE is not run'; fi
true && grep notfound /etc/passwd || echo 'ELSE is also run!!'
```

Even if you understand the subtle difference between `A && B || C`  and `if; then; else`, someone else will not and they will re-use the approach. Avoid it.

Must separate local/export and $( )
-----------------------------------

```shell
# set -e does NOT catch this failure!
local foo=$(failed_command) 
# FIXED: set -e does catch this failure
local foo; foo=$(failed_command)
```
https://github.com/koalaman/shellcheck/wiki/SC2155

More
----

http://redsymbol.net/articles/unofficial-bash-strict-mode/#issues-and-solutions

Warning: this page documents an even stricter way to use bash, so some "solutions" are unrelated to `set -e`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

Rationale

Why stop immediately when something fails?

Why not just check the exit code of every command like in C?

Should I still manually check the exit code of some commands (which ones) even when using set -e?

Why do I need "some_command || true" and other tricks?

How do I use $?

Prefer || over && when possible

Must separate local/export and $( )

More

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[sof-test] add set -e to all test cases (Unless You Love Debugging) #312

Description

Rationale

Why stop immediately when something fails?

Why not just check the exit code of every command like in C?

Should I still manually check the exit code of some commands (which ones) even when using set -e?

Why do I need "some_command || true" and other tricks?

How do I use $?

Prefer || over && when possible

Must separate local/export and $( )

More

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions