Unnoficial Bash Strict Mode
Bash (and shell scripting in general) is NOT straightforward. It’s easy to mess up if you don’t know what you are doing. If you come from a traditional programming background and just want to plumb a few lines of code, there are a few behaviours that will confuse the hell out of you.
To help with that, the unnoficial bash strict mode was created. In this post, we will go over misleading behaviors and how the strict mode can be helpful in each case (quirks included).
errexit
errexit: The problem
In the following bash script:
|
|
/posts/bash-strict-mode/errexit.sh
Given the file does not exist, should it run all the way through or should it fail?
Turns out the script continues running just fine!
To tackle the problem, I tend to resort to the “Fail Fast” approach. From the Fail Fast - C2 wiki:
This is done upon encountering an error so serious that it is possible that the process state is corrupt or inconsistent, and immediate exit is the best way to ensure that no (more) damage is done.
Sounds reasonable. So why is “Fail silently” the normal behaviour in a shell script? Well, think that in the context of a shell you DO NOT want to exit when there’s an error (imagine crashing your shell when you cat
a file that does not exist). Looks like the behaviour was simply carried out to the non-interactive shell.
errexit: The solution
So how can we improve this behaviour? By setting the flag errexit
.
|
|
or the shorthand version (more commonly used):
|
|
What does this do? As per the docs:
Exit immediately if a pipeline (…) returns a non-zero status.
– https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html#The-Set-Builtin
Going back to our example, we would do it instead:
|
|
/posts/bash-strict-mode/errexit2.sh
Which would then fail. Since the file does not exist, cat
returns a non-zero exit code. This behaviour is described in the following bats unit test:
|
|
/posts/bash-strict-mode/errexit.bats
That works and will definitely IMO help you, but be aware:
Errexit: Quirks
Quirk #1: Programs that return non-zero status
Not all commands return 0 on successfull runs. The most proeminent example is grep
. From the docs:
Normally the exit status is
- 0 if a line is selected,
- 1 if no lines were selected,
- and 2 if an error occurred.
However, if the -q or –quiet or –silent is used and a line is selected, the exit status is 0 even if an error occurred.
So in the example below, echo
will never be run.
|
|
/posts/bash-strict-mode/grep_fail.sh
What can we do in that situation? Thankfully there’s a bit in the bash manual on errexit
section that can help us (reformatted for clarity):
The shell does not exit if the command that fails is
- part of the command list immediately following a while or until keyword,
- part of the test in an if statement,
- part of any command executed in a && or || list except the command following the final && or ||
- any command in a pipeline but the last, or if the command’s return status is being inverted with !.
In our case, we can simply rewrite to comply with 2:
|
|
/posts/bash-strict-mode/grep_correct.sh
This behaviour can be verified by the following bats test:
|
|
/posts/bash-strict-mode/grep.bats
Quirk 2: What if you are ok with a command failing/returning non-zero?
In that case, simply OR with true
:
|
|
Since we do not want to fail if there are no log files.
Let’s think a little bit why this works. From the docs (which we already read in a previous point):
The shell does not exit if the command that fails is (…)
- part of any command executed in a && or || list except the command following the final && or ||
As the command following the final ||
is true
, there’s no way for the whole line to fail.
Another option would be to turn it off momentaneously:
|
|
The +
syntax means “remove” and -
means “to add” (go figure). Therefore, we are simply disabling that feature while our command_allowed_to_fail
is called!
Bonus point: How do I know which command failed?
Not really a quirk, and not specific to errexit
, but often you need to know where it failed.
|
|
/posts/bash-strict-mode/which_command_failed.sh
How can you tell which command failed? (Apart from looking at the very obvious mistake)
echo
everything you are doing
Pros: straightforward
Cons: quite boring to do so
set -x
, which will print every instruction.
Pros: simple to add
Cons: you may end up exposing more than you want (imagine printing a variable with secrets, now imagine that running in a CI environment)
- put a
trap
to print the line number when a command fails
Pros: can be added globally
Cons: bit verbose
- put a
Is there anything more?
Yup. Once you get the gist of it, read the entry on BashFAQ and its linked resources.
Pipefail
Pipefail: The problem
|
|
/posts/bash-strict-mode/pipefail_first.sh
This would run just fine!
Unfortunately errexit
is not enough to save us here. From the docs, again:
The shell does not exit if the command that fails is: (…)
- any command in a pipeline but the last, or if the command’s return status is being inverted with !.
The exit status of a pipeline is the exit status of the last command in the pipeline
Pipefail: The solution
Let’s set pipefail
:
If pipefail is enabled, the pipeline’s return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully.
In other words, it will only return 0 if all parts of the pipeline return 0.
As opposed to errexit
, pipefail
can only be set by its full form:
|
|
Let’s fix the example shown before:
|
|
/posts/bash-strict-mode/pipefail_first_correct.sh
Both behaviours are verified by the following Bats test:
|
|
/posts/bash-strict-mode/pipefail_first.bats
Pipefail: Quirks
Quirk 1:
the pipeline’s return status is the value of the last (rightmost) command to exit with a non-zero status
|
|
/posts/bash-strict-mode/pipefail_quirk_1.sh
cat
’s exit code is 1
for when the file does not exist. And xargs
’s exit code is 123
“if any invocation of the command exited with status 1-12”.
Obviously both calls are broken, but what exit code do we get here?
The answer is 123
, which is not ideal.
My recommendation for this case is to simply break it down into different instructions:
|
|
/posts/bash-strict-mode/pipefail_quirk_1_correct.sh
This behaviour can be confirmed by the following bats test:
|
|
/posts/bash-strict-mode/pipefail_quirk_1.bats
Quirk 2:
Be careful with what you pipe:
|
|
/posts/bash-strict-mode/pipefail_quirk_2.sh
In this example, we are loading a whitelist file, feeding it to another command (here implemented as a function) that passes it to yet another service (e.g. a CLI tool). Even though the file does not exist, the pipeline does not fail. This ends up passing an empty string to remove_hosts
, which could have catastrophic effects! (Deleting more than you expect).
Ideally, you want to fail as soon as possible. The best way to do so is to break it down into more instructions and just be more careful ¯_(ツ)_/¯
|
|
/posts/bash-strict-mode/pipefail_quirk_2_correct.sh
As always, this behaviour is described by the following bats file:
|
|
/posts/bash-strict-mode/pipefail_quirk_2.bats
For more examples, check Examples of why pipefail is really important to use.
nounset
Last but not least, this one is very straightforward.
nounset: The problem
|
|
/posts/bash-strict-mode/nounset.sh
nounset: The solution
Treat unset variables and parameters other than the special parameters ‘@’ or ‘*’ as an error when performing parameter expansion. An error message will be written to the standard error, and a non-interactive shell will exit.
|
|
/posts/bash-strict-mode/nounset.bats
nounset: Quirks
Quirk 1:
As mentioned in the docs, @
and *
are treated differently:
|
|
/posts/bash-strict-mode/nounset_quirk_1.sh
So always verify the arguments you are getting are actually correct:
|
|
/posts/bash-strict-mode/nounset_quirk_1_correct.sh
|
|
/posts/bash-strict-mode/nounset_quirk_1.bats
Conclusion
I hope this is enough to:
- illustrate how the expectations we often have are not true;
- how ‘unnoficial strict mode’ can help;
- and how the strict mode is not a panacea!