Unnoficial Bash Strict Mode

Bash (and shell scripting in general) is NOT straightforward. It’s easy to mess up if you don’t know what you are doing. If you come from a traditional programming background and just want to plumb a few lines of code, there are a few behaviours that will confuse the hell out of you.

To help with that, the unnoficial bash strict mode was created. In this post, we will go over misleading behaviors and how the strict mode can be helpful in each case (quirks included).

errexit

errexit: The problem

In the following bash script:

1
2
3
4
#!/usr/bin/env bash

cat /tmp/i_do_not_exist
echo "Hey"
Source: /posts/bash-strict-mode/errexit.sh

Given the file does not exist, should it run all the way through or should it fail?

Turns out the script continues running just fine!

To tackle the problem, I tend to resort to the “Fail Fast” approach. From the Fail Fast - C2 wiki:

This is done upon encountering an error so serious that it is possible that the process state is corrupt or inconsistent, and immediate exit is the best way to ensure that no (more) damage is done.

Sounds reasonable. So why is “Fail silently” the normal behaviour in a shell script? Well, think that in the context of a shell you DO NOT want to exit when there’s an error (imagine crashing your shell when you cat a file that does not exist). Looks like the behaviour was simply carried out to the non-interactive shell.

errexit: The solution

So how can we improve this behaviour? By setting the flag errexit.

1
set -o errexit

or the shorthand version (more commonly used):

1
set -e

What does this do? As per the docs:

Exit immediately if a pipeline (…) returns a non-zero status.

https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html#The-Set-Builtin

Going back to our example, we would do it instead:

1
2
3
4
5
6
#!/usr/bin/env bash

set -e

cat /tmp/i_do_not_exist
echo "Hey"
Source: /posts/bash-strict-mode/errexit2.sh

Which would then fail. Since the file does not exist, cat returns a non-zero exit code. This behaviour is described in the following bats unit test:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/usr/bin/env bats

load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'

@test "runs fine even though file does not exist" { 
	run "$BATS_TEST_DIRNAME/errexit.sh"
	[ "$status" -eq 0 ]
}

@test "fails since file does not exist AND errexit is turned on" {
	run "$BATS_TEST_DIRNAME/errexit2.sh"
	[ "$status" -ne 0 ]
	[ "$output" == "cat: /tmp/i_do_not_exist: No such file or directory" ]
}
Source: /posts/bash-strict-mode/errexit.bats

That works and will definitely IMO help you, but be aware:

Errexit: Quirks

Quirk #1: Programs that return non-zero status

Not all commands return 0 on successfull runs. The most proeminent example is grep. From the docs:

Normally the exit status is

  1. 0 if a line is selected,
  2. 1 if no lines were selected,
  3. and 2 if an error occurred.

However, if the -q or –quiet or –silent is used and a line is selected, the exit status is 0 even if an error occurred.

So in the example below, echo will never be run.

1
2
3
4
5
6
#!/usr/bin/env bash

set -e

status_code=$(grep non_existant_word /dev/null)
echo "Hello world"
Source: /posts/bash-strict-mode/grep_fail.sh

What can we do in that situation? Thankfully there’s a bit in the bash manual on errexit section that can help us (reformatted for clarity):

The shell does not exit if the command that fails is

  1. part of the command list immediately following a while or until keyword,
  2. part of the test in an if statement,
  3. part of any command executed in a && or || list except the command following the final && or ||
  4. any command in a pipeline but the last, or if the command’s return status is being inverted with !.

In our case, we can simply rewrite to comply with 2:

1
2
3
4
5
6
7
8
9
#!/usr/bin/env bash

set -e

if grep non_existant_word /dev/null; then
	echo "Hello world"
else
	echo "Does not exist"
fi
Source: /posts/bash-strict-mode/grep_correct.sh

This behaviour can be verified by the following bats test:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/usr/bin/env bats

load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'

@test "fails since grep returns non 0" {
	run "$BATS_TEST_DIRNAME/grep_fail.sh"
	[ "$status" -ne 0 ]
}

@test "runs fine since grep is in a if statement" {
	run "$BATS_TEST_DIRNAME/grep_correct.sh"
	[ "$status" -eq 0 ]
	[ "$output" == "Does not exist" ]
}
Source: /posts/bash-strict-mode/grep.bats

Quirk 2: What if you are ok with a command failing/returning non-zero?

In that case, simply OR with true:

1
rm *.log || true

Since we do not want to fail if there are no log files.

Let’s think a little bit why this works. From the docs (which we already read in a previous point):

The shell does not exit if the command that fails is (…)

  1. part of any command executed in a && or || list except the command following the final && or ||

As the command following the final || is true, there’s no way for the whole line to fail.

Another option would be to turn it off momentaneously:

1
2
3
set +e
command_allowed_to_fail
set -e

The + syntax means “remove” and - means “to add” (go figure). Therefore, we are simply disabling that feature while our command_allowed_to_fail is called!

Bonus point: How do I know which command failed?

Not really a quirk, and not specific to errexit, but often you need to know where it failed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/usr/bin/env bash

set -e

function random_bytes {
	echo $(head -c "$1" /dev/random | base64)
}

random_bytes 10
random_bytes 50
random_bytes 
random_bytes 5
Source: /posts/bash-strict-mode/which_command_failed.sh

How can you tell which command failed? (Apart from looking at the very obvious mistake)

    1. echo everything you are doing
      Pros: straightforward
      Cons: quite boring to do so
    1. set -x, which will print every instruction.
      Pros: simple to add
      Cons: you may end up exposing more than you want (imagine printing a variable with secrets, now imagine that running in a CI environment)
    1. put a trap to print the line number when a command fails
      Pros: can be added globally
      Cons: bit verbose

Is there anything more?

Yup. Once you get the gist of it, read the entry on BashFAQ and its linked resources.

Pipefail

Pipefail: The problem

1
2
3
4
5
6
#!/usr/bin/env bash

set -e

non_existent_cmd | another_non_existent_cmd | cat
echo "Hello"
Source: /posts/bash-strict-mode/pipefail_first.sh

This would run just fine!

Unfortunately errexit is not enough to save us here. From the docs, again:

The shell does not exit if the command that fails is: (…)

  1. any command in a pipeline but the last, or if the command’s return status is being inverted with !.

The exit status of a pipeline is the exit status of the last command in the pipeline

Pipefail: The solution

Let’s set pipefail:

If pipefail is enabled, the pipeline’s return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully.

In other words, it will only return 0 if all parts of the pipeline return 0.

As opposed to errexit, pipefail can only be set by its full form:

1
set -o pipefail

Let’s fix the example shown before:

1
2
3
4
5
6
#!/usr/bin/env bash

set -eo pipefail

non_existent_cmd | another_non_existent_cmd | cat
echo "Hello"
Source: /posts/bash-strict-mode/pipefail_first_correct.sh

Both behaviours are verified by the following Bats test:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/usr/bin/env bats

load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'

@test "runs fine since 'pipefail' is not set" { 
	run "$BATS_TEST_DIRNAME/pipefail_first.sh"
	[ "$status" -eq 0 ]
	[ "${lines[2]}" == 'Hello' ]
}

@test "fails since 'pipefail' is set" {
	run "$BATS_TEST_DIRNAME/pipefail_first_correct.sh"
	[ "$status" -ne 0 ]
	[ "${lines[2]}" != 'Hello' ]
}
Source: /posts/bash-strict-mode/pipefail_first.bats

Pipefail: Quirks

Quirk 1:

the pipeline’s return status is the value of the last (rightmost) command to exit with a non-zero status

1
2
3
4
5
#!/usr/bin/env bash

set -eo pipefail

cat non_existing_file | xargs curl -qs
Source: /posts/bash-strict-mode/pipefail_quirk_1.sh

cat’s exit code is 1 for when the file does not exist. And xargs’s exit code is 123 “if any invocation of the command exited with status 1-12”. Obviously both calls are broken, but what exit code do we get here?

The answer is 123, which is not ideal.

My recommendation for this case is to simply break it down into different instructions:

1
2
3
4
5
6
7

#!/usr/bin/env bash

set -eo pipefail

contents=$(cat non_existing_file)
curl -qs "$contents"
Source: /posts/bash-strict-mode/pipefail_quirk_1_correct.sh

This behaviour can be confirmed by the following bats test:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/usr/bin/env bats

load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'

@test "returns exit code of xargs" {
	run "$BATS_TEST_DIRNAME/pipefail_quirk_1.sh"
	[ "$status" -eq 123 ]
}

@test "returns exit code of cat" {
	run "$BATS_TEST_DIRNAME/pipefail_quirk_1_correct.sh"
	[ "$status" -eq 1 ]
}
Source: /posts/bash-strict-mode/pipefail_quirk_1.bats

Quirk 2:

Be careful with what you pipe:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env bash

set -eo pipefail

function all_hosts() {
	echo 'host-1
host-2
host-a
host-b'
}


function remove_hosts() {
	hosts=$(all_hosts | tr '\n' ' ')
	whitelist="$1"
	echo "
Removing hosts: $hosts

Whitelist: '$whitelist'
	"

	# Imagine we are passing those two parameters
	# To another command
}

cat non_existent_whitelist_file | remove_hosts
Source: /posts/bash-strict-mode/pipefail_quirk_2.sh

In this example, we are loading a whitelist file, feeding it to another command (here implemented as a function) that passes it to yet another service (e.g. a CLI tool). Even though the file does not exist, the pipeline does not fail. This ends up passing an empty string to remove_hosts, which could have catastrophic effects! (Deleting more than you expect).

Ideally, you want to fail as soon as possible. The best way to do so is to break it down into more instructions and just be more careful ¯_(ツ)_/¯

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/usr/bin/env bash

set -eo pipefail

function all_hosts() {
	echo 'host-1
host-2
host-a
host-b'
}


function remove_hosts() {
	hosts=$(all_hosts | tr '\n' ' ')
	whitelist="$1"
	echo "
	Removing hosts:
	$hosts

	Whitelist:
	'$whitelist'
	"

	# Imagine we are passing those two parameters
	# To another command
}

readonly local whitelist_file="non_existent_whitelist_file"

if [ ! -f "$whitelist_file" ]; then
	echo "Whitelist file does not exist"
	exit 1
fi

cat "$whitelist_file" | remove_hosts
Source: /posts/bash-strict-mode/pipefail_quirk_2_correct.sh

As always, this behaviour is described by the following bats file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/usr/bin/env bats

load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'

@test "runs fine even though file does not exist" {
	run "$BATS_TEST_DIRNAME/pipefail_quirk_2.sh"
	[ "$status" -ne 0 ]
	[ "${lines[2]}" == "Whitelist: ''" ]
}

@test "fails since we verify file presence" {
	run "$BATS_TEST_DIRNAME/pipefail_quirk_2_correct.sh"
	[ "$status" -eq 1 ]
	[ "$output" == "Whitelist file does not exist" ]
}
Source: /posts/bash-strict-mode/pipefail_quirk_2.bats

For more examples, check Examples of why pipefail is really important to use.

nounset

Last but not least, this one is very straightforward.

nounset: The problem

1
2
3
4
5
#!/usr/bin/env bash

set -eo pipefail

echo "MY_VAR value: $MY_VAR"
Source: /posts/bash-strict-mode/nounset.sh

nounset: The solution

Treat unset variables and parameters other than the special parameters ‘@’ or ‘*’ as an error when performing parameter expansion. An error message will be written to the standard error, and a non-interactive shell will exit.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/usr/bin/env bats

load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'

@test "runs fine, even though MY_VAR is not set" { 
	run "$BATS_TEST_DIRNAME/nounset.sh"
	[ "$status" -eq 0 ]
	[ "$output" == 'MY_VAR value: ' ]
}

@test "fails since 'nounset' is set" {
	run "$BATS_TEST_DIRNAME/nounset_correct.sh"
	[ "$status" -ne 0 ]
	[[ "$output" =~ 'MY_VAR: unbound variable' ]]

}
Source: /posts/bash-strict-mode/nounset.bats

nounset: Quirks

Quirk 1:

As mentioned in the docs, @ and * are treated differently:

1
2
3
4
5
6
7
8
9
#!/usr/bin/env bash

set -euo pipefail

my_fn() {
	echo "Received args: '$@'"
}

my_fn "$@"
Source: /posts/bash-strict-mode/nounset_quirk_1.sh

So always verify the arguments you are getting are actually correct:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/usr/bin/env bash

set -euo pipefail

my_fn() {
	echo "Received args: '$@'"
}

if [ $# -eq 0 ]; then
	echo "No arguments supplied"
	exit 1
else
	my_fn "$@"
fi
Source: /posts/bash-strict-mode/nounset_quirk_1_correct.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/usr/bin/env bats

load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'

@test 'runs fine, since unset does not affect "$@"' {
	run "$BATS_TEST_DIRNAME/nounset_quirk_1.sh"
	[ "$status" -eq 0 ]
	[ "$output" == "Received args: ''" ]
}

@test "validates input manually" {
	run "$BATS_TEST_DIRNAME/nounset_quirk_1_correct.sh"
	[ "$status" -eq 1 ]
	[ "$output" == "No arguments supplied" ]
}
Source: /posts/bash-strict-mode/nounset_quirk_1.bats

Conclusion

I hope this is enough to:

  • illustrate how the expectations we often have are not true;
  • how ‘unnoficial strict mode’ can help;
  • and how the strict mode is not a panacea!

References/Recommended Readings