Bash script essentials

Inside the script

Code Explanation Example
#!/bin/bash “Shebang” line: points the computer to the Bash interpreter at /bin/bash.
set -u -e -o pipefail Bash strict settings: exit script with error if an unset variable is referenced (-u), if a general error occurs (but with exceptions; -e), or if an error occurs in a shell pipeline (-o pipefail).
$0 Name of the script (Note: does not work for scripts submitted as SLURM jobs).
$1, $2, etc “Positional parameters”: first, second, etc, arguments passed to the script from the shell. ./ my_arg1 my_arg2
$1 will be “my_arg1” and $2 will be “my_arg2”.
$# Number of arguments passed to the script. ./ my_arg1 my_arg2
$# will be 2.
>&2 Redirect standard out to standard error: e.g., “manually” designate an echo statement to represent standard error. echo "Error: Invalid line nr." >&2
exit 1 Exit the script with exit code 1 (= failure).
echo "Error: need 3+ args" && exit 1

Executing scripts

Command Explanation
bash Scripts that are not executable (i.e., have no execute permission) and/or have no shebang line should be run by explicitly calling bash.
Scripts that are executable and have a shebang line can be called directly by name. Note: if they are in the current working dir, preface with ./ (otherwise the script will be looked for only in $PATH).
./ in.txt out.txt Call a script with two arguments, which will be available inside the script as $1 and $2.


Code Explanation Example
= Assign a variable. Note: no spaces around the = ! nlines=200
nlines=$(wc -l my.csv)

Recall/reference a variable with $.

Preferably quote variables too, especially in scripts, to prevent unwanted shell expansion in case of spaces and other special characters in variable values.

Optionally, put variable names in curly braces {}.

echo $nlines (After assignment e.g. nlines=200)
echo "$nlines" (Safer: quoted)
echo "${nlines}" (Optionally: “embraced”)
[ ] Test statement. Spaces required around the brackets (see examples)! [ 9 -gt 5 ] (Returns true: 9 is greater than 5)
[ $var1 -lt $var2 ] (Returns true if $var1 is less than $var2)
[ -d my_dir ] (Returns true if dir exists and is a dir)
() Use to assign an array: a collection of items that can e.g. be looped over. sample_names=(zmaysA zmaysB zmaysC)
sample_files=($(cut -f 3 samples.txt))
sample_files=($(cat fastq_files.txt))
${array[@]} Print all values in an array. echo ${sample_names[@]}
&& Chain commands: execute second command only if the first succeeds. cd data && ls data
git add --all && git commit -m "Add README" && git push
|| Chain commands: execute second command only if the first fails. cd "$outdir" || echo "Cannot change directory!"
[ -d "$outdir" ] || mkdir "$outdir"
basename Strip any directory names from a path, and optionally a suffix too. basename data/A.fq (Returns A.fq)
basename data/A.fq .fq (Returns A)
expr Simple arithmetic in the shell (but: no decimals, only integers!)
nseqs=$(expr $nlines / 4) (Divide a value by 4)

for loops

Basic example showing the syntax:

for i in 1 2 3; do
    echo "Now the variable 'i' is: $i"
#> Now the variable 'i' is: 1
#> Now the variable 'i' is: 2
#> Now the variable 'i' is: 3

In each iteration, one of the items provided after in will be assigned to the variable name provided after for, which can then be used inside the loop.

Practical examples:

# Loop over files using globbing - better than using `ls`:
for fastq_file in data/raw/*fastq.gz; do
      echo "File $fastq_file has $(wc -l < $fastq_file) lines."
      # More processing...
# Loop to rename files:
for oldname in *.fastq; do
    newname=$(basename "$oldname" _001.fastq).fq
    echo "Old/new name: $oldname $newname"
    mv "$oldname" "$newname"

# Loop using an array to submit a script for each sample:
my_samples=($(cut -f1 my_metadata.txt))
for my_sample in ${my_samples[@]}; do $my_sample

if statements

Basic syntax:

if <some_test>; then
    # Commands to run if test evaluated to true
if <some_test>; then
    # Commands to run if test evaluated to true
    # Commands to run if test evaluated to false

The test is usually done with the [] syntax for a test, e.g. [ -d my_dir ] which will evaluate to true is my_dir is an existing directory.

Practical examples:

# Differential processing based on (e.g.) the number of samples:
n_samples=$(wc -l < samples.txt)
if [ "$n_samples" -gt 9 ]; then  # If the nr of samples is >9
    echo ">9 samples: processing files with algorithm A..."
    echo "<= 9 samples: processing files with algorithm B..."

# Test whether the correct number of arguments (here: 2)
# were provided to the script:
if [ ! "$#" -eq 2 ]; then
      echo "Error: wrong number of arguments"
      echo "You provided $# arguments, while 2 are required."
      echo "Usage: <line-number> <file>"
      exit 1

# Test whether the input file is a regular file (-f) and can be read (-r):
if [ ! -f $file ] || [ ! -r $file ]; then
    echo "Error: can't open file"
    echo "Second argument should be a readable file"
    echo "You provided: $file"
    exit 1

# Use a command's exit status - for grep, a match is success is true:
if grep "AGATCGG" contimated.fasta > /dev/null; then
    echo "OH NO! File is contaminated!"
    exit 1

# Remove all empty files from a directory:
for file in *; do
    if [ ! -s "$file" ]; then
        rm "$file"

File test operators

Operator Returns true if:
-f File is a regular file (e.g. not a directory)
-d File is a directory
-e File exists
-s File is not zero size
-h File is a symbolic link
-r / -w / -x File has read/write/execute permissions

Comparison operators

String Description
-z str String str is null (empty)
str1 = str2 Strings str1 and str2 are identical
str1 != str2         Strings str1 and str2 are different                
Integer Description
int1 -eq int2 Integers int1 and int2 are equal
int1 -ne int2 Integers int1 and int2 are not equal
int1 -lt int2 Integer int1 is less than int2
int1 -gt int2 Integer int1 is greater than int2
int1 -le int2 Integer int1 is less than or equal to int2
int1 -ge int2       Integer int1 is greater than or equal to int2


