Shell scripting basics

Week 6 – Lecture A

Author
Affiliation

Jelmer Poelstra

Published

September 26, 2025



1 Introduction

1.1 Learning Goals

This week and the next two weeks

This week, you will learn how to write shell scripts to run programs with command-line interfaces (CLIs), like various bioinformatics tools.

The end goal is to be able to submit shell scripts as “batch jobs” at OSC, which e.g. allows you to run them simultaneously many times! This is extremely useful because with omics analysis, it’s common to have to run the same step for many samples in parallel. To get there, you still need to learn about the following topics:

  1. The basics of shell scripts (this session)
  2. Running command-line programs using shell scripts (next session)
  3. Submitting batch jobs with Slurm (next week)

This session

In this session, we will talk about:

  • What shell scripts are
  • Boilerplate shell script header lines: shebang and safe settings
  • Shell variables
  • Command-line arguments to scripts

1.2 Getting ready

  1. At https://ondemand.osc.edu, start a VS Code session in /fs/ess/PAS2880/users/$USER
  2. Open a terminal and create and then navigate to a week06 dir
  3. Within the week06 dir, create a dir scripts (but don’t navigate there)
  4. Optional: Create a new Markdown file for class notes and save it in your week06 dir
Some useful VS Code keyboard shortcuts

Working with keyboard shortcuts for common operations can be a lot faster than using your mouse. Below are some particularly useful ones in VS Code:

  • Open a terminal: Ctrl+` (backtick)

  • Toggle between the terminal and the editor pane: Ctrl+` and Ctrl+1.

  • “Line actions” in the editor:

    • Move a line up or down: Alt/Option+/

    • Delete a line: Ctrl/⌘+Shift+K

    • The standard copy and cut shortcuts (Ctrl/⌘+X / C) will cut/copy the entire line that the cursor is on!

  • For a single-page PDF overview of keyboard shortcuts for your operating system:   =>   Help   =>   Keyboard Shortcut Reference. (Or for direct links to these PDFs: Windows / Mac / Linux.)

2 Shell scripts

Many bioinformatics tools/programs/software that are used to analyze omics data are run from the command line. In other words, they have a command-line interface (CLI). We can run them using command line expressions that are structurally very similar to how we’ve been using basic Unix shell commands. You saw an example of this when we ran FastQC last week.

However, we’ve been running shell commands in an “interactive” manner: typing or pasting them into the shell and then pressing Enter. But when you run bioinformatics tools, it is in most cases a much better idea to run them via shell scripts, which are plain-text files that contain shell code.

Why use shell scripts

Some general reasons why it can be beneficial to use shell scripts instead of running code interactively line-by-line:

  • It is a good way to save and organize your code.
  • You can easily rerun scripts and re-use them in similar contexts.

And very importantly for our purposes at OSC, we can submit scripts as “batch jobs” to the compute job scheduling program (which is called Slurm), and this allows us to:

  • Run scripts remotely without needing to stay connected to the running process, or even to be connected at all to it: we can submit a script, log out from OSC and shut down our computer, and it will still run.
  • Easily run analyses that take many hours or even multiple days.
  • Run a script many times simultaneously, such as for different files/samples.
Bash vs. shell

So far, we’ve mostly used talked about the Unix shell and shell scripts. From today onwards, we’ll also see the term “Bash”. Recall the difference we talked about in the Unix shell intro session: there are multiple Unix shell language variants and the specific one we’ve been using is the Bash shell (which is by far the most common). Our shell scripts are therefore in the Bash language, can be specifically called Bash scripts, and can be run with the bash command.

3 A basic shell script

3.1 A one-line script to start

Create your first script, printname.sh (shell scripts usually have the extension .sh) as follows:

# Create an empty file
touch scripts/printname.sh

A nice VS Code trick mentioned before is that is if you hold Ctrl (Windows) / (Mac) while hovering over a file path in the terminal, the path should become underlined and you can click on it to open the file. Try that with the printname.sh script1. Once the file is open in your editor pane, type or paste the following inside the script:

echo "This script will print a first and a last name"

Shell scripts mostly contain the same Unix shell code you have become familiar with. As such, the printline.sh file with a single echo command constitutes a functional shell script!

One way of running the script is by typing bash followed by the path to the script:

bash scripts/printname.sh
This script will print a first and a last name

That worked! The script doesn’t yet print any names like it “promises” to do, but we will add that functionality in a little bit. First, you’ll learn about two header lines that are good practice to add to every shell script.

Any changes you make to this and other files in the editor pane should be immediately, automatically saved by VS Code. If that’s not happening for some reason, you should see an indication of unsaved changes like a large black dot next to the script’s file name in the editor pane tab header.

If the file is not auto-saving, you can always save it manually (including with Ctrl/Cmd+S) like you would do in other programs. However, it may be convenient to turn Auto Save on: press Ctrl/Cmd+Shift+S to open the Command Palette and type “Auto Save”. You should see an option “Toggle Auto Save”: click on that.

3.2 Shebang line

A so-called “shebang” line is commonly used as the first line of a script to indicate which computer language the script uses. More specifically, this line tells the computer where to find the binary (executable) that runs your script – and since this is a Bash shell script, that will be the Bash program.

#!/bin/bash

Such a line starts with #! (hash-bang), basically marking it as a special type of comment. Those two characters are followed by the path to the relevant program: in our case Bash, which itself is just a program with an executable file that is located at /bin/bash on Linux and Mac computers.

While not always strictly necessary, adding a shebang line to every shell script is good practice, especially when you submit your script to OSC’s Slurm queue, as we’ll do later.

3.3 Shell script settings

Another best-practice line you should add to your shell scripts will change some default settings to safer alternatives.

Bad default shell settings

The following two default settings of the Bash shell are bad ideas inside scripts:

  • When you reference a non-existent (“unset”) variable, the shell replaces that with nothing without complaint:

    echo "Hello, my name is $myname. What is yours?"
    Hello, my name is . What is yours?

    In scripts, this can lead to all sorts of downstream problems, because you very likely tried and failed to do something with an existing variable (e.g. you misspelled its name, or forgot to assign it altogether). Even more problematically, this can lead to potentially very destructive file removal, as the box below illustrates.

  • A Bash script keeps running after encountering errors. That is, if an error is encountered when running, say, line 2 of a script, any remaining lines in the script will nevertheless be executed.

    At best, this is a waste of computer resources, but it can also lead to all kinds of unintended consequences. Additionally, if your script prints a lot of output, you might not notice an error somewhere in the middle if it doesn’t produce more errors downstream. But the downstream results from what we at that point might call a “zombie script” can still be completely wrong.

The shell’s default behavior of ignoring the referencing of unset variables can lead to accidental file removal as follows:

  • Using a variable, you try to remove temporary files whose names start with tmp_:

    # NOTE: DO NOT run this!
    temp_prefix="temp_"
    rm "$tmp_prefix"*
  • Using a variable, you try to remove a temporary directory:

    # NOTE: DO NOT run this!
    tempdir=output/tmp
    rm -r $tmpdir/*
Above, the text specified the intent of the commands. What would have actually happened? (Click to expand)

In both examples, there is a similar typo: temp vs. tmp, which means that we are referencing a (likely) non-existent variable.

  • In the first example, rm "$tmp_prefix"* would have been interpreted as rm *, because the non-existent variable is simply ignored. Therefore, we would have removed all files in the current working directory.

  • In the second example, along similar lines, rm -rf $tmpdir/* would have been interpreted as rm -rf /*. Horrifyingly, this would attempt to remove the entire filesystem: recall that a leading / in a path is a computer’s root directory. (-r makes the removal recursive and -f makes forces removal).

These kinds of accidents are especially likely to happen inside scripts, where it is common to use variables and to work non-interactively.

But before you get too scared of doing terrible damage, note that at OSC, you would not be able to remove any essential files since you don’t have the permissions to do so. On your own computer, this could be more genuinely dangerous, though even there, you would not be able to remove operating system files without requesting “admin” rights.

Safer settings

The following three settings will make your shell scripts more robust and safer. With these settings, the script terminates with an appropriate error message if:

  • set -u — an “unset” (non-existent) variable is referenced.
  • set -e — almost any error occurs.
  • set -o pipefail — an error occurs in a shell “pipeline” (e.g., sort | uniq).

You can change all of these settings in one line in a script:

set -e -u -o pipefail

Or even more concisely:

set -euo pipefail

3.4 Adding the header lines to your script

Add the discussed header lines to your printname.sh script, so it will now contain the following:

#!/bin/bash
set -euo pipefail

echo "This script will print a first and a last name"

And run the script again:

bash scripts/printname.sh
This script will print a first and a last name

That didn’t change anything to the output, but at least we confirmed that the script still works.

It’s possible to execute scripts by only typing their path – for example:

# If the script is in a different dir:
scripts/printname.sh

# If the script is in your working dir:
./printname.sh

To be able to do this, the script:

  1. Needs to have a shebang line so the computer knows which language to execute the script with
  2. The script needs to be “executable”. You can do that by changing the file permissions, e.g.:
# Add 'execute permissions' for printname.sh:
chmod +x printname.sh

Why do you need ./ in the example above when the script is in your working dir? The ./ is necessary to make explicit that you’re referring to a file name. Without it (running just printname.sh), the shell would look for a command or program of that name, and wouldn’t find it. With yet another step, it is possible to add your script to the computer’s registry of commands/programs, but we won’t cover that here (Google $PATH if you’re curious).

4 Shell variables

4.1 Variables

Variables are truly ubiquitous in programming. They are typically used for items that:

  • Are referred to repeatedly and/or
  • Are subject to change.

These tend to be settings like the paths to input and output files, and parameter values for programs. Using variables makes it easier to change such settings and makes it possible to write scripts and programs that are flexible depending on user input. We have already seen some handy applications of variables, like the environment variable $USER, which contains your user name.

Assigning and referencing variables

To assign a value to a variable in the shell, use the syntax variable_name=value:

# Assign the value "beach" to a variable with the name "location":
location=beach

# Assign the value "200" to a variable with the name "nr_samples":
nr_samples=200
Note: there can’t be spaces around the equals sign (=)!

To reference a variable (i.e., to access its value):

  • You need to put a dollar sign $ in front of its name.
  • It is good practice to double-quote ("...") variable names2.

As before with the environment variables $USER and $HOME, we’ll use the echo command to see what values our variables contain:

echo "$location"
beach
echo "$nr_samples"
200

Conveniently, you can use variables in lots of contexts, as if you had directly typed their values:

input_file=../garrigos-data/fastq/ERR10802863_R1.fastq.gz

ls -lh "$input_file"
-rw-rw----+ 1 jelmer PAS0471 0 Mar  7 13:17 data/fastq/sample1_R1.fastq
  • Assigning and printing the value of a variable in R:

    # (Don't run this)
    x <- 5
    x
    [1] 5
    [1] 5
  • Assigning and printing the value of a variable in the Unix shell:

    x=5
    echo $x
    5

Difference are that in the Unix shell:

  • There cannot be any spaces around the = in x=5.
  • You need a $ prefix to reference (but not to assign) variables in the shell3.
  • You need the echo command, a general command to print text, to print the value of $x (cf. in R).

4.2 Variable names

In the shell, variable names:

  • Can contain letters, numbers, and underscores
  • Cannot contain spaces, periods (.), dashes (-), or other special symbols4.
  • Cannot start with a number

Try to make your variable names descriptive, like $input_file above, as opposed to say $x and $myvar.

There are multiple ways of distinguishing words in the absence of spaces, such as $inputFile and $input_file: I prefer the latter, which is called “snake case”.

All-uppercase variable names are pretty commonly used — and recall that so-called environment variables such as $USER and $HOME are always in uppercase.

My preferred approach is to use lowercase for variables and uppercase for what we may call “constants”, like when you “hard-code” certain file paths or settings. That means you include them e.g. in a script without allowing them to be set from outside – more on this later.

4.3 Quoting variables

I have mentioned that it is good practice to quote variables (i.e. to use "$myvar" instead of $myvar). So what can happen if you don’t do this?

# Start by making and moving into a dir to create some messy files
mkdir sandbox
cd sandbox

If a variable’s value contains spaces:

# Assign a string with spaces to variable 'today', and print its value:
today="Tue, Mar 26"
echo $today
Tue, Mar 26
# Try to create a file with a name that includes this variable: 
touch README_$today.txt

# (Using the -1 option to ls will print each entry on its own line)
ls -1
26.txt
Mar
README_Tue,

Oops! The shell performed “field splitting” to split the value into three separate units — as a result, three files were created. This can be avoided by quoting the variable:

touch README_"$today".txt
ls -1
README_Tue, Mar 26.txt

Additionally, without quoting, you can’t explicitly indicate where a variable name ends:

# Start by cleaning the directory
rm *

# We intend to create a file named 'README_Tue, Mar 26_final.txt'
touch README_$today_final.txt
ls -1
README_.txt

Do you understand what happened here?

Click for the solution We have assigned a variable called $today, but the shell will instead look for a variable called $today_final. This is because we have not explicitly indicated where the variable name ends, so the shell will include all characters until it hits a character that cannot be part of a shell variable name: in this case a period, ..

Quoting solves this, too:

ls -1
README_Tue, Mar 26_final.txt
# Move out of the 'sandbox' dir (back to /fs/ess/PAS2880/users/$USER/week06)
cd ..

The $var notation to refer to a variable in the shell is actually an abbreviation of the full notation, which includes curly braces:

echo ${today}
Tue, Mar 26

Putting variable names between curly braces will also make it clear where the variable name begins and ends, although it does not prevent field splitting:

touch README_${today}_final.txt

ls
26_final.txt  Mar  README_Tue,

But you can combine curly braces and quoting:

touch README_"${today}"_final.txt

ls
'README_Tue, Mar 26_final.txt'

By double-quoting a variable, you are essentially escaping (or “turning off”) the default special meaning of the space as a field separator, and are asking the shell to interpret it as a literal space.

Similarly, double quotes will escape other “special characters”, such as shell wildcards. Compare:

# Due to shell expansion, this will echo/list all files in the current working dir
echo *
18.txt Aug README_Thu, README_Thu, Aug 18.txt
# This will simply print the literal "*" character 
echo "*"
*

However, double quotes not turn off the special meaning of $ (which is to denote a string as a variable):

echo "$today"
Thu, Aug 18

…but single quotes will:

echo '$today'
$today

5 Command-line arguments for scripts

When you run a script, you can pass arguments to it, such as a file to operate on. This allows you to make scripts that are flexible when it comes to inputs, outputs, and possibly other settings. That way, you don’t have to “hard-code” such variable things inside the script, and can for example run a script many times in parallel to dramatically speed up your analysis. All shell scripts that we will write in this course will accept arguments.

5.1 Executing a script with arguments

Executing a script with arguments is much like when you provide a command like ls with arguments:

# Running a command like `ls` with or without arguments
# [Don't run any of this, these are just syntax examples]

# Run ls without arguments:
ls

# Pass 1 filename as an argument to ls:
ls data/sampleA.fastq.gz

# Pass 2 filenames as arguments to ls, separated by spaces:
ls data/sampleA.fastq.gz data/sampleB.fastq.gz
# Running a script with or without arguments
# [Don't run any of this, these are just syntax examples]

# Run scripts without arguments:
bash scripts/fastqc.sh
bash scripts/printname.sh

# Run scripts with 1 or 2 arguments:
bash scripts/fastqc.sh data/sampleA.fastq.gz  # 1 argument: a filename
bash scripts/printname.sh John Doe            # 2 arguments: strings representing names

In the next section, you’ll see how you can use these arguments inside your script.

5.2 Positional parameters

Inside the script, any command-line arguments that you pass to it are automatically available in variables, the so-called “positional parameters”. Specifically:

  • Any first argument will be assigned to the variable $1
  • Any second argument will be assigned to $2
  • Any third argument will be assigned to $3, and so on.

In the calls to fastqc.sh and printname.sh above, what are these variables and their values?

Click here for the solution
  • In bash scripts/fastqc.sh data/sampleA.fastq.gz, a single argument, data/sampleA.fastq.gz, is passed to the script, and will be assigned to $1.

  • In bash scripts/printname.sh John Doe, two arguments are passed to the script: the first one (John) will be stored in $1, and the second one (Doe) in $2.

However, even though they are made available, these variables are not automatically used. So, unless you include code in the script to do something with these variables, nothing happens with them.

Therefore, let’s add some code to your printname.sh script to “process” any first and last name that are passed to the script. For now, your script will simply echo the placeholder variables, so that we can see what happens:

#!/bin/bash
set -euo pipefail

echo "This script will print a first and a last name"
echo "First name: $1"
echo "Last name: $2"

# [Paste this into you script - don't enter this directly in your terminal.]

Next, run the script, passing the arguments John and Doe to it:

bash scripts/printname.sh John Doe
This script will print a first and a last name
First name: John
Last name: Doe

Exercise: Command-line arguments

In each scenario that is described below, think about what might happen. Then, run the script as instructed in the scenario to test your prediction.

  1. Running the script printname.sh without passing arguments to it.

    Click here for the solution

    The script will error out because we are referencing variables that don’t exist: since we didn’t pass command-line arguments to the script, the $1 and $2 have not been set.

    bash scripts/printname.sh
    printname.sh: line 5: $1: unbound variable
  2. After commenting out the line with set settings, running the script again without passing arguments to it.

    Click here to learn what “commenting out” means

    You can deactivate a line of code without removing it by inserting a # as the first character of that line. This is often referred to as “commenting out” code. For example, below I’ve commented out the ls command, and nothing will happen if I run this line:

    #ls

    Click here for the solution

    The script will run in its entirety and not throw any errors, because we are now using default Bash settings such that referencing non-existent variables does not throw an error. Of course, no names are printed either, since we didn’t specify any:

    bash scripts/printname.sh
    echo "First name:"
    echo "Last name:"

    Being “commented out”, the set line should read:

    #set -euo pipefail
  3. Double-quoting the entire name when you run the script, e.g.: bash scripts/printname.sh "John Doe".

    Click here for the solution

    Because we are quoting "John Doe", both names are passed as a single argument and both names end up in $1, the “first name”:

    bash scripts/printname.sh "John Doe"
    echo "First name: John Doe"
    echo "Last name:"
  • To get back to where you were, remove the # you inserted in the script in step 2 above to reactive the set line.

5.3 Copying to variables with descriptive names

While you could use the $1-style variables throughout your script, I highly recommend always copying them to more descriptively named variables — for example:

#!/bin/bash
set -euo pipefail

first_name="$1"
last_name="$2"

echo "This script will print a first and a last name"
echo "First name: $first_name"
echo "Last name: $last_name"

Using descriptively named variables in your scripts has several advantages, such as:

  • It will make your script easier to understand for others and for your future self.
  • It will make it less likely that you accidentally mix up the variables.

Copy the above code into printname.sh, replacing its previous contents.

Other variables that are automatically available inside scripts:
  • $# contains the number of command-line arguments passed to the script
  • $0 contains the script’s file name5

6 Recap and next steps

In this session, you’ve learned the basics of shell scripts. As you’ve seen, shell scripts mostly consist of “regular” Unix shell code, but with some added bells and whistles like:

  • Boilerplate shell script header lines: shebang and safe settings
  • Command-line arguments to scripts which are available in variables in the script

In the next lecture, you will learn how to run command-line programs with shell scripts, and how to write loops to easily run a script many times.

Back to top

Footnotes

  1. Alternatively, find the script in the file explorer in the side bar and click on it there.↩︎

  2. We’ll talk more about quoting later.↩︎

  3. Because of this, anytime you see a word/string that starts with a $ in the shell, you can safely assume that it is a variable.↩︎

  4. Compare this with the situation for file names, which ideally do not contain spaces and special characters either, but in which - and . are recommended.↩︎

  5. Though this does not work when you submit scripts as Slurm batch jobs, and we will therefore not use this feature.↩︎