# (Don't run this)
x <- 5
x[1] 5
Week 6 – Lecture A
September 26, 2025
This week, you will learn how to write shell scripts to run programs with command-line interfaces (CLIs), like various bioinformatics tools.
The end goal is to be able to submit shell scripts as “batch jobs” at OSC, which e.g. allows you to run them simultaneously many times! This is extremely useful because with omics analysis, it’s common to have to run the same step for many samples in parallel. To get there, you still need to learn about the following topics:
In this session, we will talk about:
/fs/ess/PAS2880/users/$USERweek06 dirweek06 dir, create a dir scripts (but don’t navigate there)week06 dirWorking with keyboard shortcuts for common operations can be a lot faster than using your mouse. Below are some particularly useful ones in VS Code:
Open a terminal: Ctrl+` (backtick)
Toggle between the terminal and the editor pane: Ctrl+` and Ctrl+1.
“Line actions” in the editor:
Move a line up or down: Alt/Option+⬆/⬇
Delete a line: Ctrl/⌘+Shift+K
The standard copy and cut shortcuts (Ctrl/⌘+X / C) will cut/copy the entire line that the cursor is on!
For a single-page PDF overview of keyboard shortcuts for your operating system: => Help => Keyboard Shortcut Reference. (Or for direct links to these PDFs: Windows / Mac / Linux.)
Many bioinformatics tools/programs/software that are used to analyze omics data are run from the command line. In other words, they have a command-line interface (CLI). We can run them using command line expressions that are structurally very similar to how we’ve been using basic Unix shell commands. You saw an example of this when we ran FastQC last week.
However, we’ve been running shell commands in an “interactive” manner: typing or pasting them into the shell and then pressing Enter. But when you run bioinformatics tools, it is in most cases a much better idea to run them via shell scripts, which are plain-text files that contain shell code.
Some general reasons why it can be beneficial to use shell scripts instead of running code interactively line-by-line:
And very importantly for our purposes at OSC, we can submit scripts as “batch jobs” to the compute job scheduling program (which is called Slurm), and this allows us to:
So far, we’ve mostly used talked about the Unix shell and shell scripts. From today onwards, we’ll also see the term “Bash”. Recall the difference we talked about in the Unix shell intro session: there are multiple Unix shell language variants and the specific one we’ve been using is the Bash shell (which is by far the most common). Our shell scripts are therefore in the Bash language, can be specifically called Bash scripts, and can be run with the bash command.
Create your first script, printname.sh (shell scripts usually have the extension .sh) as follows:
A nice VS Code trick mentioned before is that is if you hold Ctrl (Windows) / ⌘ (Mac) while hovering over a file path in the terminal, the path should become underlined and you can click on it to open the file. Try that with the printname.sh script1. Once the file is open in your editor pane, type or paste the following inside the script:
Shell scripts mostly contain the same Unix shell code you have become familiar with. As such, the printline.sh file with a single echo command constitutes a functional shell script!
One way of running the script is by typing bash followed by the path to the script:
This script will print a first and a last name
That worked! The script doesn’t yet print any names like it “promises” to do, but we will add that functionality in a little bit. First, you’ll learn about two header lines that are good practice to add to every shell script.
Any changes you make to this and other files in the editor pane should be immediately, automatically saved by VS Code. If that’s not happening for some reason, you should see an indication of unsaved changes like a large black dot next to the script’s file name in the editor pane tab header.
If the file is not auto-saving, you can always save it manually (including with Ctrl/Cmd+S) like you would do in other programs. However, it may be convenient to turn Auto Save on: press Ctrl/Cmd+Shift+S to open the Command Palette and type “Auto Save”. You should see an option “Toggle Auto Save”: click on that.
A so-called “shebang” line is commonly used as the first line of a script to indicate which computer language the script uses. More specifically, this line tells the computer where to find the binary (executable) that runs your script – and since this is a Bash shell script, that will be the Bash program.
Such a line starts with #! (hash-bang), basically marking it as a special type of comment. Those two characters are followed by the path to the relevant program: in our case Bash, which itself is just a program with an executable file that is located at /bin/bash on Linux and Mac computers.
While not always strictly necessary, adding a shebang line to every shell script is good practice, especially when you submit your script to OSC’s Slurm queue, as we’ll do later.
Another best-practice line you should add to your shell scripts will change some default settings to safer alternatives.
The following two default settings of the Bash shell are bad ideas inside scripts:
When you reference a non-existent (“unset”) variable, the shell replaces that with nothing without complaint:
Hello, my name is . What is yours?
In scripts, this can lead to all sorts of downstream problems, because you very likely tried and failed to do something with an existing variable (e.g. you misspelled its name, or forgot to assign it altogether). Even more problematically, this can lead to potentially very destructive file removal, as the box below illustrates.
A Bash script keeps running after encountering errors. That is, if an error is encountered when running, say, line 2 of a script, any remaining lines in the script will nevertheless be executed.
At best, this is a waste of computer resources, but it can also lead to all kinds of unintended consequences. Additionally, if your script prints a lot of output, you might not notice an error somewhere in the middle if it doesn’t produce more errors downstream. But the downstream results from what we at that point might call a “zombie script” can still be completely wrong.
The shell’s default behavior of ignoring the referencing of unset variables can lead to accidental file removal as follows:
Using a variable, you try to remove temporary files whose names start with tmp_:
Using a variable, you try to remove a temporary directory:
In both examples, there is a similar typo: temp vs. tmp, which means that we are referencing a (likely) non-existent variable.
In the first example, rm "$tmp_prefix"* would have been interpreted as rm *, because the non-existent variable is simply ignored. Therefore, we would have removed all files in the current working directory.
In the second example, along similar lines, rm -rf $tmpdir/* would have been interpreted as rm -rf /*. Horrifyingly, this would attempt to remove the entire filesystem: recall that a leading / in a path is a computer’s root directory. (-r makes the removal recursive and -f makes forces removal).
These kinds of accidents are especially likely to happen inside scripts, where it is common to use variables and to work non-interactively.
But before you get too scared of doing terrible damage, note that at OSC, you would not be able to remove any essential files since you don’t have the permissions to do so. On your own computer, this could be more genuinely dangerous, though even there, you would not be able to remove operating system files without requesting “admin” rights.
The following three settings will make your shell scripts more robust and safer. With these settings, the script terminates with an appropriate error message if:
set -u — an “unset” (non-existent) variable is referenced.set -e — almost any error occurs.set -o pipefail — an error occurs in a shell “pipeline” (e.g., sort | uniq).You can change all of these settings in one line in a script:
Or even more concisely:
Add the discussed header lines to your printname.sh script, so it will now contain the following:
And run the script again:
This script will print a first and a last name
That didn’t change anything to the output, but at least we confirmed that the script still works.
bash command? (Click to expand)
It’s possible to execute scripts by only typing their path – for example:
# If the script is in a different dir:
scripts/printname.sh
# If the script is in your working dir:
./printname.shTo be able to do this, the script:
Why do you need ./ in the example above when the script is in your working dir? The ./ is necessary to make explicit that you’re referring to a file name. Without it (running just printname.sh), the shell would look for a command or program of that name, and wouldn’t find it. With yet another step, it is possible to add your script to the computer’s registry of commands/programs, but we won’t cover that here (Google $PATH if you’re curious).
Variables are truly ubiquitous in programming. They are typically used for items that:
These tend to be settings like the paths to input and output files, and parameter values for programs. Using variables makes it easier to change such settings and makes it possible to write scripts and programs that are flexible depending on user input. We have already seen some handy applications of variables, like the environment variable $USER, which contains your user name.
To assign a value to a variable in the shell, use the syntax variable_name=value:
# Assign the value "beach" to a variable with the name "location":
location=beach
# Assign the value "200" to a variable with the name "nr_samples":
nr_samples=200=)!
To reference a variable (i.e., to access its value):
$ in front of its name."...") variable names2.As before with the environment variables $USER and $HOME, we’ll use the echo command to see what values our variables contain:
beach
200
Conveniently, you can use variables in lots of contexts, as if you had directly typed their values:
-rw-rw----+ 1 jelmer PAS0471 0 Mar 7 13:17 data/fastq/sample1_R1.fastq
Assigning and printing the value of a variable in R:
[1] 5Assigning and printing the value of a variable in the Unix shell:
5Difference are that in the Unix shell:
= in x=5.$ prefix to reference (but not to assign) variables in the shell3.echo command, a general command to print text, to print the value of $x (cf. in R).In the shell, variable names:
.), dashes (-), or other special symbols4.Try to make your variable names descriptive, like $input_file above, as opposed to say $x and $myvar.
There are multiple ways of distinguishing words in the absence of spaces, such as $inputFile and $input_file: I prefer the latter, which is called “snake case”.
All-uppercase variable names are pretty commonly used — and recall that so-called environment variables such as $USER and $HOME are always in uppercase.
My preferred approach is to use lowercase for variables and uppercase for what we may call “constants”, like when you “hard-code” certain file paths or settings. That means you include them e.g. in a script without allowing them to be set from outside – more on this later.
I have mentioned that it is good practice to quote variables (i.e. to use "$myvar" instead of $myvar). So what can happen if you don’t do this?
If a variable’s value contains spaces:
# Assign a string with spaces to variable 'today', and print its value:
today="Tue, Mar 26"
echo $todayTue, Mar 26
# Try to create a file with a name that includes this variable:
touch README_$today.txt
# (Using the -1 option to ls will print each entry on its own line)
ls -126.txt
Mar
README_Tue,
Oops! The shell performed “field splitting” to split the value into three separate units — as a result, three files were created. This can be avoided by quoting the variable:
README_Tue, Mar 26.txt
Additionally, without quoting, you can’t explicitly indicate where a variable name ends:
# Start by cleaning the directory
rm *
# We intend to create a file named 'README_Tue, Mar 26_final.txt'
touch README_$today_final.txt
ls -1README_.txt
Do you understand what happened here?
$today, but the shell will instead look for a variable called $today_final. This is because we have not explicitly indicated where the variable name ends, so the shell will include all characters until it hits a character that cannot be part of a shell variable name: in this case a period, ..
Quoting solves this, too:
README_Tue, Mar 26_final.txt
${myvar} (Click to expand)
The $var notation to refer to a variable in the shell is actually an abbreviation of the full notation, which includes curly braces:
Tue, Mar 26
Putting variable names between curly braces will also make it clear where the variable name begins and ends, although it does not prevent field splitting:
26_final.txt Mar README_Tue,
But you can combine curly braces and quoting:
'README_Tue, Mar 26_final.txt'
By double-quoting a variable, you are essentially escaping (or “turning off”) the default special meaning of the space as a field separator, and are asking the shell to interpret it as a literal space.
Similarly, double quotes will escape other “special characters”, such as shell wildcards. Compare:
18.txt Aug README_Thu, README_Thu, Aug 18.txt
*
However, double quotes not turn off the special meaning of $ (which is to denote a string as a variable):
Thu, Aug 18
…but single quotes will:
$today
When you run a script, you can pass arguments to it, such as a file to operate on. This allows you to make scripts that are flexible when it comes to inputs, outputs, and possibly other settings. That way, you don’t have to “hard-code” such variable things inside the script, and can for example run a script many times in parallel to dramatically speed up your analysis. All shell scripts that we will write in this course will accept arguments.
Executing a script with arguments is much like when you provide a command like ls with arguments:
# Running a command like `ls` with or without arguments
# [Don't run any of this, these are just syntax examples]
# Run ls without arguments:
ls
# Pass 1 filename as an argument to ls:
ls data/sampleA.fastq.gz
# Pass 2 filenames as arguments to ls, separated by spaces:
ls data/sampleA.fastq.gz data/sampleB.fastq.gz# Running a script with or without arguments
# [Don't run any of this, these are just syntax examples]
# Run scripts without arguments:
bash scripts/fastqc.sh
bash scripts/printname.sh
# Run scripts with 1 or 2 arguments:
bash scripts/fastqc.sh data/sampleA.fastq.gz # 1 argument: a filename
bash scripts/printname.sh John Doe # 2 arguments: strings representing namesIn the next section, you’ll see how you can use these arguments inside your script.
Inside the script, any command-line arguments that you pass to it are automatically available in variables, the so-called “positional parameters”. Specifically:
$1$2$3, and so on. In the calls to fastqc.sh and printname.sh above, what are these variables and their values?
In bash scripts/fastqc.sh data/sampleA.fastq.gz, a single argument, data/sampleA.fastq.gz, is passed to the script, and will be assigned to $1.
In bash scripts/printname.sh John Doe, two arguments are passed to the script: the first one (John) will be stored in $1, and the second one (Doe) in $2.
However, even though they are made available, these variables are not automatically used. So, unless you include code in the script to do something with these variables, nothing happens with them.
Therefore, let’s add some code to your printname.sh script to “process” any first and last name that are passed to the script. For now, your script will simply echo the placeholder variables, so that we can see what happens:
#!/bin/bash
set -euo pipefail
echo "This script will print a first and a last name"
echo "First name: $1"
echo "Last name: $2"
# [Paste this into you script - don't enter this directly in your terminal.]Next, run the script, passing the arguments John and Doe to it:
This script will print a first and a last name
First name: John
Last name: Doe
In each scenario that is described below, think about what might happen. Then, run the script as instructed in the scenario to test your prediction.
Running the script printname.sh without passing arguments to it.
After commenting out the line with set settings, running the script again without passing arguments to it.
Click here to learn what “commenting out” means
You can deactivate a line of code without removing it by inserting a # as the first character of that line. This is often referred to as “commenting out” code. For example, below I’ve commented out the ls command, and nothing will happen if I run this line:
Click here for the solution
The script will run in its entirety and not throw any errors, because we are now using default Bash settings such that referencing non-existent variables does not throw an error. Of course, no names are printed either, since we didn’t specify any:
echo "First name:"
echo "Last name:"
Being “commented out”, the set line should read:
Double-quoting the entire name when you run the script, e.g.: bash scripts/printname.sh "John Doe".
# you inserted in the script in step 2 above to reactive the set line.While you could use the $1-style variables throughout your script, I highly recommend always copying them to more descriptively named variables — for example:
#!/bin/bash
set -euo pipefail
first_name="$1"
last_name="$2"
echo "This script will print a first and a last name"
echo "First name: $first_name"
echo "Last name: $last_name"Using descriptively named variables in your scripts has several advantages, such as:
Copy the above code into printname.sh, replacing its previous contents.
$# contains the number of command-line arguments passed to the script$0 contains the script’s file name5In this session, you’ve learned the basics of shell scripts. As you’ve seen, shell scripts mostly consist of “regular” Unix shell code, but with some added bells and whistles like:
In the next lecture, you will learn how to run command-line programs with shell scripts, and how to write loops to easily run a script many times.
Alternatively, find the script in the file explorer in the side bar and click on it there.↩︎
We’ll talk more about quoting later.↩︎
Because of this, anytime you see a word/string that starts with a $ in the shell, you can safely assume that it is a variable.↩︎
Compare this with the situation for file names, which ideally do not contain spaces and special characters either, but in which - and . are recommended.↩︎
Though this does not work when you submit scripts as Slurm batch jobs, and we will therefore not use this feature.↩︎