Week 6: Shell scripting & CLI tools
1 Overview
This week, you will learn how to write shell scripts and use them to run programs with command-line interfaces (CLIs), like various bioinformatics tools.
The end goal is to be able to submit shell scripts as “batch jobs” at OSC, which e.g. allows you to run them simultaneously up to hundreds of times! To get there, you need to learn about the following topics:
- The basics of shell scripts (this week)
- Running command-line programs using shell scripts (this week)
- Submitting batch jobs with Slurm (next week)
2 Learning goals
Shell scripting basics
- Why it is useful to collect your commands into shell scripts that can be rerun easily
- The basics of shell scripts including hell script header lines
- Why and how to adorn scripts with tests and
echo
statements - More on shell variables and how to use them
- Using command-line arguments with your own scripts
Running command-line software with shell scripts
- Running command-line programs (we focus on bioinformatics tools), and running them using shell scripts
- How
for
loops work - How to use a
for
loop to loop over files and run a shell script many times
Unix shell tips & tricks
- Parameter expansion – and using this in the context of looping over samples (pairs of FASTQ files) rather than individual files
- Command substitution to save the output of commands
basename
anddirname
to extract parts of file paths
3 Exercises & assignments
- Graded assignment: Software and shell scripts (Due 10/05)
4 Further resources
Buffalo (2015) (OSU libray link) – Chapter 12: “Bioinformatics Shell Scripting, Writing Pipelines, and Parallelizing Tasks”
The latter part of this chapter is about using
find
,xargs
, and Makefiles. These are somewhat tangential to the week’s topic of scripts, and we will not talk about them in class. As for Makefiles specifically, you will learn an alternative approach to workflow management later in this course, so I would recommend to only skim that part.