Compute Jobs with Slurm
With a focus on submitting shell scripts as “batch jobs”
We have so far been working on login nodes at OSC, but in order to run some actual analyses, you will need access to compute nodes.
Automated scheduling software allows hundreds of people with different requirements to access compute nodes effectively and fairly. For this purpose, OSC uses the Slurm scheduler (Simple linux utility for resource management).
A temporary reservation of resources on compute nodes is called a compute job. What are the options to start a compute job at OSC?
- “Interactive Apps” — We can start programs with GUIs, such as RStudio or Jupyter Notebook on OnDemand, and they will run in a browser window.
- Interactive shell jobs — Start a Bash shell on a compute node.
- Batch (non-interactive) jobs — Run a script on a compute node.
When running command-line programs for genomics analyses, batch jobs are the most useful and will be the focus of this module. We’ll also touch on interactive shell jobs, which can occasionally be handy and are requested and managed in a very similar way to batch jobs.
1 Setup
2 Interactive shell jobs
Interactive shell jobs will grant you interactive shell access on a compute node. Working in an interactive shell job is operationally identical to working on a login node as we’ve been doing so far, but the difference is that it’s now okay to use significant computing resources. (How much and for how long depends on what you reserve.)
2.1 Using srun
A couple of different commands can be used to start an interactive shell job. I prefer the general srun
command1, which we can use with --pty /bin/bash
added to get an interactive Bash shell.
However, if we run that command without additional options, we get an error:
srun --pty /bin/bash
srun: error: ERROR: Job invalid: Must specify account for job
srun: error: Unable to allocate resources: Unspecified error
As the error message Must specify account for job
tries to tell us, we need to indicate which OSC project (or as SLURM puts it, “account”) we want to use for this compute job. This is because an OSC project always has to be charged for the computing resources used during a compute job.
To specify the project/account, we can use the --account=
option followed by the project number:
srun --account=PAS2250 --pty /bin/bash
srun: job 12431932 queued and waiting for resources
srun: job 12431932 has been allocated resources
[…regular login info, such as quota, not shown…]
[jelmer@p0133 PAS2250]$
There we go! First some Slurm scheduling info was printed to screen:
- Initially, the job is “queued”: that is, waiting to start.
- Very soon (usually!), the job is “allocated resources”: that is, computing resources such as a compute node are reserved for the job.
Then:
The job starts and because we’ve reserved an interactive shell job, a new Bash shell is initiated: for that reason, we get to see our regular login info once again.
Most importantly, we are no longer on a login node but on a compute node, as our prompt hints at: we switched from something like
[jelmer@pitzer-login04 PAS2250]$
to the[jelmer@p0133 PAS2250]$
shown above.Note also that the job has a number (above:
job 12431932
): every compute job has such a unique identifier among all jobs by all users at OSC, and we can use this number to monitor and manage it. All of us will therefore see a different job number pop up.
2.2 Compute job options
The --account=
option is just one of out of many options we can use when reserving a compute job, but is the only one that always has to be specified (including for batch jobs and for Interactive Apps).
Defaults exist for all other options, such as the amount of time (1 hour) and the number of cores (1). These options are all specified in the same way for interactive and batch jobs, and we’ll dive into them below.
3 Intro to batch jobs
When requesting batch jobs, we are asking the Slurm scheduler to run a script on a compute node.
In contrast to interactive shell jobs, we stay in our current shell when submitting a script, and the script will run on a compute node “out of sight”. Also, as we’ll discuss in more detail below:
Output from the script that would normally be printed to screen ends up in a file (!).
Despite not being on the same node as our job, we can do things like monitoring whether the job is already/still running, and cancelling the job.
3.1 The sbatch
command
Whereas we used Slurm’s srun
command to start an interactive shell job, we use its sbatch
command to submit a batch job. Recall from the Bash scripting module that we can run a Bash script as follows:
bash scripts/printname.sh Jane Doe
First name: Jane
Last name: Doe
The above command ran the script on our current node, a login node. To instead submit the script to the Slurm queue, we would start by simply replacing bash
by sbatch
:
sbatch scripts/printname.sh Jane Doe
srun: error: ERROR: Job invalid: Must specify account for job
srun: error: Unable to allocate resources: Unspecified error
As we’ve learned, we always have to specify the OSC account when submitting a compute job. Conveniently, we can also specify Slurm/sbatch
options inside our script, but first, let’s add the --account
option on the command line:
sbatch --account=PAS2250 scripts/printname.sh Jane Doe
Submitted batch job 12431935
3.2 Adding sbatch
options in scripts
Instead of specifying Slurm/sbatch
options on the command-line when we submit the script, we can also add these options inside the script.
This is handy because even though we have so far only seen the account=
option, you often want to specify several options. That would lead to very long sbatch
commands. Additionally, it can be practical to store a script’s typical Slurm options along with the script itself.
We add the options in the script using another type of special comment line akin to the shebang line, marked by #SBATCH
. The equivalent of adding --account=PAS2250
after sbatch
on the command line is a line in a script that reads #SBATCH --account=PAS2250
.
Just like the shebang line, the #SBATCH
line(s) should be at the top of the script. Let’s add one such line to the printname.sh
script, such that the first few lines read:
#!/bin/bash
#SBATCH --account=PAS2250
set -ueo pipefail
After having added this to the script, we can run our earlier sbatch
command without options:
sbatch printname.sh Jane Doe
Submitted batch job 12431942
After we submit the batch job, we immediately get our prompt back. Everything else (job queuing and running) will happen out of our immediate view. This allows us to submit many jobs at the same time — we don’t have to wait for other jobs to finish (or even to start).
3.3 Where does the output go?
Above, we saw that when we ran the printname.sh
script directly, its output was printed to the screen, whereas when we submitted it as a batch job, all that was sprinted to screen was Submitted batch job 12431942
. So where did our output go?
Our output ended up in a file called slurm-12431942.out
: that is, slurm-<job-number>.out
. Since each job number is unique to a given job, your file would have a different number in its name. We might call this type of file a Slurm log file.
Let’s take a look at the contents of the Slurm log file with the cat
command:
cat slurm-12431942.out
First name: Jane
Last name: Doe
This file simply contains the output that we saw printed to screen before — nothing more and nothing less.
It’s important to conceptually distinguish two broad types of output that a script may have:
Output that is printed to screen when we directly run a script, such as what was produced by our
echo
statements, by any errors that may occur, and possibly by a program that we run in the script.2 As we saw, this output ends up in the Slurm log file when we submit the script as a batch job.Output that we redirect to a file (
> myfile.txt
) or output that a program we run in the script writes to file(s). This type of output will always end up in those very same files regardless of whether we run the script directly or as a batch job.
4 Monitoring batch (and other compute) jobs
4.1 A sleepy script for practice
Let’s use the following short script to practice monitoring and managing batch and other compute jobs.
Open a new file in the VS Code
editor ( => File
=> New File
) and save it as scripts/sleep.sh
, then copy the following into it:
#!/bin/bash
#SBATCH --account=PAS2250
echo "I will sleep for 30 seconds" > sleep.txt
sleep 30s
echo "I'm awake!"
On Your Own: Batch job output recap
If you submit the script as a batch job using sbatch scripts/sleep.sh
:
- How many output files will this batch job produce?
- What will be in it/them?
- In which directory will the file(s) appear?
- In terms of output, what would have been different if we had run the script directly, i.e. using the command
bash scripts/sleep.sh
?
You can test your predictions by running the script, if you want.
4.2 Checking the status of our batch job
After we submit a job, it may be initially be queued (or pending), before the Slurm scheduler finds a “slot” for our job. Then, the job will start running, and at some point it will stop running, either because the script ran into and error or because it ran to completion.
How can we check the status of our batch job? We can do so using the Slurm command squeue
:
squeue -u $USER -l
In the command above:
- Our user name is specified with the
-u
option (otherwise we would see everyone’s jobs) — - We use the environment variable
$USER
, which is a variable that’s always available and contains your user name, so that the very same code will work for everyone (you can also simply type your user name if that’s shorter or easier). - We’ve added the
-l
option to get more verbose output.
Let’s try that — first we submit the script:
sbatch scripts/sleep.sh
Submitted batch job 12431945
We may be able to catch the STATE
being PENDING
before the job starts:
squeue -u $USER -l
# Fri Aug 19 07:23:19 2022
# JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
# 12520046 serial-40 sleep.sh jelmer PENDING 0:00 1:00:00 1 (None)
But soon enough it should say RUNNING
in the STATE
column:
squeue -u $USER -l
# Fri Aug 19 07:23:45 2022
# JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
# 12520046 condo-osu sleep.sh jelmer RUNNING 0:12 1:00:00 1 p0133
The script should finish after 30 seconds (sleep 30s
…), and after that, the squeue
output will only show the header line with column names:
squeue -u $USER -l
# Fri Aug 19 07:24:18 2022
# JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
Once a job has finished running, it disappears from the squeue
listing. So, the output above means that we have no running (or pending) jobs.
But we need to check our output file(s) to see if our script ran successfully!
cat sleep.txt
I will sleep for 30 seconds
cat slurm-12520046.out
I’m awake!
4.3 Cancelling jobs (and other monitoring/managing commands)
Sometimes, you want to cancel one or more jobs, because you realize you made a mistake in the script or you used the wrong input files. You can do so using scancel
:
scancel 2979968 # Cancel job number 2979968
scancel -u $USER # Cancel all your jobs
5 Common sbatch
options
5.1 --account
: The OSC project
As seen above. Always specify the project when submitting a batch job.
5.2 --time
: Time limit (“wall time”)
Specify the maximum amount of time your job will run for. Wall time is a term meant to distinguish it from, say “core hours”: if a job runs for 2 hour and used 8 cores, the wall time was 2 hours and the number of core hours was 2 x 8 = 16.
- Your job gets killed as soon as it hits the specified time limit!
- You will only be charged for the time your job actually used.
- The default time limit is 1 hour. Acceptable time formats include:
minutes
hours:minutes:seconds
days-hours
- For single-node jobs, up to 168 hours (7 days) can be requested. If that’s not enough, you can request access to the
longserial
queue for jobs of up to 336 hours (14 days).
#!/bin/bash
#SBATCH --time=1:00:00
5.3 --mem
: RAM memory
Specify a maximum amount of RAM (Random Access Memory) that your job can use.
- The default unit is MB (MegaBytes) — append
G
for GB. - The default amount is 4 GB per core that you reserve
- Like with the time limit, your job gets killed when it hits the memory limit.
#!/bin/bash
#SBATCH --mem=20G
5.4 Cores (& nodes and tasks)
Specify the number of nodes (≈ computers), cores, or “tasks” (processes). These are separate but related options, and this is where things can get confusing!
- Slurm for the most part uses “core” and “CPU” interchangeably3. More generally, “thread” is also commonly used interchangeably with core/CPU4.
Running a program that uses multiple threads/cores/CPUs (“multi-threading”) is common. In such cases, specify the number of threads/cores/CPUs
n
with--cpus-per-task=n
(and keep--nodes
and--ntasks
at their defaults of 1).The program you’re running may have an argument like
--cores
or--threads
, which you should then set ton
as well.
Resource/use | short | long | default |
---|---|---|---|
Nr. of cores/CPUs/threads (per task) | -c 1 |
--cpus-per-task=1 |
1 |
Nr. of “tasks” (processes) | -n 1 |
--ntasks=1 |
1 |
Nr. of tasks per node | - | --ntasks-per-node=1 |
1 |
Nr. of nodes | -N 1 |
--nodes=1 |
1 |
#!/bin/bash
#SBATCH --cpus-per-task=2
5.5 --output
: Slurm log files
As we saw above, by default, all output from a script that would normally5 be printed to screen will end up in a Slurm log file when we submit the script as a batch job. This file will be created in the directory from which you submitted the script, and will be called slurm-<job-number>.out
, e.g. slurm-12431942.out
.
But it is possible to change the name of this file. For instance, it can be useful to include the name of the program that the script runs, so that it’s easier to recognize this file later.
We can do this with the --output
option, e.g. --output=slurm-fastqc.out
if we were running FastQC.
However, you’ll generally want to keep the batch job number in the file name too6. Since we won’t know the batch job number in advance, we need a trick here and that is to use %j
, which represents the batch job number:
#!/bin/bash
#SBATCH --output=slurm-fastqc-%j.out
6 Addendum: Table with sbatch
options
This includes all the discussed options, and a couple more useful ones:
Resource/use | short | long | default |
---|---|---|---|
Project to be billed | -A PAS0471 |
--account=PAS0471 |
N/A |
Time limit | -t 4:00:00 |
--time=4:00:00 |
1:00:00 |
Nr of nodes | -N 1 |
--nodes=1 |
1 |
Nr of cores | -c 1 |
--cpus-per-task=1 |
1 |
Nr of “tasks” (processes) | -n 1 |
--ntasks=1 |
1 |
Nr of tasks per node | - | --ntasks-per-node |
1 |
Memory limit per node | - | --mem=4G |
(4G) |
Log output file (%j = job number) | -o |
--output=slurm-fastqc-%j.out |
|
Error output (stderr) | -e |
--error=slurm-fastqc-%j.err |
|
Job name (displayed in the queue) | - | --job-name=fastqc |
|
Partition (=queue type) | - | --partition=longserial --partition=hugemem |
|
Get email when job starts, ends, fails, or all of the above |
- | --mail-type=START --mail-type=END --mail-type=FAIL --mail-type=ALL |
|
Let job begin at/after specific time | - | --begin=2021-02-01T12:00:00 |
|
Let job begin after other job is done | - | --dependency=afterany:123456 |
Footnotes
Other options:
salloc
works almost identically tosrun
, whereassinteractive
is an OSC convenience wrapper but with very limited options.↩︎Technically, these are two different types of output, as we briefly touch on below: “standard output” and “standard error”.↩︎
Even though technically, one CPU often contains multiple cores.↩︎
Even though technically, one core often contains multiple threads.↩︎
That is, when we run the script directly, e.g.
bash myscript.sh
↩︎For instance, we might be running the FastQC script multiple times, and otherwise those would all have the same name and be overwritten.↩︎