Compute Jobs with Slurm
With a focus on submitting shell scripts as “batch jobs”
We have so far been working on login nodes at OSC, but in order to run some actual analyses, you will need access to compute nodes.
Automated scheduling software allows hundreds of people with different requirements to access compute nodes effectively and fairly. For this purpose, OSC uses the Slurm scheduler (Simple linux utility for resource management).
A temporary reservation of resources on compute nodes is called a compute job. What are the options to start a compute job at OSC?
- “Interactive Apps” — We can start programs with GUIs, such as RStudio or Jupyter Notebook on OnDemand, and they will run in a browser window.
- Interactive shell jobs — Start a Bash shell on a compute node.
- Batch (non-interactive) jobs — Run a script on a compute node.
When running command-line programs for genomics analyses, batch jobs are the most useful and will be the focus of this module. We’ll also touch on interactive shell jobs, which can occasionally be handy and are requested and managed in a very similar way to batch jobs.
1 Setup
Log in to OSC at https://ondemand.osc.edu.
In the blue top bar, select
Interactive Apps
and thenCode Server
.In the form that appears:
- Enter
4
or more in the boxNumber of hours
- To avoid having to switch folders within VS Code, enter
/fs/ess/scratch/PAS2250/participants/<your-folder>
in the boxWorking Directory
(replace<your-folder>
by the actual name of your folder). - Click
Launch
.
- Enter
On the next page, once the top bar of the box is green and says
Runnning
, clickConnect to VS Code
.Open a terminal: =>
Terminal
=>New Terminal
.In the terminal, type
bash
and press Enter.Type
pwd
in the termain to check you are in/fs/ess/scratch/PAS2250
.If not, click =>
File
=>Open Folder
and enter/fs/ess/scratch/PAS2250/<your-folder>
.
2 Interactive shell jobs
Interactive shell jobs will grant you interactive shell access on a compute node. Working in an interactive shell job is operationally identical to working on a login node as we’ve been doing so far, but the difference is that it’s now okay to use significant computing resources. (How much and for how long depends on what you reserve.)
2.1 Using srun
A couple of different commands can be used to start an interactive shell job. I prefer the general srun
command1, which we can use with --pty /bin/bash
added to get an interactive Bash shell.
However, if we run that command without additional options, we get an error:
srun --pty /bin/bash
srun: error: ERROR: Job invalid: Must specify account for job
srun: error: Unable to allocate resources: Unspecified error
As the error message Must specify account for job
tries to tell us, we need to indicate which OSC project (or as SLURM puts it, “account”) we want to use for this compute job. This is because an OSC project always has to be charged for the computing resources used during a compute job.
To specify the project/account, we can use the --account=
option followed by the project number:
srun --account=PAS2250 --pty /bin/bash
srun: job 12431932 queued and waiting for resources
srun: job 12431932 has been allocated resources
[…regular login info, such as quota, not shown…]
[jelmer@p0133 PAS2250]$
There we go! First some Slurm scheduling info was printed to screen:
- Initially, the job is “queued”: that is, waiting to start.
- Very soon (usually!), the job is “allocated resources”: that is, computing resources such as a compute node are reserved for the job.
Then:
The job starts and because we’ve reserved an interactive shell job, a new Bash shell is initiated: for that reason, we get to see our regular login info once again.
Most importantly, we are no longer on a login node but on a compute node, as our prompt hints at: we switched from something like
[jelmer@pitzer-login04 PAS2250]$
to the[jelmer@p0133 PAS2250]$
shown above.Note also that the job has a number (above:
job 12431932
): every compute job has such a unique identifier among all jobs by all users at OSC, and we can use this number to monitor and manage it. All of us will therefore see a different job number pop up.
Batch jobs start in the directory that they were submitted from: that is, your working directory remains the same.
2.2 Compute job options
The --account=
option is just one of out of many options we can use when reserving a compute job, but is the only one that always has to be specified (including for batch jobs and for Interactive Apps).
Defaults exist for all other options, such as the amount of time (1 hour) and the number of cores (1). These options are all specified in the same way for interactive and batch jobs, and we’ll dive into them below.
The “bigger” (more time, more cores, more memory) our job is, the more likely it is that our job will be pending for an appreciable amount of time.
Smaller jobs (requesting up to a few hours and cores) will almost always start running nearly instantly. Even big jobs (requesting a day or more, 10 or more cores) will often do so, but during busy times, you might have to wait for a while. That said, the only times I’ve had to wait for more than an hour or so was when I was requesting jobs with very large memory requirements (100s of GBs), which have to be submitted to a separate queue/“partition”.
3 Intro to batch jobs
When requesting batch jobs, we are asking the Slurm scheduler to run a script on a compute node.
In contrast to interactive shell jobs, we stay in our current shell when submitting a script, and the script will run on a compute node “out of sight”. Also, as we’ll discuss in more detail below:
Output from the script that would normally be printed to screen ends up in a file (!).
Despite not being on the same node as our job, we can do things like monitoring whether the job is already/still running, and cancelling the job.
The script that we submit can be in different languages but typically, including in all examples in this workshop, they are shell (Bash) scripts.
3.1 The sbatch
command
Whereas we used Slurm’s srun
command to start an interactive shell job, we use its sbatch
command to submit a batch job. Recall from the Bash scripting module that we can run a Bash script as follows:
bash scripts/printname.sh Jane Doe
First name: Jane
Last name: Doe
printname.sh
script?
- Open a new file in the
VS Code
editor ( =>File
=>New File
) and save it asprintname.sh
- Copy the code below into the script:
#!/bin/bash
set -ueo pipefail
first_name=$1
last_name=$2
echo "First name: $first_name"
echo "Last name: $last_name"
The above command ran the script on our current node, a login node. To instead submit the script to the Slurm queue, we would start by simply replacing bash
by sbatch
:
sbatch scripts/printname.sh Jane Doe
srun: error: ERROR: Job invalid: Must specify account for job
srun: error: Unable to allocate resources: Unspecified error
As we’ve learned, we always have to specify the OSC account when submitting a compute job. Conveniently, we can also specify Slurm/sbatch
options inside our script, but first, let’s add the --account
option on the command line:
sbatch --account=PAS2250 scripts/printname.sh Jane Doe
Submitted batch job 12431935
sbatch
options and script arguments
Note that we can use sbatch
options and script arguments in one command, in the following order:
sbatch [sbatch-options] myscript.sh [script-arguments]
But both of these are optional:
sbatch printname.sh # No options/arguments for either
sbatch printname.sh Jane Doe # Script arguments but no sbatch option
sbatch --account=PAS2250 printname.sh # sbatch option but no script arguments
sbatch --account=PAS2250 printname.sh Jane Doe # Both sbatch option and script arguments
3.2 Adding sbatch
options in scripts
Instead of specifying Slurm/sbatch
options on the command-line when we submit the script, we can also add these options inside the script.
This is handy because even though we have so far only seen the account=
option, you often want to specify several options. That would lead to very long sbatch
commands. Additionally, it can be practical to store a script’s typical Slurm options along with the script itself.
We add the options in the script using another type of special comment line akin to the shebang line, marked by #SBATCH
. The equivalent of adding --account=PAS2250
after sbatch
on the command line is a line in a script that reads #SBATCH --account=PAS2250
.
Just like the shebang line, the #SBATCH
line(s) should be at the top of the script. Let’s add one such line to the printname.sh
script, such that the first few lines read:
#!/bin/bash
#SBATCH --account=PAS2250
set -ueo pipefail
After having added this to the script, we can run our earlier sbatch
command without options:
sbatch printname.sh Jane Doe
Submitted batch job 12431942
After we submit the batch job, we immediately get our prompt back. Everything else (job queuing and running) will happen out of our immediate view. This allows us to submit many jobs at the same time — we don’t have to wait for other jobs to finish (or even to start).
sbatch
option precedence
Any sbatch
option provided on the command line will override the equivalent option provided inside the script. This is sensible: we can provide “defaults” inside the script, and change one or more of those when needed on the command line.
#SBATCH
in other contexts
Because #SBATCH
lines are special comment lines, they will simply be ignored and not throw any errors when you run a script that contains them in other contexts: when not running them as a batch job at OSC, or even when running them on a computer without Slurm installed.
3.3 Where does the output go?
Above, we saw that when we ran the printname.sh
script directly, its output was printed to the screen, whereas when we submitted it as a batch job, all that was sprinted to screen was Submitted batch job 12431942
. So where did our output go?
Our output ended up in a file called slurm-12431942.out
: that is, slurm-<job-number>.out
. Since each job number is unique to a given job, your file would have a different number in its name. We might call this type of file a Slurm log file.
The power of submitting batch jobs is that you can submit many at once — e.g. one per sample, running the same script. If the output from all those scripts ends up on your screen, things become a big mess, and you have no lasting record of what happened.
Let’s take a look at the contents of the Slurm log file with the cat
command:
cat slurm-12431942.out
First name: Jane
Last name: Doe
This file simply contains the output that we saw printed to screen before — nothing more and nothing less.
It’s important to conceptually distinguish two broad types of output that a script may have:
Output that is printed to screen when we directly run a script, such as what was produced by our
echo
statements, by any errors that may occur, and possibly by a program that we run in the script.2 As we saw, this output ends up in the Slurm log file when we submit the script as a batch job.Output that we redirect to a file (
> myfile.txt
) or output that a program we run in the script writes to file(s). This type of output will always end up in those very same files regardless of whether we run the script directly or as a batch job.
4 Monitoring batch (and other compute) jobs
4.1 A sleepy script for practice
Let’s use the following short script to practice monitoring and managing batch and other compute jobs.
Open a new file in the VS Code
editor ( => File
=> New File
) and save it as scripts/sleep.sh
, then copy the following into it:
#!/bin/bash
#SBATCH --account=PAS2250
echo "I will sleep for 30 seconds" > sleep.txt
sleep 30s
echo "I'm awake!"
On Your Own: Batch job output recap
If you submit the script as a batch job using sbatch scripts/sleep.sh
:
- How many output files will this batch job produce?
- What will be in it/them?
- In which directory will the file(s) appear?
- In terms of output, what would have been different if we had run the script directly, i.e. using the command
bash scripts/sleep.sh
?
You can test your predictions by running the script, if you want.
The script will produce 2 files
They will contain:
sleep.txt
:I will sleep for 30 seconds
slurm-<job-number>.out
:I'm awake!
Both files will end up in your current working directory.
If we had run the script directly,
slept.txt
would have been the same, butAll done!
would have been printed to screen.
4.2 Checking the status of our batch job
After we submit a job, it may be initially be queued (or pending), before the Slurm scheduler finds a “slot” for our job. Then, the job will start running, and at some point it will stop running, either because the script ran into and error or because it ran to completion.
How can we check the status of our batch job? We can do so using the Slurm command squeue
:
squeue -u $USER -l
In the command above:
- Our user name is specified with the
-u
option (otherwise we would see everyone’s jobs) — - We use the environment variable
$USER
, which is a variable that’s always available and contains your user name, so that the very same code will work for everyone (you can also simply type your user name if that’s shorter or easier). - We’ve added the
-l
option to get more verbose output.
Let’s try that — first we submit the script:
sbatch scripts/sleep.sh
Submitted batch job 12431945
We may be able to catch the STATE
being PENDING
before the job starts:
squeue -u $USER -l
# Fri Aug 19 07:23:19 2022
# JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
# 12520046 serial-40 sleep.sh jelmer PENDING 0:00 1:00:00 1 (None)
But soon enough it should say RUNNING
in the STATE
column:
squeue -u $USER -l
# Fri Aug 19 07:23:45 2022
# JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
# 12520046 condo-osu sleep.sh jelmer RUNNING 0:12 1:00:00 1 p0133
The script should finish after 30 seconds (sleep 30s
…), and after that, the squeue
output will only show the header line with column names:
squeue -u $USER -l
# Fri Aug 19 07:24:18 2022
# JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
Once a job has finished running, it disappears from the squeue
listing. So, the output above means that we have no running (or pending) jobs.
But we need to check our output file(s) to see if our script ran successfully!
cat sleep.txt
I will sleep for 30 seconds
cat slurm-12520046.out
I’m awake!
4.3 Cancelling jobs (and other monitoring/managing commands)
Sometimes, you want to cancel one or more jobs, because you realize you made a mistake in the script or you used the wrong input files. You can do so using scancel
:
scancel 2979968 # Cancel job number 2979968
scancel -u $USER # Cancel all your jobs
Check only a specific job by specifying the job ID, e.g
2979968
:squeue -j 2979968
Only show running (not pending) jobs:
squeue -u $USER -t RUNNING
Update Slurm directives for a job that has already been submitted:
scontrol update job=<jobID> timeLimit=5:00:00
Hold and release a pending (queued) job, e.g. when needing to update input file before it starts running:
scontrol hold <jobID> # Job won't start running until released scontrol release <jobID> # Job is free to start
You can see more details about any running or finished jobs, including the amount of time it ran for:
scontrol show job 2526085 # For job 2526085 # UserId=jelmer(33227) GroupId=PAS0471(3773) MCS_label=N/A # Priority=200005206 Nice=0 Account=pas0471 QOS=pitzer-default # JobState=RUNNING Reason=None Dependency=(null) # Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 # RunTime=00:02:00 TimeLimit=01:00:00 TimeMin=N/A # SubmitTime=2020-12-14T14:32:44 EligibleTime=2020-12-14T14:32:44 # AccrueTime=2020-12-14T14:32:44 # StartTime=2020-12-14T14:32:47 EndTime=2020-12-14T15:32:47 Deadline=N/A # SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-14T14:32:47 # Partition=serial-40core AllocNode:Sid=pitzer-login01:57954 # [...]
5 Common sbatch
options
Many SLURM options have a long format (--account=PAS2250
) and a short format (-A PAS2250
), which can generally be used interchangeably. For clarity, we’ll stick to long format options during this workshop.
5.1 --account
: The OSC project
As seen above. Always specify the project when submitting a batch job.
5.2 --time
: Time limit (“wall time”)
Specify the maximum amount of time your job will run for. Wall time is a term meant to distinguish it from, say “core hours”: if a job runs for 2 hour and used 8 cores, the wall time was 2 hours and the number of core hours was 2 x 8 = 16.
- Your job gets killed as soon as it hits the specified time limit!
- You will only be charged for the time your job actually used.
- The default time limit is 1 hour. Acceptable time formats include:
minutes
hours:minutes:seconds
days-hours
- For single-node jobs, up to 168 hours (7 days) can be requested. If that’s not enough, you can request access to the
longserial
queue for jobs of up to 336 hours (14 days).
#!/bin/bash
#SBATCH --time=1:00:00
If you are uncertain about the time your job will take, ask for (much) more time than you think you will need. This is because queuing times are generally good at OSC and you won’t be charged for reserved-but-not-used time.
5.3 --mem
: RAM memory
Specify a maximum amount of RAM (Random Access Memory) that your job can use.
- The default unit is MB (MegaBytes) — append
G
for GB. - The default amount is 4 GB per core that you reserve
- Like with the time limit, your job gets killed when it hits the memory limit.
#!/bin/bash
#SBATCH --mem=20G
It is not that common to hit the memory limit, so I usually don’t specify it — unless the program reports needing lots of memory, or I got “out-of-memory” errors when trying to run the script before.
5.4 Cores (& nodes and tasks)
Specify the number of nodes (≈ computers), cores, or “tasks” (processes). These are separate but related options, and this is where things can get confusing!
- Slurm for the most part uses “core” and “CPU” interchangeably3. More generally, “thread” is also commonly used interchangeably with core/CPU4.
Running a program that uses multiple threads/cores/CPUs (“multi-threading”) is common. In such cases, specify the number of threads/cores/CPUs
n
with--cpus-per-task=n
(and keep--nodes
and--ntasks
at their defaults of 1).The program you’re running may have an argument like
--cores
or--threads
, which you should then set ton
as well.
- Only ask for >1 node when a program is parallelized with e.g. “MPI”, which is uncommon in bioinformatics.
- For jobs with multiple processes (tasks), use
--ntasks=n
or--ntasks-per-node=n
.
Resource/use | short | long | default |
---|---|---|---|
Nr. of cores/CPUs/threads (per task) | -c 1 |
--cpus-per-task=1 |
1 |
Nr. of “tasks” (processes) | -n 1 |
--ntasks=1 |
1 |
Nr. of tasks per node | - | --ntasks-per-node=1 |
1 |
Nr. of nodes | -N 1 |
--nodes=1 |
1 |
#!/bin/bash
#SBATCH --cpus-per-task=2
5.5 --output
: Slurm log files
As we saw above, by default, all output from a script that would normally5 be printed to screen will end up in a Slurm log file when we submit the script as a batch job. This file will be created in the directory from which you submitted the script, and will be called slurm-<job-number>.out
, e.g. slurm-12431942.out
.
But it is possible to change the name of this file. For instance, it can be useful to include the name of the program that the script runs, so that it’s easier to recognize this file later.
We can do this with the --output
option, e.g. --output=slurm-fastqc.out
if we were running FastQC.
However, you’ll generally want to keep the batch job number in the file name too6. Since we won’t know the batch job number in advance, we need a trick here and that is to use %j
, which represents the batch job number:
#!/bin/bash
#SBATCH --output=slurm-fastqc-%j.out
stdout
and stderr
By default, two output streams “standard output” (stdout
) and “standard error” (stderr
) are printed to screen and therefore also both end up in the same Slurm log file, but it is possible to separate them into different files.
Because stderr
, as you might have guessed, often contains error messages, it could be useful to have those in a separate file. You can make that happen with the --error
option, e.g. --error=slurm-fastqc-%j.err
.
However, reality is more messy: some programs print their main output not to a file but to standard out, and their logging output, errors and regular messages alike, to standard error. Yet other programs use stdout
or stderr
for all messages.
I therefore usually only specify --output
, such that both streams end up in that file.
6 Addendum: Table with sbatch
options
This includes all the discussed options, and a couple more useful ones:
Resource/use | short | long | default |
---|---|---|---|
Project to be billed | -A PAS0471 |
--account=PAS0471 |
N/A |
Time limit | -t 4:00:00 |
--time=4:00:00 |
1:00:00 |
Nr of nodes | -N 1 |
--nodes=1 |
1 |
Nr of cores | -c 1 |
--cpus-per-task=1 |
1 |
Nr of “tasks” (processes) | -n 1 |
--ntasks=1 |
1 |
Nr of tasks per node | - | --ntasks-per-node |
1 |
Memory limit per node | - | --mem=4G |
(4G) |
Log output file (%j = job number) | -o |
--output=slurm-fastqc-%j.out |
|
Error output (stderr) | -e |
--error=slurm-fastqc-%j.err |
|
Job name (displayed in the queue) | - | --job-name=fastqc |
|
Partition (=queue type) | - | --partition=longserial --partition=hugemem |
|
Get email when job starts, ends, fails, or all of the above |
- | --mail-type=START --mail-type=END --mail-type=FAIL --mail-type=ALL |
|
Let job begin at/after specific time | - | --begin=2021-02-01T12:00:00 |
|
Let job begin after other job is done | - | --dependency=afterany:123456 |
Footnotes
Other options:
salloc
works almost identically tosrun
, whereassinteractive
is an OSC convenience wrapper but with very limited options.↩︎Technically, these are two different types of output, as we briefly touch on below: “standard output” and “standard error”.↩︎
Even though technically, one CPU often contains multiple cores.↩︎
Even though technically, one core often contains multiple threads.↩︎
That is, when we run the script directly, e.g.
bash myscript.sh
↩︎For instance, we might be running the FastQC script multiple times, and otherwise those would all have the same name and be overwritten.↩︎