Compute Jobs with Slurm

With a focus on submitting shell scripts as “batch jobs”

Author

Jelmer Poelstra

We have so far been working on login nodes at OSC, but in order to run some actual analyses, you will need access to compute nodes.

Automated scheduling software allows hundreds of people with different requirements to access compute nodes effectively and fairly. For this purpose, OSC uses the Slurm scheduler (Simple linux utility for resource management).

A temporary reservation of resources on compute nodes is called a compute job. What are the options to start a compute job at OSC?

“Interactive Apps” — We can start programs with GUIs, such as RStudio or Jupyter Notebook on OnDemand, and they will run in a browser window.
Interactive shell jobs — Start a Bash shell on a compute node.
Batch (non-interactive) jobs — Run a script on a compute node.

When running command-line programs for genomics analyses, batch jobs are the most useful and will be the focus of this module. We’ll also touch on interactive shell jobs, which can occasionally be handy and are requested and managed in a very similar way to batch jobs.

1 Setup

Starting a VS Code session with an active terminal (click here)

Log in to OSC at https://ondemand.osc.edu.
In the blue top bar, select Interactive Apps and then Code Server.
In the form that appears:
- Enter 4 or more in the box Number of hours
- To avoid having to switch folders within VS Code, enter /fs/ess/scratch/PAS2250/participants/<your-folder> in the box Working Directory (replace <your-folder> by the actual name of your folder).
- Click Launch.
On the next page, once the top bar of the box is green and says Runnning, click Connect to VS Code.
Open a terminal: => Terminal => New Terminal.
In the terminal, type bash and press Enter.
Type pwd in the termain to check you are in /fs/ess/scratch/PAS2250.

If not, click => File => Open Folder and enter /fs/ess/scratch/PAS2250/<your-folder>.

2 Interactive shell jobs

Interactive shell jobs will grant you interactive shell access on a compute node. Working in an interactive shell job is operationally identical to working on a login node as we’ve been doing so far, but the difference is that it’s now okay to use significant computing resources. (How much and for how long depends on what you reserve.)

2.1 Using `srun`

A couple of different commands can be used to start an interactive shell job. I prefer the general srun command¹, which we can use with --pty /bin/bash added to get an interactive Bash shell.

However, if we run that command without additional options, we get an error:

srun --pty /bin/bash

srun: error: ERROR: Job invalid: Must specify account for job
srun: error: Unable to allocate resources: Unspecified error

As the error message Must specify account for job tries to tell us, we need to indicate which OSC project (or as SLURM puts it, “account”) we want to use for this compute job. This is because an OSC project always has to be charged for the computing resources used during a compute job.

To specify the project/account, we can use the --account= option followed by the project number:

srun --account=PAS2250 --pty /bin/bash

srun: job 12431932 queued and waiting for resources
srun: job 12431932 has been allocated resources

[…regular login info, such as quota, not shown…]

[jelmer@p0133 PAS2250]$

There we go! First some Slurm scheduling info was printed to screen:

Initially, the job is “queued”: that is, waiting to start.
Very soon (usually!), the job is “allocated resources”: that is, computing resources such as a compute node are reserved for the job.

Then:

The job starts and because we’ve reserved an interactive shell job, a new Bash shell is initiated: for that reason, we get to see our regular login info once again.
Most importantly, we are no longer on a login node but on a compute node, as our prompt hints at: we switched from something like [jelmer@pitzer-login04 PAS2250]$ to the [jelmer@p0133 PAS2250]$ shown above.
Note also that the job has a number (above: job 12431932): every compute job has such a unique identifier among all jobs by all users at OSC, and we can use this number to monitor and manage it. All of us will therefore see a different job number pop up.

The working directory stays the same

Batch jobs start in the directory that they were submitted from: that is, your working directory remains the same.

2.2 Compute job options

The --account= option is just one of out of many options we can use when reserving a compute job, but is the only one that always has to be specified (including for batch jobs and for Interactive Apps).

Defaults exist for all other options, such as the amount of time (1 hour) and the number of cores (1). These options are all specified in the same way for interactive and batch jobs, and we’ll dive into them below.

Queueing times

The “bigger” (more time, more cores, more memory) our job is, the more likely it is that our job will be pending for an appreciable amount of time.

Smaller jobs (requesting up to a few hours and cores) will almost always start running nearly instantly. Even big jobs (requesting a day or more, 10 or more cores) will often do so, but during busy times, you might have to wait for a while. That said, the only times I’ve had to wait for more than an hour or so was when I was requesting jobs with very large memory requirements (100s of GBs), which have to be submitted to a separate queue/“partition”.

3 Intro to batch jobs

When requesting batch jobs, we are asking the Slurm scheduler to run a script on a compute node.

In contrast to interactive shell jobs, we stay in our current shell when submitting a script, and the script will run on a compute node “out of sight”. Also, as we’ll discuss in more detail below:

Output from the script that would normally be printed to screen ends up in a file (!).
Despite not being on the same node as our job, we can do things like monitoring whether the job is already/still running, and cancelling the job.

Scripts in other languages

The script that we submit can be in different languages but typically, including in all examples in this workshop, they are shell (Bash) scripts.

3.1 The `sbatch` command

Whereas we used Slurm’s srun command to start an interactive shell job, we use its sbatch command to submit a batch job. Recall from the Bash scripting module that we can run a Bash script as follows:

bash scripts/printname.sh Jane Doe

First name: Jane
Last name: Doe

Can’t find yesterday’s printname.sh script?

Open a new file in the VS Code editor ( => File => New File) and save it as printname.sh
Copy the code below into the script:

#!/bin/bash
set -ueo pipefail

first_name=$1
last_name=$2
  
echo "First name: $first_name"
echo "Last name: $last_name"

The above command ran the script on our current node, a login node. To instead submit the script to the Slurm queue, we would start by simply replacing bash by sbatch:

sbatch scripts/printname.sh Jane Doe

srun: error: ERROR: Job invalid: Must specify account for job
srun: error: Unable to allocate resources: Unspecified error

As we’ve learned, we always have to specify the OSC account when submitting a compute job. Conveniently, we can also specify Slurm/sbatch options inside our script, but first, let’s add the --account option on the command line:

sbatch --account=PAS2250 scripts/printname.sh Jane Doe

Submitted batch job 12431935

sbatch options and script arguments

Note that we can use sbatch options and script arguments in one command, in the following order:

sbatch [sbatch-options] myscript.sh [script-arguments]

But both of these are optional:

sbatch printname.sh                             # No options/arguments for either
sbatch printname.sh Jane Doe                    # Script arguments but no sbatch option
sbatch --account=PAS2250 printname.sh           # sbatch option but no script arguments
sbatch --account=PAS2250 printname.sh Jane Doe  # Both sbatch option and script arguments

3.2 Adding `sbatch` options in scripts

Instead of specifying Slurm/sbatch options on the command-line when we submit the script, we can also add these options inside the script.

This is handy because even though we have so far only seen the account= option, you often want to specify several options. That would lead to very long sbatch commands. Additionally, it can be practical to store a script’s typical Slurm options along with the script itself.

We add the options in the script using another type of special comment line akin to the shebang line, marked by #SBATCH. The equivalent of adding --account=PAS2250 after sbatch on the command line is a line in a script that reads #SBATCH --account=PAS2250.

Just like the shebang line, the #SBATCH line(s) should be at the top of the script. Let’s add one such line to the printname.sh script, such that the first few lines read:

#!/bin/bash
#SBATCH --account=PAS2250

set -ueo pipefail

After having added this to the script, we can run our earlier sbatch command without options:

sbatch printname.sh Jane Doe

Submitted batch job 12431942

After we submit the batch job, we immediately get our prompt back. Everything else (job queuing and running) will happen out of our immediate view. This allows us to submit many jobs at the same time — we don’t have to wait for other jobs to finish (or even to start).

sbatch option precedence

Any sbatch option provided on the command line will override the equivalent option provided inside the script. This is sensible: we can provide “defaults” inside the script, and change one or more of those when needed on the command line.

Running a script with #SBATCH in other contexts

Because #SBATCH lines are special comment lines, they will simply be ignored and not throw any errors when you run a script that contains them in other contexts: when not running them as a batch job at OSC, or even when running them on a computer without Slurm installed.

3.3 Where does the output go?

Above, we saw that when we ran the printname.sh script directly, its output was printed to the screen, whereas when we submitted it as a batch job, all that was sprinted to screen was Submitted batch job 12431942. So where did our output go?

Our output ended up in a file called slurm-12431942.out: that is, slurm-<job-number>.out. Since each job number is unique to a given job, your file would have a different number in its name. We might call this type of file a Slurm log file.

Any idea why we might not want batch job output printed to screen, even if we could?

The power of submitting batch jobs is that you can submit many at once — e.g. one per sample, running the same script. If the output from all those scripts ends up on your screen, things become a big mess, and you have no lasting record of what happened.

Let’s take a look at the contents of the Slurm log file with the cat command:

cat slurm-12431942.out

First name: Jane
Last name: Doe

This file simply contains the output that we saw printed to screen before — nothing more and nothing less.

It’s important to conceptually distinguish two broad types of output that a script may have:

Output that is printed to screen when we directly run a script, such as what was produced by our echo statements, by any errors that may occur, and possibly by a program that we run in the script.² As we saw, this output ends up in the Slurm log file when we submit the script as a batch job.
Output that we redirect to a file (> myfile.txt) or output that a program we run in the script writes to file(s). This type of output will always end up in those very same files regardless of whether we run the script directly or as a batch job.

4 Monitoring batch (and other compute) jobs

4.1 A sleepy script for practice

Let’s use the following short script to practice monitoring and managing batch and other compute jobs.

Open a new file in the VS Code editor ( => File => New File) and save it as scripts/sleep.sh, then copy the following into it:

#!/bin/bash
#SBATCH --account=PAS2250

echo "I will sleep for 30 seconds" > sleep.txt
sleep 30s
echo "I'm awake!"

On Your Own: Batch job output recap

If you submit the script as a batch job using sbatch scripts/sleep.sh:

How many output files will this batch job produce?
What will be in it/them?
In which directory will the file(s) appear?
In terms of output, what would have been different if we had run the script directly, i.e. using the command bash scripts/sleep.sh?

You can test your predictions by running the script, if you want.

Solutions

The script will produce 2 files
They will contain:
- sleep.txt: I will sleep for 30 seconds
- slurm-<job-number>.out: I'm awake!
Both files will end up in your current working directory.
If we had run the script directly, slept.txt would have been the same, but All done! would have been printed to screen.

4.2 Checking the status of our batch job

After we submit a job, it may be initially be queued (or pending), before the Slurm scheduler finds a “slot” for our job. Then, the job will start running, and at some point it will stop running, either because the script ran into and error or because it ran to completion.

How can we check the status of our batch job? We can do so using the Slurm command squeue:

squeue -u $USER -l

In the command above:

Our user name is specified with the -u option (otherwise we would see everyone’s jobs) —
We use the environment variable $USER, which is a variable that’s always available and contains your user name, so that the very same code will work for everyone (you can also simply type your user name if that’s shorter or easier).
We’ve added the -l option to get more verbose output.

Let’s try that — first we submit the script:

sbatch scripts/sleep.sh

Submitted batch job 12431945

We may be able to catch the STATE being PENDING before the job starts:

squeue -u $USER -l
# Fri Aug 19 07:23:19 2022
#              JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
#           12520046 serial-40 sleep.sh   jelmer  PENDING       0:00   1:00:00      1 (None)

But soon enough it should say RUNNING in the STATE column:

squeue -u $USER -l
# Fri Aug 19 07:23:45 2022
#              JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
#           12520046 condo-osu sleep.sh   jelmer  RUNNING       0:12   1:00:00      1 p0133

The script should finish after 30 seconds (sleep 30s…), and after that, the squeue output will only show the header line with column names:

squeue -u $USER -l
# Fri Aug 19 07:24:18 2022
#              JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)

Once a job has finished running, it disappears from the squeue listing. So, the output above means that we have no running (or pending) jobs.

But we need to check our output file(s) to see if our script ran successfully!

cat sleep.txt

I will sleep for 30 seconds

cat slurm-12520046.out

I’m awake!

4.3 Cancelling jobs (and other monitoring/managing commands)

Sometimes, you want to cancel one or more jobs, because you realize you made a mistake in the script or you used the wrong input files. You can do so using scancel:

scancel 2979968        # Cancel job number 2979968
scancel -u $USER       # Cancel all your jobs

At-home reading: Other commands and options

Check only a specific job by specifying the job ID, e.g 2979968:
```
squeue -j 2979968
```
Only show running (not pending) jobs:
```
squeue -u $USER -t RUNNING
```
Update Slurm directives for a job that has already been submitted:
```
scontrol update job=<jobID> timeLimit=5:00:00
```

Hold and release a pending (queued) job, e.g. when needing to update input file before it starts running:

scontrol hold <jobID>        # Job won't start running until released
scontrol release <jobID>     # Job is free to start

You can see more details about any running or finished jobs, including the amount of time it ran for:

scontrol show job 2526085   # For job 2526085

# UserId=jelmer(33227) GroupId=PAS0471(3773) MCS_label=N/A
# Priority=200005206 Nice=0 Account=pas0471 QOS=pitzer-default
# JobState=RUNNING Reason=None Dependency=(null)
# Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
# RunTime=00:02:00 TimeLimit=01:00:00 TimeMin=N/A
# SubmitTime=2020-12-14T14:32:44 EligibleTime=2020-12-14T14:32:44
# AccrueTime=2020-12-14T14:32:44
# StartTime=2020-12-14T14:32:47 EndTime=2020-12-14T15:32:47 Deadline=N/A
# SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-14T14:32:47
# Partition=serial-40core AllocNode:Sid=pitzer-login01:57954
# [...]

5 Common `sbatch` options

Long and and short option format

Many SLURM options have a long format (--account=PAS2250) and a short format (-A PAS2250), which can generally be used interchangeably. For clarity, we’ll stick to long format options during this workshop.

5.1 `--account`: The OSC project

As seen above. Always specify the project when submitting a batch job.

5.2 `--time`: Time limit (“wall time”)

Specify the maximum amount of time your job will run for. Wall time is a term meant to distinguish it from, say “core hours”: if a job runs for 2 hour and used 8 cores, the wall time was 2 hours and the number of core hours was 2 x 8 = 16.

Your job gets killed as soon as it hits the specified time limit!
You will only be charged for the time your job actually used.
The default time limit is 1 hour. Acceptable time formats include:
- minutes
- hours:minutes:seconds
- days-hours
For single-node jobs, up to 168 hours (7 days) can be requested. If that’s not enough, you can request access to the longserial queue for jobs of up to 336 hours (14 days).

#!/bin/bash
#SBATCH --time=1:00:00

Ask for more time

If you are uncertain about the time your job will take, ask for (much) more time than you think you will need. This is because queuing times are generally good at OSC and you won’t be charged for reserved-but-not-used time.

5.3 `--mem`: RAM memory

Specify a maximum amount of RAM (Random Access Memory) that your job can use.

The default unit is MB (MegaBytes) — append G for GB.
The default amount is 4 GB per core that you reserve
Like with the time limit, your job gets killed when it hits the memory limit.

#!/bin/bash
#SBATCH --mem=20G

Default memory limits usually work

It is not that common to hit the memory limit, so I usually don’t specify it — unless the program reports needing lots of memory, or I got “out-of-memory” errors when trying to run the script before.

5.4 Cores (& nodes and tasks)

Specify the number of nodes (≈ computers), cores, or “tasks” (processes). These are separate but related options, and this is where things can get confusing!

Slurm for the most part uses “core” and “CPU” interchangeably³. More generally, “thread” is also commonly used interchangeably with core/CPU⁴.

Running a program that uses multiple threads/cores/CPUs (“multi-threading”) is common. In such cases, specify the number of threads/cores/CPUs n with --cpus-per-task=n (and keep --nodes and --ntasks at their defaults of 1).

The program you’re running may have an argument like --cores or --threads, which you should then set to n as well.

Uncommon cases

Only ask for >1 node when a program is parallelized with e.g. “MPI”, which is uncommon in bioinformatics.
For jobs with multiple processes (tasks), use --ntasks=n or --ntasks-per-node=n.

Resource/use	short	long	default
Nr. of cores/CPUs/threads (per task)	`-c 1`	`--cpus-per-task=1`	1
Nr. of “tasks” (processes)	`-n 1`	`--ntasks=1`	1
Nr. of tasks per node	-	`--ntasks-per-node=1`	1
Nr. of nodes	`-N 1`	`--nodes=1`	1

#!/bin/bash
#SBATCH --cpus-per-task=2

5.5 `--output`: Slurm log files

As we saw above, by default, all output from a script that would normally⁵ be printed to screen will end up in a Slurm log file when we submit the script as a batch job. This file will be created in the directory from which you submitted the script, and will be called slurm-<job-number>.out, e.g. slurm-12431942.out.

But it is possible to change the name of this file. For instance, it can be useful to include the name of the program that the script runs, so that it’s easier to recognize this file later.

We can do this with the --output option, e.g. --output=slurm-fastqc.out if we were running FastQC.

However, you’ll generally want to keep the batch job number in the file name too⁶. Since we won’t know the batch job number in advance, we need a trick here and that is to use %j, which represents the batch job number:

#!/bin/bash
#SBATCH --output=slurm-fastqc-%j.out

At-home reading: stdout and stderr

By default, two output streams “standard output” (stdout) and “standard error” (stderr) are printed to screen and therefore also both end up in the same Slurm log file, but it is possible to separate them into different files.

Because stderr, as you might have guessed, often contains error messages, it could be useful to have those in a separate file. You can make that happen with the --error option, e.g. --error=slurm-fastqc-%j.err.

However, reality is more messy: some programs print their main output not to a file but to standard out, and their logging output, errors and regular messages alike, to standard error. Yet other programs use stdout or stderr for all messages.

I therefore usually only specify --output, such that both streams end up in that file.

6 Addendum: Table with `sbatch` options

This includes all the discussed options, and a couple more useful ones:

Resource/use	short	long	default
Project to be billed	`-A PAS0471`	`--account=PAS0471`	N/A
Time limit	`-t 4:00:00`	`--time=4:00:00`	1:00:00
Nr of nodes	`-N 1`	`--nodes=1`	1
Nr of cores	`-c 1`	`--cpus-per-task=1`	1
Nr of “tasks” (processes)	`-n 1`	`--ntasks=1`	1
Nr of tasks per node	-	`--ntasks-per-node`	1
Memory limit per node	-	`--mem=4G`	(4G)
Log output file (%j = job number)	`-o`	`--output=slurm-fastqc-%j.out`
Error output (stderr)	`-e`	`--error=slurm-fastqc-%j.err`
Job name (displayed in the queue)	-	`--job-name=fastqc`
Partition (=queue type)	-	`--partition=longserial` `--partition=hugemem`
Get email when job starts, ends, fails, or all of the above	-	`--mail-type=START` `--mail-type=END` `--mail-type=FAIL` `--mail-type=ALL`
Let job begin at/after specific time	-	`--begin=2021-02-01T12:00:00`
Let job begin after other job is done	-	`--dependency=afterany:123456`

Footnotes

Other options: salloc works almost identically to srun, whereas sinteractive is an OSC convenience wrapper but with very limited options.↩︎
Technically, these are two different types of output, as we briefly touch on below: “standard output” and “standard error”.↩︎
Even though technically, one CPU often contains multiple cores.↩︎
Even though technically, one core often contains multiple threads.↩︎
That is, when we run the script directly, e.g. bash myscript.sh↩︎
For instance, we might be running the FastQC script multiple times, and otherwise those would all have the same name and be overwritten.↩︎

1 Setup

2 Interactive shell jobs

2.1 Using srun

2.2 Compute job options

3 Intro to batch jobs

3.1 The sbatch command

3.2 Adding sbatch options in scripts