Slurm batch jobs – part I

Week 7 – Lecture A

Author
Affiliation

Jelmer Poelstra

Published

October 3, 2025



1 Introduction

1.1 Learning goals

This week

As mentioned last week, when you learned about shell scripts:

The end goal is to be able to submit shell scripts as “batch jobs” at OSC, which e.g. allows you to run them simultaneously many times! This is extremely useful because with omics analysis, it’s common to have to run the same step for many samples in parallel.

This week, we’ll cover the remaining piece in being able to run bioinformatics tools efficiently at OSC and beyond: how to submit your shell scripts as batch jobs.

This session

  • Different ways to start “compute jobs”: via OnDemand, with interactive jobs, and with Slurm batch jobs
  • Strategies around requesting appropriate resources for your compute jobs
  • Slurm commands like sbatch to submit, monitor and manage batch jobs
  • How to use Slurm options to request specific resources for your compute jobs

1.2 Getting ready

  1. At https://ondemand.osc.edu, start a VS Code session in /fs/ess/PAS2880/users/$USER

  2. Create a week07 dir and navigate there in the terminal

  3. Create a scripts dir within week07

  4. You’ll need two scripts you made last week. Copy those (or if you somehow don’t have these files, create new files and copy the code from the boxes below):

    cp ../../week06/scripts/printname.sh scripts/
    cp ../../week06/scripts/fastqc.sh scripts/

The code below should be saved in scripts/printname.sh:

#!/bin/bash
set -euo pipefail

first_name=$1
last_name=$2

echo "This script will print a first and a last name"
echo "First name: $first_name"
echo "Last name: $last_name"

The code below should be saved in scripts/fastqc.sh:

#!/bin/bash
set -euo pipefail

# Load the OSC module for FastQC
module load fastqc/0.12.1

# Copy the placeholder variables
fastq="$1"
outdir="$2"

# Initial logging
echo "# Starting script fastqc.sh"
date
echo "# Input FASTQ file:   $fastq"
echo "# Output dir:         $outdir"
echo

# Create the output dir if needed
mkdir -p "$outdir"

# Run FastQC
fastqc --outdir "$outdir" "$fastq"

# Final logging
echo
echo "# Used FastQC version:"
fastqc --version
echo
echo "# Successfully finished script fastqc.sh"
date

2 Compute jobs overview

Automated scheduling software allows hundreds of people with different requirements to effectively and fairly access compute nodes at supercomputers. OSC uses Slurm (Simple Linux Utility for Resource Management) for this.

As you’ve learned, a reservation of resources on compute nodes is called a compute job. Here are the main ways to start a compute job at OSC:

  1. Interactive Apps” — Run programs with GUIs (e.g. VS Code or RStudio) in your browser through the OnDemand website.
  2. Interactive shell jobs — Start an interactive shell on a compute node.
  3. Batch (non-interactive) jobs — Run a script on a compute node “remotely”: without going to that node yourself.

We’ve already worked a lot with the VS Code Interactive App, and the self-study material at the bottom of this page will cover interactive shell jobs. What we’ll focus on in this session are batch jobs.

3 Basics of Slurm batch jobs

When you submit a batch job, you ask the Slurm scheduler to run a script “out of sight” on a compute node. While that script runs on a compute node, you will stay in your current shell at your current node. After you submit a batch job, it will continue to run even if you log off from OSC and shut down your computer.

3.1 The sbatch command

You can use Slurm’s sbatch command to submit a batch job. But first, let’s recall how we’ve run shell scripts so far:

bash scripts/printname.sh Jane Doe
This script will print a first and a last name
First name: Jane
Last name: Doe

The above command ran the script on whatever node you are on, and printed output to the screen. To instead submit the script to the Slurm queue, start by simply replacing bash with sbatch:

sbatch scripts/printname.sh Jane Doe
srun: error: ERROR: Job invalid: Must specify account for job  
srun: error: Unable to allocate resources: Unspecified error

However, as the above error message –“Must specify account for job”– informs us, you need to indicate which OSC Project (or as Slurm puts it, “account”) you want to use for this compute job. Use the --account= option to do this:

sbatch --account=PAS2880 scripts/printname.sh Jane Doe
Submitted batch job 12431935

This output line means your job was successfully submitted (no further job output will be printed to your screen — more about that below). The job has a unique identifier among all compute jobs by all users at OSC, and you can use this number to monitor and manage it. Each of us will therefore see a different job number pop up.

After submitting a batch job, you immediately get your prompt back. The job will run outside of your immediate view, and you can continue doing other things in the shell while it does, or even log off. This behavior allows you to submit many jobs at the same time: you don’t have to wait for other jobs to finish, or even to start!

sbatch options and script arguments

As was implicit in the command above, we can use sbatch options and script arguments in one command like so:

sbatch [sbatch-options] myscript.sh [script-arguments]

Depending on the details of the script itself, any combination of sbatch options and script arguments is possible:

# [Don't run this - hypothetical examples]
sbatch scripts/printname.sh                             # No options/arguments for either
sbatch scripts/printname.sh Jane Doe                    # Script arguments but no sbatch option
sbatch --account=PAS2880 scripts/printname.sh           # sbatch option but no script arguments
sbatch --account=PAS2880 scripts/printname.sh Jane Doe  # Both sbatch option and script arguments

Just make sure you use the correct order, e.g. don’t type sbatch options after the name of the script. (Also, it is possible to omit the --account option, as shown above, when you specify this option inside the script. We’ll see this later.)

3.2 Where does the script’s output go?

Above, we saw that when you ran printname.sh “directly” with bash, the script’s output was printed to the screen, whereas when you submitted it as a batch job, only Submitted batch job <job-number> was printed to screen. In the latter case, where did this output go?

It ended up in a file called slurm-<job-number>.out (e.g., slurm-12431942.out; since each job number is unique to a given job, each file has a different number). We’ll call this type of file a Slurm log file.

Getting this output in log files instead of printed to screen may seem inconvenient. Can you think of any reasons why we may not want batch job output printed to screen, even if it were possible? (Click for the answer)

There are several reasons, such as:

  • If you log off after submitting a batch job, any output printed to screen would be lost.
  • The power of submitting batch jobs is that you can submit many at once — e.g. one per sample, running the same script. If the output from all those scripts ends up on your screen, things become a big mess, and you have no lasting record of what happened.

If you run ls, you should see a Slurm log file for the job you just submitted:

ls
scripts slurm-12431935.out

Let’s take a look at its contents:

cat slurm*
This script will print a first and a last name
First name: Jane  
Last name: Doe

This file contains the script’s output that was printed to screen when we ran it with bash – nothing more or less1.

The working directory stays the same

When the scripts in batch jobs are being run, they start in the directory that they were submitted from: that is, the working directory remains the same, and you shouldn’t have to make special adjustments to paths.

Additionally, as you’ve seen, Slurm log files will (by default) be created in the dir you submitted the job from.

Two types of output files

There is an important distinction between two general types of output that scripts, commands, and programs may have:

  1. Output that is printed to screen.
    The technical terms for such output are “standard output” for non-error output and “standard error” for error output.
    • We’ve seen that most Unix commands by default print their output to the screen.
    • Other programs, like bioinformatics tools, will commonly print “logging”-type output to the screen, like you’ve seen with FastQC. But some programs will by default also print their main results to the screen. (This can sometimes be changed with a program’s options, and otherwise, you can always redirect (>) output to a file.)
  2. Output that is written to files.

To summarize what happens to these when you submit a script as a batch job instead of running it directly:

  • A script’s standard out and standard error will be written to a file when you submit the script with sbatch
  • Output of commands written to output files inside the script (either via redirection or otherwise) will end up in the exact same files regardless of how you run the script.

Your printline.sh script only had the first type of output, but scripts typically have both, and we’ll see examples of that below.

Cleaning up the Slurm logs

When submitting batch jobs, your working dir can easily become a confusing mess of anonymous-looking Slurm log files. These two strategies help to prevent this:

  • Changing the default Slurm log file name to include a one- or two-word description of the job/script (see below).
  • Cleaning up your Slurm log files, by:
    • Removing them when no longer needed — as is e.g. appropriate for our current Slurm log file.
    • Moving them to the same location as other outputs by that script. This is often appropriate after you’ve run a bioinformatics tool, since the Slurm log file may contain some info you’d like to keep. For example, you can move Slurm log files for jobs that ran FastQC, and produced outputs in results/fastqc, to a dir results/fastqc/logs.

In this case, we’ll simply remove the Slurm log file, as it has no information that we need to keep:

rm -v slurm*
removed slurm-12431935.out

3.3 Adding sbatch options in scripts

The --account= option is just one of many options you can use when submitting a compute job, but is the only required one. This is because defaults exist for all other options, such as the amount of time (1 hour) and the number of cores (1 core).

Instead of adding these options after the sbatch command when submitting the script, you can also add them inside the script. This is a useful alternative because:

  • You’ll often want to specify several options, which could otherwise lead to very long sbatch commands.
  • It allows you to store a script’s typical Slurm options as part of the script, so you don’t have to remember them.

These options are added in the script using another type of special comment line (akin to the shebang #!/bin/bash line) that is marked by #SBATCH. Just like the shebang line, #SBATCH line(s) should be located at the top of the script.

Let’s add one such line to the printname.sh script, such that the first few lines read:

#!/bin/bash
#SBATCH --account=PAS2880

set -euo pipefail

So, the equivalent of adding --account=PAS2880 after sbatch on the command line is a line in your script that reads #SBATCH --account=PAS2880.

Now, you are able to run the sbatch command without options (which failed earlier):

sbatch scripts/printname.sh Jane Doe
Submitted batch job 12431942

sbatch option precedence!

Any sbatch option provided on the command line will override the equivalent option provided inside the script. This is sensible because it allows you to provide “defaults” inside the script, and change one or more of those when needed “on the go” on the command line.

Running a script with #SBATCH lines elsewhere

Because #SBATCH lines are special comment lines, they will simply be ignored (and not throw any errors) when you run a script with such lines in other contexts: for example, when not running it as a batch job at OSC, or even when running it on a computer without Slurm installed.

4 Monitoring batch jobs

Real batch jobs for your research projects may run for a while, You may also be submitting many jobs at once. Finally, longer-running jobs and those that ask for many cores sometimes remain queued for a while before they start. For these reasons, it’s important to know how you can monitor your batch jobs.

4.1 A sleepy script for practice

We’ll use another short shell script to practice monitoring and managing batch jobs. First create a new file:

touch scripts/sleep.sh

Open the file in the VS Code editor and copy the following into it:

#!/bin/bash
#SBATCH --account=PAS2880

echo "I will sleep for 30 seconds" > sleep.txt
sleep 30s
echo "I'm awake! Successfully finished script sleep.sh"

Exercise: Batch job output recap

Predict what would happen if you submit the sleep.sh script as a batch job using sbatch scripts/sleep.sh:

  1. How many output files will this batch job produce?
  2. What will be in each of those files?
  3. In which directory will the file(s) appear?
  4. In terms of output files, what would be different if we instead run the script using bash scripts/sleep.sh?

Then, test your predictions by running the script.

Click for the solutions
  1. The job will produce 2 files:

    • slurm-<job-number>.out: The Slurm log file, containing output normally printed to screen.
    • sleep.txt: Containing output that was redirected to this file in the script.
  2. The those files will contain the following:

    • slurm-<job-number>.out: I’m awake! Done with script sleep.sh
    • sleep.txt: “I will sleep for 30 seconds”
  3. Both files will end up in your current working directory. Slurm log files always go to the directory from which you submitted the job. Slurm jobs also run from the directory from which you submitted your job, and since we redirected the output simply to sleep.txt, that file was created in our working directory.

  4. If we had run the script directly, sleep.txt would have also been created with the same content, but “All done!” would have been printed to screen.

Run the script and check the outputs:

sbatch scripts/sleep.sh
Submitted batch job 27935840
cat sleep.txt
I will sleep for 30 seconds
cat slurm-27935840.out
I'm awake! Successfully finished script sleep.sh

4.2 Checking a job’s status

Batch job behavior

After you submit a job, it may initially be waiting to get resources allocated to it: in other words, the job may be pending. Eventually, and often very quickly, the job will start running. You’ve seen this process with the VS Code Interactive App job as well.

Whereas Interactive App jobs will keep running until they’ve reached the end of the allocated time2, batch jobs will stop as soon as the script has finished. And if the script is still running when the job runs out of its allocated time, it will be killed (stopped) right away.

The squeue command

You can check the status of your batch jobs using the squeue Slurm command – try the following:

squeue -u $USER -l
Thu Apr 4 15:47:51 2025
        JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
     23640814 condo-osu ondemand   jelmer  RUNNING       6:34   2:00:00      1 p0133

In the command above:

  • You specify your username with the -u option (without this, you’d see everyone’s jobs!). In this example, I used the environment variable $USER to get your user name, but you can simply type your username if that’s easier.
  • The option -l (lowercase L, “long”) will produce the more verbose output shown. Without it, the output will be a bit more cryptic.

In the squeue output, following a line with the date & time and a header line, you should see information about a single compute job, as shown above: this is the Interactive App job that runs VS Code – that’s not a batch job, but it is a compute job, and all compute jobs are listed.

The following pieces of information about each job are listed:

Column Explanation
JOBID The job ID number
PARTITION The type of queue (usually auto-assigned and not of interest)
NAME The name of the job (by default the name of the script when submitting a script)
USER The username of the person who submitted the job
STATE The job’s state, usually either PENDING or RUNNINGFinished jobs do not appear
TIME For how long the job has been running (here in “minutes:seconds” format)
TIME_LIMIT The amount of time you reserved for the job (here in “hours:minutes:seconds” format)
NODES The number of nodes reserved for the job
NODELIST(REASON) - When running: the ID of the node on which it is running.
- When pending: why the job is pending

squeue example

Now, let’s see a batch job in the squeue listing. Start by submitting the sleep.sh script as a batch job:

sbatch scripts/sleep.sh
Submitted batch job 12431945

If you’re quick enough, you may be able to catch the job’s STATE as PENDING before it starts:

squeue -u $USER -l
Thu Apr 4 15:48:26 2025
         JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
      12520046 serial-40 sleep.sh   jelmer  PENDING       0:00   1:00:00      1 (None)
      23640814 condo-osu ondemand   jelmer  RUNNING       7:12   2:00:00      1 p0133

But soon enough it should read RUNNING:

squeue -u $USER -l
Thu Apr 4 15:48:39 2025
         JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
      12520046 condo-osu sleep.sh   jelmer  RUNNING       0:12   1:00:00      1 p0133
      23640814 condo-osu ondemand   jelmer  RUNNING       8:15   2:00:00      1 p0133

The script should finish after 30 seconds (because your command was sleep 30s), after which the job will immediately disappear from the squeue listing, because only pending and running jobs are shown:

squeue -u $USER -l
Mon Aug 21 15:49:26 2025
         JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
      23640814 condo-osu ondemand   jelmer  RUNNING       9:02   2:00:00      1 p0133

Checking a job’s output files

Whenever you’ve ran a script as a batch job, even if you’ve been monitoring it with squeue, you should also make sure it ran successfully. You can do this by checking the output file(s) – as mentioned above, you’ll usually have two types of output from a batch job:

  • A Slurm log file with the script’s standard output and standard error, which would have been printed to screen if the job hadn’t been submitted with sbatch (typically: logging-type output and errors)
  • Output file(s) created inside the script (typically: the main results)

And as you saw in the exercise above, this is also the case for the output of our sleepy script:

  • The output file that the code in the script directly produced:

    cat sleep.txt
    I will sleep for 30 seconds
  • The Slurm log file:

    cat slurm-12520046.out
    I'm awake! Done with script sleep.sh

Let’s keep things tidy and remove the script’s outputs:

rm slurm* sleep.txt

Be careful with deleting Slurm files

If you delete a Slurm log file for a job that is still running, the file will not be recreated when the job produces more logging output later on. That means that if you accidentally do this and the logging output is of key importance to interpreting other outputs, or making sure it ran successfully, you are better off canceling the job entirely and trying again. 😕

4.3 Canceling jobs

Sometimes, you want to cancel one or more jobs. For example, you may realize you made a mistake in the script or used the wrong input files as arguments. You can cancel jobs that are either pending or running using scancel:

# [Examples - DON'T run this: the second line would cancel your VS Code job]
# Cancel a specific job:
scancel 2979968

# Cancel all your running and queued jobs (careful with this!):
scancel -u $USER

  • Use squeue’s -t option to restrict the type of jobs you want to show. For example, to only show running and not pending jobs:

    squeue -u $USER -t RUNNING
  • You can see more details about any running or finished job, including the amount of time it ran for:

    scontrol show job <jobID>
    UserId=jelmer(33227) GroupId=PAS0471(3773) MCS_label=N/A
    Priority=200005206 Nice=0 Account=PAS2880 QOS=pitzer-default
    JobState=RUNNING Reason=None Dependency=(null)
    Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
    RunTime=00:02:00 TimeLimit=01:00:00 TimeMin=N/A
    SubmitTime=2020-12-14T14:32:44 EligibleTime=2020-12-14T14:32:44
    AccrueTime=2020-12-14T14:32:44
    StartTime=2020-12-14T14:32:47 EndTime=2020-12-14T15:32:47 Deadline=N/A
    SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-14T14:32:47
    Partition=serial-40core AllocNode:Sid=pitzer-login01:57954
    [...]
  • Update directives for a job that has already been submitted (this can only be done before the job has started running):

    scontrol update job=<jobID> timeLimit=5:00:00
  • Hold and release a pending (queued) job – this could e.g. be useful when you need to update an input file before the job starts running:

    scontrol hold <jobID>       # Job won't start running until released
    scontrol release <jobID>    # Job is free to start

5 Recap and what’s next

In this lecture, you’ve learned the basics of submitting scripts as Slurm batch jobs with the sbatch command, including:

  • How to specify sbatch options on the command-line or inside the script
  • The basic behavior and outputs associated with batch jobs
  • How to check the job queue and monitor your jobs

In the next lecture, you will learn other commonly-used sbatch options so you can reserve more time, cores, etc. for your job. You will also see some more practical examples of running batch jobs.

Back to top

Footnotes

  1. Unless you explicitly instruct Slurm to print regular output (“standard output”) and error messages (“standard error”) to separate files — see the box in the section on Slurm log files for more.↩︎

  2. Unless you actively “Delete” the job on the Ondemand website.↩︎