pracs-sp21: Week 13 content overview and readings

Content overview for this week

Since Snakemake is a Python program, its syntax is very similar to regular Python and we can even include plain Python code! Our previous experience with Python therefore puts us in a really good position to learn Snakemake.

While its learning curve is not trivial even with all of our pre-existing knowledge, there are many advantages to using Snakemake when you have analysis pipelines that include at least a few scripts. For example, it makes our analysis pipelines more reproducible, portable, scalable, and transparent than regular Bash or Python scripts that glue together a pipeline. In addition, Snakemake takes care of a lot of boilerplate code for you, such as SLURM directives in shell scripts; and you can even forego using shell scripts altogether and specify all shell commands in a “Snakefile” (workflow script) directly. Finally, Snakemake can be an incredible timesaver when errors occur in your pipeline or you need to repeat parts of it – which is common!

Some of the things you will learn this week:

What challenges you can (will) run into when trying to glue analysis pipelines together with regular Bash (or Python) scripts.
What Workflow Management Systems are, and why they are useful.
The basics of using Snakemake to describe and run workflows.
How to get Snakemake to submit SLURM jobs for you.

Readings

Since neither of the books cover Snakemake, you’ll be reading two articles: a journalistic feature article in Nature as a light introduction to workflow systems, and a manuscript by some of the authors of Snakemake.

In the latter article, I recommend you read until you reach the heading “2.2.1 Modularization” on page 7, and beyond there, you can skim/read/skip as you see fit.

You may also want to (re)visit the section “Make and Makefiles: Another Option for Pipelines” in the Buffalo book, Chapter 12, p. 421-423. Snakemake was heavily inspired by Make (hence its name) but is much more user-friendly and has many more options.

Required readings

Perkel 2019, Nature “Toolbox” feature: “Workflow systems turn raw data into scientific knowledge”
Mölder et al. 2020, Zenodo: “Sustainable data analysis with Snakemake”

Further resources

Buffalo
The official Snakemake tutorial which you can run in your browser!
A short introduction to (Make and then) Snakemake by Vince Buffalo, the author of the secondary book we have been reading.
A Carpentries lesson on working with a compute cluster, with a large section on Snakemake starting on this page.
Another useful Snakemake tutorial.

Week 13 content overview and readings

Content overview for this week

Readings

Required readings

Further resources

Reuse