Week 13 content overview and readings

Content overview for this week

Since Snakemake is a Python program, its syntax is very similar to regular Python and we can even include plain Python code! Our previous experience with Python therefore puts us in a really good position to learn Snakemake.

While its learning curve is not trivial even with all of our pre-existing knowledge, there are many advantages to using Snakemake when you have analysis pipelines that include at least a few scripts. For example, it makes our analysis pipelines more reproducible, portable, scalable, and transparent than regular Bash or Python scripts that glue together a pipeline. In addition, Snakemake takes care of a lot of boilerplate code for you, such as SLURM directives in shell scripts; and you can even forego using shell scripts altogether and specify all shell commands in a “Snakefile” (workflow script) directly. Finally, Snakemake can be an incredible timesaver when errors occur in your pipeline or you need to repeat parts of it – which is common!

Some of the things you will learn this week:

Readings

Since neither of the books cover Snakemake, you’ll be reading two articles: a journalistic feature article in Nature as a light introduction to workflow systems, and a manuscript by some of the authors of Snakemake.

In the latter article, I recommend you read until you reach the heading “2.2.1 Modularization” on page 7, and beyond there, you can skim/read/skip as you see fit.

You may also want to (re)visit the section “Make and Makefiles: Another Option for Pipelines” in the Buffalo book, Chapter 12, p. 421-423. Snakemake was heavily inspired by Make (hence its name) but is much more user-friendly and has many more options.

Required readings

Further resources

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".