Week 9: Workflow and data management

Author
Affiliation

Jelmer Poelstra

Published

October 19, 2025



1 Overview

In the past few weeks, we have focused on writing code to complete individual steps of an omics (RNA-Seq) data processing workflow. This week, we’ll start by zooming out to look at the bigger picture: how do you organize and run your workflows as a whole? Next, we’ll switch gears and talk about how you should manage and share your data and other files.

2 Learning goals

Lecture A: Workflow management

  • What a Markdown protocol of your workflow can look like
  • How you can automate such workflows with Bash and Slurm, and what the associated challenges are
  • What “workflow management systems” are, and what the advantages of formal pipelines/workflows written with these are
  • That you may be able to use publicly available pipelines such as those produced by the nf-core initiative

Lecture B: Data management and transfer

  • How you can manage your data and share it after publication
  • How to transfer files between OSC and other computers like your own
  • How to download files at the command-line
  • How to manage file permissions

3 Readings

  • Perkel (2019): “Workflow systems turn raw data into scientific knowledge”
  • Consider reading the Grünwald et al. (2024) paper mentioned below, especially if you are working in plant pathology or adjacent fields

4 Assignments & exercises

5 Further resources

  • Grünwald et al. (2024): “Open Access and Reproducibility in Plant Pathology Research: Guidelines and Best Practices.”

  • Buffalo (2015) (OSU library link) – Chapter 4: “Working with Remote Machines

Back to top

References

Buffalo, Vince. 2015. Bioinformatics Data Skills [Reproducible and Robust Research With Open Source Tools]. First edition. Beijing: O’Reilly.
Grünwald, Niklaus J., Clive H. Bock, Jeff H. Chang, Alessandra Alves De Souza, Emerson M. Del Ponte, Lindsey J. du Toit, Anne E. Dorrance, et al. 2024. “Open Access and Reproducibility in Plant Pathology Research: Guidelines and Best Practices.” Phytopathology® 114 (5): 910–16. https://doi.org/10.1094/PHYTO-12-23-0483-IA.
Perkel, Jeffrey M. 2019. “Workflow Systems Turn Raw Data into Scientific Knowledge.” Nature 573 (7772): 149–50. https://doi.org/10.1038/d41586-019-02619-z.