Week 9: Workflow and data management
1 Overview
In the past few weeks, we have focused on writing code to complete individual steps of an omics (RNA-Seq) data processing workflow. This week, we’ll start by zooming out to look at the bigger picture: how do you organize and run your workflows as a whole? Next, we’ll switch gears and talk about how you should manage and share your data and other files.
2 Learning goals
Lecture A: Workflow management
- What a Markdown protocol of your workflow can look like
- How you can automate such workflows with Bash and Slurm, and what the associated challenges are
- What “workflow management systems” are, and what the advantages of formal pipelines/workflows written with these are
- That you may be able to use publicly available pipelines such as those produced by the nf-core initiative
Lecture B: Data management and transfer
- How you can manage your data and share it after publication
- How to transfer files between OSC and other computers like your own
- How to download files at the command-line
- How to manage file permissions
3 Readings
4 Assignments & exercises
- Exercises for this week
- Ungraded assignment: Local VS Code installation (deadline: Monday Oct 27 at noon)
5 Further resources
Grünwald et al. (2024): “Open Access and Reproducibility in Plant Pathology Research: Guidelines and Best Practices.”
Buffalo (2015) (OSU library link) – Chapter 4: “Working with Remote Machines”
References
Buffalo, Vince. 2015. Bioinformatics Data Skills [Reproducible and Robust Research With Open Source Tools]. First edition. Beijing: O’Reilly.
Grünwald, Niklaus J., Clive H. Bock, Jeff H. Chang, Alessandra Alves De Souza, Emerson M. Del Ponte, Lindsey J. du Toit, Anne E. Dorrance, et al. 2024. “Open Access and Reproducibility in Plant Pathology Research: Guidelines and Best Practices.” Phytopathology® 114 (5): 910–16. https://doi.org/10.1094/PHYTO-12-23-0483-IA.
Perkel, Jeffrey M. 2019. “Workflow Systems Turn Raw Data into Scientific Knowledge.” Nature 573 (7772): 149–50. https://doi.org/10.1038/d41586-019-02619-z.