Brief Course Recap

Practical Computing Skills for Omics Data (PLNTPTH 5006)

Jelmer Poelstra

MCIC Wooster, Ohio State University

2025-12-04

The core goals of this course

This course has tried to enable you to:

  • Do your research more reproducibly and efficiently

  • Work with large-scale “omics” datasets


It has focused primarily on general, foundational “computing skills” rather than on specific applications.


Your next course?!

To learn more about specific omics analysis, I highly recommended the follow-up course Genome Analytics (HCS 7004) by Dr. Jonathan Fresnedo-Ramirez, which will be taught in SP26.

Reproducibility

In general terms, your research is reproducible when you:

  • Share your data
  • Share your methods — in sufficient detail for anyone to redo what you did
  • All reported data, methods and results are congruent

“The most basic principle for reproducible research is: Do everything via code.”
—Karl Broman, University of Madison

Reproducibility (cont.)

We haved covered the following practices that benefit reproducibility:

  • Using code (Unix shell/Bash and R), and following best practices doing so
  • Detailed project documentation (with Markdown)
  • Data and code management & sharing (e.g. with Git and GitHub)
  • Good project file organization
  • Reporting a clear protocol or using a pipeline to (re)run your analyses
  • Using open-source software with “containers”

Efficiency and automation

We haved covered the following topics to improve efficiency and automation:

  • Writing shell scripts – and writing them in a flexible, reusable way
  • Using the Ohio Supercomputer Center (OSC) and submitting scripts as batch jobs
  • Using Nextflow/nf-core pipelines
  • Using generative AI to help with coding

Omics data

Along the way, you have learned about:

  • What “omics” data types are and what technologies produce this data
  • Common file formats for sequence data (FASTQ, FASTA, GFF/GTF, etc.)
  • Common tools for sequence data processing (FastQC, TrimGalore, etc.)
  • How to interpret basic quality control metrics for sequence data
  • How to perform an RNA-Seq analysis from start to finish

Presentation schedule for next Tuesday

Time Presenter
12:50-13:00 Alison
13:00-13:10 Anna
13:10-13:20 Elisabeth
13:20-13:30 Freddy
13:30-13:35 break
13:35-13:45 Kavya
13:45-13:55 Kelsey
13:55-14:05 Mia
14:05-14:15 Yaxin

Questions?