pracs-sp21: Week 4 content overview and readings

Content overview for this week

This week, we will continue learning about working in the Unix shell, using Buffalo chapter 7, with a focus on data tools. New tools will include the less pager and the advanced commands sed and awk (and bioawk).

We’ll also revisit and expand our knowledge of several tools we have already seen in week 1, such as grep, sort, and the use of Unix “pipelines”. The examples will focus on working with sequence data in several formats.

Some of the things you will learn this week:

How clever usage of Unix pipelines can be a large time-saver when needing to extract, convert and summarize data.
Using less to view files and quickly find and view text matches.
Using grep to find, count, and extract occurrences of strings and patterns.
Using sort together with cut and uniq / uniq -c to summarize columns of tabular data.
Using awk for many tasks on tabular data, including filtering rows and adding columns with arithmetic, and calculating means.
Use sed to substitute text and reformat data.
Optionally, understand subshells and process substitution.
Optionally, use join to merge files by shared columns.

Readings

This week, you will read chapter 7 of Buffalo. It’s a pretty dense chapter with lots of examples. The examples all involve sequence data but if you don’t work with sequence data, it should be fairly easy to see how these tools could be applied any kind of text and tabular data.

Required readings

Buffalo Chapter 7: “Unix Data Tools”

The following sections can be skimmed or skipped, depending on your bandwidth and interest in these topics:
- “Decoding Plain-Text Data: hexdump”
- “Bioawk: an Awk for Biological Formats” (skim or skip if you’re not working with sequence data)
- “Advanced Shell Tricks”

Week 4 content overview and readings

Content overview for this week

Readings

Required readings

Reuse