class:inverse middle center ## *Week 1* ---- # An Introduction to the Unix shell ## *CSB Ch. 1: Unix* <br> <br> <br> <br> <br> ### Jelmer Poelstra ### 2021/01/14 --- class: inverse middle center # Overview ---- .pull-left[ ### [Ch. 1.1-1.2 – Intro](#shell-intro) ### [Ch. 1.3 – Getting started: Unix](#unix-intro) ### [Ch. 1.4 – Getting started: Shell](#shell-intro) ### [Ch. 1.5 – Basic commands](#basic) ] .pull-right[ ### [Ch. 1.6 – Advanced commands](#advanced) ### [Wrap-up & The Unix philosophy](#wrap-up) ### [*Bonus material*](#bonus) ] <br> <br> <br> <br> --- class: inverse middle center name: shell-intro # Ch. 1.1-1.2 – Intro ---- <br><br><br><br> --- ## Why use the Unix shell? - **Using software** Best or only option to use many programs, especially in bioinformatics. - **Automation and fewer errors** The shell allows you to repeat and automate tasks easily and without introducing errors. - **Reproducibility** The shell keeps a record of what you have done. - **Remote computing – especially HPCs** Often not possible to do *anything* outside of the terminal. -- ### Versus scripting languages like Python or R - For many simpler tasks, built-in shell tools are faster — writing time, processing time, ease of processing very large files. - *Direct* interface to other programs. --- ## Accessing a shell at OSC 1. Go to <https://ondemand.osc.edu> in your browser, and log in. 2. Click on `Clusters` and then `Pitzer Shell Access`. -- <br> .content-box-info[ You can't right-click in this shell-in-the-browser, but: - To **paste**, use <kbd>Ctrl</kbd>+<kbd>V</kbd>. - To **copy**, simply select text and you should see a `<i class="fas fa-cut "></i>`{=html} icon. ] --- class: inverse middle center name: unix-intro # Ch. 1.3 – Getting started with Unix ---- <br> <br> <br> <br> --- ## Ch. 1.3 – Unix directory structure - The Unix directory structure is hierarchical, with a single starting point: the **root**, depicted as **`/`**. - `/bin` and also `/usr/bin` contains *bin*aries: executable programs. -- <br> - At OSC, our home dir is not `/home/<username>` – but can, as always, be accessed using **`$HOME`** or **`~`**. ```sh $ echo $HOME # The "echo" command prints text to screen $ echo ~ ``` .content-box-info[ **`$HOME`** is an *environment variable*, like **`$USER`**, which contains your username (we'll be using that one a lot). ] --- class: inverse middle center name: shell-intro # Ch. 1.4 – Getting started with the shell ---- <br> <br> <br> <br> --- ## 1.4.1 – The `cal` command as an example - Invoke `cal` without options or arguments: ```sh $ cal ``` - Provide the *argument* `2020`: ```sh $ cal 2020 ``` - Use the *option* `-j` for a Julian calendar: ```sh $ cal -j ``` - Also use the `-3` option, to show 3 months: ```sh $ cal -j -3 $ cal -j3 # You can also combine options like this ``` --- ## 1.4.1 – Command structure <p align = "center"> <img src=img/command-structure-1.svg width="90%"> </p> --- ## 1.4.1 – Command structure <p align = "center"> <img src=img/command-structure-2.svg width="90%"> </p> --- ## 1.4.1 – Command structure <p align = "center"> <img src=img/command-structure-3.svg width="90%"> </p> --- ## 1.4.2 – Getting help - Most commands have a **`man`**ual entry: ```sh $ man cal ``` .content-box-info[ Type `q` to exit the pager that `man` launches. ] <br> - Many commands – *and other programs!* – have a **`-h`**/**`--help`** option, which tends to give a more concise summary of available options: ```sh $ cal -h $ cal --help ``` --- ## 1.4 – Keyboard shortcuts ### Most keyboard shortcuts for efficiency, some invaluable: - `Up` / `Down` arrow keys to cycle through command history. - <kbd>Ctrl</kbd>+<kbd>C</kbd> to kill the currently running command. Try this: ```sh # For each, see what happens, then press Ctrl + C $ sleep 60s $ echo "Missing quote" # Omit closing quote! ``` - <kbd>Ctrl</kbd>+<kbd>D</kbd> or typing `exit` will exit a shell. <br> -- .content-box-warning[ Another shell shortcut you may be aware of is **<kbd>Ctrl</kbd>+<kbd>W</kbd>**, which cuts the *word* behind the cursor – but in a browser, it closes your tab! ] --- ## 1.4 – Practicing keyboard shortcuts - Tab completion: - Type **`/f`** and press <kbd>Tab</kbd> (will autocomplete to */fs/*) - Then add **`e`** (*/fs/e*) and press <kbd>Tab</kbd>. - Add **`PAS`** (*/fs/project/PAS*) and press <kbd>Tab</kbd> <kbd>Tab</kbd>. - Add **`185`** (*/fs/project/PAS18*) and press <kbd>Tab</kbd> <kbd>Tab</kbd>. - Complete it to */fs/project/PAS1855* but don't press <kbd>Enter</kbd>. --- ## 1.4 – Practicing keyboard shortcuts (cont.) - Press <kbd>Ctrl</kbd>+<kbd>A</kbd> to move to the beginning of the line, and add **`ls`** to the beginning: `ls /fs/project/PAS1855`. Press <kbd>Enter</kbd> (anywhere on the line!). -- <br> - Press <kbd>⇧</kbd> to get the previous command back on the prompt, and then press <kbd>Ctrl+U</kbd> to delete until the beginning of the line. - <kbd>Ctrl</kbd>+<kbd>U</kbd> actually *cut* the text: "Yank" it back with <kbd>Ctrl</kbd>+<kbd>Y</kbd>. --- ## 1.4 – Getting set up .content-box-info[ **Our base OSC directory for the course:** `/fs/ess/PAS1855/` ] - Create a directory for yourself using `mkdir`: ```sh $ mkdir /fs/ess/PAS1855/users/$USER ``` - Move into this dir using `cd` – after typing `cd` and a space, press <kbd>Alt</kbd>+<kbd>.</kbd>: ```sh $ cd /fs/ess/PAS1855/users/$USER ``` -- - Get the files associated with the CSB book by "*cloning*" the Git repo: ```sh $ git clone https://github.com/CSB-book/CSB.git ``` - Move into the *sandbox* dir for the Unix chapter: ```sh $ cd CSB/unix/sandbox # Use tab completion ``` --- ## 1.4.3 – Navigation - To see where you are, **`pwd`** will **`p`**rint **`w`**orking **`d`**ir: ```sh pwd ``` - Use **`cd`** to **`c`**hange **`d`**irectory, here with an *absolute path* (starting from **`/`**): ```sh $ cd /fs/ess/PAS1855/users/$USER/CSB/python/data ``` - We'll try a *relative path* to move back: ```sh $ cd ../../unix/sandbox ``` - Move back and forth using `cd -`: ```sh $ cd - # Back to python/data $ cd - # Back to unix/sandbox ``` --- ### 1.4.3 – Navigation shortcuts ```sh $ cd ~ # Move to your home dir $ cd .. # Move one dir up $ cd ../.. # Move two dirs up (sequence can be expanded) $ cd - # Go back to last visited dir (like "Back" in a browser) $ cd . # This doesn't move you anywhere, as `.` is a shortcut # for the current working directory (useful elsewhere) ``` <br> .content-box-info[ All except the **`-`** are *general shell shortcuts* that work with other commands, too (e.g., `ls ..`). ] --- class: inverse middle center name: basic # Ch. 1.5 – Basic Unix commands ---- <br> <br> <br> <br> --- ## 1.5.1 – Handling directories and files - Use **`cp`** to **`c`**o**`p`**y files and optionally directories: ```sh $ cp ../data/Buzzard2015_about.txt \ /fs/ess/PAS1855/users/$USER/CSB/unix/sandbox/ $ cp ../data/Buzzard2015_about.txt . # . for current dir ``` .content-box-info[ **`\`** can be used to continue a command on another line: it "escapes" the newline character. ] --- ## 1.5.1 – Handling directories and files - Use **`cp`** to **`c`**o**`p`**y files and optionally directories: ```sh $ cp ../data/Buzzard2015_about.txt \ /fs/ess/PAS1855/users/$USER/CSB/unix/sandbox/ $ cp ../data/Buzzard2015_about.txt . # . for current dir ``` .content-box-warning[ We overwrote the first copy of the Buzzard file with the second, and the shell did not warn us about it (!). (To prevent this behavior, use `set -o noclobber`.) ] --- ## 1.5.1 – Handling directories and files - Use **`cp`** to **`c`**o**`p`**y files and optionally directories: ```sh $ cp ../data/Buzzard2015_about.txt \ /fs/ess/PAS1855/users/$USER/CSB/unix/sandbox/ $ cp ../data/Buzzard2015_about.txt . # . for current dir ``` - Copy using a new name for the copy: ```sh $ cp ../data/Buzzard2015_about.txt buzz2.txt ``` - The `-r` option is needed for recursive copying: ```sh $ cp -r ../data/ . ``` --- ## 1.5.1 – Handling directories and files (cont.) - Use **`mv`** both to **`m`**o**`v`**e and rename files (this is really the same operation): ```sh $ mv buzz2.txt buzz_copy.txt $ mv Buzzard2015_about.txt ../data/ ``` .content-box-info[ `mv` works for directories without the need to specify an option. ] <br> .content-box-answer[ For both `cp` and `mv`: - If the destination is an existing dir, the file will keep its name. - If it is not an existing dir, the path specifies the new name. ] --- ## 1.5.1 – Handling directories and files (cont.) - **`rm`** **`r`**e**`m`**oves files: ```sh $ rm buzz_copy.txt ``` - Like with `cp`, the **`-r`** option is needed to work on dirs: ```sh $ mkdir -p d1/d2/d3 # mkdir -p to create multiple levels at once $ rm d1 # NOPE $ rm -r d1 ``` -- <br> .content-box-warning[ There is no thrash bin when deleting files in the shell, so use `rm` with caution! ] ```sh $ cd ../data # Move to the data dir for the next commands ``` --- ## 1.5.2 – Viewing and processing text files - **`cat`** will print the contents of one or more files to screen: ```sh $ cat Marra2014_about.txt Gesquiere2011_about.txt ``` - **`wc`** will count lines, words, and characters by default – but is used mostly to just count lines with the `-l` option: ```sh $ wc -l Marra2014_about.txt ``` -- - **`head`** and **`tail`** will print the first or last lines of a file: ```sh $ head Gesquiere2011_data.csv # Default: first 10 lines $ tail -n 2 Gesquiere2011_data.csv ``` - A neat trick to *start at* a specific line, here to skip the header: ```sh $ tail -n +2 Gesquiere2011_data.csv ``` --- class: inverse middle center name: advanced # Ch. 1.6 – Advanced Unix commands ---- <br> <br> <br> <br> --- ## 1.6.1 – Redirection and pipes ### Standard output and redirection - The regular output of a command is called **"standard out"** ("*stdout*"). By default, this is printed to screen, but it can be "*redirected*" to a file. - With "**`>`**", we **redirect** output to a file: - If the file doesn't exist, it will be *created*. - If the file does exist, any contents will be *overwritten*. ```sh $ cd ../sandbox $ echo "My first line" $ echo "My first line" > test.txt $ cat test.txt ``` -- - With "**`>>`**", we **append** the output to a file: ```sh $ echo "My second line" >> test.txt $ cat test.txt ``` --- ## 1.6.1 – Redirection and pipes ### Standard input and pipes - A file name can be given as an argument to most commands: ```sh $ ls ../data/Saavedra2013/ > filelist.txt $ cat filelist.txt $ wc -l filelist.txt ``` -- - Most commands *also* accept input from "standard input" (*stdin*) using the **pipe**, "**`|`**": ```sh $ ls ../data/Saavedra2013/ | wc -l ``` <br> .content-box-info[ Pipes avoids the need write/read files – this saves typing, and not writing/reading also makes the operation much quicker. ] --- ## 1.6.2 – Selecting columns using `cut` - Let's have a look at the data first: ```sh $ cd ../data $ head -n 2 Pacifici2013_data.csv ``` - To select a column, specify the specify the column *delimiter* with **`-d`**, and the numbers of the desired columns (or *fields*) with **`-f`**: ```sh $ cut -d ";" -f 1 Pacifici2013_data.csv | head -n 3 ``` -- - To select multiple columns, use a range or comma-delimited list: ```sh $ cut -d ";" -f 1-4 Pacifici2013_data.csv | head -n 3 $ cut -d ";" -f 2,8 Pacifici2013_data.csv | head -n 3 ``` -- .content-box-info[ Unfortunately, we can't *reorder* columns using `cut`. ] --- ## 1.6.2 – Combining `cut`, `sort`, and `uniq` <br> to create a list Let's say we want an alphabetically sorted list of animal *orders* from the `Pacifici2013_data.csv` file... -- - First, we select our column of interest, and get rid of the header line: ```sh $ cut -d ";" -f 2 Pacifici2013_data.csv | \ tail -n +2 | \ ``` - Next, we pipe to `sort` to sort the result, and then to `uniq` to only keep unique rows (values): ```sh $ cut -d ";" -f 2 Pacifici2013_data.csv | \ tail -n +2 | \ sort | uniq ``` .content-box-info[ `uniq` only removes all duplicates for *sorted* data. ] --- name: tr ## 1.6.3 – Substituting characters using `tr` - `tr` for **`tr`**anslate will substitute characters – here, any `a` for a `b`: ```sh $ echo "aaaabbb" | tr a b ``` - `tr` also works well with multiple values and replacements: ```sh echo "123456789" | tr 1-5 0 # Replace any of 1-5 with 0 # One-to-one mapping of input and replacement! $ echo "ACtGGcAaTT" | tr actg ACTG # ACTGGCAATT $ echo "aabbccddee" | tr a-c 1-3 # 112233ddee ``` -- - Deletion and "squeezing": ```sh $ echo "aabbccddee" | tr -d a # Delete all a's # bbccddee $ echo "aabbccddee" | tr -s a # Remove consecutive duplicates # abbccddee ``` --- ## 1.6.3 – Substituting characters using `tr` (cont.) - `tr` does not take file names as an argument, so how can we provide it with input from a file? -- - Using a pipe: ```sh $ cat inputfile.csv | tr " " "\t" > outputfile.csv ``` - By *redirecting* the input (**`<`**): ```sh $ tr " " "\t" < inputfile.csv > outputfile.csv ``` <br> .content-box-info[ In, **`\t`**, the **`\`** gives the **`t`** a special meaning: **`Tab`**. We'll talk more about regular expressions later. ] --- ## 1.6.5 – `grep` to print lines matching a pattern - Print all lines containing "Vombatidae": ```sh $ grep "Vombatidae" Pacifici2013_data.csv ``` - Print all lines *not* containing "Vombatidae": ```sh $ grep -v "Vombatidae" Pacifici2013_data.csv # v for inVert ``` -- .content-box-info[ - It is best to always use quotes around the pattern - Incomplete matches work: "`Vombat`" matches `Vombatidae` ] -- - **Many other useful options — more in Week 4:** - `-c` to count matches - `-i` to ignore case - `-r` to search file recursively – **very useful** - `-B` and `-A` to print lines surrounding matches - `-w` to match "words" --- class: inverse middle center name: wrap-up # Wrap-up & the Unix philosophy ---- <br> <br> <br> <br> --- ## Covered in the chapter but not in the slides - The `less` pager to view files - The `find` command to find files - Showing and changing file permissions - Basic shell scripting - `for` loops - `$PATH` and bash profile settings <br> **We'll cover all of these in class over the next few weeks.** --- ## The Unix philosophy > *This is the Unix philosophy: Write programs that do one thing and do it well.* > *Write programs to work together. Write programs to handle text **streams**,* > *because that is a universal interface.*<br> > — Doug McIlory <br> -- ### Advantages of a modular approach - Easier to spot errors - Easy to swap out components, including in other languages - Easier to learn (?) --- ## The Unix philosophy > *This is the Unix philosophy: Write programs that do one thing and do it well.* > *Write programs to work together. Write programs to handle text **streams**,* > *because that is a universal interface.*<br> > — Doug McIlory <br> ### Text "streams"? Rather than loading entire files into memory, process them one line at a time. Very useful with large files! ```sh $ cat *.fa > combined.fa ``` --- ## Looking ahead - Next two weeks, primary focus on project organization and then Git, with more shell practice and tools along the way. - In weeks 4-6, we'll fully focus on the shell and shell scripting. <br> -- .content-box-info[ If you're feeling a bit overwhelmed, note that we will repeat lots of today's content in week 4 (Buffalo chapter 7). ] --- class: inverse middle center # Questions? ---- <br> <br> <br> <br> --- class: inverse middle center name: bonus # Bonus material ---- <br> ### [Keyboard shortcuts & tricks](#keyboard-shortcuts) ### [Other CSB-Ch.1 sections, and "uniq -c" bonus](#uniqc) <br> <br> <br> --- background-color: #f2f5eb name: keyboard-shortcuts ## Keyboard shortcuts and tricks - I | Shortcut | Command | Function |--|--| | <kbd>Tab</kbd> | | Tab completion! Files, commands, etc. <br> Double <kbd>Tab<kbd> to show options when <br> multiple are still available.| | <kbd>⇧</kbd> / <kbd>⇩</kbd> | | Cycle through command history| | <kbd>CTRL</kbd> + <kbd>R</kbd> | | Enter characters to search for in the history <br> (repeat <kbd>CTRL</kbd> + <kbd>R</kbd> to keep going back, <br> <kbd>ENTER</kbd> to put command in prompt) | | <kbd>CTRL</kbd> + <kbd>C</kbd> | | Abort (kill) current process | | <kbd>CTRL</kbd> + <kbd>D</kbd> | `exit` | Exit the current shell (/ interactive job) | | <kbd>CTRL</kbd> + <kbd>Z</kbd> | (`bg` / `fg`)| Suspend (pause) a process, <br> then use `bg` to move to background | --- background-color: #f2f5eb ## Keyboard shortcuts and tricks - II | Shortcut | Function |--|--| | <kbd>Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>C</kbd> | Copy | | <kbd>Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>V</kbd> | Paste | | <kbd>Ctrl</kbd>+<kbd>A</kbd> | Go to beginning of line | | <kbd>Ctrl</kbd>+<kbd>E</kbd> | Go to end of line | | <kbd>Ctrl</kbd>+<kbd>U</kbd> | Cut to beginning of line | | <kbd>Ctrl</kbd>+<kbd>K</kbd> | Cut to end of line | | <kbd>Ctrl</kbd>+<kbd>W</kbd> | Cut previous word | | <kbd>Ctrl</kbd>+<kbd>Y</kbd> | Paste previously cut element | | **<kbd>Alt</kbd>+<kbd>.</kbd>** | Paste last argument of last command | - For the last one, can also use `!$`: ```sh mkdir -p ~/long/path/I/dont/want/to/retype.txt cd !$ ``` --- background-color: #f2f5eb ## 1.4.3 – The trouble with spaces - Because spaces are special characters used to separate commands from options and arguments, etc., using them in file names is inconvenient at best: ```sh # You should be in /fs/ess/PAS1855/users/$USER/CSB/unix/sandbox ls cd Papers and reviews # NOPE! cd Papers\ and\ reviews # \ to escape each individual space cd "Papers and reviews" # Quotes to escape special characters ``` .content-box-info[ We'll talk about spaces and file names in week 2. ] --- background-color: #f2f5eb name: uniqc ## 1.6.2 – Selecting columns using `cut`: Bonus - Use `uniq -c` to count occurrences of each unique value! ```sh $ cut -d ";" -f 2 Pacifici2013_data.csv | tail -n +2 | sort \ | uniq -c # 54 Afrosoricida # 280 Carnivora # 325 Cetartiodactyla # 1144 Chiroptera # 21 Cingulata ``` ```sh $ cut -d ";" -f 2 Pacifici2013_data.csv | tail -n +2 | sort \ | uniq -c | sort -nr # 2220 Rodentia # 1144 Chiroptera # 442 Eulipotyphla # 418 Primates ``` .content-box-info[ We'll see `uniq -c` again in week 4. ] --- background-color: #f2f5eb ## 1.6.3 – Building a Unix pipeline with `tr` and others Let's say we want a list of animals sorted by body weight... ```sh $ cd ../sandbox/ $ tail -n +2 ../data/Pacifici2013_data.csv ``` - Select the columns of interest with `cut`: ```sh $ tail -n +2 ../data/Pacifici2013_data.csv | \ cut -d ";" -f 2-3 | head # Using head just to check ``` - Change the *delimiter* with `tr`: ```sh $ tail -n +2 ../data/Pacifici2013_data.csv | \ cut -d ";" -f 5-6 | tr ";" " " | head ``` - Sort in reverse numerical order with `sort`, and redirect output to a file: ```sh $ tail -n +2 ../data/Pacifici2013_data.csv | \ cut -d ";" -f 5-6 | tr ";" " " | sort -r -n -k 3 > BodyM.csv $ head BodyM.csv ``` --- background-color: #f2f5eb ## 1.6.4 – Wildcards ### Use wildcards with any command to match file names of interest: - **`*`** will match any number (including zero) of any character, and **`?`** will match any *single* character. ```sh cd ~/CSB/unix/data/miRNA # Match any file ending in ".fasta": wc -l *.fasta # Match any file starting with "pp" head -n 2 pp* # Match any file ending in a period and then three characters: file *.??? ``` .content-box-info[ We'll talk more about wildcards in week 2. ]