Topic overview
| Command | Description | Examples / options | 
|---|---|---|
| pwd | Print current working directory (dir). | pwd | 
| ls | List files in working dir (default) or elsewhere. | ls data/-llong format-hhuman-readable file sizes-ashow hidden files | 
| cd | Change working dir. As with all commands, you can use an absolute path (starting from the root dir /) or a relative path (starting from the current working dir). | cd /fs/ess/PAS1855(With absolute path)cd ../..(Two levels up)cd -(To previous dir) | 
| cp | Copy files or, with -r, dirs and their contents (i.e., recursively).If target is a dir, file will keep same name; otherwise, a new name can be provided. | cp *.fq data/(All .fq files into dir data)cp my.fq data/new.fq(With new name)cp -r data/ ~(Copy dir and contents to home dir) | 
| mv | Move/rename files or dirs ( -rnot needed).If target is a dir, file will keep same name; otherwise a new name can be provided. | mv my.fq data/(Keep same name)mv my.fq my.fastq(Simple rename)mv file1 file2 mydir/(Last arg is destination) | 
| rm | Remove files or dirs/recursively (with -r).With -f(force), any write-protections that you have set will be overridden. | rm *fq(Remove all matching files)rm -r mydir/(Remove dir & contents)-iPrompt for confirmation-fForce remove | 
| mkdir | Create a new dir. Use -pto create multiple levels at once and to avoid an error if the dir exists. | mkdir my_new_dirmkdir -p new1/new2/new3 | 
| touch | If file does not exist: create empty file. If file exists: change last-modified date. | touch newfile.txt | 
| cat | Print file contents to standard out (screen). | cat my.txtcat *.fa > concat.fq(Concatenate files) | 
| head | Print the first 10 lines of a file or specify number with -n <n>or shorthand-<n>. | head -n 40 my.fq(print 40 lines)head -40 my.fq(equivalent) | 
| tail | Like headbut print the last lines. | tail -n +2 my.csv(“trick” to skip first line)tail -f slurm.out(“follow” file) | 
| less | View a file in a file pager; type qto exit. See below for more details. | less myfile-Sdisable line-wrapping | 
| column -t | View a tabular file with columns nicely lined up in the shell. | Nice viewing of a CSV file: column -s "," -t my.csv | 
| history | Print previously issued commands. | history | grep "cut"(Find previouscutusage) | 
| chmod | Change file permissions for file owner (user, u), “group” (g), others (o) or everyone (all;a). Permissions can be set for reading (r), writing (w), and executing (x).ddddddddddddddddddddddddddddddddddddd | chmod u+x script.sh(Make script executable)chmod a=r data/raw/*(Make data read-only)-Rrecursiveddddddddddddddddddddddddddddddddddddddddddddd | 
| Command | Description | Examples and options | 
|---|---|---|
| wc -l | Count the number of lines in a file. | wc -l my.fq | 
| cut | Select one or more columns from a file. | Select columns 1-4: cut -f 1-4 my.csv-d ","comma as delimiter | 
| sort | Sort lines. The  | Sort column 1 alphabetically, column 2 reverse numerically: sort -k1,1 -k2,2nr my.bed-k 1,1by column 1 only-nnumerical sorting-rreverse order-Vrecognize number with string | 
| uniq | Remove consecutive duplicate lines (often from single-column selection): i.e., removes all duplicates if input is sorted. | Unique values for column 2: cut -f2 my.tsv | sort | uniq | 
| uniq -c | If input is sorted, create a count table for occurrences of each line (often from single-column selection). | Count table for column 3: cut -f3 my.tsv | sort | uniq -c | 
| tr | Substitute (translate) characters or character classes (like  To “squeeze” ( | TSV to CSV: cat my.csv | tr "\t" ","Uppercase to lowercase: tr A-Z a-z < in.txt > out.txt-ddelete-ssqueeze | 
| grep | Search files for a pattern and print matching lines (or only the matching string with  Default regex is basic (GNU BRE): use  To print lines surrounding a match, use  ddddddddddddddddddddddddddddddddddddddd | Match AAC or AGC: grep "A[AG]C" my.faOmit comment lines: grep -v "^# my.gff-ccount-iignore case-rrecursive-vinvert-oprint match only | 
| Symbol | Meaning | example | 
|---|---|---|
| / | Root directory. | cd / | 
| . | Current working directory. | cp data/file.txt .(Copy to working dir)Use ./to execute script if not in$PATH:./myscript.sh | 
| .. | One directory level up. | cd ../..(Move 2 levels up) | 
| ~or$HOME | Home directory. | cp myfile.txt ~(Copy to home) | 
| $USER | User name. | mkdir $USER | 
| > | Redirect standard out to a file. | echo "My 1st line" > myfile.txt | 
| >> | Append standard out to a file. | echo "My 2nd line" >> myfile.txt | 
| 2> | Redirect standard error to a file. | Send standard out and standard error for a script to separate files: myscript.sh >log.txt 2> err.txt | 
| &> | Redirect standard out and standard error to a file. | myscript.sh &> log.txt | 
| | | Pipe standard out (output) of one command into standard in (input) of a second command | The output of the sortcommand will be piped into head to show the first lines:sort myfile.txt | head | 
| {} | Brace expansion. Use ..to indicate numeric or character ranges (1..4=>1,2,3,4) and,to separate items. | mkdir Jan{01..31}(Jan01, Jan02, …, Jan31)touch fig1{A..F}(fig1A, fig1B, …, fig1F)mkdir fig1{A,D,H}(fig1A, fig1D, fig1D) | 
| $() | Command substitution. Allows for flexible usage of the output of any command: e.g., use command output in an echostatement or assign it to a variable. | Report number of FASTQ files: echo "I see $(ls *fastq | wc -l) files"Substitute with date in YYYY-MM-DD format: mkdir results_$(date +%F)nlines=$(wc -l < $infile) | 
| $PATH | Contains colon-separated list of directories with executables: these will be searched when trying to execute a program by name. ddddddddddddddddddddddddddddddddddddd | Add dir to path: PATH=$PATH:/new/dir(But for lasting changes, edit the Bash configuration file ~./bashrc.) dddddddddddddddddddddddddddddddd | 
| Wildcard | Matches | |
|---|---|---|
| * | Any number of any character, including nothing. | ls data/*fastq.gz(Matches any file ending in “fastq.gz”)ls *R1*(Matches any file containing “R1” somewhere in the name.) | 
| ? | Any single character. | ls sample1_?.fastq.gz(Matchessample1_A.fastq.gzbut notsample1_AA.fastq.gz) | 
| [] and [^] | One or none ( ^) of the “character set” within the brackets.ddddddddddddddddddddddddddddddddddddd | ls fig1[A-C](Matchesfig1A,fig1B,fig1C)ls fig[0-3](Matchesfig0,fig1,fig2,fig3)ls fig[^4]*(Does not match files with a “4” after “fig”)ddddddddddddddddddddddddddddddddddddddd | 
Note: ERE = GNU “Extended Regular Expressions”. If “yes” in ERE column, then the symbol needs ERE to work1: use a -E flag for grep and sed (note that awk uses ERE by default) to turn on ERE.
| Symbol | ERE | Matches | Example | 
|---|---|---|---|
| . | Any single character | Match Olfrwith none or any characters after it:grep -o "Olfr.*" | |
| * | Quantifier: matches preceding character any number of times | See previous example. | |
| + | yes | Quantifier: matches preceding character at least once | At least two consecutive digits: grep -E [0-9]+ | 
| ? | yes | Quantifier: matches preceding character at most once | Only a single digit: grep -E [0-9]? | 
| {m}/{m,}/{m,n} | yes | Quantifier: match preceding character mtimes / at leastmtimes /mtontimes | Between 50 and 100 consecutive Gs: grep -E "G{50,100}" | 
| ^/$ | Anchors: match beginning / end of line | Exclude empty lines: grep -v "^$"Exclude lines beginning with a “#”: grep -v "^#" | |
| \t | Tab (To match in grep, needs-Pflag for Perl-like regex) | echo -e "column1 \t column2" | |
| \n | Newline (Not straightforward to match since Unix tools are line-based.) | echo -e "Line1 \n Line2" | |
| \w | (yes) | “Word” character: any alphanumeric character or “_”. Needs -E(ERE) ingrepbut not insed. | Match gene_idfollowed by a space and a “word”:grep -E -o 'gene_id "\w+"'Change any word character to X: sed s/\w/X/ | 
| | | yes | Alternation / logical or: match either the string before or after the | | Find lines with either intronorexon:grep -E "intron|exon" | 
| () | yes | Grouping | Find “AAG” repeated 10 times: grep (AAG){10} | 
| \1,\2, etc. | yes | Backreferences to groups captured with (): first group is\1, second group is\2, etc.ddddddddddddddddddddddddddddddddddddd | Invert order of two words: sed -E 's/(\w+) (\w+)/\2 \1/'ddddddddddddddddddddddddddddddddddddd | 
less| Key | Function | 
|---|---|
| q | Exit less | 
| space / b | Go down / up a page. ( pgup/pgdnusually also work.) | 
| d / u | Go down / up half a page. | 
| g / G | Go to the first / last line ( home/endalso work). | 
| / <pattern>or ?<pattern> | Search for <pattern>forwards / backwards: type your search after/or?. | 
| n / N | When searching, go to next / previous search match. dddddddddddddddddddddddddddddddddddddddddddddddddddd | 
sedsed flags:| Flag | Meaning | 
|---|---|
| -E | Use extended regular expressions | 
| -e | When using multiple expressions, precede each with -e | 
| -i | Edit a file in place | 
| -n | Don’t print lines unless specified with pmodifier | 
sed examples# Replace "chrom" by "chr" in every line,
# with "i": case insensitive, and "g": global (>1 replacements per line)
sed 's/chrom/chr/ig' chroms.txt
# Only print lines matching "abc":
sed -n '/abc/p' my.txt
# Print lines 20-50:
sed -n '20,50p'
# Change the genomic coordinates format chr1:431-874 ("chrom:start-end")
# ...to one that has a tab ("\t") between each field:
echo "chr1:431-874" | sed -e 's/:/\t/' -e 's/-/\t/'
#> chr1    431     874
# Invert the order of two words:
echo "inverted words" | sed -E 's/(\w+) (\w+)/\2 \1/'
#> words inverted
# Capture transcript IDs from a GTF file (format 'transcript_id "ID_I_WANT"'):
# (Needs "-n" and "p" so lines with no transcript_id are not printed.) 
grep -v "^#" my.gtf | sed -E -n 's/.*transcript_id "([^"]+)".*/\1/p'
# When a pattern contains a `/`, use a different expression delimiter:
echo "data/fastq/sampleA.fastq" | sed 's#data/fastq/##'
#> sampleA.fastqawkRecords and fields: by default, each line is a record (assigned to $0). Each column is a field (assigned to $1, $2, etc).
Patterns and actions: A pattern is a condition to be tested, and an action is something to do when the pattern evaluates to true.
Omit the pattern: action applies to every record.
awk '{ print $0 }' my.txt     # Print entire file
awk '{ print $3,$2 }' my.txt  # Print columns 3 and 2 for each lineOmit the action: print full records that match the pattern.
# Print all lines for which:
awk '$3 < 10' my.bed          # Column 3 is less than 10
awk '$1 == "chr1"' my.bed     # Column 1 is "chr1"
awk '/chr1/' my.bed           # Regex pattern "chr1" matches
awk '$1 ~ /chr1/' my.bed      # Column 1 _matches_ "chr1"awk examples# Count columns in a GTF file after excluding the header
# (lines starting with "#"):
awk -F "\t" '!/^#/ {print NF; exit}' my.gtf
# Print all lines for which column 1 matches "chr1" and the difference
# ...between columns 3 and 2 (feature length) is less than 10:
awk '$1 ~ /chr1/ && $3 - $2 > 10' my.bed
# Select lines with "chr2" or "chr3", print all columns and add a column 
# ...with the difference between column 3 and 2 (feature length):
awk '$1 ~ /chr2|chr3/ { print $0 "\t" $3 - $2 }' my.bed
# Caclulate the mean value for a column:
awk 'BEGIN{ sum = 0 };            
     { sum += ($3 - $2) };             
     END{ print "mean: " sum/NR };' my.bedawk comparison and logical operators| Comparison | Description | 
|---|---|
| a == b | ais equal tob | 
| a != b | ais not equal tob | 
| a < b | ais less thanb | 
| a > b | ais greater thanb | 
| a <= b | ais less than or equal tob | 
| a >= b | ais greater than or equal tob | 
| a ~ /b/ | amatches regular expression patternb | 
| a !~ /b/ | adoes not match regular expression patternb | 
| a && b | logical and: aandb | 
| a||b | logical or: aorb[note typo in Buffalo] | 
| !a | not a (logical negation) | 
awk special variables and keywords| keyword/ variable | meaning | 
|---|---|
| BEGIN | Used as a pattern that matches the start of the file | 
| END | Used as a pattern that matches the end of the file | 
| NR | Number of Records (running count; in END: total nr. of lines) | 
| NF | Number of Fields (for each record) | 
| $0 | Contains entire record (usually a line) | 
| $1-$n | Contains one column each | 
| FS | Input Field Separator (default: any whitespace) | 
| OFS | Output Field Separator (default: single space) | 
| RS | Input Record Separator (default: newline) | 
| ORS | Output Record Separator (default: newline) | 
awk functions| Function | Meaning | 
|---|---|
| length(<string>) | Return number of characters | 
| tolower(<string>) | Convert to lowercase | 
| toupper(<string>) | Convert to uppercase | 
| substr(<string>, <start>, <end>) | Return substring | 
| split(<string>, <array>, <delimiter>) | Split into chunks in an array | 
| sub(<from>, <to>, <string>) | Substitute (replace) regex | 
| gsub(<from>, <to> <string>) | >1 substitution per line | 
| print | Print, e.g. column: print $1 | 
| exit | Break out of record-processing loop; e.g. to stop when match is found | 
| next | Don’t process later fields: to next iteration | 
| Shortcut | Function | 
|---|---|
| Tab | Tab completion | 
| ⇧ / ⇩ | Cycle through previously issued commands | 
| Ctrl+Shift+C | Copy selected text | 
| Ctrl+Shift+V | Paste text from clipboard | 
| Ctrl+A / Ctrl+E | Go to beginning/end of line | 
| Ctrl+U / Ctrl+K | Cut from cursor to beginning / end of line2 | 
| Ctrl+W | Cut word before before cursor3 | 
| Ctrl+Y | Paste (“yank”) | 
| Alt+. | Last argument of previous command (very useful!) | 
| Ctrl+R | Search history: press Ctrl+R again to cycle through matches, Enter to put command in prompt. | 
| Ctrl+C | Kill (stop) currently active command | 
| Ctrl+D | Exit (a program or the shell depending on the context) | 
| Ctrl+Z | Suspend (pause) a process: then use bgto move to background. | 
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".