Topic overview
Command | Description | Examples / options |
---|---|---|
pwd |
Print current working directory (dir). | pwd |
ls |
List files in working dir (default) or elsewhere. | ls data/ -l long format -h human-readable file sizes -a show hidden files |
cd |
Change working dir. As with all commands, you can use an absolute path (starting from the root dir / ) or a relative path (starting from the current working dir). |
cd /fs/ess/PAS1855 (With absolute path) cd ../.. (Two levels up) cd - (To previous dir) |
cp |
Copy files or, with -r , dirs and their contents (i.e., recursively). If target is a dir, file will keep same name; otherwise, a new name can be provided. |
cp *.fq data/ (All .fq files into dir data) cp my.fq data/new.fq (With new name) cp -r data/ ~ (Copy dir and contents to home dir) |
mv |
Move/rename files or dirs (-r not needed). If target is a dir, file will keep same name; otherwise a new name can be provided. |
mv my.fq data/ (Keep same name) mv my.fq my.fastq (Simple rename) mv file1 file2 mydir/ (Last arg is destination) |
rm |
Remove files or dirs/recursively (with -r ). With -f (force), any write-protections that you have set will be overridden. |
rm *fq (Remove all matching files) rm -r mydir/ (Remove dir & contents) -i Prompt for confirmation -f Force remove |
mkdir |
Create a new dir. Use -p to create multiple levels at once and to avoid an error if the dir exists. |
mkdir my_new_dir mkdir -p new1/new2/new3 |
touch |
If file does not exist: create empty file. If file exists: change last-modified date. |
touch newfile.txt |
cat |
Print file contents to standard out (screen). | cat my.txt cat *.fa > concat.fq (Concatenate files) |
head |
Print the first 10 lines of a file or specify number with -n <n> or shorthand -<n> . |
head -n 40 my.fq (print 40 lines) head -40 my.fq (equivalent) |
tail |
Like head but print the last lines. |
tail -n +2 my.csv (“trick” to skip first line) tail -f slurm.out (“follow” file) |
less |
View a file in a file pager; type q to exit. See below for more details. |
less myfile -S disable line-wrapping |
column -t |
View a tabular file with columns nicely lined up in the shell. | Nice viewing of a CSV file: column -s "," -t my.csv |
history |
Print previously issued commands. | history | grep "cut" (Find previous cut usage) |
chmod |
Change file permissions for file owner (user, u ), “group” (g ), others (o ) or everyone (all; a ). Permissions can be set for reading (r ), writing (w ), and executing (x ). ddddddddddddddddddddddddddddddddddddd |
chmod u+x script.sh (Make script executable) chmod a=r data/raw/* (Make data read-only) -R recursive ddddddddddddddddddddddddddddddddddddddddddddd |
Command | Description | Examples and options |
---|---|---|
wc -l |
Count the number of lines in a file. | wc -l my.fq |
cut |
Select one or more columns from a file. | Select columns 1-4: cut -f 1-4 my.csv -d "," comma as delimiter |
sort |
Sort lines. The |
Sort column 1 alphabetically, column 2 reverse numerically: sort -k1,1 -k2,2nr my.bed -k 1,1 by column 1 only -n numerical sorting -r reverse order -V recognize number with string |
uniq |
Remove consecutive duplicate lines (often from single-column selection): i.e., removes all duplicates if input is sorted. | Unique values for column 2: cut -f2 my.tsv | sort | uniq |
uniq -c |
If input is sorted, create a count table for occurrences of each line (often from single-column selection). | Count table for column 3: cut -f3 my.tsv | sort | uniq -c |
tr |
Substitute (translate) characters or character classes (like To “squeeze” ( |
TSV to CSV: cat my.csv | tr "\t" "," Uppercase to lowercase: tr A-Z a-z < in.txt > out.txt -d delete -s squeeze |
grep |
Search files for a pattern and print matching lines (or only the matching string with Default regex is basic (GNU BRE): use To print lines surrounding a match, use ddddddddddddddddddddddddddddddddddddddd |
Match AAC or AGC: grep "A[AG]C" my.fa Omit comment lines: grep -v "^# my.gff -c count -i ignore case -r recursive -v invert -o print match only |
Symbol | Meaning | example |
---|---|---|
/ |
Root directory. | cd / |
. |
Current working directory. | cp data/file.txt . (Copy to working dir) Use ./ to execute script if not in $PATH : ./myscript.sh |
.. |
One directory level up. | cd ../.. (Move 2 levels up) |
~ or $HOME |
Home directory. | cp myfile.txt ~ (Copy to home) |
$USER |
User name. | mkdir $USER |
> |
Redirect standard out to a file. | echo "My 1st line" > myfile.txt |
>> |
Append standard out to a file. | echo "My 2nd line" >> myfile.txt |
2> |
Redirect standard error to a file. | Send standard out and standard error for a script to separate files: myscript.sh >log.txt 2> err.txt |
&> |
Redirect standard out and standard error to a file. | myscript.sh &> log.txt |
| |
Pipe standard out (output) of one command into standard in (input) of a second command | The output of the sort command will be piped into head to show the first lines: sort myfile.txt | head |
{} |
Brace expansion. Use .. to indicate numeric or character ranges (1..4 => 1 , 2 , 3 , 4 ) and , to separate items. |
mkdir Jan{01..31} (Jan01, Jan02, …, Jan31) touch fig1{A..F} (fig1A, fig1B, …, fig1F) mkdir fig1{A,D,H} (fig1A, fig1D, fig1D) |
$() |
Command substitution. Allows for flexible usage of the output of any command: e.g., use command output in an echo statement or assign it to a variable. |
Report number of FASTQ files: echo "I see $(ls *fastq | wc -l) files" Substitute with date in YYYY-MM-DD format: mkdir results_$(date +%F) nlines=$(wc -l < $infile) |
$PATH |
Contains colon-separated list of directories with executables: these will be searched when trying to execute a program by name. ddddddddddddddddddddddddddddddddddddd |
Add dir to path: PATH=$PATH:/new/dir (But for lasting changes, edit the Bash configuration file ~./bashrc .) dddddddddddddddddddddddddddddddd |
Wildcard | Matches | |
---|---|---|
* | Any number of any character, including nothing. | ls data/*fastq.gz (Matches any file ending in “fastq.gz”) ls *R1* (Matches any file containing “R1” somewhere in the name.) |
? | Any single character. | ls sample1_?.fastq.gz (Matches sample1_A.fastq.gz but not sample1_AA.fastq.gz ) |
[] and [^] | One or none (^ ) of the “character set” within the brackets. ddddddddddddddddddddddddddddddddddddd |
ls fig1[A-C] (Matches fig1A , fig1B , fig1C ) ls fig[0-3] (Matches fig0 , fig1 , fig2 , fig3 ) ls fig[^4]* (Does not match files with a “4” after “fig”) ddddddddddddddddddddddddddddddddddddddd |
Note: ERE = GNU “Extended Regular Expressions”. If “yes” in ERE column, then the symbol needs ERE to work1: use a -E
flag for grep
and sed
(note that awk
uses ERE by default) to turn on ERE.
Symbol | ERE | Matches | Example |
---|---|---|---|
. |
Any single character | Match Olfr with none or any characters after it: grep -o "Olfr.*" |
|
* |
Quantifier: matches preceding character any number of times | See previous example. | |
+ |
yes | Quantifier: matches preceding character at least once | At least two consecutive digits: grep -E [0-9]+ |
? |
yes | Quantifier: matches preceding character at most once | Only a single digit: grep -E [0-9]? |
{m} / {m,} / {m,n} |
yes | Quantifier: match preceding character m times / at least m times / m to n times |
Between 50 and 100 consecutive Gs: grep -E "G{50,100}" |
^ / $ |
Anchors: match beginning / end of line | Exclude empty lines: grep -v "^$" Exclude lines beginning with a “#”: grep -v "^#" |
|
\t |
Tab (To match in grep , needs -P flag for Perl-like regex) |
echo -e "column1 \t column2" |
|
\n |
Newline (Not straightforward to match since Unix tools are line-based.) | echo -e "Line1 \n Line2" |
|
\w |
(yes) | “Word” character: any alphanumeric character or “_”. Needs -E (ERE) in grep but not in sed . |
Match gene_id followed by a space and a “word”: grep -E -o 'gene_id "\w+"' Change any word character to X: sed s/\w/X/ |
| |
yes | Alternation / logical or: match either the string before or after the | |
Find lines with either intron or exon : grep -E "intron|exon" |
() |
yes | Grouping | Find “AAG” repeated 10 times: grep (AAG){10} |
\1 , \2 , etc. |
yes | Backreferences to groups captured with () : first group is \1 , second group is \2 , etc. ddddddddddddddddddddddddddddddddddddd |
Invert order of two words: sed -E 's/(\w+) (\w+)/\2 \1/' ddddddddddddddddddddddddddddddddddddd |
less
Key | Function |
---|---|
q | Exit less |
space / b | Go down / up a page. (pgup / pgdn usually also work.) |
d / u | Go down / up half a page. |
g / G | Go to the first / last line (home / end also work). |
/<pattern> or ?<pattern> |
Search for <pattern> forwards / backwards: type your search after / or ? . |
n / N | When searching, go to next / previous search match. dddddddddddddddddddddddddddddddddddddddddddddddddddd |
sed
sed
flags:Flag | Meaning |
---|---|
-E |
Use extended regular expressions |
-e |
When using multiple expressions, precede each with -e |
-i |
Edit a file in place |
-n |
Don’t print lines unless specified with p modifier |
sed
examples# Replace "chrom" by "chr" in every line,
# with "i": case insensitive, and "g": global (>1 replacements per line)
sed 's/chrom/chr/ig' chroms.txt
# Only print lines matching "abc":
sed -n '/abc/p' my.txt
# Print lines 20-50:
sed -n '20,50p'
# Change the genomic coordinates format chr1:431-874 ("chrom:start-end")
# ...to one that has a tab ("\t") between each field:
echo "chr1:431-874" | sed -e 's/:/\t/' -e 's/-/\t/'
#> chr1 431 874
# Invert the order of two words:
echo "inverted words" | sed -E 's/(\w+) (\w+)/\2 \1/'
#> words inverted
# Capture transcript IDs from a GTF file (format 'transcript_id "ID_I_WANT"'):
# (Needs "-n" and "p" so lines with no transcript_id are not printed.)
grep -v "^#" my.gtf | sed -E -n 's/.*transcript_id "([^"]+)".*/\1/p'
# When a pattern contains a `/`, use a different expression delimiter:
echo "data/fastq/sampleA.fastq" | sed 's#data/fastq/##'
#> sampleA.fastq
awk
Records and fields: by default, each line is a record (assigned to $0
). Each column is a field (assigned to $1
, $2
, etc).
Patterns and actions: A pattern is a condition to be tested, and an action is something to do when the pattern evaluates to true.
Omit the pattern: action applies to every record.
awk '{ print $0 }' my.txt # Print entire file
awk '{ print $3,$2 }' my.txt # Print columns 3 and 2 for each line
Omit the action: print full records that match the pattern.
# Print all lines for which:
awk '$3 < 10' my.bed # Column 3 is less than 10
awk '$1 == "chr1"' my.bed # Column 1 is "chr1"
awk '/chr1/' my.bed # Regex pattern "chr1" matches
awk '$1 ~ /chr1/' my.bed # Column 1 _matches_ "chr1"
awk
examples# Count columns in a GTF file after excluding the header
# (lines starting with "#"):
awk -F "\t" '!/^#/ {print NF; exit}' my.gtf
# Print all lines for which column 1 matches "chr1" and the difference
# ...between columns 3 and 2 (feature length) is less than 10:
awk '$1 ~ /chr1/ && $3 - $2 > 10' my.bed
# Select lines with "chr2" or "chr3", print all columns and add a column
# ...with the difference between column 3 and 2 (feature length):
awk '$1 ~ /chr2|chr3/ { print $0 "\t" $3 - $2 }' my.bed
# Caclulate the mean value for a column:
awk 'BEGIN{ sum = 0 };
{ sum += ($3 - $2) };
END{ print "mean: " sum/NR };' my.bed
awk
comparison and logical operatorsComparison | Description |
---|---|
a == b |
a is equal to b |
a != b |
a is not equal to b |
a < b |
a is less than b |
a > b |
a is greater than b |
a <= b |
a is less than or equal to b |
a >= b |
a is greater than or equal to b |
a ~ /b/ |
a matches regular expression pattern b |
a !~ /b/ |
a does not match regular expression pattern b |
a && b |
logical and: a and b |
a | | b |
logical or: a or b [note typo in Buffalo] |
!a |
not a (logical negation) |
awk
special variables and keywordskeyword/ variable |
meaning |
---|---|
BEGIN |
Used as a pattern that matches the start of the file |
END |
Used as a pattern that matches the end of the file |
NR |
Number of Records (running count; in END : total nr. of lines) |
NF |
Number of Fields (for each record) |
$0 |
Contains entire record (usually a line) |
$1 - $n |
Contains one column each |
FS |
Input Field Separator (default: any whitespace) |
OFS |
Output Field Separator (default: single space) |
RS |
Input Record Separator (default: newline) |
ORS |
Output Record Separator (default: newline) |
awk
functionsFunction | Meaning |
---|---|
length(<string>) |
Return number of characters |
tolower(<string>) |
Convert to lowercase |
toupper(<string>) |
Convert to uppercase |
substr(<string>, <start>, <end>) |
Return substring |
split(<string>, <array>, <delimiter>) |
Split into chunks in an array |
sub(<from>, <to>, <string>) |
Substitute (replace) regex |
gsub(<from>, <to> <string>) |
>1 substitution per line |
print |
Print, e.g. column: print $1 |
exit |
Break out of record-processing loop; e.g. to stop when match is found |
next |
Don’t process later fields: to next iteration |
Shortcut | Function |
---|---|
Tab | Tab completion |
⇧ / ⇩ | Cycle through previously issued commands |
Ctrl+Shift+C | Copy selected text |
Ctrl+Shift+V | Paste text from clipboard |
Ctrl+A / Ctrl+E | Go to beginning/end of line |
Ctrl+U / Ctrl+K | Cut from cursor to beginning / end of line2 |
Ctrl+W | Cut word before before cursor3 |
Ctrl+Y | Paste (“yank”) |
Alt+. | Last argument of previous command (very useful!) |
Ctrl+R | Search history: press Ctrl+R again to cycle through matches, Enter to put command in prompt. |
Ctrl+C | Kill (stop) currently active command |
Ctrl+D | Exit (a program or the shell depending on the context) |
Ctrl+Z | Suspend (pause) a process: then use bg to move to background. |
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".