library(tidyverse) # Misc. data manipulation and plotting
library(pheatmap) # Heatmap plot
library(DESeq2) # Differential expression analysisWeek 14 exercises: Additional gene expression analysis in R
1 Setting up
2 Heatmaps
Rather than plotting expression levels for many individual genes, we can create “heatmap” plots to plot dozens (possibly even hundreds) of genes at once.
We will create heatmaps with the pheatmap function, and let’s make a heatmap for the top-25 most highly significant DEGs for our focal contrast.
Unlike with some of the functions we used before, we unfortunately can’t directly use our DESeq2 object, but we have to extract and subset the count matrix, and also pass the metadata to the heatmap function:
dds_vst <- varianceStabilizingTransformation(dds)# We need a normalized count matrix, like for the PCA
# We can simply extract the matrix from the normalized dds object we created for the PCA
norm_mat <- assay(dds_vst)
# In the normalized count matrix, select only the genes of interest
# We'll reuse the 'top25_DE' vector that we created for the individual gene plots
norm_mat_sel <- norm_mat[match(top25_DE, rownames(norm_mat)), ]
# Sort the metadata
meta_sort <- meta |>
arrange(treatment, time) |>
select(treatment, time)Now we can create the plot:
pheatmap(
norm_mat_sel,
annotation_col = meta_sort, # Add the metadata
cluster_cols = FALSE, # Don't cluster samples (=columns, cols)
show_rownames = FALSE, # Don't show gene names
scale = "row", # Perform z-scaling for each gene
)Notes on the code and plot above:
The z-scaling with
scale =will make sure we can compare genes with very different expression levels: after all, we’re interested in relative expression levels across samples/sample groups.pheatmapwill by default perform hierarchical clustering both at the sample (col) and gene (row) level, such that more similar samples and genes will appear closer to each other. Above, we turned clustering off for samples, since we want to keep them in their by-group order.
Bonus exercise: heatmaps
Make a heatmap with the top-25 most-highly expressed genes (i.e., genes with the highest mean expression levels across all samples).
Click for a hint: how to get that top-25
top25_hi <- names(sort(rowMeans(norm_mat), decreasing = TRUE)[1:25])Click for the solution
# In the normalized count matrix, select only the genes of interest
norm_mat_sel <- norm_mat[match(top25_hi, rownames(norm_mat)), ]
# Sort the metadata
meta_sort <- meta |>
arrange(treatment, time) |>
select(treatment, time)
# Create the heatmap
pheatmap(
norm_mat_sel,
annotation_col = meta_sort,
cluster_cols = FALSE,
show_rownames = FALSE,
scale = "row"
)