Original Authors: Belinda Phipson, Anna Trigos, Matt Ritchie, Maria Doyle, Harriet Dashnow, Charity Law Based on the course RNAseq analysis in R delivered on May 11/12th 2016

Resources and data files

Original materials

This material has been created using the following resources:
http://www.statsci.org/smyth/pubs/QLedgeRPreprint.pdf (Lun, Chen, and Smyth 2016)
http://monashbioinformaticsplatform.github.io/RNAseq-DE-analysis-with-R/99-RNAseq_DE_analysis_with_R.html

Data files location

Data files needed for this analysis are available in the data directory.

Recap of pre-processing

The previous section walked-through the pre-processing and transformation of the count data. Here, for completeness, we list the minimal steps required to process the data prior to differential expression analysis.

First load the packages we need.

library(edgeR)
library(limma)
library(readr)
## Read the counts from the downloaded data
seqdata <- read_tsv("data/GSE60450_Lactation-GenewiseCounts.txt")

# Remove first two columns from seqdata
countdata <- as.matrix(seqdata[ ,-(1:2)])

# Store EntrezGeneID as rownames
rownames(countdata) <- seqdata$EntrezGeneID
# modify column names
colnames(countdata) <- substr(colnames(countdata), 1, 7)

# Calculate the Counts Per Million measure
myCPM <- cpm(countdata)

# Identify genes with at least 0.5 cpm in at least 2 samples
thresh <- myCPM > 0.5
keep <- rowSums(thresh) >= 2

# Subset the rows of countdata to keep the more highly expressed genes
counts.keep <- countdata[keep,]

# Convert to an edgeR object
dgeObj <- DGEList(counts.keep)

# Perform TMM normalisation
dgeObj <- calcNormFactors(dgeObj)

Differential expression with edgeR

Now that we are happy that we have normalised the data and that the quality looks good, we can proceed to testing for differentially expressed genes. There are a number of packages to analyse RNA-Seq data. Most people use DESEQ2 or edgeR.

We will use edgeR for the rest of this practical.

Load the data

Alternatively we can load the dgeObj and sampleinfo objects with the Rdata file we created in the pre-processing tutorial.

# before proceeding clean all objects out of the environment
rm(list=ls())
# load the Rdata object we created in the previous session
load("Robjects/preprocessing.Rdata")
ls()
[1] "dgeObj"     "sampleinfo"

Creating the design matrix

First we need to create a design matrix for the groups, as we have seen in the linear models lecture. We have two variables, status and cell type. We will fit two models under two assumptions; no interaction and interaction of these two factors.

Let’s start with the model with only main effects, that is no interaction. The main assumption here is that the effect of the status is the same in both type of cells.

# Specify a design matrix with an intercept term
design <- model.matrix(~ CellType + Status, data = sampleinfo)
colnames(design)[1] <- "Intercept"

Data exploration

An MDS plot shows distances, in terms of biological coefficient of variation (BCV), between samples. What do you think of the quality of the data? Can you anticipate if the interaction term will be important?

plotMDS(dgeObj, 
        labels=sampleinfo$Group, 
        xlim=c(-4, 5),
        col=c("black", "dark red", "blue")[sampleinfo$Status])

Estimating the dispersion

The common dispersion estimates the overall BCV of the dataset, averaged over all genes:

dgeObj <- estimateCommonDisp(dgeObj)

Then we estimate gene-wise dispersion estimates, allowing a possible trend with average count size:

dgeObj <- estimateGLMTrendedDisp(dgeObj)
dgeObj <- estimateTagwiseDisp(dgeObj)

Plot the estimated dispersions:

plotBCV(dgeObj)

Testing for differential expression

First, we fit genewise glms:

# Fit the linear model
fit <- glmFit(dgeObj, design)
names(fit)
 [1] "coefficients"          "fitted.values"         "deviance"              "iter"                  "failed"               
 [6] "method"                "counts"                "unshrunk.coefficients" "df.residual"           "design"               
[11] "offset"                "dispersion"            "prior.count"           "samples"               "prior.df"             
[16] "AveLogCPM"            
head(coef(fit))
        Intercept CellTypeluminal Statuspregnant Statusvirgin
497097 -11.187922     -7.58804851     -0.7085514  -0.09305118
20671  -12.715063     -1.85287334      0.2269001   0.49554506
27395  -11.221391      0.56368066     -0.1415910  -0.29221577
18777  -10.146793      0.08280255     -0.1845489  -0.48795441
21399   -9.909825     -0.24195503      0.1753606   0.13494615
58175  -16.310131      3.09936215      1.1975518   0.84742701

Conduct likelihood ratio tests for luminal vs basal and show the top genes:

lrt_BvsL <- glmLRT(fit, coef=2)
topTags(lrt_BvsL)
Coefficient:  CellTypeluminal 

Challenge 1

Conduct likelihood ratio tests for virgin vs lactate and show the top genes.

Solution

Contrasts

Suppose we want to find differentially expressed genes between pregnant and virgin. We don’t have a parameter that explicitly will allow us to test that hypothesis. We need to build a contrast:

PvsV <- makeContrasts(Statuspregnant - Statusvirgin, levels = design)
lrt_pVsV <- glmLRT(fit, contrast=PvsV)
topTags(lrt_pVsV)
Coefficient:  1*Statuspregnant -1*Statusvirgin 

Testing an interaction model

Challenge 2

1.Fit a model with interaction: What is the rationale to include the interaction (What assumption are you relaxing?)
2. Is the number of replicates good enough to include the interaction?
3. Is the interaction needed in the model?

Solution

Finally save the results in a new Rdata object

save(lrt_BvsL, dgeObj, sampleinfo, file="Robjects/DE.Rdata")

Lun, Aaron T L, Yunshun Chen, and Gordon K Smyth. 2016. “It’s DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.” Methods in Molecular Biology (Clifton, N.J.) 1418 (January): 391–416. doi:10.1007/978-1-4939-3578-9\_19.

LS0tCnRpdGxlOiAiUk5BLXNlcSBhbmFseXNpcyBpbiBSIgpzdWJ0aXRsZTogIkRpZmZlcmVudGlhbCBFeHByZXNzaW9uIG9mIFJOQS1zZXEgZGF0YSIKYXV0aG9yOiAiU3RlcGhhbmUgQmFsbGVyZWF1LCBNYXJrIER1bm5pbmcsIE9zY2FyIFJ1ZWRhLCBBc2hsZXkgU2F3bGUiCmRhdGU6ICdgciBmb3JtYXQoU3lzLnRpbWUoKSwgIkxhc3QgbW9kaWZpZWQ6ICVkICViICVZIilgJwpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazoKICAgIHRvYzogeWVzCiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogeWVzCm1pbnV0ZXM6IDMwMApsYXlvdXQ6IHBhZ2UKYmlibGlvZ3JhcGh5OiByZWYuYmliCi0tLQoKKipPcmlnaW5hbCBBdXRob3JzOiBCZWxpbmRhIFBoaXBzb24sIEFubmEgVHJpZ29zLCBNYXR0IFJpdGNoaWUsIE1hcmlhIERveWxlLCBIYXJyaWV0IERhc2hub3csIENoYXJpdHkgTGF3KioKQmFzZWQgb24gdGhlIGNvdXJzZSBbUk5Bc2VxIGFuYWx5c2lzIGluIFJdKGh0dHA6Ly9jb21iaW5lLWF1c3RyYWxpYS5naXRodWIuaW8vMjAxNi0wNS0xMS1STkFzZXEvKSBkZWxpdmVyZWQgb24gTWF5IDExLzEydGggMjAxNgoKIyBSZXNvdXJjZXMgYW5kIGRhdGEgZmlsZXMKCiMjIE9yaWdpbmFsIG1hdGVyaWFscwoKVGhpcyBtYXRlcmlhbCBoYXMgYmVlbiBjcmVhdGVkIHVzaW5nIHRoZSBmb2xsb3dpbmcgcmVzb3VyY2VzOiAgIApodHRwOi8vd3d3LnN0YXRzY2kub3JnL3NteXRoL3B1YnMvUUxlZGdlUlByZXByaW50LnBkZiBbQEx1bjIwMTZdICAKaHR0cDovL21vbmFzaGJpb2luZm9ybWF0aWNzcGxhdGZvcm0uZ2l0aHViLmlvL1JOQXNlcS1ERS1hbmFseXNpcy13aXRoLVIvOTktUk5Bc2VxX0RFX2FuYWx5c2lzX3dpdGhfUi5odG1sICAKCiMjIERhdGEgZmlsZXMgbG9jYXRpb24KCkRhdGEgZmlsZXMgbmVlZGVkIGZvciB0aGlzIGFuYWx5c2lzIGFyZSBhdmFpbGFibGUgaW4gdGhlIGBkYXRhYCBkaXJlY3RvcnkuCgojIFJlY2FwIG9mIHByZS1wcm9jZXNzaW5nCgpUaGUgcHJldmlvdXMgc2VjdGlvbiB3YWxrZWQtdGhyb3VnaCB0aGUgcHJlLXByb2Nlc3NpbmcgYW5kIHRyYW5zZm9ybWF0aW9uIG9mIHRoZSBjb3VudCBkYXRhLiBIZXJlLCBmb3IgY29tcGxldGVuZXNzLCB3ZSBsaXN0IHRoZSBtaW5pbWFsIHN0ZXBzIHJlcXVpcmVkIHRvIHByb2Nlc3MgdGhlIGRhdGEgcHJpb3IgdG8gZGlmZmVyZW50aWFsIGV4cHJlc3Npb24gYW5hbHlzaXMuCgpGaXJzdCBsb2FkIHRoZSBwYWNrYWdlcyB3ZSBuZWVkLgoKYGBge3IgbG9hZExpYnJhcmllcywgbWVzc2FnZT1GfQpsaWJyYXJ5KGVkZ2VSKQpsaWJyYXJ5KGxpbW1hKQpsaWJyYXJ5KHJlYWRyKQpgYGAKCmBgYHtyIHJlY2FwLCBldmFsPUZBTFNFfQojIyBSZWFkIHRoZSBjb3VudHMgZnJvbSB0aGUgZG93bmxvYWRlZCBkYXRhCnNlcWRhdGEgPC0gcmVhZF90c3YoImRhdGEvR1NFNjA0NTBfTGFjdGF0aW9uLUdlbmV3aXNlQ291bnRzLnR4dCIpCgojIFJlbW92ZSBmaXJzdCB0d28gY29sdW1ucyBmcm9tIHNlcWRhdGEKY291bnRkYXRhIDwtIGFzLm1hdHJpeChzZXFkYXRhWyAsLSgxOjIpXSkKCiMgU3RvcmUgRW50cmV6R2VuZUlEIGFzIHJvd25hbWVzCnJvd25hbWVzKGNvdW50ZGF0YSkgPC0gc2VxZGF0YSRFbnRyZXpHZW5lSUQKIyBtb2RpZnkgY29sdW1uIG5hbWVzCmNvbG5hbWVzKGNvdW50ZGF0YSkgPC0gc3Vic3RyKGNvbG5hbWVzKGNvdW50ZGF0YSksIDEsIDcpCgojIENhbGN1bGF0ZSB0aGUgQ291bnRzIFBlciBNaWxsaW9uIG1lYXN1cmUKbXlDUE0gPC0gY3BtKGNvdW50ZGF0YSkKCiMgSWRlbnRpZnkgZ2VuZXMgd2l0aCBhdCBsZWFzdCAwLjUgY3BtIGluIGF0IGxlYXN0IDIgc2FtcGxlcwp0aHJlc2ggPC0gbXlDUE0gPiAwLjUKa2VlcCA8LSByb3dTdW1zKHRocmVzaCkgPj0gMgoKIyBTdWJzZXQgdGhlIHJvd3Mgb2YgY291bnRkYXRhIHRvIGtlZXAgdGhlIG1vcmUgaGlnaGx5IGV4cHJlc3NlZCBnZW5lcwpjb3VudHMua2VlcCA8LSBjb3VudGRhdGFba2VlcCxdCgojIENvbnZlcnQgdG8gYW4gZWRnZVIgb2JqZWN0CmRnZU9iaiA8LSBER0VMaXN0KGNvdW50cy5rZWVwKQoKIyBQZXJmb3JtIFRNTSBub3JtYWxpc2F0aW9uCmRnZU9iaiA8LSBjYWxjTm9ybUZhY3RvcnMoZGdlT2JqKQpgYGAKCiMgRGlmZmVyZW50aWFsIGV4cHJlc3Npb24gd2l0aCBlZGdlUgoKTm93IHRoYXQgd2UgYXJlIGhhcHB5IHRoYXQgd2UgaGF2ZSBub3JtYWxpc2VkIHRoZSBkYXRhIGFuZCB0aGF0IHRoZSBxdWFsaXR5IGxvb2tzIGdvb2QsIHdlIGNhbiBwcm9jZWVkIHRvIHRlc3RpbmcgZm9yIGRpZmZlcmVudGlhbGx5IGV4cHJlc3NlZCBnZW5lcy4gVGhlcmUgYXJlIGEgbnVtYmVyIG9mIHBhY2thZ2VzIHRvIGFuYWx5c2UgUk5BLVNlcSBkYXRhLiBNb3N0IHBlb3BsZSB1c2UgW0RFU0VRMl0oaHR0cHM6Ly9iaW9jb25kdWN0b3Iub3JnL3BhY2thZ2VzL3JlbGVhc2UvYmlvYy9odG1sL0RFU2VxMi5odG1sKSBvciBbZWRnZVJdKGh0dHA6Ly9iaW9jb25kdWN0b3Iub3JnL3BhY2thZ2VzL3JlbGVhc2UvYmlvYy9odG1sL2VkZ2VSLmh0bWwpLiAKCldlIHdpbGwgdXNlICoqZWRnZVIqKiBmb3IgdGhlIHJlc3Qgb2YgdGhpcyBwcmFjdGljYWwuCgojIyBMb2FkIHRoZSBkYXRhCgpBbHRlcm5hdGl2ZWx5IHdlIGNhbiBsb2FkIHRoZSBgZGdlT2JqYCBhbmQgYHNhbXBsZWluZm9gIG9iamVjdHMgd2l0aCB0aGUgUmRhdGEgCmZpbGUgd2UgY3JlYXRlZCBpbiB0aGUgcHJlLXByb2Nlc3NpbmcgdHV0b3JpYWwuCgpgYGB7ciBsb2FkRGF0YX0KIyBiZWZvcmUgcHJvY2VlZGluZyBjbGVhbiBhbGwgb2JqZWN0cyBvdXQgb2YgdGhlIGVudmlyb25tZW50CnJtKGxpc3Q9bHMoKSkKCiMgbG9hZCB0aGUgUmRhdGEgb2JqZWN0IHdlIGNyZWF0ZWQgaW4gdGhlIHByZXZpb3VzIHNlc3Npb24KbG9hZCgiUm9iamVjdHMvcHJlcHJvY2Vzc2luZy5SZGF0YSIpCmxzKCkKYGBgCgojIyBDcmVhdGluZyB0aGUgZGVzaWduIG1hdHJpeAoKRmlyc3Qgd2UgbmVlZCB0byBjcmVhdGUgYSBkZXNpZ24gbWF0cml4IGZvciB0aGUgZ3JvdXBzLCBhcyB3ZSBoYXZlIHNlZW4gaW4gdGhlIGxpbmVhciBtb2RlbHMgbGVjdHVyZS4gCldlIGhhdmUgdHdvIHZhcmlhYmxlcywgc3RhdHVzIGFuZCBjZWxsIHR5cGUuIFdlIHdpbGwgZml0IHR3byBtb2RlbHMgdW5kZXIgdHdvIGFzc3VtcHRpb25zOyBubyBpbnRlcmFjdGlvbiBhbmQgaW50ZXJhY3Rpb24gb2YgdGhlc2UgdHdvIGZhY3RvcnMuIAoKTGV0J3Mgc3RhcnQgd2l0aCB0aGUgbW9kZWwgd2l0aCBvbmx5IG1haW4gZWZmZWN0cywgdGhhdCBpcyBubyBpbnRlcmFjdGlvbi4gVGhlIG1haW4gYXNzdW1wdGlvbiBoZXJlIGlzIHRoYXQgdGhlIGVmZmVjdCBvZiB0aGUgc3RhdHVzIGlzIHRoZSBzYW1lIGluIGJvdGggdHlwZSBvZiBjZWxscy4KCmBgYHtyIG1vZGVsTWF0cml4fQojIFNwZWNpZnkgYSBkZXNpZ24gbWF0cml4IHdpdGggYW4gaW50ZXJjZXB0IHRlcm0KZGVzaWduIDwtIG1vZGVsLm1hdHJpeCh+IENlbGxUeXBlICsgU3RhdHVzLCBkYXRhID0gc2FtcGxlaW5mbykKY29sbmFtZXMoZGVzaWduKVsxXSA8LSAiSW50ZXJjZXB0IgpgYGAKCiMjIERhdGEgZXhwbG9yYXRpb24KCkFuIE1EUyBwbG90IHNob3dzIGRpc3RhbmNlcywgaW4gdGVybXMgb2YgYmlvbG9naWNhbCBjb2VmZmljaWVudCBvZiB2YXJpYXRpb24gKEJDViksIGJldHdlZW4gc2FtcGxlcy4gV2hhdCBkbyB5b3UgdGhpbmsgb2YgdGhlIHF1YWxpdHkgb2YgdGhlIGRhdGE/IENhbiB5b3UgYW50aWNpcGF0ZSBpZiB0aGUgaW50ZXJhY3Rpb24gdGVybSB3aWxsIGJlIGltcG9ydGFudD8KCmBgYHtyIG1kc1Bsb3R9CgpwbG90TURTKGRnZU9iaiwgCiAgICAgICAgbGFiZWxzPXNhbXBsZWluZm8kR3JvdXAsIAogICAgICAgIHhsaW09YygtNCwgNSksCiAgICAgICAgY29sPWMoImJsYWNrIiwgImRhcmsgcmVkIiwgImJsdWUiKVtzYW1wbGVpbmZvJFN0YXR1c10pCgpgYGAKCiMjIEVzdGltYXRpbmcgdGhlIGRpc3BlcnNpb24KClRoZSBjb21tb24gZGlzcGVyc2lvbiBlc3RpbWF0ZXMgdGhlIG92ZXJhbGwgQkNWIG9mIHRoZSBkYXRhc2V0LCBhdmVyYWdlZCBvdmVyIGFsbCBnZW5lczoKCmBgYHtyIGNvbW1vbkRpc3BlcnNpb259CgpkZ2VPYmogPC0gZXN0aW1hdGVDb21tb25EaXNwKGRnZU9iaikKCmBgYAoKVGhlbiB3ZSBlc3RpbWF0ZSBnZW5lLXdpc2UgZGlzcGVyc2lvbiBlc3RpbWF0ZXMsIGFsbG93aW5nIGEgcG9zc2libGUgdHJlbmQgd2l0aCBhdmVyYWdlIGNvdW50IHNpemU6CgpgYGB7ciBnZW5ld2lzZURpc3BlcnNpb259CgpkZ2VPYmogPC0gZXN0aW1hdGVHTE1UcmVuZGVkRGlzcChkZ2VPYmopCmRnZU9iaiA8LSBlc3RpbWF0ZVRhZ3dpc2VEaXNwKGRnZU9iaikKCmBgYAoKUGxvdCB0aGUgZXN0aW1hdGVkIGRpc3BlcnNpb25zOgpgYGB7ciBkaXNwZXJzaW9uUGxvdH0KCnBsb3RCQ1YoZGdlT2JqKQoKYGBgCgojIyBUZXN0aW5nIGZvciBkaWZmZXJlbnRpYWwgZXhwcmVzc2lvbgoKRmlyc3QsIHdlIGZpdCBnZW5ld2lzZSBnbG1zOgoKYGBge3IgZml0R0xNfQoKIyBGaXQgdGhlIGxpbmVhciBtb2RlbApmaXQgPC0gZ2xtRml0KGRnZU9iaiwgZGVzaWduKQpuYW1lcyhmaXQpCmhlYWQoY29lZihmaXQpKQoKYGBgCgpDb25kdWN0IGxpa2VsaWhvb2QgcmF0aW8gdGVzdHMgZm9yIGx1bWluYWwgdnMgYmFzYWwgYW5kIHNob3cgdGhlIHRvcCBnZW5lczoKYGBge3IgZGVCeUxSVH0KCmxydF9CdnNMIDwtIGdsbUxSVChmaXQsIGNvZWY9MikKdG9wVGFncyhscnRfQnZzTCkKCmBgYAoKCj4gIyMjIENoYWxsZW5nZSAxIHsuY2hhbGxlbmdlfQo+IENvbmR1Y3QgbGlrZWxpaG9vZCByYXRpbyB0ZXN0cyBmb3IgdmlyZ2luIHZzIGxhY3RhdGUgYW5kIHNob3cgdGhlIHRvcCBnZW5lcy4KCioqU29sdXRpb24qKgpgYGB7ciBzb2x1dGlvbkNoYWxsZW5nZTF9CgoKCgpgYGAKCiMjIENvbnRyYXN0cwoKU3VwcG9zZSB3ZSB3YW50IHRvIGZpbmQgZGlmZmVyZW50aWFsbHkgZXhwcmVzc2VkIGdlbmVzIGJldHdlZW4gcHJlZ25hbnQgYW5kIHZpcmdpbi4gV2UgZG9uJ3QgaGF2ZSBhIHBhcmFtZXRlciB0aGF0IGV4cGxpY2l0bHkgd2lsbCBhbGxvdyB1cyB0byB0ZXN0IHRoYXQgaHlwb3RoZXNpcy4gV2UgbmVlZCB0byBidWlsZCBhIGNvbnRyYXN0OgoKYGBge3IgbWFrZUNvbnRyYXN0fQpQdnNWIDwtIG1ha2VDb250cmFzdHMoU3RhdHVzcHJlZ25hbnQgLSBTdGF0dXN2aXJnaW4sIGxldmVscyA9IGRlc2lnbikKbHJ0X3BWc1YgPC0gZ2xtTFJUKGZpdCwgY29udHJhc3Q9UHZzVikKdG9wVGFncyhscnRfcFZzVikKYGBgCgojIyBUZXN0aW5nIGFuIGludGVyYWN0aW9uIG1vZGVsCgo+ICMjIyBDaGFsbGVuZ2UgMiB7LmNoYWxsZW5nZX0KPgo+IDEuRml0IGEgbW9kZWwgd2l0aCBpbnRlcmFjdGlvbjogV2hhdCBpcyB0aGUgcmF0aW9uYWxlIHRvIGluY2x1ZGUgdGhlIGludGVyYWN0aW9uIChXaGF0IGFzc3VtcHRpb24gYXJlIHlvdSByZWxheGluZz8pICAKPiAyLiBJcyB0aGUgbnVtYmVyIG9mIHJlcGxpY2F0ZXMgZ29vZCBlbm91Z2ggdG8gaW5jbHVkZSB0aGUgaW50ZXJhY3Rpb24/ICAKPiAzLiBJcyB0aGUgaW50ZXJhY3Rpb24gbmVlZGVkIGluIHRoZSBtb2RlbD8gIAoKCioqU29sdXRpb24qKgpgYGB7ciBzb2x1dGlvbkNoYWxsZW5nZTJ9CgoKCgpgYGAKCgojIyBGaW5hbGx5IHNhdmUgdGhlIHJlc3VsdHMgaW4gYSBuZXcgUmRhdGEgb2JqZWN0CgpgYGB7ciBzYXZlT2JqZWN0c30KCnNhdmUobHJ0X0J2c0wsIGRnZU9iaiwgc2FtcGxlaW5mbywgZmlsZT0iUm9iamVjdHMvREUuUmRhdGEiKQoKYGBgCgotLS0tLS0tLS0tLQo=