RNA-seq analysis in R

Exercise 1

First we will recapitulate the above steps to generate a new DESeq2 object with the additive model. Then we will extract the results table for the contrast for “lactate v virgin”.

design <- as.formula(~ CellType + Status)
ddsObj.raw <- DESeqDataSetFromMatrix(countData = countdata, 
                                     colData = sampleinfo,
                                     design = design)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

ddsObj <- DESeq(ddsObj.raw)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

resLvV <- results(ddsObj, alpha=0.05)

Challenge 1

Obtain results for luminal vs basal. Call the new results object resLvB.

resultsNames(ddsObj)

## [1] "Intercept"                  "CellType_luminal_vs_basal" 
## [3] "Status_pregnant_vs_lactate" "Status_virgin_vs_lactate"

resLvB <- results(ddsObj, alpha=0.05, name="CellType_luminal_vs_basal")
resLvB

## log2 fold change (MLE): CellType luminal vs basal 
## Wald test p-value: CellType luminal vs basal 
## DataFrame with 25369 rows and 6 columns
##                       baseMean log2FoldChange     lfcSE       stat       pvalue
##                      <numeric>      <numeric> <numeric>  <numeric>    <numeric>
## ENSMUSG00000051951  194.697167       -9.84263  0.940542 -10.464855  1.25268e-25
## ENSMUSG00000102331    0.557693       -2.17264  2.244934  -0.967796  3.33146e-01
## ENSMUSG00000025900    3.838125        4.16201  1.057538   3.935563  8.30020e-05
## ENSMUSG00000025902   52.523551       -2.67254  0.341351  -7.829312  4.90547e-15
## ENSMUSG00000098104    0.686876       -1.34807  1.458074  -0.924552  3.55199e-01
##                            padj
##                       <numeric>
## ENSMUSG00000051951  1.92854e-24
## ENSMUSG00000102331           NA
## ENSMUSG00000025900  2.25328e-04
## ENSMUSG00000025902  3.85988e-14
## ENSMUSG00000098104           NA
##  [ reached getOption("max.print") -- omitted 6 rows ]

How many significantly upregulated genes are there?

sum(resLvB$padj < 0.05 & resLvB$log2FoldChange > 0, na.rm = TRUE)

## [1] 6410

Challenge 2 - Contrasts

Suppose we want to find genes that are differentially expressed between pregnant and lactate. We don’t have a parameter that explicitly will allow us to test that hypothesis. We need to provide a contrast. Look at the help page for results (?results) and read about the contrast argument (also look at Example 1 at the bottom of the help page)

Use the contrast argument to extract the results table for “pregnant v lactate”.

resPvL <- results(ddsObj, 
                  alpha=0.05,
                  contrast = c("Status", "pregnant", "lactate"))
resPvL

## log2 fold change (MLE): Status pregnant vs lactate 
## Wald test p-value: Status pregnant vs lactate 
## DataFrame with 25369 rows and 6 columns
##                       baseMean log2FoldChange     lfcSE      stat    pvalue
##                      <numeric>      <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000051951  194.697167      -1.660985  0.758521 -2.189767 0.0285411
## ENSMUSG00000102331    0.557693      -1.059737  2.714947 -0.390334 0.6962893
## ENSMUSG00000025900    3.838125       2.280314  1.306913  1.744810 0.0810180
## ENSMUSG00000025902   52.523551       0.305865  0.420506  0.727375 0.4669962
## ENSMUSG00000098104    0.686876      -1.698857  1.839139 -0.923724 0.3556298
##                         padj
##                    <numeric>
## ENSMUSG00000051951  0.101341
## ENSMUSG00000102331        NA
## ENSMUSG00000025900  0.214589
## ENSMUSG00000025902  0.669987
## ENSMUSG00000098104        NA
##  [ reached getOption("max.print") -- omitted 6 rows ]

Exercise 2

When we looked at the PCA it did seem that an interaction model might be warranted. Let’s test that.

Create a new DESeq2 object using a model with an interaction between CellType and Status (~ CellType * Status)

design <- as.formula(~ CellType + Status + CellType:Status)
ddsObj2.raw <- DESeqDataSetFromMatrix(countData = countdata, 
                                     colData = sampleinfo,
                                     design = design)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

Use the LRT to compare this to the simpler additive model (~CellType + Status)

# create the simpler model
design.reduced <- as.formula(~ CellType + Status)

# Compare the two designs
ddsObjC2 <- DESeq(ddsObj2.raw, test="LRT", reduced=design.reduced)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

resCSvCxS <- results(ddsObjC2)

For how many genes is interaction model a better fit?

sum(resCSvCxS$padj < 0.05, na.rm=TRUE)

## [1] 9119

RNA-seq analysis in R

Differential Expression of RNA-seq data

Stephane Ballereau, Dominique-Laurent Couturier, Mark Dunning, Abbi Edwards, Ashley Sawle

Last modified: 30 Jun 2020

Exercise 1

Challenge 1

Challenge 2 - Contrasts

Exercise 2