RNA-seq analysis in R

Challenge 1

Obtain results for luminal vs basal and find the top 200 genes (call the new results object resLvB.

resLvB <- results(ddsObj, name="CellType_luminal_vs_basal", alpha=0.05)
topGenesLvB <- as.data.frame(resLvB) %>%
    rownames_to_column("GeneID") %>% 
    arrange(padj) %>%
    head(200)
head(topGenesLvB, 10)

##                GeneID  baseMean log2FoldChange      lfcSE      stat pvalue
## 1  ENSMUSG00000039323  9060.194      -8.760171 0.22493130 -38.94598      0
## 2  ENSMUSG00000041559  8051.736      -8.891474 0.23194448 -38.33449      0
## 3  ENSMUSG00000026483 11515.064      -6.902183 0.14290812 -48.29804      0
## 4  ENSMUSG00000027750 40543.099      -7.704290 0.11933515 -64.56010      0
## 5  ENSMUSG00000028464 34755.226      -8.638337 0.21247668 -40.65546      0
## 6  ENSMUSG00000040118  2670.115      -8.383387 0.19064166 -43.97458      0
## 7  ENSMUSG00000059430  5343.648      -8.959208 0.23078343 -38.82085      0
## 8  ENSMUSG00000001435  4992.645      -6.392363 0.15807506 -40.43878      0
## 9  ENSMUSG00000031488  1843.299       3.182767 0.07926289  40.15457      0
## 10 ENSMUSG00000061048  4081.866      -7.130746 0.13296793 -53.62756      0
##    padj
## 1     0
## 2     0
## 3     0
## 4     0
## 5     0
## 6     0
## 7     0
## 8     0
## 9     0
## 10    0

Challenge 2

When we looked at the PCA it did seem that an interaction model might be warranted. Let’s test that. 1.Fit a model with interaction: What is the rationale to include the interaction?
2. Use the LRT to compare the two models. 3. Is the number of replicates good enough to include the interaction?
4. Is the interaction needed in the model?

designI <- as.formula(~ CellType * Status)

# Build model
ddsObjI <- DESeqDataSetFromMatrix(countData = countdata,
                              colData = sampleinfo,
                              design = designI)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

# Run DESeq
ddsObjI <- DESeq(ddsObjI)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

# Compare the models
ddsObjI <- DESeq(ddsObjI, test="LRT", reduced=design)

## using pre-existing size factors

## estimating dispersions

## found already estimated dispersions, replacing these

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

resIvA <- results(ddsObjI)
head(resIvA)

## log2 fold change (MLE): CellTypeluminal.Statuslactate 
## LRT p-value: '~ CellType * Status' vs '~ CellType + Status' 
## DataFrame with 6 rows and 6 columns
##                             baseMean    log2FoldChange             lfcSE
##                            <numeric>         <numeric>         <numeric>
## ENSMUSG00000051951  194.697166753892  3.42876696993572  2.25525294592593
## ENSMUSG00000102331 0.557692810163983 -1.18009452521664   6.4831382441703
## ENSMUSG00000025900  3.83812469441752 -1.58613077979856  2.94700581877308
## ENSMUSG00000025902  52.5235511181304 0.375067416581556 0.824377287953244
## ENSMUSG00000098104 0.686876046943845 -2.55234830195711  4.51410048646478
## ENSMUSG00000103922  143.632008465288  3.21835297941951 0.382568802664718
##                                  stat               pvalue
##                             <numeric>            <numeric>
## ENSMUSG00000051951    2.6180796927415    0.270079249503763
## ENSMUSG00000102331 0.0222544111112732    0.988934472814827
## ENSMUSG00000025900   1.20783586975494    0.546665634536119
## ENSMUSG00000025902  0.208223147093236    0.901124750127783
## ENSMUSG00000098104  0.480231180269215    0.786536939901103
## ENSMUSG00000103922   71.2318881630894 3.40558729995637e-16
##                                    padj
##                               <numeric>
## ENSMUSG00000051951    0.397589418207503
## ENSMUSG00000102331                   NA
## ENSMUSG00000025900     0.66992218679061
## ENSMUSG00000025902    0.940275409789841
## ENSMUSG00000098104                   NA
## ENSMUSG00000103922 4.80683955675595e-15

length(which(resIvA$padj<=0.05))

## [1] 9119

RNA-seq analysis in R

Differential Expression of RNA-seq data - Solutions

Stephane Ballereau, Mark Dunning, Oscar Rueda, Ashley Sawle

Last modified: 17 Jul 2019

Challenge 1

Challenge 2