Challenge 1
Obtain results for luminal vs basal and find the top 200 genes (call the new results object
resLvB
.
resLvB <- results(ddsObj, name="CellType_luminal_vs_basal", alpha=0.05)
topGenesLvB <- as.data.frame(resLvB) %>%
rownames_to_column("GeneID") %>%
arrange(padj) %>%
head(200)
head(topGenesLvB, 10)
## GeneID baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000039323 9060.194 -8.760171 0.22493130 -38.94598 0
## 2 ENSMUSG00000041559 8051.736 -8.891474 0.23194448 -38.33449 0
## 3 ENSMUSG00000026483 11515.064 -6.902183 0.14290812 -48.29804 0
## 4 ENSMUSG00000027750 40543.099 -7.704290 0.11933515 -64.56010 0
## 5 ENSMUSG00000028464 34755.226 -8.638337 0.21247668 -40.65546 0
## 6 ENSMUSG00000040118 2670.115 -8.383387 0.19064166 -43.97458 0
## 7 ENSMUSG00000059430 5343.648 -8.959208 0.23078343 -38.82085 0
## 8 ENSMUSG00000001435 4992.645 -6.392363 0.15807506 -40.43878 0
## 9 ENSMUSG00000031488 1843.299 3.182767 0.07926289 40.15457 0
## 10 ENSMUSG00000061048 4081.866 -7.130746 0.13296793 -53.62756 0
## padj
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## 7 0
## 8 0
## 9 0
## 10 0
Challenge 2
When we looked at the PCA it did seem that an interaction model might be warranted. Let’s test that. 1.Fit a model with interaction: What is the rationale to include the interaction?
2. Use the LRT to compare the two models. 3. Is the number of replicates good enough to include the interaction?
4. Is the interaction needed in the model?
designI <- as.formula(~ CellType * Status)
# Build model
ddsObjI <- DESeqDataSetFromMatrix(countData = countdata,
colData = sampleinfo,
design = designI)
## converting counts to integer mode
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
# Run DESeq
ddsObjI <- DESeq(ddsObjI)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
# Compare the models
ddsObjI <- DESeq(ddsObjI, test="LRT", reduced=design)
## using pre-existing size factors
## estimating dispersions
## found already estimated dispersions, replacing these
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
resIvA <- results(ddsObjI)
head(resIvA)
## log2 fold change (MLE): CellTypeluminal.Statuslactate
## LRT p-value: '~ CellType * Status' vs '~ CellType + Status'
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange lfcSE
## <numeric> <numeric> <numeric>
## ENSMUSG00000051951 194.697166753892 3.42876696993572 2.25525294592593
## ENSMUSG00000102331 0.557692810163983 -1.18009452521664 6.4831382441703
## ENSMUSG00000025900 3.83812469441752 -1.58613077979856 2.94700581877308
## ENSMUSG00000025902 52.5235511181304 0.375067416581556 0.824377287953244
## ENSMUSG00000098104 0.686876046943845 -2.55234830195711 4.51410048646478
## ENSMUSG00000103922 143.632008465288 3.21835297941951 0.382568802664718
## stat pvalue
## <numeric> <numeric>
## ENSMUSG00000051951 2.6180796927415 0.270079249503763
## ENSMUSG00000102331 0.0222544111112732 0.988934472814827
## ENSMUSG00000025900 1.20783586975494 0.546665634536119
## ENSMUSG00000025902 0.208223147093236 0.901124750127783
## ENSMUSG00000098104 0.480231180269215 0.786536939901103
## ENSMUSG00000103922 71.2318881630894 3.40558729995637e-16
## padj
## <numeric>
## ENSMUSG00000051951 0.397589418207503
## ENSMUSG00000102331 NA
## ENSMUSG00000025900 0.66992218679061
## ENSMUSG00000025902 0.940275409789841
## ENSMUSG00000098104 NA
## ENSMUSG00000103922 4.80683955675595e-15
length(which(resIvA$padj<=0.05))
## [1] 9119