Challenge 1
A reminder of the code we ran:
# lets set it up
ourCols <- c("SYMBOL", "GENEID", "ENTREZID")
ourKeys <- rownames(resLvV)[1:1000]
# run the query
annot <- AnnotationDbi::select(EnsDb.Mmusculus.v79, 
                keys=ourKeys, 
                columns=ourCols, 
                keytype="GENEID")
That was just 1000 genes. We need annotations for the entire results table.
Run the same query using all of the genes in our results table (
resLvV)Can we also have the biotype of our genes too? Hint: You can find the name of the column for this by running columns(EnsDb.Mmusculus.v79)
How many Ensembl genes have multipe Entrez IDs associated with them?
# lets set it up
ourCols <- c("SYMBOL", "GENEID", "ENTREZID", "GENEBIOTYPE")
ourKeys <- rownames(resLvV)
# run the query
annot <-AnnotationDbi::select(EnsDb.Mmusculus.v79, 
                keys=ourKeys, 
                columns=ourCols, 
                keytype="GENEID")
# multiple EntrezIDs
multiples <- annot %>%
  add_count(GENEID) %>%  
  dplyr::filter(n>1)
length(unique(multiples$SYMBOL))
Challenge 2
If you haven’t already make sure you load in our data and annotation. Then shrink the values. You can copy and paste the code below.
# First load data and annotations
load("../Robjects/DE.RData")
load("../Robjects/Ensembl_annotations.RData")
#Shrink our values
ddsShrink <- lfcShrink(ddsObj, coef="Status_lactate_vs_virgin")
shrinkLvV <- as.data.frame(ddsShrink) %>%
    rownames_to_column("GeneID") %>% 
    left_join(ensemblAnnot, "GeneID") %>% 
    rename(logFC=log2FoldChange, FDR=padj)
Use the log2 fold change (
logFC) on the x-axis, and use-log10(pvalue)on the y-axis. (This-log10transformation is commonly used for p-values as it means that more significant genes have a higher scale)
Create a new column of -log10(pvalue) values in shrinkLvV
Create a plot with points coloured by pvalue < 0.05 similar to how we did in the MA plot
# First load data and annotations
load("../Robjects/DE.RData")
load("../Robjects/Ensembl_annotations.RData")
#Shrink our values
ddsShrink <- lfcShrink(ddsObj, coef="Status_lactate_vs_virgin")
## using 'apeglm' for LFC shrinkage. If used in published research, please cite:
##     Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
##     sequence count data: removing the noise and preserving large differences.
##     Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
shrinkLvV <- as.data.frame(ddsShrink) %>%
    rownames_to_column("GeneID") %>% 
    left_join(ensemblAnnot, "GeneID") %>% 
    rename(logFC=log2FoldChange, FDR=padj)
# first remove the filtered genes (FDR=NA) and create a -log10(FDR) column
filtTab <- shrinkLvV %>% 
    filter(!is.na(FDR)) %>% 
    mutate(`-log10(FDR)` = -log10(FDR))
ggplot(filtTab, aes(x = logFC, y=`-log10(FDR)`)) + 
    geom_point(aes(colour=FDR < 0.05), size=1)