Challenge 1
A reminder of the code we ran:
# lets set it up
ourCols <- c("SYMBOL", "GENEID", "ENTREZID")
ourKeys <- rownames(resLvV)[1:1000]
# run the query
annot <- AnnotationDbi::select(EnsDb.Mmusculus.v79,
keys=ourKeys,
columns=ourCols,
keytype="GENEID")
That was just 1000 genes. We need annotations for the entire results table.
Run the same query using all of the genes in our results table (
resLvV
)Can we also have the biotype of our genes too? Hint: You can find the name of the column for this by running columns(EnsDb.Mmusculus.v79)
How many Ensembl genes have multipe Entrez IDs associated with them?
# lets set it up
ourCols <- c("SYMBOL", "GENEID", "ENTREZID", "GENEBIOTYPE")
ourKeys <- rownames(resLvV)
# run the query
annot <-AnnotationDbi::select(EnsDb.Mmusculus.v79,
keys=ourKeys,
columns=ourCols,
keytype="GENEID")
# multiple EntrezIDs
multiples <- annot %>%
add_count(GENEID) %>%
dplyr::filter(n>1)
length(unique(multiples$SYMBOL))
Challenge 2
If you haven’t already make sure you load in our data and annotation. Then shrink the values. You can copy and paste the code below.
# First load data and annotations
load("../Robjects/DE.RData")
load("../Robjects/Ensembl_annotations.RData")
#Shrink our values
ddsShrink <- lfcShrink(ddsObj, coef="Status_lactate_vs_virgin")
shrinkLvV <- as.data.frame(ddsShrink) %>%
rownames_to_column("GeneID") %>%
left_join(ensemblAnnot, "GeneID") %>%
rename(logFC=log2FoldChange, FDR=padj)
Use the log2 fold change (
logFC
) on the x-axis, and use-log10(pvalue)
on the y-axis. (This-log10
transformation is commonly used for p-values as it means that more significant genes have a higher scale)
Create a new column of -log10(pvalue) values in shrinkLvV
Create a plot with points coloured by pvalue < 0.05 similar to how we did in the MA plot
# First load data and annotations
load("../Robjects/DE.RData")
load("../Robjects/Ensembl_annotations.RData")
#Shrink our values
ddsShrink <- lfcShrink(ddsObj, coef="Status_lactate_vs_virgin")
## using 'apeglm' for LFC shrinkage. If used in published research, please cite:
## Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
## sequence count data: removing the noise and preserving large differences.
## Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
shrinkLvV <- as.data.frame(ddsShrink) %>%
rownames_to_column("GeneID") %>%
left_join(ensemblAnnot, "GeneID") %>%
rename(logFC=log2FoldChange, FDR=padj)
# first remove the filtered genes (FDR=NA) and create a -log10(FDR) column
filtTab <- shrinkLvV %>%
filter(!is.na(FDR)) %>%
mutate(`-log10(FDR)` = -log10(FDR))
ggplot(filtTab, aes(x = logFC, y=`-log10(FDR)`)) +
geom_point(aes(colour=FDR < 0.05), size=1)