Challenge 1

  1. Use pathview to export a figure for “mmu04060”, but this time only use genes that are statistically significant at FDR < 0.01
logFC <- shrinkLvV %>% 
  drop_na(FDR, Entrez) %>% 
  filter(FDR < 0.01) %>% 
  select(Entrez, logFC) %>% 
  deframe()

pathview(gene.data = logFC, 
         pathway.id = "mmu04060", 
         species = "mmu", 
         limit = list(gene=5, cpd=1))
## Info: Downloading xml files for mmu04060, 1/1 pathways..
## Info: Downloading png files for mmu04060, 1/1 pathways..
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory /home/cri.camres.org/sawle01/Documents/training/RNAseq_November_2020_remote/Course_Materials/solutions
## Info: Writing image file mmu04060.pathview.png

mmu04060.pathview.png:

mmu04060 -Cytokine-cytokine receptor interaction

Challenge 2 - GO term enrichment analysis

clusterProfiler can also perform over-representation analysis on GO terms. using the commmand enrichGO. Look at the help page for the command enrichGO (?enrichGO) and have a look at the instructions in the clusterProfiler book.

  1. Run the over-representation analysis for GO terms
    • Use genes that have an adjusted p-value (FDR) of less than 0.01 and an absolute fold change greater than 2.
    • For this analysis you can use Ensembl IDs rather then Entrez
    • You’ll need to provide the background (universe) genes, this should be all the genes in our analysis.
    • The mouse database package is called org.Mm.eg.db. You’ll need to load it using library before running the analysis.
    • As we are using Ensembl IDs, you’ll need to set the keyType parameter in the enrichGO command to indicate this.
    • Only test terms in the “Molecular Function” ontology
  2. Use the dotplot function to visualise the results.
library(org.Mm.eg.db)
sigGenes <- shrinkLvV %>% 
    drop_na(FDR) %>% 
    filter(FDR < 0.01 & abs(logFC) > 1) %>% 
    pull(GeneID)

universe <- shrinkLvV$GeneID

ego <- enrichGO(gene          = sigGenes, 
                universe      = universe,
                OrgDb         = org.Mm.eg.db,
                keyType       = "ENSEMBL",
                ont           = "MF",
                pvalueCutoff  = 0.01,
                readable      = TRUE)
dotplot(ego)

Challenge 3

Another common way to rank the genes is to order by pvalue, but also, sorting so that upregulated genes are at start and downregulated at the other - you can do this combining the sign of the fold change and the pvalue.

  1. Rank the genes by statisical significance - you will need to create a new ranking value using -log10({p value}) * sign({Fold Change})
  2. Load the “C2” pathways from the the Robjects/mouse_c2_v5.RData file
  3. Run fgsea using the new ranked genes and the C2 pathways
ranks <- -log10(gseaDat$pvalue) * sign(gseaDat$logFC) # or gseaDat$stat
names(ranks) <- gseaDat$Entrez  

load("Robjects/mouse_c2_v5.RData")

fgseaResC2 <- fgsea(Mm.c2, ranks, minSize=15, maxSize = 500)
head(fgseaResC2[order(padj, -abs(NES)), ], n=10)
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (0.08% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## Warning in fgseaMultilevel(...): For some pathways, in reality P-values are less
## than 1e-10. You can set the `eps` argument to zero for better estimation.
## # A tibble: 10 x 8
##    pathway                   pval    padj log2err     ES   NES  size leadingEdge
##    <chr>                    <dbl>   <dbl>   <dbl>  <dbl> <dbl> <int> <list>     
##  1 REACTOME_INFLUENZA_…  1.00e-10 2.15e-8      NA -0.859 -2.70   128 <chr [80]> 
##  2 REACTOME_NONSENSE_M…  1.00e-10 2.15e-8      NA -0.887 -2.70   100 <chr [74]> 
##  3 REACTOME_3_UTR_MEDI…  1.00e-10 2.15e-8      NA -0.886 -2.70   100 <chr [68]> 
##  4 REACTOME_PEPTIDE_CH…  1.00e-10 2.15e-8      NA -0.908 -2.70    79 <chr [68]> 
##  5 REACTOME_INFLUENZA_…  1.00e-10 2.15e-8      NA -0.884 -2.69    95 <chr [69]> 
##  6 KEGG_RIBOSOME         1.00e-10 2.15e-8      NA -0.897 -2.66    81 <chr [66]> 
##  7 REACTOME_METABOLISM…  1.00e-10 2.15e-8      NA -0.774 -2.57   202 <chr [87]> 
##  8 REACTOME_TRANSLATION  1.00e-10 2.15e-8      NA -0.800 -2.56   141 <chr [71]> 
##  9 REACTOME_SRP_DEPEND…  1.00e-10 2.15e-8      NA -0.843 -2.55   103 <chr [67]> 
## 10 REACTOME_METABOLISM…  1.00e-10 2.15e-8      NA -0.755 -2.54   244 <chr [97]>
  1. Run fgsea using the new ranked genes and the H pathways. How do these results differ from the ones we got when ranking by the fold change alone?
fgseaResH <- fgsea(Mm.H, ranks, minSize=15, maxSize = 500)
head(fgseaResH[order(padj, -abs(NES)), ], n=20)
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (0.08% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## Warning in fgseaMultilevel(...): For some pathways, in reality P-values are less
## than 1e-10. You can set the `eps` argument to zero for better estimation.
## # A tibble: 20 x 8
##    pathway                  pval     padj log2err     ES   NES  size leadingEdge
##    <chr>                   <dbl>    <dbl>   <dbl>  <dbl> <dbl> <int> <list>     
##  1 HALLMARK_MYC_TARG…   1.00e-10  5.00e-9  NA     -0.683 -2.24   201 <chr [56]> 
##  2 HALLMARK_OXIDATIV…   5.12e- 9  1.28e-7   0.761  0.648  2.07   196 <chr [128]>
##  3 HALLMARK_E2F_TARG…   8.51e- 7  1.42e-5   0.659 -0.562 -1.85   201 <chr [49]> 
##  4 HALLMARK_TGF_BETA…   2.25e- 6  2.82e-5   0.627 -0.742 -2.05    54 <chr [15]> 
##  5 HALLMARK_ADIPOGEN…   5.70e- 6  5.70e-5   0.611  0.573  1.83   199 <chr [79]> 
##  6 HALLMARK_MYC_TARG…   1.05e- 5  8.76e-5   0.593 -0.701 -1.96    58 <chr [25]> 
##  7 HALLMARK_CHOLESTE…   3.10e- 5  2.21e-4   0.557  0.701  1.94    73 <chr [29]> 
##  8 HALLMARK_FATTY_AC…   4.63e- 5  2.57e-4   0.557  0.576  1.78   155 <chr [54]> 
##  9 HALLMARK_MTORC1_S…   4.52e- 5  2.57e-4   0.557  0.550  1.76   203 <chr [47]> 
## 10 HALLMARK_G2M_CHEC…   7.54e- 5  3.77e-4   0.538 -0.506 -1.66   201 <chr [29]> 
## 11 HALLMARK_ESTROGEN…   4.44e- 4  2.02e-3   0.498 -0.482 -1.57   193 <chr [54]> 
## 12 HALLMARK_PEROXISO…   9.93e- 4  4.14e-3   0.455  0.609  1.75    95 <chr [39]> 
## 13 HALLMARK_ALLOGRAF…   1.08e- 3  4.14e-3   0.455 -0.460 -1.52   202 <chr [41]> 
## 14 HALLMARK_WNT_BETA…   2.16e- 3  7.70e-3   0.432 -0.658 -1.73    40 <chr [18]> 
## 15 HALLMARK_NOTCH_SI…   2.48e- 3  7.76e-3   0.432 -0.696 -1.73    31 <chr [15]> 
## 16 HALLMARK_TNFA_SIG…   2.44e- 3  7.76e-3   0.432 -0.449 -1.47   199 <chr [60]> 
## 17 HALLMARK_DNA_REPA…   2.99e- 3  8.78e-3   0.432 -0.459 -1.46   150 <chr [27]> 
## 18 HALLMARK_BILE_ACI…   5.90e- 3  1.64e-2   0.407  0.548  1.57    98 <chr [37]> 
## 19 HALLMARK_ESTROGEN…   1.09e- 2  2.86e-2   0.381 -0.416 -1.36   197 <chr [39]> 
## 20 HALLMARK_PROTEIN_…   1.71e- 2  4.27e-2   0.352  0.499  1.44    96 <chr [56]>