July 2020

Differential Gene Expression Analysis Workflow


Gene Set testing

There are many approaches to searching for biological meaning in the results of differential expression analysis.

Commonly we look to see if the differentially expressed genes tend to relate to specific pathways or ontological groups of genes.

We will look at two methods of doing this:

  • Over Representation Analysis (ORA)

  • Gene Set Enrichment Analysis (GSEA)

Gene Set testing

Common sources of Gene Sets:

  • KEGG pathways

  • Gene Ontologies

  • Reactome

  • MSigDB (GSEA)

  • Manually curated gene lists

Over Representation Analysis - Method

  • This method tests whether genes in specific pathway are present in a subset of genes of interest in our data more than expected.

  • The genes of interest could be e.g. statistically significant genes or a cluster of genes from hierachical or k-means clustering.

  • Given the ratio of genes in the pathway to genes not in the pathway, is the number of genes in the pathway and in our subset statistically unlikely by chance.

Over Representation Analysis - Method

Genes in the experiment are split in two ways:

  • annotated to the pathway or not
  • differentially expressed or not

Contingency table:

  • Analysis with the hypergeometric/fishers exact test

Gene Set Enrichment Analysis (GSEA)

  • This method is based on ranking of all genes in our dataset

  • If the gene set is significantly affected in our experiment, then the genes in the set should tend to be at one end or the other of our ranking.

  • The ranking method is arbitrary, but p-value and fold change are common choices.

  • GSEA calculates an enrichment score based on the ranking, and then uses permutation to calculate a p-value for how significant the enrichment score is.

GSEA: Calculate the enrichment score

  • Ranking by Fold Change

GSEA: Calculate the enrichment score

  • Ranking by Fold Change

GSEA: Calculate the enrichment score

  • Identify genes in list

GSEA: Calculate the enrichment score

  • Identify genes in list

GSEA: Calculate the enrichment score

  • Walk along genes and calculate a cumulative score

GSEA: Calculate the enrichment score

  • A different gene set

GSEA: Calculate the enrichment score

  • A different gene set

GSEA - esimate a p-value

  • Randomly permute the ranking and recalculate the Enrichment Score.

  • From a distribution of our permuted Enrichment scores determine how likely our ES.