October 2024

Differential Gene Expression Analysis Workflow


Transformation

For differential expression analyses we use raw counts but to visualise the data to explore it we use transformed data.

  • The range of raw counts is very large
  • Variance increases with mean gene expression

  • Allows us to more clearly assess differences between sample groups

Types of Transformations

  • Log2
  • Rlog - Performs a log2 scale transformation in a way that compensates for differences between samples for genes with low read count and also normalizes between samples for library size.
  • VST - Variance stabilizing transformation (VST) aims at generating a matrix of values for which variance is constant across the range of mean values, especially for low mean and accounts for library size.

Comparison between the two: https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#count-data-transformations

Principle Component Analysis

  • Unsupervised analysis
  • If the experiment is well controlled and has worked well, we should find that replicate samples cluster closely, whilst the greatest sources of variation in the data should be between treatments/sample groups
  • Useful tool for checking for outliers and batch effects