March 2023
Counting estimates the relative counts for each gene
Does this accurately represent the original population of RNAs?
The relationship between counts and RNA expression is not the same for all genes across all samples
Library Size
Differing sequencing depth
Gene properties
Length, GC content, sequence
Library composition
Quantification is relative - changes in relative abundance for one gene will affect the relative abundances of other genes
“Composition Bias”
Simple difference in means
Replication introduces variation
Normal (Gaussian) Distribution - t-test
Two parameters - \(mean\) and \(sd\) (\(sd^2 = variance\))
Suitable for microarray data but not for RNAseq data
Count data - Poisson distribution
One parameter - \(mean\) \((\lambda)\)
\(variance\) = \(mean\)
Use the Negative Binomial distribution
In the NB distribution \(mean\) not equal to \(variance\)
Two paramenters - \(mean\) and \(dispersion\)
\(dispersion\) describes how \(variance\) changes with \(mean\)
Anders, S. & Huber, W. (2010) Genome Biology
Estimating the dispersion parameter can be difficult with a small number of samples
DESeq2 models the variance as the sum of technical and biological variance
Esimate dispersion for each gene
‘Share’ dispersion information between genes to obtain fitted estimate
Shrink gene-wise estimates towards the fitted estimates
Bad dispersion plots from: https://github.com/hbctraining/DGE_workshop
Calculate coefficients describing change in gene expression
Linear Model \(\rightarrow\) General Linear Model
Thank you