Differential expression and abundance between conditions

Sept 2022

Single Cell RNAseq Analysis Workflow

Clusters and/or cell types have been identified, we now want to compare sample groups:

Differential expression - Differences in expression between sample group within a biological state.
Differential abundance - Differences in cell numbers between sample groups within a biological state.

Replicates are samples not cells:

Pseudo-bulk:

Workflow:

Method:

quasi-likelihood (QL) methods from the edgeR package
negative binomial generalized linear model (NB GLM)
- to handle overdispersed count data
- in experiments with limited replication

Steps:

Remove samples with very low library sizes, e.g. < 20 cells
- better normalisation
Remove genes that are lowly expressed,
- reduces computational work,
- improves the accuracy of mean-variance trend modelling
- decreases the severity of the multiple testing correction
- filter: log-CPM threshold in a minimum number of samples, smallest sample group
Correct for composition biases
- by computing normalization factors with the trimmed mean of M-values method
Test whether the log-fold change between sample groups is significantly different from zero
- estimate the negative binomial (NB) dispersions
- estimate the quasi-likelihood dispersions, uncertainty and variability of the per-gene variance

Aim:

Example:

Most methods require defined clusters as input. Assigning cells to discrete clusters in context of continuous differentiation, developmental or stimulation trajectories.
Methods that don’t require clusters also don’t model variability in cell numbers among replicates or can only carry out pairwise comparisons.

Milo

Uses K- nearest neighbour graph to model cellular states as overlapping neighbourhoods. Non-independence is accounted for with a weighted version of the Benjamini–Hochberg method.
Determines neighbourhoods and groupings independently of our defined clusters
Can be used for complex models
Faster and scalable

Steps:

Construct KNN graph
- rescales UMI count by per-cell sequencing depth
- log transforms
- uses PCA
- calculates Euclidean distance between cells and its k nearest neighbour in PC space
Defines Cell Neighbourhoods
Counts cells in Neighbourhoods
Tests for DA in Neighbourhoods
Does a multiple testing correction (Spacial FDR)
Visualiations