Sept 2022
Clusters and/or cell types have been identified, we now want to compare sample groups:
Differential expression - Differences in expression between sample group within a biological state.
Differential abundance - Differences in cell numbers between sample groups within a biological state.
Replicates are samples not cells:
Pseudo-bulk:
Workflow:
compute pseudo-bulk count by summing across cells,
perform bulk analysis with fewer replicates,
Method:
quasi-likelihood (QL) methods from the edgeR
package
negative binomial generalized linear model (NB GLM)
Steps:
Remove samples with very low library sizes, e.g. < 20 cells
Remove genes that are lowly expressed,
Correct for composition biases
Test whether the log-fold change between sample groups is significantly different from zero
Aim:
Example:
Most methods require defined clusters as input. Assigning cells to discrete clusters in context of continuous differentiation, developmental or stimulation trajectories.
Methods that don’t require clusters also don’t model variability in cell numbers among replicates or can only carry out pairwise comparisons.
Milo
Uses K- nearest neighbour graph to model cellular states as overlapping neighbourhoods. Non-independence is accounted for with a weighted version of the Benjamini–Hochberg method.
Determines neighbourhoods and groupings independently of our defined clusters
Can be used for complex models
Faster and scalable
Steps:
Construct KNN graph
Defines Cell Neighbourhoods
Counts cells in Neighbourhoods
Tests for DA in Neighbourhoods
Does a multiple testing correction (Spacial FDR)
Visualiations