Jun 2022
Our goal is to identify genes that are differently expressed between clusters
Calculate effect sizes that capture differences in:
These are calculated in pairwise cluster comparisons.
scran::scoreMarkers()
functionFor each cluster the function computes the effect size scores between it and every other cluster.
scoreMarkers( sce, groups = sce$louvain15 # clusters to compare block = sce$SampleGroup, # covariates in statistical model )
Outputs a list of DataFrame
with summary statistics for the metrics we just covered (columns named with suffix cohen
, AUC
and detected
).
scran::scoreMarkers()
: summary statisticsUnderstand what are we trying to compare with the different scores:
Strictly speaking, identifying genes differentially expressed between clusters is statistically flawed, since the clusters were themselves defined based on the gene expression data itself. Validation is crucial as a follow-up from these analyses.
Do not use batch-integrated expression data for calculating marker gene scores, instead, include batch in the statistical model (the scoreMarkers()
function has the block
argument to achieve this).
Normalization strategy has a big influence on the results in differences in expression between cell and between clusters.
A lot of what you get might be noise. Take two random set of cells and run DE and you probably with have a few significant genes with most of the commonly used tests.
It’s important to assess and validate the results. Think of the results as hypotheses that need independent verification (e.g. microscopy, qPCR)