26th July 2021
Practicalities of our Experimental Design
Different 10X runs at different times OR just the same sample run twice
Obscure real biological changes
A few ways our data can be arranged (software dependent too)
single sample SCE objects QCed in isolation
large SCE object containing many samples
multiple large SCE objects with multiple samples
Important we make sure things match up
Different bioconductor versions
Different analysts may have formatted things differently
A useful quick look
Gaussian/Linear Regression - removeBatchEffect (limma), comBat (sva), rescaleBatches or regressBatches (batchelor)
Mutual nearest neighbour correction - Haghverdi et al 2018 Nature Biotechnology
mnnCorrect (batchelor)
FastMNN (batchelor)
Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36(5):421
If you use fastMNN in the absence of a batch effect, it may not work correctly
It is possible to remove genuine biological heterogeneity
fastMNN can be instructed to skip the batch correction if the batch effect is below a threshold. You can use the effect sizes it calculates to do this.
In reality the absence of any batch effect would warrant further investigation.
The value in batch correction is that it enables you to see population heterogeneity within clusters/celltypes across batches.
However the corrected values should not be used for gene based analysis eg. DE/marker detection.