Sept 2022

Single Cell RNAseq Analysis Workflow

Outline

Clusters and/or cell types have been identified, we now want to compare sample groups:

  • Differential expression - Differences in expression between sample group within a biological state.

  • Differential abundance - Differences in cell numbers between sample groups within a biological state.

Differential Expression

Replicates are samples not cells:

  • single cells within a sample are not independent of each other,
  • using cells as replicates amounts to studying variation inside an individual
  • while we want to study variation across a population

Pseudo-bulk:

  • gene expression levels for each cluster in each sample
  • are obtained by summing across cells

Differential expression between conditions

Workflow:

  • compute pseudo-bulk count by summing across cells,

    • per cluster and per sample
  • perform bulk analysis with fewer replicates,

    • for each cluster separately

Method:

  • quasi-likelihood (QL) methods from the edgeR package

  • negative binomial generalized linear model (NB GLM)

    • to handle overdispersed count data
    • in experiments with limited replication

Differential Expression

Steps:

  • Remove samples with very low library sizes, e.g. < 20 cells

    • better normalisation
  • Remove genes that are lowly expressed,

    • reduces computational work,
    • improves the accuracy of mean-variance trend modelling
    • decreases the severity of the multiple testing correction
    • filter: log-CPM threshold in a minimum number of samples, smallest sample group
  • Correct for composition biases

    • by computing normalization factors with the trimmed mean of M-values method
  • Test whether the log-fold change between sample groups is significantly different from zero

    • estimate the negative binomial (NB) dispersions
    • estimate the quasi-likelihood dispersions, uncertainty and variability of the per-gene variance

Differential Abundance

Aim:

  • test for significant changes in grouped cell abundance across conditions

Example:

  • which cell types are depleted or enriched upon treatment?

Differencial Abundance - Milo

  • Most methods require defined clusters as input. Assigning cells to discrete clusters in context of continuous differentiation, developmental or stimulation trajectories.

  • Methods that don’t require clusters also don’t model variability in cell numbers among replicates or can only carry out pairwise comparisons.

Milo

  • Uses K- nearest neighbour graph to model cellular states as overlapping neighbourhoods. Non-independence is accounted for with a weighted version of the Benjamini–Hochberg method.

  • Determines neighbourhoods and groupings independently of our defined clusters

  • Can be used for complex models

  • Faster and scalable

Differencial Abundance - Milo

Steps:

  • Construct KNN graph

    • rescales UMI count by per-cell sequencing depth
    • log transforms
    • uses PCA
    • calculates Euclidean distance between cells and its k nearest neighbour in PC space
  • Defines Cell Neighbourhoods

  • Counts cells in Neighbourhoods

  • Tests for DA in Neighbourhoods

  • Does a multiple testing correction (Spacial FDR)

  • Visualiations