1 Differential expression and abundance between conditions

splSetToGet <- "PBMMC,ETV6-RUNX1"
splSetVec <- unlist(strsplit(splSetToGet, ","))
splSetToGet2 <- gsub(",", "_", splSetToGet)
nbPcToComp <- 50
figSize <- 7
library(scater)
library(scran)
library(batchelor)
library(edgeR)
library(tidyverse)
library(patchwork)
library(DT)
fontsize <- theme(axis.text=element_text(size=12), axis.title=element_text(size=16))

Source: Multi-sample comparisons of the OSCA book.

1.1 Motivation

A powerful use of scRNA-seq technology lies in the design of replicated multi-condition experiments to detect changes in composition or expression between conditions. For example, a researcher could use this strategy to detect changes in cell type abundance after drug treatment (Richard et al. 2018) or genetic modifications (Scialdone et al. 2016). This provides more biological insight than conventional scRNA-seq experiments involving only one biological condition, especially if we can relate population changes to specific experimental perturbations.

Differential analyses of multi-condition scRNA-seq experiments can be broadly split into two categories - differential expression (DE) and differential abundance (DA) analyses. The former tests for changes in expression between conditions for cells of the same type that are present in both conditions, while the latter tests for changes in the composition of cell types (or states, etc.) between conditions.

1.2 Setting up the data

We have used the data set comprising the 7 samples (500 cells per sample) analysed with fastMNN.

The differential analyses will be predicated on many of the pre-processing steps covered previously. For brevity, we will not explicitly repeat them here, only noting that we have already merged cells from all samples into the same coordinate system and clustered the merged dataset to obtain a common partitioning across all samples.

Load the SCE object:

# Read object in:
sce <- readRDS("../Robjects/caron_postDeconv_5hCellPerSpl_dsi_PBMMC_ETV6-RUNX1_clust.Rds")
sce
## class: SingleCellExperiment 
## dim: 9479 3500 
## metadata(2): merge.info pca.info
## assays(1): reconstructed
## rownames(9479): TSPAN6 SCYL3 ... AL132709.8 AL109627.1
## rowData names(8): ID Symbol ... gene_sparsity rotation
## colnames(3500): 1_AAACGGGCAGTTCATG-1 1_AAACGGGGTTCACCTC-1 ...
##   12_TTTGTCACAATGAAAC-1 12_TTTGTCATCAGTTGAC-1
## colData names(28): SampleName Barcode ... cluster.out.10
##   cluster.out.10.col
## reducedDimNames(3): corrected TSNE UMAP
## mainExpName: NULL
## altExpNames(0):
colData(sce) %>% head
## DataFrame with 6 rows and 28 columns
##                        SampleName            Barcode    SampleId SampleGroup
##                       <character>        <character> <character>    <factor>
## 1_AAACGGGCAGTTCATG-1 ETV6-RUNX1_1 AAACGGGCAGTTCATG-1  SRR9264343  ETV6-RUNX1
## 1_AAACGGGGTTCACCTC-1 ETV6-RUNX1_1 AAACGGGGTTCACCTC-1  SRR9264343  ETV6-RUNX1
## 1_AAAGATGAGCGATGAC-1 ETV6-RUNX1_1 AAAGATGAGCGATGAC-1  SRR9264343  ETV6-RUNX1
## 1_AAAGATGCAGCCAATT-1 ETV6-RUNX1_1 AAAGATGCAGCCAATT-1  SRR9264343  ETV6-RUNX1
## 1_AAAGTAGCAATGCCAT-1 ETV6-RUNX1_1 AAAGTAGCAATGCCAT-1  SRR9264343  ETV6-RUNX1
## 1_AAATGCCAGAACAATC-1 ETV6-RUNX1_1 AAATGCCAGAACAATC-1  SRR9264343  ETV6-RUNX1
##                      DatasetName       sum  detected subsets_Mito_sum
##                      <character> <numeric> <integer>        <numeric>
## 1_AAACGGGCAGTTCATG-1       Caron      8103      2299              430
## 1_AAACGGGGTTCACCTC-1       Caron      9552      2843              752
## 1_AAAGATGAGCGATGAC-1       Caron      1693       894               75
## 1_AAAGATGCAGCCAATT-1       Caron      6164      2063              402
## 1_AAAGTAGCAATGCCAT-1       Caron      5816      1943              434
## 1_AAATGCCAGAACAATC-1       Caron      2533      1080              271
##                      subsets_Mito_detected subsets_Mito_percent     total
##                                  <integer>            <numeric> <numeric>
## 1_AAACGGGCAGTTCATG-1                    11              5.30668      8103
## 1_AAACGGGGTTCACCTC-1                    12              7.87270      9552
## 1_AAAGATGAGCGATGAC-1                    10              4.43001      1693
## 1_AAAGATGCAGCCAATT-1                    11              6.52174      6164
## 1_AAAGTAGCAATGCCAT-1                    12              7.46217      5816
## 1_AAATGCCAGAACAATC-1                    12             10.69878      2533
##                      low_lib_size low_n_features high_Mito_percent   discard
##                         <logical>      <logical>         <logical> <logical>
## 1_AAACGGGCAGTTCATG-1        FALSE          FALSE             FALSE     FALSE
## 1_AAACGGGGTTCACCTC-1        FALSE          FALSE             FALSE     FALSE
## 1_AAAGATGAGCGATGAC-1        FALSE          FALSE             FALSE     FALSE
## 1_AAAGATGCAGCCAATT-1        FALSE          FALSE             FALSE     FALSE
## 1_AAAGTAGCAATGCCAT-1        FALSE          FALSE             FALSE     FALSE
## 1_AAATGCCAGAACAATC-1        FALSE          FALSE             FALSE     FALSE
##                          sum.1 detected.1   total.1 sizeFactor cell_sparsity
##                      <numeric>  <integer> <numeric>  <numeric>     <numeric>
## 1_AAACGGGCAGTTCATG-1      8101       2298      8101   3.636419      0.870113
## 1_AAACGGGGTTCACCTC-1      9550       2841      9550   5.029599      0.839379
## 1_AAAGATGAGCGATGAC-1      1692        893      1692   0.953496      0.949492
## 1_AAAGATGCAGCCAATT-1      6160       2059      6160   3.072866      0.883446
## 1_AAAGTAGCAATGCCAT-1      5815       1942      5815   2.777673      0.890226
## 1_AAATGCCAGAACAATC-1      2531       1078      2531   1.233575      0.938983
##                         label  cluster  kmeans10 kmeans.best  louvain   leiden
##                      <factor> <factor> <numeric>   <numeric> <factor> <factor>
## 1_AAACGGGCAGTTCATG-1        2        4         5           6        4        4
## 1_AAACGGGGTTCACCTC-1        2        4         5           6        4        4
## 1_AAAGATGAGCGATGAC-1        5        3        10           7        3        2
## 1_AAAGATGCAGCCAATT-1        4        5         6          15        1        6
## 1_AAAGTAGCAATGCCAT-1        5        5         6          15        1        6
## 1_AAATGCCAGAACAATC-1        5        2         6           9        3        2
##                      cluster.out.10 cluster.out.10.col
##                            <factor>        <character>
## 1_AAACGGGCAGTTCATG-1              5            #CA8385
## 1_AAACGGGGTTCACCTC-1              5            #CA8385
## 1_AAAGATGAGCGATGAC-1              2            #AA38D5
## 1_AAAGATGCAGCCAATT-1              2            #AA38D5
## 1_AAAGTAGCAATGCCAT-1              2            #AA38D5
## 1_AAATGCCAGAACAATC-1              2            #AA38D5

A brief inspection of the results shows clusters contain varying contributions from batches:

colLabels(sce) <- sce$louvain # clusters.mnn
tab <- table(colLabels(sce), sce$SampleGroup)
tab
##     
##      ETV6-RUNX1 HHD PBMMC PRE-T
##   1         135   0   116     0
##   2         438   0    53     0
##   3         584   0    10     0
##   4         366   0   149     0
##   5           2   0   168     0
##   6         202   0   398     0
##   7          23   0    85     0
##   8           2   0   168     0
##   9         140   0   136     0
##   10         40   0   111     0
##   11         68   0   106     0
tab <- table(colLabels(sce), sce$SampleName)
pheatmap::pheatmap(tab,
           border_color      = NA,
           drop_levels       = TRUE,
           cluster_rows      = !FALSE,
           cluster_cols      = !FALSE
           )

On the t-SNE plots below, cells are coloured by type or sample (‘batch of origin’). Cluster numbers are superimposed based on the median coordinate of cells assigned to that cluster.

p1 <- plotTSNE(sce, colour_by="SampleGroup", text_by="label", point_size=0.3)
p2 <- plotTSNE(sce, colour_by="SampleName", point_size=0.3)
gridExtra::grid.arrange(p1, p2+facet_wrap(~colData(sce)$SampleName), ncol=2)