splSetToGet <- "PBMMC,ETV6-RUNX1"
splSetVec <- unlist(strsplit(splSetToGet, ","))
splSetToGet2 <- gsub(",", "_", splSetToGet)
nbPcToComp <- 50
figSize <- 7
library(scater)
library(scran)
library(batchelor)
library(edgeR)
library(tidyverse)
library(patchwork)
library(DT)
fontsize <- theme(axis.text=element_text(size=12), axis.title=element_text(size=16))
Source: Multi-sample comparisons of the OSCA book.
A powerful use of scRNA-seq technology lies in the design of replicated multi-condition experiments to detect changes in composition or expression between conditions. For example, a researcher could use this strategy to detect changes in cell type abundance after drug treatment (Richard et al. 2018) or genetic modifications (Scialdone et al. 2016). This provides more biological insight than conventional scRNA-seq experiments involving only one biological condition, especially if we can relate population changes to specific experimental perturbations.
Differential analyses of multi-condition scRNA-seq experiments can be broadly split into two categories - differential expression (DE) and differential abundance (DA) analyses. The former tests for changes in expression between conditions for cells of the same type that are present in both conditions, while the latter tests for changes in the composition of cell types (or states, etc.) between conditions.
We have used the data set comprising the 7 samples (500 cells per sample) analysed with fastMNN.
The differential analyses will be predicated on many of the pre-processing steps covered previously. For brevity, we will not explicitly repeat them here, only noting that we have already merged cells from all samples into the same coordinate system and clustered the merged dataset to obtain a common partitioning across all samples.
Load the SCE object:
# Read object in:
sce <- readRDS("../Robjects/caron_postDeconv_5hCellPerSpl_dsi_PBMMC_ETV6-RUNX1_clust.Rds")
sce
## class: SingleCellExperiment
## dim: 9479 3500
## metadata(2): merge.info pca.info
## assays(1): reconstructed
## rownames(9479): TSPAN6 SCYL3 ... AL132709.8 AL109627.1
## rowData names(8): ID Symbol ... gene_sparsity rotation
## colnames(3500): 1_AAACGGGCAGTTCATG-1 1_AAACGGGGTTCACCTC-1 ...
## 12_TTTGTCACAATGAAAC-1 12_TTTGTCATCAGTTGAC-1
## colData names(28): SampleName Barcode ... cluster.out.10
## cluster.out.10.col
## reducedDimNames(3): corrected TSNE UMAP
## mainExpName: NULL
## altExpNames(0):
colData(sce) %>% head
## DataFrame with 6 rows and 28 columns
## SampleName Barcode SampleId SampleGroup
## <character> <character> <character> <factor>
## 1_AAACGGGCAGTTCATG-1 ETV6-RUNX1_1 AAACGGGCAGTTCATG-1 SRR9264343 ETV6-RUNX1
## 1_AAACGGGGTTCACCTC-1 ETV6-RUNX1_1 AAACGGGGTTCACCTC-1 SRR9264343 ETV6-RUNX1
## 1_AAAGATGAGCGATGAC-1 ETV6-RUNX1_1 AAAGATGAGCGATGAC-1 SRR9264343 ETV6-RUNX1
## 1_AAAGATGCAGCCAATT-1 ETV6-RUNX1_1 AAAGATGCAGCCAATT-1 SRR9264343 ETV6-RUNX1
## 1_AAAGTAGCAATGCCAT-1 ETV6-RUNX1_1 AAAGTAGCAATGCCAT-1 SRR9264343 ETV6-RUNX1
## 1_AAATGCCAGAACAATC-1 ETV6-RUNX1_1 AAATGCCAGAACAATC-1 SRR9264343 ETV6-RUNX1
## DatasetName sum detected subsets_Mito_sum
## <character> <numeric> <integer> <numeric>
## 1_AAACGGGCAGTTCATG-1 Caron 8103 2299 430
## 1_AAACGGGGTTCACCTC-1 Caron 9552 2843 752
## 1_AAAGATGAGCGATGAC-1 Caron 1693 894 75
## 1_AAAGATGCAGCCAATT-1 Caron 6164 2063 402
## 1_AAAGTAGCAATGCCAT-1 Caron 5816 1943 434
## 1_AAATGCCAGAACAATC-1 Caron 2533 1080 271
## subsets_Mito_detected subsets_Mito_percent total
## <integer> <numeric> <numeric>
## 1_AAACGGGCAGTTCATG-1 11 5.30668 8103
## 1_AAACGGGGTTCACCTC-1 12 7.87270 9552
## 1_AAAGATGAGCGATGAC-1 10 4.43001 1693
## 1_AAAGATGCAGCCAATT-1 11 6.52174 6164
## 1_AAAGTAGCAATGCCAT-1 12 7.46217 5816
## 1_AAATGCCAGAACAATC-1 12 10.69878 2533
## low_lib_size low_n_features high_Mito_percent discard
## <logical> <logical> <logical> <logical>
## 1_AAACGGGCAGTTCATG-1 FALSE FALSE FALSE FALSE
## 1_AAACGGGGTTCACCTC-1 FALSE FALSE FALSE FALSE
## 1_AAAGATGAGCGATGAC-1 FALSE FALSE FALSE FALSE
## 1_AAAGATGCAGCCAATT-1 FALSE FALSE FALSE FALSE
## 1_AAAGTAGCAATGCCAT-1 FALSE FALSE FALSE FALSE
## 1_AAATGCCAGAACAATC-1 FALSE FALSE FALSE FALSE
## sum.1 detected.1 total.1 sizeFactor cell_sparsity
## <numeric> <integer> <numeric> <numeric> <numeric>
## 1_AAACGGGCAGTTCATG-1 8101 2298 8101 3.636419 0.870113
## 1_AAACGGGGTTCACCTC-1 9550 2841 9550 5.029599 0.839379
## 1_AAAGATGAGCGATGAC-1 1692 893 1692 0.953496 0.949492
## 1_AAAGATGCAGCCAATT-1 6160 2059 6160 3.072866 0.883446
## 1_AAAGTAGCAATGCCAT-1 5815 1942 5815 2.777673 0.890226
## 1_AAATGCCAGAACAATC-1 2531 1078 2531 1.233575 0.938983
## label cluster kmeans10 kmeans.best louvain leiden
## <factor> <factor> <numeric> <numeric> <factor> <factor>
## 1_AAACGGGCAGTTCATG-1 2 4 5 6 4 4
## 1_AAACGGGGTTCACCTC-1 2 4 5 6 4 4
## 1_AAAGATGAGCGATGAC-1 5 3 10 7 3 2
## 1_AAAGATGCAGCCAATT-1 4 5 6 15 1 6
## 1_AAAGTAGCAATGCCAT-1 5 5 6 15 1 6
## 1_AAATGCCAGAACAATC-1 5 2 6 9 3 2
## cluster.out.10 cluster.out.10.col
## <factor> <character>
## 1_AAACGGGCAGTTCATG-1 5 #CA8385
## 1_AAACGGGGTTCACCTC-1 5 #CA8385
## 1_AAAGATGAGCGATGAC-1 2 #AA38D5
## 1_AAAGATGCAGCCAATT-1 2 #AA38D5
## 1_AAAGTAGCAATGCCAT-1 2 #AA38D5
## 1_AAATGCCAGAACAATC-1 2 #AA38D5
A brief inspection of the results shows clusters contain varying contributions from batches:
colLabels(sce) <- sce$louvain # clusters.mnn
tab <- table(colLabels(sce), sce$SampleGroup)
tab
##
## ETV6-RUNX1 HHD PBMMC PRE-T
## 1 135 0 116 0
## 2 438 0 53 0
## 3 584 0 10 0
## 4 366 0 149 0
## 5 2 0 168 0
## 6 202 0 398 0
## 7 23 0 85 0
## 8 2 0 168 0
## 9 140 0 136 0
## 10 40 0 111 0
## 11 68 0 106 0
tab <- table(colLabels(sce), sce$SampleName)
pheatmap::pheatmap(tab,
border_color = NA,
drop_levels = TRUE,
cluster_rows = !FALSE,
cluster_cols = !FALSE
)
On the t-SNE plots below, cells are coloured by type or sample (‘batch of origin’). Cluster numbers are superimposed based on the median coordinate of cells assigned to that cluster.
p1 <- plotTSNE(sce, colour_by="SampleGroup", text_by="label", point_size=0.3)
p2 <- plotTSNE(sce, colour_by="SampleName", point_size=0.3)
gridExtra::grid.arrange(p1, p2+facet_wrap(~colData(sce)$SampleName), ncol=2)