Often, single-cell experiments are done by processing samples in multiple batches. This may be related to logistical constraints such as the inability to run all experimental conditions in parallel, or more extreme cases where samples are processed in different laboratories, by different people and even sequenced with different technologies (e.g. samples from human patients collected in different hospitals). These differences across sample batches very often result in global gene expression differences across those batches. Since batch-to-batch transcriptomic differences are likely unrelated to biological differences, we would ideally want “remove” them before drawing inferences about our cell populations.
Biases due to batch effects are not new to single-cell RNA-seq. Indeed, several methods have been previously developed for standard bulk RNA-seq approaches. Some of those approaches rely on linear models that “regress out” the batch effect, assuming that the cell composition is similar across batches. However, in single-cell RNA-seq we may very often expect changes in cell compositions across batches (e.g. in our course data we have data from cancer samples such as ETV6-RUNX as well as a reference panel of healthy blood cells, PBMMCs). Therefore, methods are necessary that can deal with with heterogeneity across batches.
In recent years, several methods have been developed to deal with this challenge (too many to list here!). Some of the most popular ones include the Mutual Nearest Neighbours (MNN) algorithm, a Principal Components Analysis-based clustering method implemented in the package HARMONY and a method that combines Canonical Correlation Analysis (CCA) and MNN implemented in the package Seurat 5. These methods have been shown to perform well in several benchmark studies (e.g. Luecken et al 2022 and Tran et al 2020), although one important message from these studies is that no single method is universally the best in all situations. For example, some methods may be better at preserving small populations of cells as separate groups in the integrated data at the cost of poorer overall integration, while others may be better at removing batch effects at the cost of also removing some biological signal.
Seurat makes it easy for us to use different batch correction methods but we must be able to assess when to use batch correction and how to adjust the parameters for our dataset.
There are several steps to deciding on our batch correction. First we must examine our uncorrected data and decide if we think a correction is needed. Then we run a correction method and visualise the results. Finally, we must assess the results and decide if we are happy with the correction or if we need to adjust the parameters or try a different method.
library(Seurat)
library(sctransform)
library(glmGamPoi)
library(tidyverse)
library(patchwork)
In the normalisation section this morning we used the
SCTransform function on a single sample. We have 11 samples
in this dataset from 4 different sample groups. Depending on the
variance of our data it may be more appropriate to normalise the data
across all samples together, or to normalise each sample or sample group
separately and then integrate the data.
Given what we saw in Day 1 with large differences in distributions between batches it would be a sensible assumption that the sample groups or even the samples themselves should have normalisation applied separately regardless of whether we decide to batch correct.
We will revert back to the filtered 500 cell dataset that we started
the day with and re-process the data by sample group. Seurat allows us
to do this still all within one object by using layers. We will use the
SplitObject function to split our data by sample group and
then re-process each sample group separately. The parameter
f specifies the variable that we want to split by, in this
case SampleGroup. This will create a new layer in our
Seurat object for each sample group. We can then run the normalisation
and dimensionality reduction steps separately for each sample group.
# Load the data
seurat_object <- readRDS("RObjects/Filtered.500.rds")
# Split the data by sample group
seurat_object[["RNA"]] <- split(seurat_object[["RNA"]],
f = seurat_object$SampleGroup)
seurat_object
## An object of class Seurat
## 29786 features across 5500 samples within 1 assay
## Active assay: RNA (29786 features, 0 variable features)
## 4 layers present: counts.ETV6RUNX1, counts.HHD, counts.PRET, counts.PBMMC
The RNA assay now has 4 layers, one for each of our sample groups.
# Re-process, Seurat will treat each sample group separately
seurat_object <- SCTransform(
seurat_object,
assay = "RNA",
vars.to.regress = "percent.mt",
verbose = FALSE
)
seurat_object <- RunPCA(seurat_object,
features = VariableFeatures(object = seurat_object))
## Warning in PrepDR(object = object, features = features, verbose = verbose): The
## following 141 features requested have not been scaled (running reduction
## without them): GYPA, FCN1, ALAS2, GYPB, CXCL8, HEMGN, LINC02945, SERPINA1,
## EPB42, KRT1, SELENBP1, ENSG00000189229, LINC02227, KLF1, CYGB, LINC01013,
## RPS4Y2, SMIM1, HBQ1, PELATON, APOC1, IGHA2, TSPO2, BAALC, IGHGP, TCL6, SH2D4B,
## MYLK, LINC02202, NETO1, KANK2, ENSG00000287778, S100A12, CCL17, VCAN, IFIT1B,
## HBG1, TMCC2, CSTA, CCL3L3, ENSG00000227706, IGLC7, MPO, PRTN3, TCL1B, TMPRSS15,
## ENSG00000287172, CLEC7A, ENSG00000224610, EREG, HS3ST4, ENSG00000288087, AZU1,
## CTSG, ELANE, CLEC14A, LINC00922, NHERF2, RETN, ENSG00000286581, TNFRSF17, CCN2,
## ARG1, HTR1F, BEST3, PPBP, MSR1, ENSG00000254951, RNASE2, IRF4, S100B, MEIS1,
## POU4F1, LRRC26, ENSG00000260271, CDA, ENSG00000248656, LINC00114, LINC01922,
## ADGRF1, FHIP1A, TFPI, IGLC5, SDC2, FPR1, MPPED2, RBP7, ENSG00000289949, ACY3,
## KEL, EPHA7, ENSG00000288691, PDZK1IP1, SLC40A1, LOXHD1, KCNH2, APOE,
## ENSG00000291034, PLCB4, CLEC4OP, ENSG00000253214, NRP1, MS4A3, FCAR, MYO18B,
## ENSG00000286546, LINC02363, MGST1, CDHR3, ENSG00000257883, IGHG4,
## ENSG00000253673, MYO3A, GNG11, CRMP1, NT5E, ZNF804A, BEX1, DEPP1, OVCH2, AJAP1,
## ABCB4, NKAIN4, TMED6, LINC00692, ENSG00000233064, ENSG00000258631, LARGE-AS1,
## ENSG00000251598, LGR5, SCN9A, SOCS2-AS1, TRIM10, ENSG00000232464,
## ENSG00000291325, PCLO, PEG10, EBF4, TMEM220-AS1, LY6H, ENSG00000253983
## PC_ 1
## Positive: BACH2, IGHM, PCDH9, AFF3, JUNB, CD74, PDE4D, B2M, CD79B, BTG1
## CD69, LINC-PINT, RPS27, CXCR4, IGKC, FOS, ACSM3, TCL1A, TSC22D3, HLA-DRA
## CD52, CD37, JUN, IFITM1, ENSG00000287092, LINC01619, SSBP2, SKAP1, TMSB10, IGLC3
## Negative: HBA1, HBB, HBA2, HBD, AHSP, HBM, CA1, SLC25A37, SLC4A1, BLVRB
## IFI27, SNCA, PRDX2, CA2, HMBS, RHAG, FECH, ANK1, SOX6, SLC25A39
## CD36, BPGM, GMPR, RHCE, STRADB, UROD, FAM210B, HMGB2, SLC2A1, GYPE
## PC_ 2
## Positive: IFITM1, IL32, SKAP1, JUNB, SRGN, S100A4, TRAC, S100A6, TC2N, CD3D
## CD247, CD3E, HBB, B2M, S100A10, CD7, HBA2, CCL5, HBA1, ANXA1
## TRBC2, ZNF331, RPS27, LINC-PINT, PCED1B-AS1, CD44, TNIK, TYROBP, SCML4, TRBC1
## Negative: TUBA1B, HMGB2, H4C3, STMN1, TUBB, IGLL1, TOP2A, H2AZ1, HMGN2, HMGB1
## AFF3, NUSAP1, TCL1A, MME, SSBP2, PCLAF, EBF1, TYMS, BACH2, CD24
## PTMA, UBE2C, ARPP21, RRM2, PCDH9, UHRF1, CD79B, MKI67, DNTT, VPREB1
## PC_ 3
## Positive: LYZ, S100A9, TYROBP, S100A8, LGALS1, S100A4, CST3, S100A6, IFI30, FCER1G
## ENSG00000257764, S100A11, FTL, CCDC200, LST1, MNDA, CTSS, NAMPT, CD74, CFD
## AIF1, SRGN, MS4A6A, TYMP, SAT1, RBM47, FOS, FTH1, HLA-DRA, HLA-DRB1
## Negative: IL32, SKAP1, BCL11B, CAMK4, CD3D, TRBC1, TRAC, CD247, THEMIS, IFITM1
## CD3E, TC2N, TRBC2, CD7, NELL2, PITPNC1, PCED1B-AS1, PRKCH, LCK, TNIK
## SCML4, ITK, CD3G, CD6, CD96, RPS27, LEPROTL1, INPP4B, IL7R, TCF7
## PC_ 4
## Positive: HBB, HBA2, HBA1, HBD, BACH2, HBM, AHSP, PCDH9, CA1, SLC25A37
## AFF3, PDE4D, SLC4A1, IGKC, IGHM, ENSG00000287092, SNCA, IFI27, CD79B, ACSM3
## IGLC3, BLVRB, BPGM, IGLC2, BANK1, BNIP3L, LINC01374, HMBS, CD74, NIBAN3
## Negative: TUBA1B, H4C3, HMGB2, TUBB, TOP2A, H2AZ1, HMGN2, STMN1, NUSAP1, UBE2C
## PCLAF, TYMS, RRM2, MKI67, BIRC5, CENPF, HMGB1, SMC4, TUBB4B, CKS1B
## PTTG1, VIM, AURKB, GAPDH, IL32, ACTB, TPX2, CKS2, IFITM1, S100A4
## PC_ 5
## Positive: IGHM, HMGB2, IGKC, MS4A1, H4C3, TOP2A, BACH2, TUBA1B, IGLC3, IGLC2
## LYN, NUSAP1, ANGPTL1, CD79B, UBE2C, JUNB, IGHD, BANK1, RALGPS2, CENPF
## PTTG1, MTSS1, CD74, AFF3, MKI67, CD37, CRIP1, CA2, CD52, FCRL1
## Negative: GAPDH, CALN1, DNTT, ERG, CCDC26, EEF1A1, RPS2, CDK6, RPS12, AIF1
## HSP90AB1, ENSG00000287092, CD99, NME2, ENSG00000271955, GNAQ, CACNB2, LYZ, RUNX1, GAB2
## MIF, RPL10, GPX1, FRMD4B, RPLP1, RPS15A, RPLP0, CASC15, HSPB1, FABP5
seurat_object <- RunUMAP(seurat_object,
reduction = "pca",
dims = 1:15)
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 19:11:28 UMAP embedding parameters a = 0.9922 b = 1.112
## 19:11:28 Read 5500 rows and found 15 numeric columns
## 19:11:28 Using Annoy for neighbor search, n_neighbors = 30
## 19:11:28 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 19:11:29 Writing NN index file to temp file /var/folders/9s/hsfl94x96vn2g543r67q4vpn6rmr26/T//RtmpUfmyXk/file1654f362e0285
## 19:11:29 Searching Annoy index using 1 thread, search_k = 3000
## 19:11:30 Annoy recall = 100%
## 19:11:31 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 19:11:32 Initializing from normalized Laplacian + noise (using RSpectra)
## 19:11:33 Commencing optimization for 500 epochs, with 215632 positive edges
## 19:11:33 Using rng type: pcg
## 19:11:40 Optimization finished
seurat_object
## An object of class Seurat
## 50798 features across 5500 samples within 2 assays
## Active assay: SCT (21012 features, 3000 variable features)
## 3 layers present: counts, data, scale.data
## 1 other assay present: RNA
## 2 dimensional reductions calculated: pca, umap
We can now visualise the uncorrected data with a hopefully more appropriate normalisation strategy.
# Visualise the uncorrected data
uncorrected_plot <- DimPlot(seurat_object,
reduction = "umap",
group.by = "SampleName") +
ggtitle("Uncorrected data")
In order to examine our uncorrected data further we will run a quick clustering. clustering will be covered in much greater detail on day 3. This shows us how the cells group together in the uncorrected data. If we see that some of the clusters are mostly made up of cells from a single sample, this is an indication that batch effects are present and that we may want to run a batch correction method.
# Run clustering
seurat_object <- FindNeighbors(seurat_object, reduction = "pca",
dims = 1:15)
## Computing nearest neighbor graph
## Computing SNN
seurat_object <- FindClusters(seurat_object,
cluster.name = "uncorrected_clusters")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 5500
## Number of edges: 173164
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9174
## Number of communities: 20
## Elapsed time: 0 seconds
uncorrected_clusters_plot <- DimPlot(seurat_object, reduction = "umap",
group.by ="uncorrected_clusters") + ggtitle("Uncorrected data clusters")
uncorrected_plot + uncorrected_clusters_plot
It can also be useful to view this as a table to get a better idea of the data.
# Make a table of the clusters and samples
table(seurat_object$uncorrected_clusters, seurat_object$orig.ident)
##
## ETV6RUNX1-1 ETV6RUNX1-2 ETV6RUNX1-3 ETV6RUNX1-4 HHD-1 HHD-2 PBMMC-1
## 0 0 394 185 4 30 16 7
## 1 55 60 17 16 37 91 80
## 2 433 0 0 0 3 1 3
## 3 1 9 7 0 53 353 2
## 4 0 0 0 0 4 0 0
## 5 0 0 6 1 7 1 82
## 6 2 3 1 1 287 6 15
## 7 0 0 1 0 0 1 1
## 8 0 2 5 283 1 2 0
## 9 1 17 193 26 36 13 0
## 10 0 1 0 0 0 2 151
## 11 0 1 0 5 8 0 69
## 12 1 0 22 48 0 3 15
## 13 2 6 11 7 8 9 29
## 14 1 1 2 0 3 0 20
## 15 0 0 5 93 0 0 0
## 16 4 6 45 16 23 2 0
## 17 0 0 0 0 0 0 0
## 18 0 0 0 0 0 0 0
## 19 0 0 0 0 0 0 26
##
## PBMMC-2 PBMMC-3 PRET-1 PRET-2
## 0 2 24 2 6
## 1 15 53 97 14
## 2 2 0 0 1
## 3 5 4 1 2
## 4 0 0 392 0
## 5 129 128 0 3
## 6 16 14 0 4
## 7 1 1 0 295
## 8 0 2 0 0
## 9 0 0 0 0
## 10 21 64 0 0
## 11 37 57 2 25
## 12 87 14 0 11
## 13 36 34 2 36
## 14 50 79 0 23
## 15 0 0 2 3
## 16 0 0 0 0
## 17 87 4 0 0
## 18 0 0 2 77
## 19 12 22 0 0
It is worth noting that, although this suggests a batch effect, there might be cases where there are genuine differences in cell populations across batches (e.g. if the different batches represent samples from different tissues).
Data integration algorithms designed for single-cell RNA-seq do allow for unique cell types existing across batches, however, it’s always good to check the results of the integration using independent information (e.g. prior information about genes that are specific to particular cell types).
According to the biology of our experiment, we expect the samples to have different some cell type compositions but the majority of the data should have cell types in common.
Seurat allows us to integrate our data using its
IntegrateLayers function. This function takes the layers of
the RNA assay that we split into and integrates them using a specified
batch correction method. There are 4 methods wrapped in this
function:
Another popular method Mutual Nearest Neighbours (MNN) is not wrapped
in this function but can be run using the RunFastMNN
function from the SeuratWrappers package.
Harmony, Korsunsky et al. 2019 is a method that uses an iterative clustering and correction approach to remove batch effects. It is based on the idea of “harmonizing” the data by finding clusters of cells that are similar across batches and then adjusting the data to make those clusters more similar. It does not edit the raw expresion values but creates a corrected embedding space that can be used for downstream analysis.
PCA embeds cells into a space with reduced dimensionality. Harmony accepts the cell coordinates in this reduced space and runs an iterative algorithm to adjust for dataset specific effects.
A, Harmony uses fuzzy clustering to assign each cell to multiple clusters, while a penalty term ensures that the diversity of datasets within each cluster is maximized.
B, Harmony calculates a global centroid for each cluster, as well as dataset-specific centroids for each cluster.
C, Within each cluster, Harmony calculates a correction factor for each dataset based on the centroids.
D, Finally, Harmony corrects each cell with a cell-specific factor: a linear combination of dataset correction factors weighted by the cell’s soft cluster assignments made in step a. Harmony repeats steps a to d until convergence. The dependence between cluster assignment and dataset diminishes with each round. Datasets are represented with colors, cell types with different shapes.
It does not assume that all cell types are shared across batches and can therefore deal with unique cell types in different batches.
There are some parameters that can be adjusted when running Harmony.
These aren’t explicity part of the the IntegrateLayers
function but can be passed to the HarmonyIntegration method underneath.
For example, the theta parameter (diversity clustering
penalty parameter) controls the diversity of datasets within clusters,
with higher values leading to more diverse clusters. The
lambda parameter (Ridge Regression penalty parameter)
controls the strength of the correction, with higher values leading to
stronger corrections. The sigma parameter (width of soft
kmeans clusters) controls the bandwidth of the Gaussian kernel used in
the correction, with higher values leading to smoother corrections. In
practice we mostly adjust theta. The default is 2.
# Run Harmony integration
harmony_object <- IntegrateLayers(object = seurat_object,
method = HarmonyIntegration,
orig.reduction = "pca",
new.reduction = "harmony")
## The `features` argument is ignored by `HarmonyIntegration`.
## Transposing data matrix
##
## Using automatic lambda estimation
##
## Initializing state using k-means centroids initialization
##
## Harmony 1/10
##
## Harmony 2/10
##
## Harmony 3/10
##
## Harmony 4/10
##
## Harmony 5/10
##
## Harmony 6/10
##
## Harmony converged after 6 iterations
##
## This message is displayed once per session.
We can now visualise the results of the Harmony integration. We will repeat our clustering.
# Run UMAP
harmony_object <- RunUMAP(harmony_object,
reduction = "harmony",
dims = 1:15)
## 19:11:55 UMAP embedding parameters a = 0.9922 b = 1.112
## 19:11:55 Read 5500 rows and found 15 numeric columns
## 19:11:55 Using Annoy for neighbor search, n_neighbors = 30
## 19:11:55 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 19:11:55 Writing NN index file to temp file /var/folders/9s/hsfl94x96vn2g543r67q4vpn6rmr26/T//RtmpUfmyXk/file1654f3e05a46e
## 19:11:55 Searching Annoy index using 1 thread, search_k = 3000
## 19:11:57 Annoy recall = 100%
## 19:11:58 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 19:11:59 Initializing from normalized Laplacian + noise (using RSpectra)
## 19:11:59 Commencing optimization for 500 epochs, with 224864 positive edges
## 19:11:59 Using rng type: pcg
## 19:12:07 Optimization finished
# Run clustering
harmony_object <- FindNeighbors(harmony_object, reduction = "harmony",
dims = 1:15)
## Computing nearest neighbor graph
## Computing SNN
harmony_object <- FindClusters(harmony_object,
cluster.name = "harmony_clusters")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 5500
## Number of edges: 191703
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8768
## Number of communities: 16
## Elapsed time: 0 seconds
# make the plots
harmony_plot <- DimPlot(harmony_object,
reduction = "umap",
group.by = "SampleName") +
ggtitle("Harmony integrated data")
harmony_clusters_plot <- DimPlot(harmony_object, reduction = "umap",
group.by ="harmony_clusters") + ggtitle("Harmony data clusters")
harmony_plot + harmony_clusters_plot
We can see that the cells are now grouped together more by cluster rather than by sample, suggesting that the batch effect has been removed.
We have just carried out the integration using harmony with data processed separately at the sample group level. Try running the integration with data processed at the sample level and see if you get a different result.
SampleName.SCTransform and run
dimensionality reduction on the new layers.IntegrateLayers
function.# Read in our filtered data
seurat_object_sample <- readRDS("RObjects/Filtered.500.rds")
# Split the data by sample name
seurat_object_sample[["RNA"]] <- split(seurat_object_sample[["RNA"]],
f = seurat_object_sample$SampleName)
# Re-process, Seurat will treat each sample separately
seurat_object_sample <- SCTransform(
seurat_object_sample,
assay = "RNA",
vars.to.regress = "percent.mt",
verbose = FALSE
)
seurat_object_sample <- RunPCA(seurat_object_sample,
features = VariableFeatures(object = seurat_object_sample))
## Warning in PrepDR(object = object, features = features, verbose = verbose): The
## following 586 features requested have not been scaled (running reduction
## without them): LYZ, S100A8, ACSM3, CCL5, IGKC, IGLC2, HBD, AHSP, CA1, TYROBP,
## HBM, CCL3, GYPA, GYPB, CCL4, S100A9, KLRB1, TRBC1, ALAS2, CA2, CST3, GNLY,
## SLC4A1, GZMA, FCER1G, HEMGN, MS4A1, ENSG00000227706, JCHAIN, ENSG00000287092,
## ANXA1, LINC02446, CCDC26, LINC03000, GZMB, MNDA, ENSG00000231482, CFD, S100A11,
## TCL1A, CST7, GZMK, ENSG00000229425, SNCA, TMPRSS15, CD3D, IFNG-AS1, CD36, TRDC,
## KLRD1, CD9, RGS1, NELL2, EBF1, TC2N, IGHD, FCRL1, NLGN4X, MS4A6A,
## ENSG00000224610, COL19A1, CLEC7A, SERPINA1, KLHL14, ADARB2, ENSG00000257275,
## SOX5, HBG1, NRG3, MYL4, RHCE, SYT1, ID1, CD3E, HLA-DQA1, LINC01374, RHAG,
## S100A10, ENSG00000287172, LINC02945, ENSG00000288087, NCAM2, CD8B, ZNF385D,
## ANK3, CD7, PLXDC2, LINC01505, TCL1B, RETN, HLA-DPA1, HLA-DRA, XKR4, CCN2, HBG2,
## EPB42, ENSG00000271774, FCGR3A, SOX6, TAFA1, LINC01013, NPY, IGHG3, TNIK, ICOS,
## THEMIS, SEC11C, SAMD3, PPP1R14A, TENM4, KCNQ5, ATP8B4, NCALD, LINC00692,
## ENSG00000287784, HOPX, ADAM28, IGHA1, SCML4, BANK1, ANK1, RPS4Y2, CYGB, TGFBR3,
## GPR183, LINC01871, VPREB3, VPREB1, SCN3A, RUNX3, DACH1, HDAC9, KLRK1, SELENBP1,
## MAFB, KLF1, C16orf74, AREG, GPM6A, CD24, ENSG00000286863, ADAM7-AS1, GYPE,
## PRSS57, PSD3, EXT1, LINC02227, IGLL5, CCR7, EMP1, MIR4435-2HG, ENSG00000257060,
## LINC01811, LIX1-AS1, MDK, CD3G, DOCK4, ENSG00000285534, HLA-DPB1, SLCO3A1,
## STAT4, SHROOM3, PRF1, IFITM3, GPRIN3, IFI44L, FAM30A, PID1, CD8A, C1orf162,
## PTPRS, ENSG00000189229, C1QTNF4, LRRC26, TRGC2, DPEP1, IGHG1, NKAIN2, ITK,
## LRMDA, MS4A7, CD6, PDE4B, COBLL1, APBA2, IL1B, CEBPD, BNC2, HLA-DQB1, KIF14,
## TMTC2, PCAT1, RAB31, STK32B, ENSG00000253980, CCSER1, GRIP1, BASP1, HCK,
## MIR924HG, LINC00861, CSF3R, AHR, SMIM1, ENSG00000291325, SPIB, EPHA4, POU4F1,
## RGS6, MEF2C, PPM1L, CHRM3, PLCL1, TMEM176B, BLNK, RHD, CYBB, NFIA, GZMH,
## GP6-AS1, STX11, BEST3, CYRIA, FCER2, RASGRF2, C1QTNF7-AS1, PELATON, ARL4C,
## NHERF2, BIRC3, MSR1, PTPRK, LINC01922, DLGAP2, ENSG00000254951, RHEX, CYTL1,
## MPZL3, NAPSB, FGR, PTPRM, LINC02397, TSPO2, ADAMTS6, IL21R, ENSG00000290067,
## FRY, ZBTB16, GNG7, CREB5, GRAP2, SPON1, BTLA, BAALC, TRAT1, TMEM117, ARL4A,
## HELB, PCLO, H2BC5, ADARB1, AK5, STAP1, ENSG00000267723, CSF2RA, MYLK, ARHGEF4,
## LINC00623, PKIA, ENSG00000273118, PLD4, RNF144B, SYTL2, SOCS2-AS1, TRABD2A,
## SDC2, CR1L, LINC02363, TNFRSF13C, STEAP1B, GZMM, HLA-DRB1, LILRB1,
## ENSG00000258216, SOX2-OT, ENSG00000289949, DPYD-AS1, LINC02964, LINC01550,
## DTX1, COL24A1, CD2, LINC01163, CRIM1, ENSG00000287292, LHFPL6, ENSG00000260271,
## ENSG00000291034, CDH4, LINC01781, KMO, DGKH, IL10RA, UST, GIMAP4, BLK, SNED1,
## TCL6, PDE7B, SLC4A8, KANK2, PLEK, LY86, PDLIM1, MARCKS, MEF2C-AS1, RNASE6,
## JAML, CD5, EGLN3, MYO10, LINC02202, LCN6, MT1X, SNTB1, CLEC4E, TOX2, NCF1,
## P2RY10, NAALADL2, WDR49, ATRNL1, CCND2, SLFN5, APBB2, PIK3AP1, EGFL7, IRAK3,
## ENSG00000254777, LINC02694, TNFSF8, MAL, ENSG00000253673, LINC01036, C5AR1,
## CD86, ARHGAP42, BICDL1, NFIA-AS2, SPI1, FCRL5, SPTA1, FGD2, GADD45G, SH2D4B,
## PCSK5, CYTIP, LINC01891, TTC39C, LINC01588, RGL1, MPP7, HLA-DMB, FMNL2, CIITA,
## CCDC30, NETO1, SLC25A43, KLF3, PLCH1, CHST15, DEPP1, FAM135A, PTK2, MINPP1,
## SERPINB9, ALCAM, LGALS3, ENSG00000229418, LRRK1, ALOX5, SPON2, CTSC, MTMR8,
## ST3GAL6, LINC02273, ARHGEF12, SNX10, SOCS2, ITGB7, CD22, SCML1, SKI, IGFBP7,
## KCNQ1, LILRB2, BIN2, LARGE-AS1, DHRS3, ZNF608, TLCD4, ENSG00000289298, RAG1,
## WDFY3, TRPS1, ENSG00000255240, RFESD, OVCH2, EPB41L2, CACNA1C, CARD16, GPR132,
## CD28, MDFIC, C12orf42, ENSG00000290853, CD4, PTGER4, DIAPH2-AS1, GATA3,
## CFAP20DC-DT, STK33, FBXO32, TSBP1-AS1, TMEM65, CRMP1, TSPAN32, AGPAT4, ZBP1,
## CARMIL1, ENSG00000197585, EPB41L5, COL5A1, KIF13A, NT5E, TSPAN7, TANC2, ADCY9,
## CD40, TGFBR1, FAM169A, CYSLTR1, TMEM220-AS1, LAMC1, XKR6, ITGB2-AS1,
## ENSG00000290906, TRIM2, KLF3-AS1, ICOSLG, ENSG00000290905, LATS2, PRAM1,
## ENSG00000288891, MTCL1, WDR19, S100A12, CCL17, ENSG00000257764, VCAN, FCN1,
## CXCL8, IFI27, MPO, CSTA, PRTN3, TMCC2, CCL4L2, ENSG00000253693, ELANE, AZU1,
## SLIT2, ENSG00000270240, CD14, PJVK, CCL3L3, PTPRG, ROR1, SPINK2, FGF13, IRF4,
## CA3, MDGA2, RNASE2, HS3ST4, CLIC3, MTUS2, APOC1, CECR2, HTR1F, CEMIP, LINGO2,
## ENSG00000233928, PPP2R2B, TFPI, FCRLA, S100B, UNC79, LINC00114, MPEG1,
## SLC25A21, FGL2, LINC02384, EPHA7, WARS1, PLBD1, FHIP1A, FPR1, XG, MIR3681HG,
## RASD1, RORB, EDA, CYTOR, BMPR1B, MIR4432HG, NCR3, ADGRE2, KRT1, KCNH8,
## TNFSF13B, ENSG00000286481, CMTM2, AFF2, GMPR, CLIC5, FGFBP2, COL23A1, CFAP73,
## PBX1, ITPKB-AS1, GPR65, MAN1C1, TNF, CHODL, FXYD2, PLPP3, ENSG00000286742,
## SORBS2, PTCRA, KYNU, CASP1, COBL, DPP10, DIRC3-AS1, ENSG00000258631, PDGFD,
## DMTN, NETO2, PLCB4, NRN1, MYO1E, TUSC3, SLC11A1, TCFL5, MGST1, ENSG00000253214,
## ENAH, GNG11, GCNT2, ADGRG5, ENSG00000285783, HES6, RIN2, LRRC3B,
## ENSG00000228401, ENSG00000187229, TAFA5, ENSG00000285679, PPM1E, CD34, NCEH1,
## UPP1, C1orf21, IGF2BP2, LDLRAP1, SCN9A, ENSG00000289404, ENSG00000287125, CR1,
## PCBP3, SIT1, ENSG00000286393, MID1, CALD1, UGT3A2
## PC_ 1
## Positive: BACH2, PDE4D, LINC-PINT, PCDH9, AFF3, JUNB, CD74, CD37, B2M, MALAT1
## CD52, PTPRC, SKAP1, BTG1, FOXP1, PRKCB, MAML2, UTRN, ZBTB20, IGHM
## ARHGEF18, RIPOR2, IFITM1, CD79B, CD247, TXNIP, CD69, LTB, IL32, FOXO1
## Negative: TUBA1B, HMGB2, H4C3, TUBB, TOP2A, H2AZ1, PCLAF, STMN1, NUSAP1, HMGN2
## RRM2, TYMS, UBE2C, PTTG1, MKI67, HMGB1, BIRC5, CENPF, HBA1, TUBB4B
## CDK1, PCNA, CKS2, SMC4, CKS1B, DUT, TK1, CCNB2, PRDX2, DIAPH3
## PC_ 2
## Positive: HBA1, IFITM1, S100A4, IL32, S100A6, SRGN, SKAP1, RPS27, FTL, TRAC
## RPS18, RPL21, HBB, RPL13, HBA2, CD247, RPS29, BLVRB, SLC25A37, JUNB
## RPS12, RPS2, RPL34, RPS14, RPL32, RPL12, RPL41, RPS6, TRBC2, RPLP1
## Negative: BACH2, HMGB2, AFF3, H4C3, MME, CD79B, STMN1, TUBA1B, SSBP2, IGHM
## PCDH9, IGLL1, PDE4D, ARPP21, CD74, TOP2A, TUBB, HMGB1, TMEM131L, AKAP12
## PAX5, TCF4, AUTS2, RUBCNL, HMGN2, NUSAP1, RCSD1, NIBAN3, UBE2C, ZCCHC7
## PC_ 3
## Positive: HBA1, HBA2, HBB, SLC25A37, BLVRB, HMBS, FECH, BPGM, PRDX2, PCDH9
## SLC25A39, STRADB, BNIP3L, AFF3, BACH2, FAM210B, SLC2A1, UROD, SSBP2, STOM
## MME, CD79B, CAT, GYPC, MZB1, YBX3, BCL2L1, CASC15, DNTT, ERG
## Negative: TUBA1B, H4C3, HMGB2, S100A4, TOP2A, IFITM1, B2M, TMSB4X, IL32, SRGN
## TUBB, S100A6, JUNB, UBE2C, CRIP1, LGALS1, SKAP1, ACTB, VIM, COTL1
## CD247, NUSAP1, HMGN2, AOAH, IFI30, TRAC, H2AZ1, CTSS, PTPRC, MKI67
## PC_ 4
## Positive: LGALS1, S100A4, IFI30, FTL, S100A6, LST1, CTSS, CCDC200, NAMPT, CD74
## AIF1, SAT1, FTH1, TYMP, RBM47, LYN, NEAT1, SRGN, CD68, PSAP
## ANXA2, CFP, TSPO, KLF4, COTL1, TEX14, ZEB2, GPX1, FOS, GRN
## Negative: SKAP1, BCL11B, IL32, CAMK4, CD247, IFITM1, TRAC, TRBC2, PITPNC1, PRKCH
## RPS27, PCED1B-AS1, LCK, INPP4B, IL7R, LEPROTL1, ZNF331, TXK, MAML2, CD96
## RPS29, RNF125, PDE3B, RPL21, PRKCA, ABLIM1, TCF7, RORA, RPL13, RPL34
## PC_ 5
## Positive: HBA1, HBB, HBA2, SLC25A37, BLVRB, PRDX2, HMBS, TOP2A, ARL6IP1, PTTG1
## CENPF, FECH, JAZF1, MGST3, UBE2C, HMGB2, BACH2, JUNB, CCNB1, CCNB2
## TUBB4B, UROD, CENPE, SLC25A39, SLC2A1, HMMR, FAM210B, CDC20, STOM, NUSAP1
## Negative: GAPDH, RPS2, EEF1A1, RPS18, RPLP1, DNTT, RPL10, NPM1, RPLP0, RPS3A
## RPL39, RPS12, MIF, RPL7, RPL36A, HNRNPA1, RPL3, IGLL1, UHRF1, RPS6
## RPS19, STMN1, HSP90AB1, ENO1, NME2, FABP5, RPL13, PTMA, ACTG1, RPL41
seurat_object_sample <- RunUMAP(seurat_object_sample,
reduction = "pca",
dims = 1:15)
## 19:13:41 UMAP embedding parameters a = 0.9922 b = 1.112
## 19:13:41 Read 5500 rows and found 15 numeric columns
## 19:13:41 Using Annoy for neighbor search, n_neighbors = 30
## 19:13:41 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 19:13:41 Writing NN index file to temp file /var/folders/9s/hsfl94x96vn2g543r67q4vpn6rmr26/T//RtmpUfmyXk/file1654f56c18462
## 19:13:41 Searching Annoy index using 1 thread, search_k = 3000
## 19:13:43 Annoy recall = 100%
## 19:13:44 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 19:13:45 Initializing from normalized Laplacian + noise (using RSpectra)
## 19:13:45 Commencing optimization for 500 epochs, with 221882 positive edges
## 19:13:45 Using rng type: pcg
## 19:13:53 Optimization finished
uncorrected_plot_sample <- DimPlot(seurat_object_sample,
reduction = "umap",
group.by = "SampleName") +
ggtitle("Uncorrected data - Sample")
seurat_object_sample <- FindNeighbors(seurat_object_sample,
reduction = "pca",
dims = 1:15)
## Computing nearest neighbor graph
## Computing SNN
seurat_object_sample <- FindClusters(seurat_object_sample,
cluster.name = "uncorrected_clusters")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 5500
## Number of edges: 181101
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8754
## Number of communities: 19
## Elapsed time: 0 seconds
uncorrected_clusters_sample_plot <- DimPlot(seurat_object, reduction = "umap",
group.by ="uncorrected_clusters") + ggtitle("Uncorrected data clusters - Sample")
uncorrected_clusters_sample_plot
# Run Harmony integration
harmony_object_sample <- IntegrateLayers(object = seurat_object_sample,
method = HarmonyIntegration,
orig.reduction = "pca",
new.reduction = "harmony_sample")
## Transposing data matrix
## Using automatic lambda estimation
## Initializing state using k-means centroids initialization
## Harmony 1/10
## Harmony 2/10
## Harmony 3/10
## Harmony 4/10
## Harmony 5/10
## Harmony 6/10
## Harmony converged after 6 iterations
harmony_object_sample <- RunUMAP(harmony_object_sample,
reduction = "harmony_sample",
dims = 1:15)
## 19:14:12 UMAP embedding parameters a = 0.9922 b = 1.112
## 19:14:12 Read 5500 rows and found 15 numeric columns
## 19:14:12 Using Annoy for neighbor search, n_neighbors = 30
## 19:14:12 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 19:14:13 Writing NN index file to temp file /var/folders/9s/hsfl94x96vn2g543r67q4vpn6rmr26/T//RtmpUfmyXk/file1654f3296b02e
## 19:14:13 Searching Annoy index using 1 thread, search_k = 3000
## 19:14:15 Annoy recall = 100%
## 19:14:18 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 19:14:20 Initializing from normalized Laplacian + noise (using RSpectra)
## 19:14:20 Commencing optimization for 500 epochs, with 232812 positive edges
## 19:14:20 Using rng type: pcg
## 19:14:28 Optimization finished
harmony_object_sample_plot <- DimPlot(harmony_object_sample,
reduction = "umap",
group.by = "SampleName") +
ggtitle("Harmony integrated data - Sample")
uncorrected_plot_sample + harmony_object_sample_plot
harmony_object_sample <- FindNeighbors(harmony_object_sample,
reduction = "harmony_sample",
dims = 1:15)
## Computing nearest neighbor graph
## Computing SNN
harmony_object_sample <- FindClusters(harmony_object_sample,
cluster.name = "harmony_sample_clusters")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 5500
## Number of edges: 207415
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8744
## Number of communities: 14
## Elapsed time: 0 seconds
harmony_object_sample_clusters_plot <- DimPlot(harmony_object_sample, reduction = "umap",
group.by ="harmony_sample_clusters") + ggtitle("Harmony integrated data clusters - Sample")
uncorrected_clusters_sample_plot + harmony_object_sample_clusters_plot
The theta parameter controls the diversity of datasets
within clusters, with higher values leading to more diverse clusters. If
we have a strong batch effect we may want to increase this value to
encourage more mixing of the batches within clusters. However, if we
have a weaker batch effect or if we are concerned about over-correction
we may want to decrease this value to allow for more batch-specific
clusters. In our data we know we should have some celltypes that are
batch specific so we may want to try a lower value.
# Run Harmony integration with adjusted theta
harmony_object_theta <- IntegrateLayers(object = seurat_object,
method = HarmonyIntegration,
orig.reduction = "pca",
new.reduction = "harmony_theta_01",
theta = 0.1)
## Transposing data matrix
## Using automatic lambda estimation
## Initializing state using k-means centroids initialization
## Harmony 1/10
## Harmony 2/10
## Harmony 3/10
## Harmony converged after 3 iterations
harmony_object_theta <- RunUMAP(harmony_object_theta,
reduction = "harmony_theta_01",
dims = 1:15)
## 19:14:39 UMAP embedding parameters a = 0.9922 b = 1.112
## 19:14:39 Read 5500 rows and found 15 numeric columns
## 19:14:39 Using Annoy for neighbor search, n_neighbors = 30
## 19:14:39 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 19:14:39 Writing NN index file to temp file /var/folders/9s/hsfl94x96vn2g543r67q4vpn6rmr26/T//RtmpUfmyXk/file1654f7735dae0
## 19:14:39 Searching Annoy index using 1 thread, search_k = 3000
## 19:14:41 Annoy recall = 100%
## 19:14:41 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 19:14:43 Initializing from normalized Laplacian + noise (using RSpectra)
## 19:14:43 Commencing optimization for 500 epochs, with 223392 positive edges
## 19:14:43 Using rng type: pcg
## 19:14:50 Optimization finished
harmony_object_theta_plot <- DimPlot(harmony_object_theta,
reduction = "umap",
group.by = "SampleName") +
ggtitle("Harmony integrated data with theta = 0.1")
harmony_object_theta <- FindNeighbors(harmony_object_theta,
reduction = "harmony_theta_01",
dims = 1:15)
## Computing nearest neighbor graph
## Computing SNN
harmony_object_theta <- FindClusters(harmony_object_theta,
cluster.name = "harmony_theta_01_clusters")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 5500
## Number of edges: 186686
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8900
## Number of communities: 18
## Elapsed time: 0 seconds
harmony_object_theta_clusters_plot <- DimPlot(harmony_object_theta,
reduction = "umap",
group.by ="harmony_theta_01_clusters") +
ggtitle("Harmony integrated data clusters with theta = 0.1")
harmony_object_theta_plot + harmony_object_theta_clusters_plot
We can visually assess whether the correction has worked by looking at the UMAP plots like the above to see if the cells are grouped together more by cluster rather than by sample, this is an indication that the batch effect has been removed. If we have enough cells we can also look at the distribution of samples within clusters to see if they are more mixed after integration. With our downsampled dataset this is difficult.
We can also make a barplots to check if the clusters have mixed sufficiently.
#Make a barplot of the clusters and samples
data.frame(Cluster = seurat_object$uncorrected_clusters, Sample = seurat_object$SampleName) %>%
ggplot(aes(x = Cluster)) +
geom_bar(aes(fill = Sample), position = "fill") +
labs(title = "Uncorrected data") +
scale_y_continuous(labels = scales::percent)
#Make a barplot of the clusters and samples
data.frame(Cluster = harmony_object_theta$harmony_theta_01_clusters, Sample = harmony_object_theta$SampleName) %>%
ggplot(aes(x = Cluster)) +
geom_bar(aes(fill = Sample), position = "fill") +
labs(title = "Harmony Corrected data") +
scale_y_continuous(labels = scales::percent)
In the corrected barplot we have less clusters and the clusters are more mixed across samples, suggesting that the batch effect has been removed. However, we can still see some clusters that are mostly made up of cells from a single sample. Biologically this is not unexpected so this should reassure us that the correction has done what it should.
From our plots we can see that the cells have been more integrated, however we must also ensure that in doing this we have not removed genuine biological heterogeneity.
We can do this visually by looking at markers of celltypes. For example if cells of different types merge together after integration this may be an indication that the integration has removed some of the biological signal. We can also look at the expression of known marker genes for different cell types to see if they are still expressed in the expected clusters after integration.
# CD79A (B cells)
# CST3 (monocytes)
# CD3D (T cells)
# HBA1 (erythrocytes)
FeaturePlot(seurat_object, features = c("CD79A", "CST3", "CD3D", "HBA1"),
reduction = "umap")
FeaturePlot(harmony_object_theta, features = c("CD79A", "CST3", "CD3D", "HBA1"),
reduction = "umap")
We can see that for all of these celltype markers the expression is still in the expected clusters after integration, suggesting that the integration has not removed too much of the biological signal.
Another method is to use the Adjusted Rand Index. We can
use the adjusted Rand index to quantify the agreement between the
clusterings before and after batch correction.
The adjusted Rand index gives us a measure of the proportion of pairs of cells that have the same relative status in both clusterings - i.e. they were are in the same cluster in both clusterings or they are in different clusters in both clusterings, as oppose to e.g. being in the same cluster in one clustering and different clusters in the other.
Larger indices are more desirable as this indicates that within-batch heterogeneity is preserved, though this must be balanced against the ability of each method to actually perform batch correction.
We can use the pairwiseRand function from the
bluster package. The input is a list of clusterings. We can
test all three of our integrated clusterings against their uncorrected
clustering to see how much they differ.
library(bluster)
# Calculate the adjusted Rand index
# Input is a vector of the clusters
ari.SG <- pairwiseRand(seurat_object$uncorrected_clusters,
harmony_object$harmony_clusters,
mode = "index")
ari.SG
## [1] 0.3776693
ari.SN <- pairwiseRand(seurat_object_sample$uncorrected_clusters,
harmony_object_sample$harmony_sample_clusters,
mode = "index")
ari.SN
## [1] 0.3106236
ari.T <- pairwiseRand(seurat_object$uncorrected_clusters,
harmony_object_theta$harmony_theta_01_clusters,
mode = "index")
ari.T
## [1] 0.5711085
We can see that integrating on samples a gives us a worse Adjusted
Rand Index compared to sample group and decreasing the
theta parameter gives us a better Adjusted Rand Index,
suggesting that the integration with theta = 0.1 has
preserved more of the within batch heterogeneity than the other two
integrations. An ARI of 0.59 is still less that ideal and so we may want
to try other parameter changes or algorithms other than Harmony.
Once we are happy with the integration we can rejoin our dataset for the downstream analyis. We must make sure we specify the correct assay to rejoin, in this case the “RNA” assay. The current active assay is “SCT” but the “RNA” assay contains the original, uncorrected data that we want to use for downstream analyses and is what needs to be rejoined.
# Rejoin the data
harmony_object_theta <- JoinLayers(harmony_object_theta, assay = "RNA")
A warning: Do not use the batch corrected values for gene based downstream analyses like differential expression. Instead, you should use the original, uncorrected data (in the “RNA or”SCT” layers) for gene-based analyses and use the batch-corrected data for visualisation and clustering.
Harmony is often faster for large datasets, whereas MNN is often preferred when strong, non-linear batch effects are present.
sessionInfo()
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.7.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/London
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bluster_1.20.0 future_1.69.0 patchwork_1.3.2 lubridate_1.9.4
## [5] forcats_1.0.1 stringr_1.6.0 dplyr_1.2.0 purrr_1.2.1
## [9] readr_2.1.6 tidyr_1.3.2 tibble_3.3.1 ggplot2_4.0.2
## [13] tidyverse_2.0.0 glmGamPoi_1.22.0 sctransform_0.4.3 Seurat_5.4.0
## [17] SeuratObject_5.3.0 sp_2.2-0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.18.0
## [3] jsonlite_2.0.0 magrittr_2.0.4
## [5] spatstat.utils_3.2-1 farver_2.1.2
## [7] rmarkdown_2.30 vctrs_0.7.1
## [9] ROCR_1.0-12 DelayedMatrixStats_1.32.0
## [11] spatstat.explore_3.7-0 htmltools_0.5.9
## [13] S4Arrays_1.10.1 BiocNeighbors_2.4.0
## [15] SparseArray_1.10.8 sass_0.4.10
## [17] parallelly_1.46.1 KernSmooth_2.23-26
## [19] bslib_0.10.0 htmlwidgets_1.6.4
## [21] ica_1.0-3 plyr_1.8.9
## [23] plotly_4.12.0 zoo_1.8-15
## [25] cachem_1.1.0 igraph_2.2.1
## [27] mime_0.13 lifecycle_1.0.5
## [29] pkgconfig_2.0.3 Matrix_1.7-4
## [31] R6_2.6.1 fastmap_1.2.0
## [33] MatrixGenerics_1.22.0 fitdistrplus_1.2-6
## [35] shiny_1.12.1 digest_0.6.39
## [37] S4Vectors_0.48.0 tensor_1.5.1
## [39] RSpectra_0.16-2 irlba_2.3.7
## [41] GenomicRanges_1.62.1 beachmat_2.26.0
## [43] labeling_0.4.3 progressr_0.18.0
## [45] timechange_0.4.0 spatstat.sparse_3.1-0
## [47] httr_1.4.7 polyclip_1.10-7
## [49] abind_1.4-8 compiler_4.5.2
## [51] withr_3.0.2 S7_0.2.1
## [53] BiocParallel_1.44.0 fastDummies_1.7.5
## [55] MASS_7.3-65 DelayedArray_0.36.0
## [57] tools_4.5.2 lmtest_0.9-40
## [59] otel_0.2.0 httpuv_1.6.16
## [61] future.apply_1.20.1 goftest_1.2-3
## [63] glue_1.8.0 nlme_3.1-168
## [65] promises_1.5.0 grid_4.5.2
## [67] Rtsne_0.17 cluster_2.1.8.1
## [69] reshape2_1.4.5 generics_0.1.4
## [71] gtable_0.3.6 spatstat.data_3.1-9
## [73] tzdb_0.5.0 hms_1.1.4
## [75] data.table_1.18.2.1 XVector_0.50.0
## [77] BiocGenerics_0.56.0 spatstat.geom_3.7-0
## [79] RcppAnnoy_0.0.23 ggrepel_0.9.6
## [81] RANN_2.6.2 pillar_1.11.1
## [83] spam_2.11-3 RcppHNSW_0.6.0
## [85] later_1.4.5 splines_4.5.2
## [87] lattice_0.22-7 survival_3.8-6
## [89] deldir_2.0-4 tidyselect_1.2.1
## [91] miniUI_0.1.2 pbapply_1.7-4
## [93] knitr_1.51 gridExtra_2.3
## [95] IRanges_2.44.0 Seqinfo_1.0.0
## [97] SummarizedExperiment_1.40.0 scattermore_1.2
## [99] RhpcBLASctl_0.23-42 stats4_4.5.2
## [101] xfun_0.56 Biobase_2.70.0
## [103] matrixStats_1.5.0 stringi_1.8.7
## [105] lazyeval_0.2.2 yaml_2.3.12
## [107] evaluate_1.0.5 codetools_0.2-20
## [109] cli_3.6.5 uwot_0.2.4
## [111] xtable_1.8-4 reticulate_1.44.1
## [113] jquerylib_0.1.4 harmony_1.2.4
## [115] dichromat_2.0-0.1 Rcpp_1.1.1
## [117] globals_0.19.0 spatstat.random_3.4-4
## [119] png_0.1-8 spatstat.univar_3.1-6
## [121] parallel_4.5.2 dotCall64_1.2
## [123] sparseMatrixStats_1.22.0 listenv_0.10.0
## [125] viridisLite_0.4.2 scales_1.4.0
## [127] ggridges_0.5.7 rlang_1.1.7
## [129] cowplot_1.2.0