10 Differential expression analysis

When we have data for multiple conditions (e.g. control vs treatment, healthy vs diseased, wild-type vs mutant, etc.) and multiple biological replicates for each of them, we can investigate which genes are differentially expressed between conditions. This is similar to what is done in more conventional bulk RNA-seq analysis, but in this case we have the added benefit of being able to do our analysis focusing on particular cell clusters/types (which would not be possible to do in a bulk experiment, unless we had done cell-sorting before the sequencing).

Differential analyses of multi-condition scRNA-seq experiments can be broadly split into two categories:

differential expression (DE) tests for changes in expression between conditions for cells of the same ‘type’ that are present in both conditions.
differential abundance (DA) tests for changes in the composition of cell types (or states, etc.) between conditions.

In this section we will cover the first of these, i.e. how to do a differential expression analysis for a simple design with a single factor (variable) with two levels (conditions). In our case this corresponds to the comparison between cells from healthy (PBMMC) versus leukemia (ETV6-RUNX1) donors.

The standard method for doing this in Seurat is to reuse the FindMarkers function, which we used earlier to find cluster markers genes. However, instead of comparing clusters, they compare conditions by inputing the groups to be compared into the ident.1 and ident.2 arguments.

The main problem with this approach is that it treats cells as replicates and ignores the fact that they are nested within samples (i.e. donors). The pseudoreplicates are not independent from one another and so this approach can lead to an inflated false positive rate (i.e. it will identify many genes as differentially expressed when they are not). Sometimes this is called pvalue inflation.

In order to avoid this problem, we can use a pseudo-bulk approach, which consists of aggregating the counts for all cells of the same type (e.g. cluster) and sample (e.g. donor) into a single ‘pseudo-bulk’ sample, and then running a bulk RNA-seq differential expression analysis on these pseudo-bulk samples. The methods designed for bulk RNA-seq data are appropriate for this type of analysis and we also avoid the previously discussed issue of sparsity of scRNA-seq data, since the pseudo-bulk samples will have much higher counts than individual cells.

There are three main steps for a bulk analysis:

Creating the pseudo-bulk samples
Filter low-count cells/samples
Run the differential analysis

10.1 Setup

We start by loading our packages and the single-cell object that will be used. We will focus our analysis on the set of PBMMC (healthy) and ETV6-RUNX1 (leukemia) samples. These data are the full dataset (rather than those downsamled to 500 cells per sample as we have used previously) and have gone through QC, normalised, batch-corrected and clustered as shown in the previous sections. We have clustered the cells using the Leiden algorithm with k=48, and did some basic cell type annotation based on known markers of blood/immune cells.

library(Seurat)
library(ggplot2)
library(tidyverse)
library(DESeq2)

# Load the data
seurat_object <- readRDS("RObjects/Annotated.full.ETV6.PBMMC.rds")
# plot umap
DimPlot(seurat_object,
        reduction = "umap",
        group.by = "Idents",
        label = TRUE,
        label.size = 5) +
  NoLegend()

10.2 Creating pseudo-bulk samples

Pseudo-bulk samples consist of summing the counts across all the cells for a given sample and cell Idents combination. It is up to you to decide what the cell labels should consist of. In our case, we’re using the Leiden clusters, which were further manually annotated according to cell types. But you may have defined these labels, for example, based on a more systematic cell type annotation using public atlases or reference gene panels (see Chapter 7 - Cell type annotation in the OSCA book).

To revise, here is how our sample + label combinations look like:

# tabulate number of cells per label + sample combination
table(seurat_object$Idents, seurat_object$SampleName)

##                     
##                      ETV6RUNX1-1 ETV6RUNX1-2 ETV6RUNX1-3 ETV6RUNX1-4 PBMMC-1
##   B (c1)                       6        4380          70          16     240
##   B (c12)                      6          76        1580          84       0
##   B (c16)                      3         105           7         850       1
##   B (c22)                      0           2           0           1       8
##   B (c23)                      0          23           8           5       4
##   B (c3)                    2183          14           3           5      28
##   B (c4)                       7          43          81        1438      44
##   B (c5)                     115         327         124         910      60
##   B (c6)                     316         495          61         264      70
##   B (c7)                       3         464          75          12       0
##   CD20+ B (c10)                7         113         464         141      56
##   Erythrocytes (c15)           0           5          80         909       0
##   Erythrocytes (c17)           0           0           3           6       0
##   Erythrocytes (c8)            4          17         259         898      49
##   Monocytes (c14)              5           6           3           9      95
##   Monocytes (c21)              1           3          15          36      17
##   NK T (c11)                  15          83         337         115      34
##   T (c13)                      7          41          49          28       0
##   T (c18)                      0           9          10           2       0
##   T (c19)                     11          61          24          18       1
##   T (c2)                      10         167        1701         228     104
##   T (c20)                      0           0           0           0      24
##   Unknown (c9)                16          99          58          16       0
##                     
##                      PBMMC-2 PBMMC-3
##   B (c1)                 175     596
##   B (c12)                 16      67
##   B (c16)                  8      14
##   B (c22)                 41      28
##   B (c23)                 49      29
##   B (c3)                  76      71
##   B (c4)                  65     181
##   B (c5)                  26     161
##   B (c6)                 106     177
##   B (c7)                   2       9
##   CD20+ B (c10)          448     709
##   Erythrocytes (c15)      25       3
##   Erythrocytes (c17)     935      41
##   Erythrocytes (c8)      941     218
##   Monocytes (c14)        273     420
##   Monocytes (c21)         80     157
##   NK T (c11)             433     314
##   T (c13)                  1       5
##   T (c18)                  1       1
##   T (c19)                 61      41
##   T (c2)                 999     750
##   T (c20)                 97     227
##   Unknown (c9)             4       0

We can see that the contribution of each sample to the different labels (clusters) is quite unbalanced for some of them. This may be biologically correct, or it may prompt us to revise our batch correction step. For now, we will proceed with these data as they are.

We use Idents and SampleName as the variables to aggregate by. We therefore will have a maximum of 7x23 (161) pseudo-samples. We will have less than this because, as we saw, there are some clusters with no cells from a particular sample.

10.3 Pseudo-bulk aggregation

# Aggregate counts by sample and celltype
pseudo_bulk <- AggregateExpression(seurat_object,
                                  group.by = c("Idents", "SampleName"),
                                  assays = "RNA",
                                  return.seurat = TRUE)
# check the new assay
pseudo_bulk

## An object of class Seurat 
## 33869 features across 142 samples within 1 assay 
## Active assay: RNA (33869 features, 0 variable features)
##  2 layers present: counts, data

We end up with 142 pseudo-bulk samples, which are the result of aggregating the counts for all cells of the same sample and cluster.

head(Cells(pseudo_bulk))

## [1] "B (c1)_ETV6RUNX1-1" "B (c1)_ETV6RUNX1-2" "B (c1)_ETV6RUNX1-3"
## [4] "B (c1)_ETV6RUNX1-4" "B (c1)_PBMMC-1"     "B (c1)_PBMMC-2"

It has named each cell by concatenating the sample name and the cluster name, so it is easy for us to tell which pseudosamples we are dealing with.

We can also have a look at the metadata it has retained.

head(pseudo_bulk@meta.data)

##                            orig.ident Idents  SampleName
## B (c1)_ETV6RUNX1-1 B (c1)_ETV6RUNX1-1 B (c1) ETV6RUNX1-1
## B (c1)_ETV6RUNX1-2 B (c1)_ETV6RUNX1-2 B (c1) ETV6RUNX1-2
## B (c1)_ETV6RUNX1-3 B (c1)_ETV6RUNX1-3 B (c1) ETV6RUNX1-3
## B (c1)_ETV6RUNX1-4 B (c1)_ETV6RUNX1-4 B (c1) ETV6RUNX1-4
## B (c1)_PBMMC-1         B (c1)_PBMMC-1 B (c1)     PBMMC-1
## B (c1)_PBMMC-2         B (c1)_PBMMC-2 B (c1)     PBMMC-2

We can see that although it has retained the SampleName and Idents variables, it has not retained the SampleGroup variable, which is the one we will need for our differential analysis.

Like we did in the first section, we can add this variable back to the metadata of the pseudo-bulk object by using the SampleName variable to match it to the original metadata.

# add SampleGroup variable to the metadata of the pseudo-bulk object
temp_metadata <- pseudo_bulk@meta.data %>%
  mutate(SampleGroup = str_remove(SampleName, "-.*")) %>%
  mutate(Cluster = Idents)
pseudo_bulk@meta.data <- temp_metadata
# check the new metadata
head(pseudo_bulk@meta.data)

##                            orig.ident Idents  SampleName SampleGroup Cluster
## B (c1)_ETV6RUNX1-1 B (c1)_ETV6RUNX1-1 B (c1) ETV6RUNX1-1   ETV6RUNX1  B (c1)
## B (c1)_ETV6RUNX1-2 B (c1)_ETV6RUNX1-2 B (c1) ETV6RUNX1-2   ETV6RUNX1  B (c1)
## B (c1)_ETV6RUNX1-3 B (c1)_ETV6RUNX1-3 B (c1) ETV6RUNX1-3   ETV6RUNX1  B (c1)
## B (c1)_ETV6RUNX1-4 B (c1)_ETV6RUNX1-4 B (c1) ETV6RUNX1-4   ETV6RUNX1  B (c1)
## B (c1)_PBMMC-1         B (c1)_PBMMC-1 B (c1)     PBMMC-1       PBMMC  B (c1)
## B (c1)_PBMMC-2         B (c1)_PBMMC-2 B (c1)     PBMMC-2       PBMMC  B (c1)

10.4 Filtering Pseudosamples

Before we run the differential expression analysis, it is important to filter out low-count samples, as these can lead to unreliable results. For example if a pseudosample was created from only 5 cells and another pseudosample was created from 1000 cells, the counts in the first pseudosample will be much lower than those in the second pseudosample, and this can lead to a bias in the differential expression analysis. Looking back at our table of the number of cells per sample and cluster combination, we can see that there are some combinations with very few cells (e.g. cluster 7 has only 3 cell from sample ETV6RUNX1-1). We will filter out any pseudosample that was created from less than 20 cells.

# filter out pseudosamples with less than 20 cells
# determine which pseudosamples have at least than 20 cells
ps_keep <- table(seurat_object$Idents, seurat_object$SampleName) %>%
  as.data.frame() %>%
  filter(Freq >= 20) %>%
  mutate(PseudoSample = str_c(Var1, "_", Var2)) %>%
  pull(PseudoSample)
# subset our pseudo-bulk object to keep only the pseudosamples with at least 20 cells
pseudo_bulk <- subset(pseudo_bulk, cells = ps_keep)

We should also filter out the very lowly detected genes as they can appear more significant that they are due to the low counts and the fact that they are more likely to be detected in one group than the other just by chance. We will filter out any rows in our counts matrix that have less than 5 counts across all pseudosamples.

# filter out genes with less than 5 counts across all pseudosamples
# determine which genes have at least 5 counts across all pseudosamples
genes_keep <- rowSums(pseudo_bulk@assays$RNA$counts) >= 5
# subset our pseudo-bulk object to keep only the genes with at least 5 counts across
pseudo_bulk <- subset(pseudo_bulk, features = names(genes_keep)[genes_keep])

10.5 Differential Testing of the whole dataset

Now that we have our pseudobulk samples we can proceed with our analyis. We could use standard bulk RNA-seq R/Bioconductor packages, such as edgeR, DESeq2 or limma. However the FindMarkers() function in Seurat also supports testing for differential expression using the pseudobulk samples, and it is a convenient way to do this without having to switch to a different package. Instead of using the Wilcoxon rank sum test, which is the default method for testing differential expression in FindMarkers(), we can specify test.use = "DESeq2" to use the DESeq2 method for testing differential expression on the pseudobulk samples.

For this to work we need the Idents of our pseudo-bulk object to be set to the variable that contains the sample group information (i.e. SampleGroup), so that FindMarkers() knows which groups to compare. We can do this by using the Idents() function to set the Idents of our pseudo-bulk object to the SampleGroup variable in the metadata.

Idents(pseudo_bulk) <- pseudo_bulk$SampleGroup

# Run DESeq2 on pseudobulk samples
de <- FindMarkers(pseudo_bulk,
                  ident.1 = "ETV6RUNX1",
                  ident.2 = "PBMMC",
                  test.use = "DESeq2")

## converting counts to integer mode

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

head(de)

##                         p_val avg_log2FC pct.1 pct.2     p_val_adj
## LINC03000       3.738789e-111   6.936124 0.977 0.511 1.044244e-106
## ENSG00000254951  3.174132e-80   5.949381 0.977 0.267  8.865350e-76
## TERF2            1.260590e-77   4.425474 1.000 0.956  3.520828e-73
## ENSG00000286393  1.088189e-76   5.003968 0.977 0.044  3.039312e-72
## KCNN1            1.527405e-76   5.263270 0.977 0.556  4.266042e-72
## ENSG00000285534  1.600099e-76   6.271876 1.000 0.444  4.469077e-72

The results output we get from this analysis is a summarised version of what we would get had we run the same analysis directly with DESeq2. The columns are as follows:

p_val: p-value of the test
avg_log2FC: average log2 fold change between the two groups
pct.1: percentage of pseudobulk samples in group 1 (ETV RUNX1) that have non-zero counts for the gene
pct.2: percentage of pseudobulk samples in group 2 (PBMMC ) that have non-zero counts for the gene
p_val_adj: adjusted p-value for multiple testing

The default statistical test in DeSeq2 is the Wald test, which tests whether the log2 fold change between the two groups is significantly different from zero and the default multiplicity testing correction is a Benjamini-Hochberg correction, which is a false discovery rate (FDR).

10.6 Differential Testing within clusters

This DE analysis is the same as doing a bulk experiment but what we really want to know is how the different sample groups (ETV6-RUNX1 vs PBMMC) differ in terms of their gene expression within each cluster (or celltype). We should also bear in mind that due to the pseudosamples we have after filtering, not all comparisons will be possible for all clusters. For example, cluster 7 has 2 pseudosamples from the ETV6-RUNX1 group and no pseudosamples from the PBMMC group, so we cannot do a DE analysis for this cluster.

In order to do this we need to add more detail to the Idents of our pseudo-bulk object, so that it contains both the sample group and the cluster information. We can do this by concatenating the SampleGroup and Cluster variables in the metadata of our pseudo-bulk object.

# add cluster information to the Idents of our pseudo-bulk object
pseudo_bulk$Cluster_SampleGroup <- str_c(pseudo_bulk$Cluster,
                                         pseudo_bulk$SampleGroup, sep = "_")
Idents(pseudo_bulk) <- pseudo_bulk$Cluster_SampleGroup

Now lets try to look for differentially expressed genes between the ETV6RUNX1 and PBMMC groups within our B cell cluster 6.

The simplicity of the FindMarkers function means it is very easy to run but as with any differential expression analysis, it is important to carefully consider the design of the experiment and the assumptions of the statistical test being used. There are some plots we can draw to reassure ourselves that the assumptions of the test are being met, for example we can draw a PCA plot of our pseudo-bulk samples to see if they cluster by sample group and if there are any outliers that may be driving the results.

# PCA plot of pseudo-bulk samples
c6 <- subset(pseudo_bulk, subset = Cluster == "B (c6)")
c6_counts <-  GetAssayData(c6, assay = "RNA", layer = "counts")
pca <- prcomp(t(c6_counts))
pca_df <- data.frame(pca$x, SampleGroup = c6$SampleGroup)
ggplot(pca_df, aes(x = PC1, y = PC2, color = SampleGroup)) +
  geom_point() +
  theme_classic()

We can run the comparision by specifying the ident.1 and ident.2 arguments in the FindMarkers() function to be the corresponding cluster and sample group combinations.

# Run DESeq2 on pseudobulk samples for cluster 6
de_cluster6 <- FindMarkers(pseudo_bulk,
                           ident.1 = "B (c6)_ETV6RUNX1",
                           ident.2 = "B (c6)_PBMMC",
                           test.use = "DESeq2")

## converting counts to integer mode

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

head(de_cluster6)

##                        p_val avg_log2FC pct.1 pct.2    p_val_adj
## ACSM3           6.300880e-59  -4.080780  0.75     1 1.759836e-54
## MDK             2.159060e-48   3.940974  1.00     1 6.030253e-44
## USP9Y           4.561823e-43  -2.404333  1.00     1 1.274117e-38
## AFF3            3.331994e-42  -3.710056  1.00     1 9.306259e-38
## ENSG00000229839 1.551937e-37   3.198410  1.00     1 4.334561e-33
## ENSG00000273118 2.264600e-37  -2.261956  0.75     1 6.325027e-33

The output dataframe has all the information for all genes tested.

We can draw a MA plot to see if the variance of our data is consistent with the assumptions of the DESeq2 method. An MA plot should have a “trumpet” like shape.

# MA plot of DE results for cluster 6
# calculate mean expression for each gene
mean_expression <- rowMeans(c6_counts)
# add mean expression to DE results
de_cluster6$mean_expression <- mean_expression[rownames(de_cluster6)]
# plot MA plot
ggplot(de_cluster6, aes(x = log2(mean_expression), y = avg_log2FC)) +
  geom_point() +
  theme_classic() +
  xlab("Log2 mean expression") +
  ylab("Log2 fold change") +
  ggtitle("MA plot for cluster 6")

We can also check the pvalue histogram to get an idea of how the dataset behaved in the analysis. If the p-value histogram is uniform (i.e. flat) it suggests that there are no differentially expressed genes between the two groups, while if there is a peak near zero it suggests that there are many differentially expressed genes between the two groups. This is a useful blog article which goes into more detail.

http://varianceexplained.org/statistics/interpreting-pvalue-histogram/

# p-value histogram for DE results for cluster 6
hist(de_cluster6$p_val, breaks = 50, 
     main = "P-value histogram for cluster 6", 
     xlab = "P-value")

The extra distrete peak/s we see are indicative of sparse data, which is common in scRNA-seq data. Since we have already pseudobulked this is like due to not having harsh enough filters on the number of cells per pseudosample or the number of counts per gene.

If we are satisfied with these plots we can have a look at our differentially expressed genes. The avg_log2FC column gives us the log2 fold change between the two groups, so a value of 1 means that the gene is expressed 2 times higher in the ETV6-RUNX1 group compared to the PBMMC group, while a value of -1 means that the gene is expressed 2 times lower in the ETV6-RUNX1 group compared to the PBMMC group. The p_val_adj column gives us the adjusted p-value for multiple testing, so we can use this to determine which genes are significantly differentially expressed between the two groups.

We can filter this dataframe to keep only the genes that are significantly differentially expressed between the two groups, for example by keeping only the genes with an adjusted p-value less than 0.05 and an absolute log2 fold change greater than 1.

# filter the DE results for cluster 6
de_cluster6_sig <- de_cluster6 %>%
  filter(p_val_adj < 0.05 & abs(avg_log2FC) >1)
dim(de_cluster6_sig)

## [1] 429   6

This gives us 429 significantly differentially expressed genes between the ETV6-RUNX1 and PBMMC groups within the B cell cluster 6 with a log fold change greater than 1.

We can draw some expression plots of the one of the top differentially expressed genes to see how it differs across the clusters.

# expression plots for one of the top differentially expressed genes
top_gene <- "MDK"
# counts for the top gene
top_gene_counts <- GetAssayData(pseudo_bulk, 
                                assay = "RNA", 
                                layer = "counts")[top_gene, ]
# plot expression of the top gene across the clusters and sample groups
top_gene_df <- data.frame(Expression = top_gene_counts,
                          Cluster = pseudo_bulk$Cluster,
                          SampleGroup = pseudo_bulk$SampleGroup,
                          SampleName = pseudo_bulk$SampleName)
ggplot(top_gene_df, aes(x = SampleName, y = Expression, color = SampleGroup
)) +
  geom_point() +
  xlab("Cluster") +
  ylab("Expression of MDK") +
  ggtitle("Expression of MDK across clusters and sample groups") +
  facet_wrap(~ Cluster) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

We can also see the expression of this gene on the UMAP plot to see how it differs across the sample groups in the single cell level data.

FeaturePlot(seurat_object, features = top_gene, split.by = "SampleGroup")

10.7 Exercise

Repeat the differential expression analysis for the CD20+ B cell cluster and see how many significantly differentially expressed genes you find between the ETV6-RUNX1 and PBMMC groups with a log fold change greater than 1. Plot one of the differentially expressed gene across the clusters and sample groups to see how it differs across the clusters.

Answer

# Run DESeq2 on pseudobulk samples for cluster 10
de_cluster10 <- FindMarkers(pseudo_bulk,
                           ident.1 = "CD20+ B (c10)_ETV6RUNX1",
                           ident.2 = "CD20+ B (c10)_PBMMC",
                           test.use = "DESeq2")

## converting counts to integer mode

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

# filter the DE results for cluster 10
de_cluster10_sig <- de_cluster10 %>%
  filter(p_val_adj < 0.05 & abs(avg_log2FC) >1)
dim(de_cluster10_sig)

## [1] 13  5

# plot the top differentially expressed gene for cluster 10
top_gene <- rownames(de_cluster10_sig)[1]
top_gene_counts <- GetAssayData(pseudo_bulk, 
                                assay = "RNA", 
                                layer = "counts")[top_gene, ]
# plot expression of the top gene across the clusters and sample groups
top_gene_df <- data.frame(Expression = top_gene_counts,
                          Cluster = pseudo_bulk$Cluster,
                          SampleGroup = pseudo_bulk$SampleGroup,
                          SampleName = pseudo_bulk$SampleName)
ggplot(top_gene_df, aes(x = SampleName, y = Expression, color = SampleGroup
)) +
  geom_point() +
  xlab("Cluster") +
  ylab("Expression of SOCS2") +
  ggtitle("Expression of SOCS2 across clusters and sample groups") +
  facet_wrap(~ Cluster) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Although the changes in the 13 genes we have filtered are significant in our cluster, they are changing more in other clusters. For example, the top differentially expressed gene in cluster 10 is SOCS2, which is also significantly differentially expressed in cluster 6 but with a higher log fold change. This is not unsurprising as cluster 6 and cluster 10 are both B cell clusters, so we would expect them to share some of the same differentially expressed genes between the two groups.

10.8 Other possible testing methods

Seurat’s FindMarkers() function also supports other methods for testing differential expression, which can be specified using the test.use parameter:

“wilcox” : Wilcoxon rank sum test (default, using ‘presto’ package) “wilcox_limma” : Wilcoxon rank sum test (using ‘limma’ package) “bimod” : Likelihood-ratio test for single cell feature expression, (McDavid et al., Bioinformatics, 2013) “roc” : Standard AUC classifier “t” : Student’s t-test “poisson” : Likelihood ratio test assuming an underlying negative binomial distribution. Use only for UMI-based datasets “negbinom” : Likelihood ratio test assuming an underlying negative binomial distribution. Use only for UMI-based datasets “LR” : Uses a logistic regression framework to determine differentially expressed genes. Constructs a logistic regression model predicting group membership based on each feature individually and compares this to a null model with a likelihood ratio test. “MAST” : GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology, 2015) (Installation instructions) “DESeq2” : DE based on a model using the negative binomial distribution (Love et al, Genome Biology, 2014) (Installation instructions) For MAST and DESeq2, please ensure that these packages are installed separately in order to use them as part of Seurat. Once installed, use the test.use parameter can be used to specify which DE test to use.

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.7.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] DESeq2_1.50.2               SummarizedExperiment_1.40.0
##  [3] Biobase_2.70.0              MatrixGenerics_1.22.0      
##  [5] matrixStats_1.5.0           GenomicRanges_1.62.1       
##  [7] Seqinfo_1.0.0               IRanges_2.44.0             
##  [9] S4Vectors_0.48.0            BiocGenerics_0.56.0        
## [11] generics_0.1.4              lubridate_1.9.5            
## [13] forcats_1.0.1               stringr_1.6.0              
## [15] dplyr_1.2.0                 purrr_1.2.1                
## [17] readr_2.1.6                 tidyr_1.3.2                
## [19] tibble_3.3.1                tidyverse_2.0.0            
## [21] ggplot2_4.0.2               Seurat_5.4.0               
## [23] SeuratObject_5.3.0          sp_2.2-1                   
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     rstudioapi_0.18.0      jsonlite_2.0.0        
##   [4] magrittr_2.0.4         spatstat.utils_3.2-1   farver_2.1.2          
##   [7] rmarkdown_2.30         vctrs_0.7.1            ROCR_1.0-12           
##  [10] spatstat.explore_3.7-0 S4Arrays_1.10.1        htmltools_0.5.9       
##  [13] SparseArray_1.10.8     sass_0.4.10            sctransform_0.4.3     
##  [16] parallelly_1.46.1      KernSmooth_2.23-26     bslib_0.10.0          
##  [19] htmlwidgets_1.6.4      ica_1.0-3              plyr_1.8.9            
##  [22] plotly_4.12.0          zoo_1.8-15             cachem_1.1.0          
##  [25] igraph_2.2.2           mime_0.13              lifecycle_1.0.5       
##  [28] pkgconfig_2.0.3        Matrix_1.7-4           R6_2.6.1              
##  [31] fastmap_1.2.0          fitdistrplus_1.2-6     future_1.69.0         
##  [34] shiny_1.12.1           digest_0.6.39          patchwork_1.3.2       
##  [37] tensor_1.5.1           RSpectra_0.16-2        irlba_2.3.7           
##  [40] labeling_0.4.3         progressr_0.18.0       spatstat.sparse_3.1-0 
##  [43] timechange_0.4.0       httr_1.4.8             polyclip_1.10-7       
##  [46] abind_1.4-8            compiler_4.5.2         withr_3.0.2           
##  [49] S7_0.2.1               BiocParallel_1.44.0    fastDummies_1.7.5     
##  [52] MASS_7.3-65            DelayedArray_0.36.0    tools_4.5.2           
##  [55] lmtest_0.9-40          otel_0.2.0             httpuv_1.6.16         
##  [58] future.apply_1.20.1    goftest_1.2-3          glue_1.8.0            
##  [61] nlme_3.1-168           promises_1.5.0         grid_4.5.2            
##  [64] Rtsne_0.17             cluster_2.1.8.2        reshape2_1.4.5        
##  [67] gtable_0.3.6           spatstat.data_3.1-9    tzdb_0.5.0            
##  [70] data.table_1.18.2.1    hms_1.1.4              XVector_0.50.0        
##  [73] spatstat.geom_3.7-0    RcppAnnoy_0.0.23       ggrepel_0.9.6         
##  [76] RANN_2.6.2             pillar_1.11.1          spam_2.11-3           
##  [79] RcppHNSW_0.6.0         later_1.4.6            splines_4.5.2         
##  [82] lattice_0.22-9         survival_3.8-6         deldir_2.0-4          
##  [85] tidyselect_1.2.1       locfit_1.5-9.12        miniUI_0.1.2          
##  [88] pbapply_1.7-4          knitr_1.51             gridExtra_2.3         
##  [91] scattermore_1.2        xfun_0.56              stringi_1.8.7         
##  [94] lazyeval_0.2.2         yaml_2.3.12            evaluate_1.0.5        
##  [97] codetools_0.2-20       cli_3.6.5              uwot_0.2.4            
## [100] xtable_1.8-4           reticulate_1.45.0      jquerylib_0.1.4       
## [103] dichromat_2.0-0.1      Rcpp_1.1.1             globals_0.19.0        
## [106] spatstat.random_3.4-4  png_0.1-8              spatstat.univar_3.1-6 
## [109] parallel_4.5.2         dotCall64_1.2          listenv_0.10.0        
## [112] viridisLite_0.4.3      scales_1.4.0           ggridges_0.5.7        
## [115] rlang_1.1.7            cowplot_1.2.0

Introduction to single-cell RNA-seq analysis

Differential expression analysis with Seurat