1 Introduction

In order to aid the interpretation of the clustering results that we covered in the previous section, it is helpful to identify genes that contribute to the separation of cells into those clusters.

The main approach to achieve this, is to identify genes that are differently expressed between clusters. These may be, for example, exclusively expressed in a single cluster or perhaps differentiate between a few different clusters. There are different methods to identify expression differences between clusters: using mean expression level, or the ranking of the gene by its expression, or the proportions of cells that express the gene.

Our main objective in this section is to cover some of the methods that can be used to achieve this goal, and obtain a summary table of results. As always, the OSCA chapter on marker detection contains additional detail and advice.

2 Setup

Before we start, let’s load our packages and read our data in.

# Load packages ----
library(scater)
library(scran)
library(pheatmap)
library(tidyverse) # always load tidyverse after other packages

We will load two SingleCellExperiment objects generated in previous sections:

The SCE with logcounts normalised within each batch generated in the Batch Correction section.
The SCE with batch-corrected data integration using the batchelor::fastMNN() algorithm, including clusters covered in the Clustering section.

# Read data ----

# read object (continued from the previous section)
sce <- readRDS("Robjects/Caron_clustering_material.rds")

Note that we also replace the gene names (rownames) of our objects to use common gene names instead of Ensembl IDs. Using the uniquifyFeatureNames() function makes this safely, avoiding duplicate gene names.

As a reminder, the corrected object contains the cell cluster assignments obtained, for example, with the “louvain” algorithm:

# visualise cluster assignments on the corrected data
plotReducedDim(sce, 
               dimred = "UMAP_corrected",
               colour_by = "louvain15", 
               text_by = "louvain15")

Our objective is to identify genes that distinguish between these clusters. For example genes such as the “CST3” gene, which is a known monocyte marker:

# visualise a previously known marker gene (for monocytes)
plotReducedDim(sce, 
               dimred = "UMAP_corrected",
               colour_by = "CST3", 
               text_by = "louvain15", 
               by_exprs_values = "logcounts")

3 Identifying cluster marker genes

Although we have defined our clusters based on the batch-corrected expression values, these should not be used for for gene-based analyses like marker gene detection. Instead, we should use the uncorrected (normalised) expression values for differential expression between clusters.

This is because data integration algorithms bring cells together based on their overall gene expression, but for each gene individually the data transformation may introduce artificial agreement between batches, which is not ideal for gene-level differential analysis. Also, depending on the parameters we used for data integration (such as the number of nearest neighbours in the fastMNN() method), these biases may be more or less strong.

Therefore, valid assays to use in differential analysis tests are the normalised counts obtained from the deconvolution method (using scuttle::computePooledFactors() + scuttle::logNormCounts()) or from the variance stabilising transformation method (using sctransform::vst()). In our SCE object, we have the normalised counts in the “logcounts” assay, which we can access with assay(sce, "logcounts") (or using the shortcut logcounts(sce)).

3.1 Pairwise Cluster Comparisons

The basic approach for marker gene identification across clusters is to perform statistical tests for each gene between every pair of clusters. The scoreMarkers() function can do this for us, while accounting for known factors (aka “blocking factors” or “blocks”), such as sample batch. The function outputs a list of DataFrames, one for each cluster compared to all others. However, note that the blocking assumes that each pair of clusters is present in at least one of the blocks. If there are two clusters which are not both present in at least one block (in our cases Samples), then that pairwise comparison will by necessity be omitted.

# Marker gene identification ----

# identify marker genes
# by default the function uses "logcounts" as the assay (see help)
markers <- scoreMarkers(sce, 
                        groups = sce$louvain15, 
                        block = sce$SampleName)

# returns a list of length equal to the number of clusters
markers

## List of length 8
## names(8): 1 2 3 4 5 6 7 8

# check the result of a particular cluster
markers[[8]]

## DataFrame with 17486 rows and 19 columns
##            self.average other.average self.detected other.detected
##               <numeric>     <numeric>     <numeric>      <numeric>
## AL627309.5  -0.00174891  -0.000160484    0.00000000    0.000053295
## LINC01409    0.00856616   0.025022073    0.00994609    0.021400943
## LINC01128    0.01758974   0.038739966    0.04440598    0.063123841
## LINC00115    0.00223649   0.005449232    0.00574655    0.008793942
## FAM41C       0.02046140   0.002475268    0.00603382    0.001813279
## ...                 ...           ...           ...            ...
## AL592183.1   0.03077966    0.03820566    0.04119730    3.93866e-02
## AC240274.1   0.01753945    0.00476643    0.00850419    7.49984e-03
## AC004556.3  -0.00696663   -0.00618943    0.00000000    9.97088e-05
## AC233755.1  -0.00142726   -0.00105565    0.00000000    3.99872e-04
## AC007325.4  -0.00632622    0.00131403    0.00000000    1.13196e-03
##            mean.logFC.cohen min.logFC.cohen median.logFC.cohen max.logFC.cohen
##                   <numeric>       <numeric>          <numeric>       <numeric>
## AL627309.5       -0.0250930      -0.0947183         0.00000000       0.0000000
## LINC01409        -0.0883646      -0.2159842        -0.10439837       0.0914996
## LINC01128        -0.0389992      -0.3645822        -0.06031501       0.2576382
## LINC00115        -0.0310189      -0.1190068        -0.00205363       0.0000000
## FAM41C            0.0806768      -0.0393471         0.05932790       0.2624242
## ...                     ...             ...                ...             ...
## AL592183.1       0.00257403      -0.1410224        -0.02410839        0.213209
## AC240274.1       0.03471196      -0.0408433        -0.00757032        0.245649
## AC004556.3       0.00000000       0.0000000         0.00000000        0.000000
## AC233755.1      -0.00664753      -0.0465327         0.00000000        0.000000
## AC007325.4      -0.04817517      -0.1127149        -0.06222675        0.000000
##            rank.logFC.cohen  mean.AUC   min.AUC median.AUC   max.AUC  rank.AUC
##                   <integer> <numeric> <numeric>  <numeric> <numeric> <integer>
## AL627309.5             8095  0.498612  0.494705   0.500000  0.500000      8323
## LINC01409              5770  0.495150  0.486673   0.496049  0.504648      6796
## LINC01128              2202  0.491892  0.445983   0.500158  0.521051      3921
## LINC00115              8095  0.498335  0.492935   0.499910  0.500000      7922
## FAM41C                 2366  0.506311  0.497635   0.505088  0.517595      3098
## ...                     ...       ...       ...        ...       ...       ...
## AL592183.1             3090  0.499645  0.487556   0.501104  0.513193      4770
## AC240274.1             2596  0.504325  0.491410   0.501073  0.526084      3236
## AC004556.3             5605  0.500000  0.500000   0.500000  0.500000      4188
## AC233755.1             5605  0.499677  0.497740   0.500000  0.500000      4188
## AC007325.4             8116  0.495989  0.487764   0.497902  0.500000      9265
##            mean.logFC.detected min.logFC.detected median.logFC.detected
##                      <numeric>          <numeric>             <numeric>
## AL627309.5         -0.09143670          -0.423593           0.000000000
## LINC01409          -0.31705049          -0.847080          -0.263667905
## LINC01128           0.00939697          -1.258733           0.065980291
## LINC00115          -0.10823583          -0.470412          -0.000511868
## FAM41C              0.44105315          -0.194180           0.461420272
## ...                        ...                ...                   ...
## AL592183.1         1.89173e-01       -5.09254e-01           7.51657e-02
## AC240274.1         1.97344e-01       -1.98865e-01          -3.02159e-03
## AC004556.3         5.83158e-17       -5.51650e-17           7.25275e-17
## AC233755.1        -2.93282e-02       -2.05297e-01           0.00000e+00
## AC007325.4        -2.32799e-01       -4.61679e-01          -3.58911e-01
##            max.logFC.detected rank.logFC.detected
##                     <numeric>           <integer>
## AL627309.5        2.38180e-16                8632
## LINC01409         3.34718e-01                6269
## LINC01128         1.35539e+00                1785
## LINC00115         7.67354e-17                7599
## FAM41C            1.06903e+00                2319
## ...                       ...                 ...
## AL592183.1        1.45514e+00                2481
## AC240274.1        1.50036e+00                1227
## AC004556.3        2.38180e-16                4320
## AC233755.1        2.38180e-16                4320
## AC007325.4        2.38180e-16                9276

This DataFrame contains the results for cluster 8. The first four columns contain summary statistics:

self.average - the mean log-expression in the cluster of interest
other.average - the grand mean across all other clusters
self.detected - the proportion of cells with detected expression in the cluster of interest
other.detected - the mean detected proportion across all other clusters.

The remaining columns contain summaries of three scores from the pairwise comparisons. The three scores are:

logFC.cohen - “Cohen’s d” - this is the log fold change of mean gene expression that has been standardized by scaling by the average of the standard deviation across the groups. This can be interpreted in a similar way to log fold change in that a positive value indicates upregulation in the cluster of interest.
AUC - “Area Under the Curve” - this quantifies the ability to distinguish between two gene expression distributions. It can be interpreted as the likelihood that any random cell in the cluster of interest will have a higher expression of the gene than any random cell in the other cluster. It ranges from 0 to 1, where 1 can be interpreted as upregulation, 0 downregulation, and 0.5 as no difference.
logFC.detected - this is the log fold change in the proportion of cells in which the gene is detected in the cluster of interest versus the proportion of cells in which the gene is detected in the other cluster. Positive values indicate that the gene is detected in more cells in the cluster of interest than the other cluster. Note, this takes no account of the magnitude of the gene expression, instead this metric helps to identify presence/absence differences in gene expression between clusters.

More detail on the differences between these effect size scores can be found in the “Advanced” Marker detection chapter of the OSCA book.

Whilst all the pairwise scores can be retrieved by adding the argument full.stats=TRUE to scoreMarkers(), by default this function returns 5 summary statistics for each score:

mean.X - this is the mean of the score across all pairwise comparisons. It gives the relative expression of the gene versus the average of the other clusters.
min.X - this is the minimum score obtained across all pairwise comparisons. This is the most stringent summary statistic for detecting upregulated genes, if the score is high, then the gene is upregulated in the cluster of interest relative to all other clusters. Conversely, if the minimum is low, then the gene is downregulated relative to at least one other cluster.
median.X - this is the median of the score across all pairwise comparisons. It is more robust to outliers than the mean. Whilst this is beneficial in avoiding the situation where the effect size is very large in only a small number of comparisons, it may conversely be detrimental to marker gene selection if there are small number of clusters with very similar effect sizes, as these will be effectively ignored.
max.X - this is the maximum score obtained across all pairwise comparisons. This is the least stringent summary statistic for detecting upregulated genes as a high score only indicates that the gene is upregulated in the cluster of interest relative to at least one other clusters. Conversely, if the maximum is low, then the gene is downregulated relative to all other clusters.
rank.X - This is the minimum ranking (“min-rank”) of that gene by that score across all clusters. For each pairwise comparison the genes are ranked according to the score, this summary provides the lowest rank for that gene across all comparisons. Essentially, a gene with a low “min-rank” will be one of the best genes according to that score in at least one comparison.

The choice of the summary used for ranking will effect the stringency of the selection. See the the OSCA books “Basic” chapter on Marker gene detection for further discussion of these different summaries. In general the mean and median make reasonable defaults for most applications. In practice, the minimum and maximum are most helpful for diagnosing discrepancies between the mean and median, rather than being used directly for ranking.

Selecting genes based on a given min-rank, say 5, is useful as it will generate a list of genes that is the union of genes with a rank of 5 or less for any pairwise comparison. This will ensure we get at least 5 genes that distinguish the cluster of interest from all other clusters.

For example using the min-rank for Cohen’s d on cluster 11 yields 19 marker genes:

# extract results for one of the clusters
c8_markers <- markers[["8"]] %>% 
  as.data.frame()

# look at top-ranked genes
c8_markers %>% 
  select(contains("cohen")) %>%
  filter(rank.logFC.cohen <= 5) %>%
  arrange(rank.logFC.cohen)

##         mean.logFC.cohen min.logFC.cohen median.logFC.cohen max.logFC.cohen
## S100A6         2.9753231      1.37568609          2.9518078        4.731097
## FCGR3A         6.5298023     -0.28670567          1.7947271       25.335142
## CST3           3.9561976      2.89955595          4.1828470        4.589024
## FCER1G         3.2602231      1.82038162          3.6537518        4.252583
## SRGN           2.5778638      1.05560785          2.5423087        4.661317
## LYZ            3.6839303      2.49739592          3.7537947        4.532474
## TMSB4X         1.6271218     -0.03096264          1.9579224        2.906194
## LGALS1         2.9095930      2.13901741          2.9251589        3.552885
## ANXA1          1.9536038      0.73244727          2.1447801        2.740984
## VIM            1.9280077      1.07057708          2.1084486        2.436820
## SMIM25         1.5487054      0.67684848          0.9229746        3.251185
## S100A4         2.6377149      1.33624034          2.5930960        3.645940
## HLA-DRA        0.9305274     -0.63406472          0.3733123        2.658081
## TYROBP         2.8439259      1.63250555          3.0093774        3.423302
##         rank.logFC.cohen
## S100A6                 1
## FCGR3A                 1
## CST3                   1
## FCER1G                 2
## SRGN                   2
## LYZ                    2
## TMSB4X                 2
## LGALS1                 3
## ANXA1                  4
## VIM                    4
## SMIM25                 4
## S100A4                 5
## HLA-DRA                5
## TYROBP                 5

As we suspected, cluster 8 is likely to contain monocytes, based on the expression of CST3 and TYROBP, for example.

We can visualise the expression of some of the other top-ranked genes:

# visualise one of the top genes on the MNN-corrected UMAP
plotReducedDim(sce, 
               dimred = "UMAP_corrected",
               colour_by = "LYZ", 
               text_by = "louvain15")

# visualise the logcounts distribution for this gene
plotExpression(sce, 
               features = "LYZ", 
               x = "louvain15")

The CD3D gene is a known marker for T cells. From visualising this gene’s expression, it seems like it is expressed in clusters 4 and 5 (but mostly in cluster 4):

# Exercise ----

# visualise CD3D (T cell marker)
plotReducedDim(sce, 
               dimred = "UMAP_corrected",
               colour_by = "CD3D", 
               text_by = "louvain15")

plotExpression(sce, 
               features = "CD3D", 
               x = "louvain15")

Fix the code below (where the word “FIXME” appears) to extract the data frame with metrics for this cluster (from our markers object):

# extract results from cluster 4 and convert to data.frame
c4_markers <- FIXME

# filter the data.frame using your choice of ranking statistic
# `rank.logFC.detected` or `rank.logFC.cohen`
# or a combination of both!
c4_markers %>% 
  filter(FIXME)

# visualise the expression of genes that seem interesting from your filters
plotExpression(sce, 
               features = FIXME, 
               x = "louvain15")

Answer

In our code, we start by extracting the marker results for cluster 4 from our list:

# extract results from cluster 4 and convert to data.frame
c4_markers <- markers[["4"]] %>% 
  as.data.frame()

In our case, we chose to then filter our table with two criteria to find genes that:

are in the top 2 ranking for being mostly detected in this cluster (i.e. a presence/absence criteria) AND
have a Cohen’s d in the top 10 (i.e. the average expression difference should be high, relative to the gene’s variance)

We obtain 4 genes with this criteria:

c4_markers %>% 
  filter(rank.logFC.detected <= 2 & rank.logFC.cohen <= 10)

##      self.average other.average self.detected other.detected mean.logFC.cohen
## CD3E     1.259064     0.1705965     0.3672631     0.04722092         1.400344
## CD3D     1.534314     0.1966673     0.3819505     0.04611212         1.640297
## TRAC     1.465896     0.2953109     0.8258313     0.19513233         1.441731
## IL32     1.565295     0.3828596     0.4536128     0.09486134         1.349687
##      min.logFC.cohen median.logFC.cohen max.logFC.cohen rank.logFC.cohen
## CD3E       0.1586294           1.596783        1.831546                5
## CD3D       0.1245614           1.828942        2.276455                1
## TRAC       0.2008197           1.526173        1.915234                8
## IL32      -0.7183806           1.632037        2.048769                2
##       mean.AUC   min.AUC median.AUC   max.AUC rank.AUC mean.logFC.detected
## CD3E 0.7571838 0.5432453  0.7854802 0.8311599       21            3.307209
## CD3D 0.7826552 0.5196454  0.8315106 0.8471660       12            3.231604
## TRAC 0.7528929 0.5491772  0.7822182 0.8315600       26            2.779592
## IL32 0.7256337 0.2972637  0.7942938 0.8307177       26            3.113590
##      min.logFC.detected median.logFC.detected max.logFC.detected
## CD3E          0.1583214              3.838541           4.428539
## CD3D          0.2832248              3.395978           4.774811
## TRAC          0.2792971              3.137827           3.886991
## IL32         -0.3047522              3.729108           4.528497
##      rank.logFC.detected
## CD3E                   1
## CD3D                   1
## TRAC                   2
## IL32                   1

Finally, we visualise the expression of these genes:

plotExpression(sce, 
               features = c("CD3E", "CD3D", "IL32", "TRAC"), 
               x = "louvain15")

We can see all these genes behave in a similar way to our known marker CD3D, suggesting they are good markers for these cell types. This makes sense as CD3D, CD3E and TRAC encode T cell surface proteins. IL32 encodes a citokine, which is often associated with cancer. This result could open an avenue for further investigation in this study.

3.2 Heatmap of marker genes

We have already seen how we can use the plotExpression() function to visualise the distribution of expression in our data between clusters. We have also seen how to use plotReducedDim() to visualise a gene’s expression on the projected reduced dimensionality space.

Another useful type of visualisation is to use heatmaps to show the expression of these genes of interest. We can make two types of heatmap, either showing the expression in individual cells, or averaged across clusters. There are two functions we can use, respectively: plotHeatmap() and plotGroupedHeatmap()

# Heatmaps ----

# select some top genes for cluster 8
c8_top10 <- c8_markers %>% 
  filter(rank.logFC.cohen <= 10)

# heatmap of expression for each cell
plotHeatmap(sce, 
            features = rownames(c8_top10),
            order_columns_by = c("louvain15", "SampleGroup"))

# heatmap of expression with average per cluster
plotGroupedHeatmap(sce, 
                   features = rownames(c8_top10),
                   group = "louvain15",
                   block = "SampleGroup")

In both cases, the colour scale of expression is showing the logcounts in their original scale. However, for this kind of visualisation, it may sometimes be useful to scale the data (aka Z-score), which brings all the genes to the same relative scale.

# heatmap of Z-scores for each cell
plotHeatmap(sce, 
            features = rownames(c8_top10),
            order_columns_by = c("louvain15", "SampleGroup"),
            scale = TRUE, zlim = c(-3, 3))

# heatmap of Z-scores averaged per cluster
plotGroupedHeatmap(sce, 
                   features = rownames(c8_top10),
                   group = "louvain15",
                   block = "SampleGroup",
                   scale = TRUE, zlim = c(-3, 3))

In this case, the colour scale can be interpreted as the number of standard deviations above/below the mean of that gene across all cells.

3.3 Adjusting the log-fold change

The AUC and Cohen’s d scores incorporate both the gene expression differences between the clusters and the variance in gene expression scores within each cluster. If a gene has low variance, it is possible that it will be ranked highly even if the magnitude of the difference between the clusters is low. These genes will not necessarily make good marker genes. It may therefore be desirable to favour the detection of genes with larger log-fold changes. A log-fold change threshold can be set using the lfc= argument in score markers.

For example, in our results from cluster 8, the gene FCGR3A was one of three genes with a min-rank for Cohen’s d of 1:

# LFC threshold ----

# genes with rank 1 in cluster 8
c8_markers %>% 
  filter(rank.logFC.cohen == 1) %>% 
  select(contains("cohen"))

##        mean.logFC.cohen min.logFC.cohen median.logFC.cohen max.logFC.cohen
## S100A6         2.975323       1.3756861           2.951808        4.731097
## FCGR3A         6.529802      -0.2867057           1.794727       25.335142
## CST3           3.956198       2.8995560           4.182847        4.589024
##        rank.logFC.cohen
## S100A6                1
## FCGR3A                1
## CST3                  1

However, we can also see that its LFC goes from -0.3 to 25.3, which is a huge range! Looking at its expression, we can see what migth be going on:

# plot expression of FCGR3A
plotExpression(sce,
               features = "FCGR3A",
               x = "louvain15")

This gene has generally very low variation in expression, and because Cohen’s d measures average differences scaled by variance, the gene comes up as having a high value for that metric in some comparisons.

To make our analysis more restrictive, we can instead indicate to the scoreMarkers() function what is the minimum LFC threshold we want to use to consider a gene for ranking. For example, a LFC > 2:

# run gene marker analysis using a stricter LFC threshold
markers_lfc <- scoreMarkers(sce,
                           groups = sce$louvain15,
                           block = sce$SampleName,
                           lfc = 2)

# extract the results for cluster 8
c8_markers_lfc <- markers_lfc[["8"]] %>% as.data.frame()

RPS18 no longer appears in the candidate cluster marker genes list by min-rank of Cohen’s d.

c8_markers_lfc %>%
  select(contains("cohen")) %>%
  filter(rank.logFC.cohen <= 5)

##        mean.logFC.cohen min.logFC.cohen median.logFC.cohen max.logFC.cohen
## S100A9        1.0044175      0.38590399          1.0198932       1.3040889
## S100A8        0.9287267      0.44646680          0.9404558       1.1803394
## S100A6        0.9566286     -0.03946771          1.0271012       2.1452412
## S100A4        0.9119760     -0.09228689          0.8524573       1.6418924
## TMSB10       -0.9879259     -2.66398544         -1.1741091       0.6156175
## SRGN          0.5135508     -0.76156729          0.5384299       1.8291905
## LYZ           2.1022087      1.19916636          2.2358600       2.7100383
## CST3          1.3078106      0.77699080          1.4468664       1.6238249
## LGALS1        0.9957345      0.47123671          0.9913956       1.3573375
## TMSB4X       -1.1630819     -3.82185076         -0.4259003       1.2681214
##        rank.logFC.cohen
## S100A9                3
## S100A8                4
## S100A6                2
## S100A4                3
## TMSB10                5
## SRGN                  3
## LYZ                   1
## CST3                  2
## LGALS1                3
## TMSB4X                1

In fact it’s min-rank for Cohen’s d has dropped to:

c8_markers_lfc["FCGR3A", c("rank.logFC.cohen")]

## [1] 2970

Note that you could have also eliminated this gene as an interesting marker gene by using not just Cohen’s d, but also the ranking on this gene’s detection:

# we could have also used other ranks to eliminate this gene in the original analysis
c8_markers %>% 
  filter(rank.logFC.cohen == 1) %>% 
  select(contains("rank"))

##        rank.logFC.cohen rank.AUC rank.logFC.detected
## S100A6                1        2                  82
## FCGR3A                1     1042                1115
## CST3                  1        1                  33

You can see that when it comes to detection differences (i.e. presence/absence), this gene comes poorly ranked in the original analysis.

From the OSCA book:

Given that scoreMarkers() already reports effect sizes, it is tempting to take the next step and obtain p-values for the pairwise comparisons. Unfortunately, the p-values from the relevant tests cannot be reliably used to reject the null hypothesis. This is because DE analysis is performed on the same data used to obtain the clusters, which represents “data dredging” (also known as fishing or data snooping). The hypothesis of interest - are there differences between clusters? - is formulated from the data, so we are more likely to get a positive result when we re-use the data set to test that hypothesis.

The main thing to remember is that, in practice, this is a valid approach to help us annotate groups of cells based on the expression of genes with known cell-specificity and to find new interesting genes for further experiments and validation (e.g. using microscopy or qPCR). In other words, identifying cluster marker genes should be taken as a way to generate new hypothesis from our data, rather than a valid statistical model to test for differential expression between cell types.

4 Session information

sessionInfo()

sessionInfo()

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] forcats_0.5.1               stringr_1.4.0              
##  [3] dplyr_1.0.7                 purrr_0.3.4                
##  [5] readr_2.1.1                 tidyr_1.1.4                
##  [7] tibble_3.1.6                tidyverse_1.3.1            
##  [9] pheatmap_1.0.12             scran_1.20.1               
## [11] scater_1.20.1               ggplot2_3.3.5              
## [13] scuttle_1.2.1               SingleCellExperiment_1.14.1
## [15] SummarizedExperiment_1.22.0 Biobase_2.52.0             
## [17] GenomicRanges_1.44.0        GenomeInfoDb_1.28.4        
## [19] IRanges_2.26.0              S4Vectors_0.30.2           
## [21] BiocGenerics_0.38.0         MatrixGenerics_1.4.3       
## [23] matrixStats_0.61.0         
## 
## loaded via a namespace (and not attached):
##  [1] ggbeeswarm_0.6.0          colorspace_2.0-2         
##  [3] ellipsis_0.3.2            rprojroot_2.0.2          
##  [5] bluster_1.2.1             XVector_0.32.0           
##  [7] BiocNeighbors_1.10.0      fs_1.5.2                 
##  [9] rstudioapi_0.13           farver_2.1.0             
## [11] fansi_1.0.2               lubridate_1.8.0          
## [13] xml2_1.3.3                codetools_0.2-18         
## [15] sparseMatrixStats_1.4.2   knitr_1.37               
## [17] jsonlite_1.7.3            broom_0.7.11             
## [19] cluster_2.1.2             dbplyr_2.1.1             
## [21] compiler_4.1.1            httr_1.4.2               
## [23] dqrng_0.3.0               backports_1.4.1          
## [25] assertthat_0.2.1          Matrix_1.3-4             
## [27] fastmap_1.1.0             cli_3.1.1                
## [29] limma_3.48.3              BiocSingular_1.8.1       
## [31] htmltools_0.5.2           tools_4.1.1              
## [33] rsvd_1.0.5                igraph_1.2.11            
## [35] gtable_0.3.0              glue_1.6.0               
## [37] GenomeInfoDbData_1.2.6    Rcpp_1.0.8               
## [39] cellranger_1.1.0          jquerylib_0.1.4          
## [41] vctrs_0.3.8               DelayedMatrixStats_1.14.3
## [43] xfun_0.29                 beachmat_2.8.1           
## [45] rvest_1.0.2               lifecycle_1.0.1          
## [47] irlba_2.3.5               statmod_1.4.36           
## [49] edgeR_3.34.1              zlibbioc_1.38.0          
## [51] scales_1.1.1              hms_1.1.1                
## [53] RColorBrewer_1.1-2        yaml_2.2.1               
## [55] gridExtra_2.3             sass_0.4.0               
## [57] stringi_1.7.6             highr_0.9                
## [59] ScaledMatrix_1.0.0        BiocParallel_1.26.2      
## [61] rlang_0.4.12              pkgconfig_2.0.3          
## [63] bitops_1.0-7              evaluate_0.14            
## [65] lattice_0.20-44           labeling_0.4.2           
## [67] cowplot_1.1.1             tidyselect_1.1.1         
## [69] here_1.0.1                magrittr_2.0.1           
## [71] R6_2.5.1                  generics_0.1.1           
## [73] metapod_1.0.0             DelayedArray_0.18.0      
## [75] DBI_1.1.2                 pillar_1.6.4             
## [77] haven_2.4.3               withr_2.4.3              
## [79] RCurl_1.98-1.5            modelr_0.1.8             
## [81] crayon_1.4.2              utf8_1.2.2               
## [83] tzdb_0.2.0                rmarkdown_2.11           
## [85] viridis_0.6.2             locfit_1.5-9.4           
## [87] grid_4.1.1                readxl_1.3.1             
## [89] reprex_2.0.1              digest_0.6.29            
## [91] munsell_0.5.0             beeswarm_0.4.0           
## [93] viridisLite_0.4.0         vipor_0.4.5              
## [95] bslib_0.3.1

Introduction to single-cell RNA-seq analysis

Cluster marker genes

Ashley Sawle, Stephane Ballereau, Zeynep Kalender Atak, Hugo Tavares

Jun 2022