1 Normalisation Practical

In the demonstration we ran the normalisation using just 500 cells per sample. For this practical you will carry out normalisation, but this time using the all of the cells in one sample - ETV6_RUNX_1. Some of the commands will take a little longer to run this time.

2 Load packages

#install.packages('irlba')
library(irlba)
library(knitr)
library(scater)
library(scran)
library(tidyverse)
library(BiocParallel)
library(patchwork)
bpp <- MulticoreParam(7)

2.1 Prepare the data object

First we will load the full filtered data set. This data was previously filtered to remove poor quality cells.

sce <- readRDS("R_objects/Caron_filtered.full.rds")

Now we need to extract just the cells for ETV6_RUNX_1.

etvr1 <- which(sce$SampleName=="ETV6-RUNX1_1")
sce <- sce[, etvr1]

Now that we have reduced the number of cells in the data set, it may be that there are genes in the object that have not been detected in any of the remaining cells. Filter the object to remove any genes that have not been detected.

Hint

Any gene that has not been detected will have total UMI count across all cells of zero. You can use rowSums on the counts matrix of the single cell object to determine the total counts for each each gene. Then keep only rows with total counts greater than 0.

Answer
detected_genes <- rowSums(counts(sce)) > 0
sce <- sce[detected_genes,]

Q1. How many cells and genes are there in the ETV6_RUNX1_1 data set?

Answer
dim(sce)
## [1] 26248  2773

or

sce
## class: SingleCellExperiment 
## dim: 26248 2773 
## metadata(1): Samples
## assays(1): counts
## rownames(26248): ENSG00000238009 ENSG00000241860 ... ENSG00000276345
##   ENSG00000278817
## rowData names(4): ID Symbol Type Chromosome
## colnames(2773): 1_AAACCTGAGACTTTCG-1 1_AAACCTGGTCTTCAAG-1 ...
##   1_TTTGTCAAGGACGAAA-1 1_TTTGTCAGTTCGGCAC-1
## colData names(10): Sample Barcode ... subsets_Mito_percent total
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

There are 2773 cells remaining and 26248 genes detected.

Q2. In how many cells has the gene ORC1 been detected with at least 1 UMI?

Hint

You will need to search the rowData of the single cell experiment object to determine which row of the counts matrix contains the data for the gene ORC1.

You can either use the row name (in this case the Ensembl ID) or the row number to retrieve the counts for ORC1 from the counts matrix. You can then determine how many of these are greater than 0.

Answer
orc1_row <- which(rowData(sce)$Symbol == "ORC1")
sum(counts(sce)[orc1_row, ] > 0)
## [1] 164

2.2 Normalisation by deconvolution

Now normalise the data set using the deconvolution method. You will need to

  1. cluster the cells - remember to set a seed so that your results are reproducible
  2. compute size factors using the deconvolution method
  3. log normalize the counts by applying the size factors
  4. check that the single cell experiment object contains a new “logcounts” assay
Hint

These are the commands you will need to use for each step:

  1. cluster the cells: quickCluster
  2. compute size factors: computePooledFactors
  3. log normalize the counts: logNormCounts
  4. check assay: assayNames
Answer

2.2.1 Cluster cells

set.seed(100) 
clust <- quickCluster(sce, BPPARAM = bpp)

2.2.2 Compute size factors

sce <- computePooledFactors(sce,
             clusters = clust,
             min.mean = 0.1,
             BPPARAM = bpp)

2.2.3 Apply size factors

sce <- logNormCounts(sce)

2.2.4 Check assays

assayNames(sce)
## [1] "counts"    "logcounts"