Alignment and feature counting

March 2024

Single Cell RNAseq Analysis Workflow

10x technology overview

GEM: Gel Bead-In-EMulsion
Millions of GEMs
Each GEM comes with thousands of oligonucleotide sequences
Each oligo sequence has cell barcode + UMI + capture sequence

10x library file structure

The 10x library contains four pieces of information, in the form of DNA sequences, for each “read”.

sample index - identifies the library, with one or two indexes per sample
10x barcode - identifies the droplet in the library
UMI - identifies the transcript molecule within a cell and gene
insert - the transcript molecule

Raw fastq files

The sequences for any given fragment will generally be delivered in 3 or 4 files:

I1: I7 sample index
I2: I5 sample index if present (dual indexing only)
R1: 10x barcode + UMI
R2: insert sequence

QC of Raw Reads - FASTQC

QC of Raw Reads - MultiQC

Alignment and counting

The first steps in the analysis of single cell RNAseq data:

Align reads to genome
Annotate reads with feature (gene)
Quantify gene expression

Cell Ranger

10x Cell Ranger - This not only carries out the alignment and feature counting, but will also:
- Call cells
- Generate a summary report in html format
- Generate a “cloupe” file

Alternative methods include:

STAR solo:
- Generates outputs very similar to CellRanger minus the cloupe file and the QC report
- Will run with lower memory requirements in a shorter time than Cell Ranger
Alevin:
- Based on the popular Salmon tool for bulk RNAseq feature counting
- Alevin supports both 10x-Chromium and Drop-seq derived data

Obtaining Cell Ranger

Cell Ranger tools

Cell Ranger includes a number of different tools for analysing scRNAseq data, including:

cellranger mkref - for making custom references
cellranger count - for aligning reads and generating a count matrix
cellranger aggr - for combining multiple samples and normalising the counts

Preparing the raw fastq files

Cell Ranger requires the fastq file names to follow a convention:

<SampleName>_S<SampleNumber>_L00<Lane>_<Read>_001.fastq.gz

e.g. for a single sample we may want:

    SITTA11_S1_L001_I1_001.fastq.gz
    SITTA11_S1_L001_I2_001.fastq.gz
    SITTA11_S1_L001_R1_001.fastq.gz
    SITTA11_S1_L001_R2_001.fastq.gz

Unfortunately, the files we receive from the Genomics server will be named like this:

    SLX-21334.SITTA11.HTLCWDRXY.s_2.i_1.fq.gz
    SLX-21334.SITTA11.HTLCWDRXY.s_2.i_2.fq.gz
    SLX-21334.SITTA11.HTLCWDRXY.s_2.r_1.fq.gz
    SLX-21334.SITTA11.HTLCWDRXY.s_2.r_2.fq.gz

Genome/Transcriptome Reference

As with other aligners Cell Ranger requires the information about the genome and transcriptome of interest to be provided in a specific format.

Obtain from the 10x website for human or mouse (or both - PDX)
Build a custom reference with cellranger mkref

Running `cellranger count`

Computationally very intensive
High memory requirements

Cell Ranger outputs

One directory per sample

Single Cell RNAseq Analysis Workflow

10x technology overview

10x library file structure

Raw fastq files

QC of Raw Reads - FASTQC

QC of Raw Reads - MultiQC

QC of Raw Reads - MultiQC

QC of Raw Reads - MultiQC

QC of Raw Reads - MultiQC

Alignment and counting

Cell Ranger

Obtaining Cell Ranger

Cell Ranger tools

Preparing the raw fastq files

Genome/Transcriptome Reference

Running `cellranger count`

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger report

Cell Ranger outputs

Loupe Browser

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger cell calling

Single Cell RNAseq Analysis Workflow

Single Cell RNAseq Analysis Workflow

10x technology overview

10x library file structure

Raw fastq files

QC of Raw Reads - FASTQC

QC of Raw Reads - MultiQC

QC of Raw Reads - MultiQC

QC of Raw Reads - MultiQC

QC of Raw Reads - MultiQC

Alignment and counting

Cell Ranger

Obtaining Cell Ranger

Cell Ranger tools

Preparing the raw fastq files

Genome/Transcriptome Reference

Running cellranger count

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger report

Cell Ranger outputs

Loupe Browser

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger outputs

Cell Ranger cell calling

Single Cell RNAseq Analysis Workflow

Running `cellranger count`