Introduction to single-cell RNA-seq data analysis
18 Jan, 25 Jan, 1 Feb || 09:30 - 17:30
Online (Zoom)
Instructors
- Abigail Edwards - Bioinformatics Core, Cancer Research UK Cambridge Institute
- Adam Reid - The Gurdon Institute, University of Cambridge
- Ashley Sawle - Bioinformatics Core, Cancer Research UK Cambridge Institute
- Chandra Chilamakuri - Bioinformatics Core, Cancer Research UK Cambridge Institute
- Hugo Tavares - Bioinformatics Training Facility, University of Cambridge
- Katarzyna Kania - Genomics Core, Cancer Research UK Cambridge Institute
- Roderik Kortlever - Dept. Biochemistry, University of Cambridge
Helpers:
- Jon Price - Biochemistry Department, University of Cambridge
- Jiayin Hong - Biochemistry Department, University of Cambridge
- Ulrika Yuan - Biochemistry Department, University of Cambridge
Outline
This workshop is aimed at biologists interested in learning how to perform standard single-cell RNA-seq analyses.
This will focus on the droplet-based assay by 10X genomics and include running
the accompanying cellranger
pipeline to align reads to a genome reference and
count the number of read per gene, reading the count data into R, quality control,
normalisation, data set integration, clustering and identification of cluster
marker genes, as well as differential expression and abundance analyses.
You will also learn how to generate common plots for analysis and visualisation
of gene expression data, such as TSNE, UMAP and violin plots.
Prerequisites
**Some basic experience of using a UNIX/LINUX command line is assumed**
**Some R knowledge is assumed and essential. Without it, you will struggle on this course.** If you are not familiar with the R statistical programming language we strongly encourage you to work through an introductory R course before attempting these materials. We recommend our Introduction to R course
Data set
- The course data is based on ‘CaronBourque2020’
relating to pediatric leukemia, with four sample types, including:
- pediatric Bone Marrow Mononuclear Cells (PBMMCs)
- three tumour types: ETV6-RUNX1, HHD, PRE-T
- The data used in the course can be downloaded from Dropbox (the file is 4.2GB compressed and XXGB when uncompressed, so make sure you have enough space on your computer). Please note that:
- these data have been processed for teaching purposes and are therefore not suitable for research use;
- all the data is provided on our training machines, you don’t need to download it to attend the course.
Schedule
PDF of materials: if you want a PDF version of the materials go to the “Print” option on your browser and select “Print to PDF” (all major browsers have this functionality).
Day 1
Training helpers: Jon Price, Abbi Edwards
Training observers: Jiayin Hong, Ulrika Yuan
- 09:30 - 09:40 Welcome
- 09:40 - 10:25 Introduction to Single Cell Technologies - Katarzyna Kania
- 10:25 - 10:30 - Break
- 10:30 - 10:40 Preamble: data set and workflow - Adam Reid
- 10:40 - 12:00 Library structure, cellranger for alignment and cell calling - Adam Reid
- 12.00 - 12.30 Loupe browser demo - Roderik Kortlever
- 12:30 - 13:30 Lunch break
- 13:30 - 17:00 QC and exploratory analysis - Ashley Sawle
Day 2
Training helpers: Jon Price, Ash Sawle
Training observers: Jiayin Hong, Ulrika Yuan
- 09:30 - 09:40 Recap -
- 09:40 - 12:30 Normalisation - Adam Reid
- 12:30 - 13:30 lunch break
- 13:30 - 15:25 Feature selection and dimensionality reduction - Abigail Edwards
- 15:25 - 15:35 10 min break
- 15:35 - 17:30 Batch correction and data set integration - Abigail Edwards
Day 3
Training helpers: Chandra Chilamakuri
Training observers: Jiayin Hong, Ulrika Yuan
- 09:30 - 09:40 Recap
- 09:40 - 11:05 Clustering - Adam Reid
- 11:05 - 11:15 10 min break
- 11:15 - 12:30 Identification of cluster marker genes - Ashley Sawle
- 12:30 - 13:30 lunch break
- 13:30 - 17.30 Differential Expression and Abundance Analysis - Hugo Tavares
Extended Materials
- Seurat walkthrough:
- Part 1: Data pre-processing
- Part 2: Cell clustering and annotation
Software Installation
We will give you access to an online environment with all the necessary software installed. However, if you want to run the analysis on your own computer, you can follow these instructions.
- Download and install R: https://cloud.r-project.org/
- (Windows users only): Download and install RTools: https://cran.r-project.org/bin/windows/Rtools/
- Download and install RStudio: https://www.rstudio.com/products/rstudio/download/#download
- Open RStudio and run the following commands from the console:
install.packages("BiocManager") BiocManager::install(c("AnnotationHub", "BiocParallel", "BiocSingular", "DropletUtils", "PCAtools", "batchelor", "bluster", "cluster", "clustree", "dynamicTreeCut", "edgeR", "ensembldb", "ggplot2", "igraph", "patchwork", "pheatmap", "scater", "scran", "tidyverse"))
For Cellranger, you will need to use a Linux machine. See the installation instructions from 10x Genomics.
Acknowledgments:
Much of the material in this course has been derived from the demonstrations found in
OSCA book
and the Hemberg Group course materials. Additional material concerning miloR
has been based on the demonstration from the Marioni Lab.
The materials have been contributed to by many individuals over the last 2 years, including:
Abigail Edwards, Ashley D Sawle, Chandra Chilamakuri, Kamal Kishore, Stephane Ballereau, Zeynep Kalendar Atak, Hugo Tavares, Jon Price, Katarzyna Kania, Roderik Kortlever, Adam Reid, Tom Smith
Apologies if we have missed anyone!