Introduction to single-cell RNA-seq data analysis
17, 24 June, 1 July 2022, 09:30 - 17:30
Taught remotely (zoom link provided by email)
Bioinformatics Training Facility, University of Cambridge
Instructors
- Abigail Edwards - Bioinformatics Core, Cancer Research UK Cambridge Institute
- Ashley Sawle - Bioinformatics Core, Cancer Research UK Cambridge Institute
- Chandra Chilamakuri - Bioinformatics Core, Cancer Research UK Cambridge Institute
- Katarzyna Kania - Genomics Core, Cancer Research UK Cambridge Institute
- Stephane Ballereau - Cellular Genetics programme, Wellcome Sanger Institute
- Roderik Kortlever - Dept. Biochemistry, University of Cambridge
Helpers:
- Adam Reid - The Gurdon Institute, University of Cambridge
- Hugo Tavares - Bioinformatics Training Facility, University of Cambridge
- Jon Price - The Gurdon Institute, University of Cambridge
- Raquel Manzano Garcia - Cancer Research UK Cambridge Institute
- Tom Smith - MRC Toxicology, University of Cambridge
Outline
This workshop is aimed at biologists interested in learning how to perform standard single-cell RNA-seq analyses.
This will focus on the droplet-based assay by 10X genomics and include running
the accompanying cellranger
pipeline to align reads to a genome reference and
count the number of read per gene, reading the count data into R, quality control,
normalisation, data set integration, clustering and identification of cluster
marker genes, as well as differential expression and abundance analyses.
You will also learn how to generate common plots for analysis and visualisation
of gene expression data, such as TSNE, UMAP and violin plots.
Prerequisites
**Some basic experience of using a UNIX/LINUX command line is assumed**
**Some R knowledge is assumed and essential. Without it, you will struggle on this course.** If you are not familiar with the R statistical programming language we strongly encourage you to work through an introductory R course before attempting these materials. We recommend our Introduction to R course
Data sets
Two data sets:
- ‘CaronBourque2020’: pediatric leukemia, with four sample types, including:
- pediatric Bone Marrow Mononuclear Cells (PBMMCs)
- three tumour types: ETV6-RUNX1, HHD, PRE-T
- ‘HCA’: adult BMMCs (ABMMCs) obtained from the Human Cell Atlas (HCA)
- (here downsampled from 25000 to 5000 cells per sample)
Schedule
Please not that we may adjust these times as the pace of the course requires.
Day 1
- 09:30 - 09:40 Welcome
- 09:40 - 10:25 Introduction - Katarzyna Kania
- 10:25 - 10:30 5 min break
- 10:30 - 10:40 Preamble: data set and workflow - Ashley Sawle
- 10:40 - 12:30 Library structure, cellranger for alignment and cell calling - Ashley Sawle
- 12:30 - 13:30 lunch break
- 13:30 - 14:00 Loupe browser demo - Roderik Kortlever
- 14:00 - 17:30 QC and exploratory analysis - Chandra Chilamakuri
Day 2
- 09:30 - 09:40 Recap
- 09:40 - 12:30 Normalisation - Stephane Ballereau
- 12:30 - 13:30 lunch break
- 13:30 - 15:25 Feature selection and dimensionality reduction - Chandra Chilamakuri
- 15:25 - 15:35 10 min break
- 15:35 - 17:30 Batch correction and data set integration - Hugo Tavares
Day 3
- 09:30 - 09:40 Recap
- 09:40 - 11:05 Clustering - Ashley Sawle
- 11:05 - 11:15 10 min break
- 11:15 - 12:30 Identification of cluster marker genes - Hugo Tavares
- 12:30 - 13:30 lunch break
- 13:30 - 15:25 Differential expression between conditions - Abigail Edwards
- 15:25 - 15:35 10 min break
- 15:35 - 17:30 Differential abundance between conditions - Abigail Edwards