Introduction to Bulk RNA-seq data analysis
17th, 24th February and 3rd March 2022
Taught remotely
Bioinformatics Training, Craik-Marshall Building, Downing Site, University of Cambridge
Instructors
- Abigail Edwards - Bioinformatics Core, Cancer Research UK Cambridge Institute
- Ashley D Sawle - Bioinformatics Core, Cancer Research UK Cambridge Institute
- Jon Price - Miska Group, Gurdon Institute, Cambridge
- Hugo Tavares - Bioinformatics Training Facility, Dept of Genetics
- Chloe Pacyna - Welcome Sanger Institute
- Sankari Nagarajan - University of Manchester
- Carolin Sauer - Brenton Group, Cancer Research UK Cambridge Institute
- Tom Smith - Bioinformatics Faciltiy, MRC Toxicology Unit, Cambridge
Outline
In this workshop, you will be learning how to analyse RNA-seq data. This will include read alignment, quality control, quantification against a reference, reading the count data into R, performing differential expression analysis, and gene set testing, with a focus on the DESeq2 analysis workflow. You will learn how to generate common plots for analysis and visualisation of gene expression data, such as boxplots and heatmaps.
This workshop is aimed at biologists interested in learning how to perform differential expression analysis of RNA-seq data.
Whilst we have run this course for several years, we are still learning how to
teach it remotely and this is the first time split over three weeks.
Please bear with us if there are any technical hitches, and
be aware that timings for different sections laid out in the schedule below may
not be adhered to. There may be some necessity to make adjustments to the course
as we go.
Prerequisites
**Some basic experience of using a UNIX/LINUX command line is assumed**
**Some R knowledge is assumed and essential. Without it, you will struggle on this course.** If you are not familiar with the R statistical programming language we strongly encourage you to work through an introductory R course before attempting these materials. We recommend our Introduction to R course
Shared Google Document
This Google Document contains useful information and links..
Please use it to post any questions you have during the course.
The trainers will be monitoring the document and will answer questions as quickly as they can.
Introduce Yourself
There is another Google Doc Google Document Please write a couple sentences here to introduce yourself to the class, tell us a bit about your background and what you hope to get out of this course. If you are a student or staff at the University of Cambridge, tell us which Department you are in.
Course etiquette
As this course is being taught online and there are a large number of participants, we will all need to follow a few simple rules to ensure things run as smoothly as possible:
-
Please mute your microphone
-
To get help from a tutor, please click the “Raise Hand” button in Zoom:
This can be found by clicking on the ‘Participants’ button. A tutor will then contact you in the chat. If necessary, you and the tutor can be moved to a breakout room where you can discuss your issue in more detail.
-
Please ask any general question by typing it into the Google Doc mentioned above
-
During practicals, when you are done, please press the green “Yes” button:
This way we will know when we can move on.
Timetable
We are still learning how to teach this course remotely, all times here should be regarded as aspirations
Day 1
9:30 - 9:45 - Welcome!
9:45 - 10:15 - Introduction to RNAseq Methods - Sankari Nagarajan
10:15 - 11:15 Raw read file format and
QC - Jon Price
- Practical (pdf)
- Practical solutions (pdf)
11:15 - 12:45 Short read alignment with
HISAT2 - Jon Price
- Practical (pdf)
12:45 - 13:45 Lunch
13:45 - 15:30 QC of alignment - Jon Price
- Practical (pdf)
15:30 - 17:00 Quantification of Gene Expression with Salmon - Ashley D Sawle
- Practical (pdf)
- Practical solutions (pdf)
Day 2
9:30 - 10:15 Introduction to RNAseq Analysis in R - Sankari Nagarajan
10:15 - 12:15 - RNA-seq Data Exploration (pdf) - Abbi Edwards
12:15 - 13:15 Lunch
13:15 - 15:45 Statistical Analysis of Bulk RNAseq Data
-
Part I: Statistics of RNA-seq analysis - Ashley D Sawle
-
Part II: Linear Models in R and DESeq2 (pdf) - Ashley D Sawle
- Slides
- Find the worksheet in
Course_Materials/stats/models_in_r_worksheet.R
15:45 - 17:00 - Differential Expression for RNA-seq - Part 1 (pdf) - Abbi Edwards
Day 3
9.30 - 10.00 - Recap of Day 1 and 2 - Ashley D Sawle
10.00 - 12:00 - Differential Expression for RNA-seq - Part 2 (pdf) - Abbi Edwards
12.00 - 12.45 Annotation and Visualisation of RNA-seq results - Ashley D Sawle
12.45 - 13.45 Lunch
13.45 - 15.00 Annotation and Visualisation of RNA-seq results (pdf) - Ashley D Sawle
- practical solutions
- live script live_scripts/DESeq2_part1.R live_scripts/DataExploration.R
15:00 - 17:00 Gene-set testing - Ashley D Sawle
Source Materials for Practicals
The lecture slides and other source materials, including R code and practical solutions, can be found in the course’s Github repository
Extended materials
The Extended Materials contain extensions to some of the sessions and additional materials, including instruction on downloading and processing the raw data for this course, a link to an excellent R course, and where to get further help after the course.
Additional Resources
Acknowledgements
This course is based on the course RNAseq analysis in R prepared by Combine Australia and delivered on May 11/12th 2016 in Carlton. We are extremely grateful to the authors for making their materials available; Maria Doyle, Belinda Phipson, Matt Ritchie, Anna Trigos, Harriet Dashnow, Charity Law.
The materials have been rewritten/modified/corrected/updated by various contributors over the past 5 years including:
Abigail Edwards Ashley D Sawle Chandra Chilamakuri Dominique-Laurent Couturier Guillermo Parada González Hugo Tavares Jon Price Mark Dunning Mark Fernandes Oscar Rueda Sankari Nagarajan Stephane Ballereau Zeynep Kalender Atak
Apologies if we have missed anyone!