Introduction to RNA-seq data analysis

19th - 21st May 2020

Taught remotely

Bioinformatics Training, Craik-Marshall Building, Downing Site, University of Cambridge



In this workshop, you will be learning how to analyse RNA-seq data. This will include read alignment, quality control, quantification against a reference, reading the count data into R, performing differential expression analysis, and gene set testing, with a focus on the DESeq2 analysis workflow. You will learn how to generate common plots for analysis and visualisation of gene expression data, such as boxplots and heatmaps.

This workshop is aimed at biologists interested in learning how to perform differential expression analysis of RNA-seq data.

Whilst we have run this course for several years, this is the first time that we will be teaching it remotely. Please bear with us if there are any technical hitches, and be aware that timings for different sections laid out in the schedule below may not be adhered to. There may be some necessity to make adjusments to the course as we go.

Google Document

There is a Google Document for the course.

It contains useful information and links relevant to the course.

Please use it to post any questions you have during the course. The trainers will be monitoring the document and will answer questions as quickly as they can.

There were over 140 questions during the course!! A pdf of the final Google Document can be found here.

Course etiquette

As this course is being taught online and there are a large number of participants, we will alls need to follow a few simple rules to ensure things run as smoothly as possible:

  1. Please mute your microphone

  2. To get help from a tutor, please click the “Raise Hand” button in Zoom:

    This can be found by clicking on the ‘Participants’ button. A tutor will then contact you in the chat. If necessary, you and the tutor can be moved to a breakout room where you will be discuss your issue in more detail.

  3. Please ask any general questions by typing in them into the Google Doc mentioned above

  4. During practicals, when you are done, please press the green “Yes” button:

    this way we will know when we can move on.


As we have not taught this course remotely before, all times here should be regarded as aspirations

Day 1

9:30 - 9:45 - Welcome!

9:45 - 10:15 - Introduction to RNAseq Methods - Sankari Nagarajan

10:15 - 11:15 Raw read file format and QC - Abbi Edwards
- Introductory slides
- Practical
- Practical solutions

11:15 - 12:30 Short read alignment with HISAT2 - Ashley Sawle
- Introductory slides
- Practical
- Practical solutions

12:30 - 13:30 Lunch

13:30 - 15:00 QC of alignment - Ashley Sawle
- Introductory slides
- Practical
- Practical solutions

15:00 - 16:30 Quantification with SubRead - Abbi Edwards
- Introductory slides
- Practical
- Practical solutions

Day 2

9:30 - 10:00 Introduction to RNAseq Analysis in R - Sankari Nagarajan

10:00 - 12:00 - RNA-seq Pre-processing - Stephane Ballereau
- Practical solutions
- R script from live session

13:00 - 15:00 Statistical Analysis of Bulk RNAseq Data - Dominique-Laurent
- Slides
- Practical (html) (rmd)

15:00 - 16:00 Experimental Design of Bulk RNAseq studies - Sankari Nagarajan
- Slides
- Practical
- Practical Answers

Day 3

9:30 - 12:00 - Differential Expression for RNA-seq - Ashley Sawle
- practical solutions
- R script from session

12:00 - 13:00 Lunch

13:00 - 15:00 Annotation and Visualisation of RNA-seq results - Abbi Edwards
- practical solutions
- R script from session

15:00 - 16:00 Gene-set testing - Stephane Ballereau
- practical solutions


**Some basic experience of using a UNIX/LINUX command line is assumed**

**Some R knowledge is assumed (and is essential). Without it, you will struggle on this course.** If you are not familiar with the R statistical programming language we strongly encourage you to work through an introductory R course before attempting these materials. We recommend our Introduction to R course

Source Materials for Practicals

The all of the lecture slides and other source materials, including R code and practical solutions, can be found in the course’s Github repository

Extended materials

The materials linked to from this page are somewhat cut down from the complete course that we normally teach. The Extended Materials contain the full course materials and links to additional RNAseq materials, including instruction on downloading and processing the raw data for this course, a link to an excellent R course, and where to get further help after the course.

Additional Resources


This course is based on the course RNAseq analysis in R prepared by Combine Australia and delivered on May 11/12th 2016 in Carlton. We are extremely grateful to the authors for making their materials available; Maria Doyle, Belinda Phipson, Matt Ritchie, Anna Trigos, Harriet Dashnow, Charity Law.