Introduction to RNA-seq data analysis

27th - 29th March 2019

Bioinformatics Training Room, Craik-Marshall Building, Downing Site, University of Cambridge

Outline

In this workshop, you will be learning how to analyse RNA-seq count data, using R. This will include reading the data into R, quality control and performing differential expression analysis and gene set testing, with a focus on the DESeq2 analysis workflow. You will learn how to generate common plots for analysis and visualisation of gene expression data, such as boxplots and heatmaps.

This workshop is aimed at biologists interested in learning how to perform differential expression analysis of RNA-seq data when reference genomes are available.

Etherpad

There is a course Etherpad. Please post questions here and we will answer them as soon as we can (Or if you can answer someone elses question do so!). The trainers may also post useful code snippets here for you.

Timetable

Day 1

9:30 - 10:15 - Introduction to RNAseq Methods - Ashley Sawle

10:15 - 11:00 - Introduction to Alignment and Quantification - Guillermo Parada Gonzalez

11:00 - 12:30 Practical: QC and Alignment with HISAT2

12:30 - 13:30 Lunch

13:30 - 17:30 Practical: Transcriptome assembly and quantificatioin with stringtie

Day 2

9:30 - 10:30 Normalisation; Quasi-mapping and quantification with Salmon - Guillermo Parada Gonzalez

10:30 - 12:30 Practical: Mapping and quantification with Star; Quantification with Salmon

12:30 - 13:30 - Lunch

13:30 - 14:00 - Introduction to RNAseq Analysis in R - Ashley Sawle

14:00 - 14:45 - RNA-seq Pre-processing - Ashley Sawle

14:45 - 17:30 - Linear Model and Statistics for Differential Expression - Dominique-Laurent Couturier

Day 3

9:30 - 12:00 - Differential Expression for RNA-seq - Stephane Ballereau

12:00 - 13:00 Lunch

13:00 - 15:30 Annotation and Visualisation of RNA-seq results - Abbi Edwards

15:30 - 17:30 Gene-set testing - Ashley Sawle

Prerequisites

**Some basic R knowledge is assumed (and is essential). Without it, you will struggle on this course.** If you are not familiar with the R statistical programming language we strongly encourage you to work through an introductory R course before attempting these materials. We recommend reading our R crash course before attending, which should take around 1 hour

Running these materials on your own computer.

Source Materials for Practicals

The all of the lecture slides and other source materials, including R code and practical solutions, can be found in the course’s Github repository

Supplementary lessons

Introductory R materials:

Additional RNAseq materials:

Data: Example Mouse mammary data (fastq files): https://figshare.com/s/f5d63d8c265a05618137

Additional resources

Bioconductor help
Biostars
SEQanswers

Acknowledgements

This course is based on the course RNAseq analysis in R prepared by Combine Australia and delivered on May 11/12th 2016 in Carlton. We are extremely grateful to the authors for making their materials available; Maria Doyle, Belinda Phipson, Matt Ritchie, Anna Trigos, Harriet Dashnow, Charity Law.