In collaboration with The Gulbenkian Training Programme in Bioinformatics

Overview

High-throughput technologies such as next generation sequencing (NGS) can routinely produce massive amounts of data. These technologies allow us to describe all variants in a genome or to detect the whole set of transcripts that are present in a cell or tissue. However, such datasets pose new challenges in the way the data have to be analyzed, annotated and interpreted which are not trivial and are daunting to the wet-lab biologist. This course covers state-of-the-art and best-practice tools for NGS RNA-seq and ChIP-seq data analysis, which are of major relevance in today’s genomic and gene expression studies.

Methods

The course is comprised of practical exercises preceded by short lectures. Exercises will be conducted primarily in the R programming language.

Target Audiences

Enthusiastic and motivated wet-lab biologists who want to gain more of an understanding of NGS data and eventually progress to analysing their own data

Pre-requisites

There is a lot of material to cover in the course, so we will assume that you are familiar with a few basics before you come. The tool that will we do most of the analysis in is R. There will be a short recap of the key concepts at the beginning of the course; however it will be beneficial if you are already familiar with how to read data into R, perform basic subset operations and produce simple plots.

Several Online videos are available that cover this materials. For example

Some introductory statistics, such as summary statistics for continuous data (mean, variance etc) and interpreting the results of a t-test, will be also be assumed. See "Statistics at Square One"" Chapters 1, 2, 3 and 7 (http://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one) for a good overview.

Basic unix skills, such as being able to list the contents of a directory and copy files, would also be an advantage. See "Session 1" of the Software Carpentry training for a Unix introduction (http://bioinformatics-core-shared-training.github.io/shell-novice/)

Course Materials

All material is available to download under Attribution-NonCommercial-NoDerivatives 4.0 International

Day 1

Morning

  • Course Introduction
  • R Refresher
  • Afternoon

  • Intro to NGS
  • QC of NGS data (Lecture)
  • QC of NGS data (Practical)
  • QC of NGS data (Quiz)
  • IGV notes
  • Day 2

    Morning

  • Representing NGS data in Bioconductor (Lecture)
  • Representing NGS data in Bioconductor (Practical)
  • Afternoon

  • Statistical Models for Sequencing Data
  • Introduction to RNA-seq
  • Mapping and Counting for RNA-seq (Practical)
  • Morning

    Day 3

  • Differential Expression for RNA-Seq (Practical)
  • Afternoon

  • Annotation of NGS data in Bioconductor (Lecture)
  • Annotation of NGS data in Bioconductor (Practical)
  • Further RNA-seq analysis (Lecture)
  • Further RNA-seq analysis (Practical)
  • Day 4

  • See separate page
  • Docker

  • You can replicate the environment for the course using this Docker file
  • docker.com
  • Acknowledgements

    The materials from this course are based on NGS analysis courses in Cambridge and the MRC Clinical Sciences Centre. Mark would like to thanks Oscar Rueda, Bernard Pereira (Caldas group, CRUK Cambridge), Ines de Santiago and Shamith Samarajiwa for making their materials available.