Description

High-throughput technologies such as next generation sequencing (NGS) can routinely produce massive amounts of data. These technologies allow us to describe all variants in a genome and international collaborative efforts such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) have begun to catalogue and release data on genomic variation in a variety of cancer types.

However, such datasets pose new challenges in the way the data have to be analyzed, annotated and interpreted which are not trivial and are daunting to the wet-lab biologist. This course covers state-of-the-art and best-practice tools for the analysis of cancer genomes. We describe, and give hands-on experience of, the entire analysis workflow from raw data generated by a sequencing machine to deriving variant calls (SNVs, copy-number and structural variants) that are ready for downstream analysis and interpretation

Instructors

We are very grateful for the support of the Bioinformatics Training facility at the University of Cambridge

Aims

Appreciation of the nature and scale of NGS data and the requirement for sophisticated computational methods
Describe theory behind current methods for calling SNVs and copy-number changes from NGS data, and their outputs
Encourage exploration of NGS data using interactive tools such the Integrative Genomics Viewer (IGV)
Increase awareness of existing cancer cohorts and how they can be exploited

Objectives

Understand the main file formats used for NGS analysis (bam, vcf, bed etc), what is included in each file and appropriate tools for manipulating each file
Know the metrics and tools that can be used to assess if a given sequencing run is adequate quality for analysis
Understand the concepts and challenges involved in calling SNVs from whole-genome data
Given a set of called SNVs, be able to i) assess quantitively and qualitatively which calls might be "real" or not ii) assess which calls might be biologically-meaningful and warrant further investigation
Know how to access TCGA and ICGC data and how they can inform other studies

Venue

Craik-Marshall room, Department of Genetics

Accommodation

If required, free bed and breakfast accommodation wll be provided for attendees in Downing College - close to the course's location. Please let us know on the registration form if you need accommodation and when you plan to check-in and check-out.

Timetable

Day One 09:30 - 17:00

09:30 - 12:30 Course Introductions and R Recap:-

Course Introduction

R Recap

Introduction to sequencing data

12:30 - 13:30 LUNCH (provided)

13:30 - 14:30 Understanding raw sequencing reads:-

Hands-on fastq Practical

QA of reads

14:30 - 17:00 Understanding aligned reads:-

Introduction

Introduction to IGV

Hands-on with bam files Practical

Day Two 09:30 - 17:00

09:30 - 10:30 SNV Introduction and calling germline SNVs (Lecture)

10:30 - 12:30 Calling germline variants and working with VCF files

12:30 - 13:30 LUNCH (provided)

13:30 - 17:00 Copy-Number analysis

Lecture

Practical

Day Three 09:30 - 17:00

09:30 - 10:00 Somatic SNV calling (Lecture)

10:00 - 10:45 Filtering SNVs (Lecture)

10:45 - 12:30 Visualising and Assessing Somatic SNVs

12:30 - 13:30 LUNCH (provided)

13:30 - 16:00 Annotating, Filtering and Prioritising SNVs

16:00 - 17:00 Further issues and considerations

Lecture

Practical

Day Four 09:30 - 17:00

Index

Solutions to Exercises

09:30 - 10:30 Introduction to Structural Variants (SV) and methods for calling SVs

10:30 - 12:30 Visualising SVs

12:30 - 13:30 LUNCH (provided)

13:30 - 14:30 Intersecting and filter SVs

14:30 - 16:00 Understanding and annotating SVs

16:00 - 17:00 Complex rearrangements and detecting "chromothripsis"

Bonus: Docker Demo

WORKSHOP DINNER

Day Five (1/2 day) 09:30 - 12:30

Slides

Dealing with large collections of Genomes

Obtaining TCGA data

Mutational signatures

Integrating different data types

12:30 - 13:30 LUNCH (provided)

Software

Practical exercises can be completed in either RStudio, or Docker

RStudio instructions

Download and install the latest version of RStudio for your operating system: LINK

Once RStudio is loaded, Install the R packages required for the course by typing the command:-


  source("https://raw.githubusercontent.com/bioinformatics-core-shared-training/cruk-summer-school-2016/master/installBioCPkgs.R")

CGPbox instructions

The Wellcome Trust Sanger Institute have made their cancer genome analysis pipeline available as a Docker container; cgpbox

Install Docker for your operating system. Docker for Mac requires OS X 10.10.3 Yosemite or newer. Docker for Windows requires Windows 10.

If you do not have Mac OSX 10.10.3 or Windows 10, you will have to install Docker Toolbox

Follow the installation instructions

Once docker is sucessfully running, you can get the following at the command-line:-

docker pull quay.io/wtsicgp/cgp_in_a_box

CRUK Docker instructions

Install docker as above

Download the latest version of the docker container with:-

docker pull markdunning/summer-school2016

Analysis of Cancer Genomes

Cancer Research Uk Bioinformatics Summer School: Cambridge, 25th - 29th July 2016

Description

Instructors

Aims

Objectives

Venue

Accommodation

Timetable

Day One 09:30 - 17:00

Day Two 09:30 - 17:00

Day Three 09:30 - 17:00

Day Four 09:30 - 17:00

Day Five (1/2 day) 09:30 - 12:30

Software

RStudio instructions

CGPbox instructions

CRUK Docker instructions