Description
High-throughput technologies such as next generation sequencing (NGS) can routinely produce massive amounts of data. These technologies allow us to describe all variants in a genome and international collaborative efforts such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) have begun to catalogue and release data on genomic variation in a variety of cancer types.
However, such datasets pose new challenges in the way the data have to be analyzed, annotated and interpreted which are not trivial and are daunting to the wet-lab biologist. This course covers state-of-the-art and best-practice tools for the analysis of cancer genomes. We describe, and give hands-on experience of, the entire analysis workflow from raw data generated by a sequencing machine to deriving variant calls (SNVs, copy-number and structural variants) that are ready for downstream analysis and interpretation
Instructors
We are very grateful for the support of the Bioinformatics Training facility at the University of Cambridge
Aims
- Appreciation of the nature and scale of NGS data and the requirement for sophisticated computational methods
- Describe theory behind current methods for calling SNVs and copy-number changes from NGS data, and their outputs
- Encourage exploration of NGS data using interactive tools such the Integrative Genomics Viewer (IGV)
- Increase awareness of existing cancer cohorts and how they can be exploited
Objectives
- Understand the main file formats used for NGS analysis (bam, vcf, bed etc), what is included in each file and appropriate tools for manipulating each file
- Know the metrics and tools that can be used to assess if a given sequencing run is adequate quality for analysis
- Understand the concepts and challenges involved in calling SNVs from whole-genome data
- Given a set of called SNVs, be able to i) assess quantitively and qualitatively which calls might be "real" or not ii) assess which calls might be biologically-meaningful and warrant further investigation
- Know how to access TCGA and ICGC data and how they can inform other studies
Venue
- Craik-Marshall room, Department of Genetics
Accommodation
If required, free bed and breakfast accommodation wll be provided for attendees in Downing College - close to the course's location. Please let us know on the registration form if you need accommodation and when you plan to check-in and check-out.
Timetable
Day One 09:30 - 17:00
12:30 - 13:30 LUNCH (provided)
Day Two 09:30 - 17:00
12:30 - 13:30 LUNCH (provided)
Day Three 09:30 - 17:00
12:30 - 13:30 LUNCH (provided)
Day Four 09:30 - 17:00
WORKSHOP DINNER
Day Five (1/2 day) 09:30 - 12:30
12:30 - 13:30 LUNCH (provided)
Software
Practical exercises can be completed in either RStudio, or Docker
RStudio instructions
source("https://raw.githubusercontent.com/bioinformatics-core-shared-training/cruk-summer-school-2016/master/installBioCPkgs.R")
CGPbox instructions
The Wellcome Trust Sanger Institute have made their cancer genome analysis pipeline available as a Docker container; cgpboxdocker pull quay.io/wtsicgp/cgp_in_a_box
CRUK Docker instructions
docker pull markdunning/summer-school2016