Description

High-throughput technologies such as next generation sequencing (NGS) can routinely produce massive amounts of data. These technologies allow us to describe all variants in a genome and international collaborative efforts such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) have begun to catalogue and release data on genomic variation in a variety of cancer types.

However, such datasets pose new challenges in the way the data have to be analyzed, annotated and interpreted which are not trivial and are daunting to the wet-lab biologist. This course covers state-of-the-art and best-practice tools for the analysis of cancer genomes. We describe, and give hands-on experience of, the entire analysis workflow from raw data generated by a sequencing machine to deriving variant calls (SNVs, copy-number and structural variants) that are ready for downstream analysis and interpretation

Instructors

We are very grateful for the support of the Bioinformatics Training facility at the University of Cambridge

Aims

Objectives

Venue

Accommodation

If required, free bed and breakfast accommodation wll be provided for attendees in Downing College - close to the course's location. Please let us know on the registration form if you need accommodation and when you plan to check-in and check-out.

Timetable

Day One 09:30 - 17:00

  • 09:30 - 12:30 Course Introductions and R Recap:-
  • 12:30 - 13:30 LUNCH (provided)

  • 13:30 - 14:30 Understanding raw sequencing reads:-
  • 14:30 - 17:00 Understanding aligned reads:-

    Day Two 09:30 - 17:00

  • 09:30 - 10:30 SNV Introduction and calling germline SNVs (Lecture)
  • 10:30 - 12:30 Calling germline variants and working with VCF files
  • 12:30 - 13:30 LUNCH (provided)

  • 13:30 - 17:00 Copy-Number analysis
  • Lecture
  • Practical
  • Day Three 09:30 - 17:00

  • 09:30 - 10:00 Somatic SNV calling (Lecture)
  • 10:00 - 10:45 Filtering SNVs (Lecture)
  • 10:45 - 12:30 Visualising and Assessing Somatic SNVs
  • 12:30 - 13:30 LUNCH (provided)

  • 13:30 - 16:00 Annotating, Filtering and Prioritising SNVs
  • 16:00 - 17:00 Further issues and considerations
  • Lecture
  • Practical
  • Day Four 09:30 - 17:00

    Index

    Solutions to Exercises

  • 09:30 - 10:30 Introduction to Structural Variants (SV) and methods for calling SVs
  • 10:30 - 12:30 Visualising SVs
  • 12:30 - 13:30 LUNCH (provided)
  • 13:30 - 14:30 Intersecting and filter SVs
  • 14:30 - 16:00 Understanding and annotating SVs
  • 16:00 - 17:00 Complex rearrangements and detecting "chromothripsis"
  • Bonus: Docker Demo

    WORKSHOP DINNER

    Day Five (1/2 day) 09:30 - 12:30

    Slides

  • Dealing with large collections of Genomes
  • Obtaining TCGA data
  • Mutational signatures
  • Integrating different data types
  • 12:30 - 13:30 LUNCH (provided)

    Software

    Practical exercises can be completed in either RStudio, or Docker

    RStudio instructions

  • Download and install the latest version of RStudio for your operating system: LINK
  • Once RStudio is loaded, Install the R packages required for the course by typing the command:-
  • source("https://raw.githubusercontent.com/bioinformatics-core-shared-training/cruk-summer-school-2016/master/installBioCPkgs.R")

    CGPbox instructions

    The Wellcome Trust Sanger Institute have made their cancer genome analysis pipeline available as a Docker container; cgpbox
  • Install Docker for your operating system. Docker for Mac requires OS X 10.10.3 Yosemite or newer. Docker for Windows requires Windows 10.
  • If you do not have Mac OSX 10.10.3 or Windows 10, you will have to install Docker Toolbox
  • Follow the installation instructions
  • Once docker is sucessfully running, you can get the following at the command-line:-
  • docker pull quay.io/wtsicgp/cgp_in_a_box

    CRUK Docker instructions

  • Install docker as above
  • Download the latest version of the docker container with:-
  • docker pull markdunning/summer-school2016