Functional impact of indels in cancer
Short exercise: exploring TGF-beta receptor 2 indels in COSMIC
Indel calling tools
How well do somatic indel callers perform?
Some characteristics of indels called in HCC1143
July 2017
Functional impact of indels in cancer
Short exercise: exploring TGF-beta receptor 2 indels in COSMIC
Indel calling tools
How well do somatic indel callers perform?
Some characteristics of indels called in HCC1143
Range from 1 to 10,000 bases but here we are mostly considering small indels of 1 - 50 bp
Lower frequency than SNPs except near highly repetitive regions, including homopolymers and microsatellites
1000 Genomes Project loss of function indel variants in each individual [Nature 2010]
340 - 400 premature stop codons, split site disruptions and frame shifts
250 - 300 genes affected
Colorectal cancer
15% of colorectal tumours characterized by microsatellite instability (MSI), caused by defective mismatch repair
DNA slippage within coding sequences induces frameshift mutations resulting in truncated, functionally inactive proteins (TGFBR2 and BAX frequently targeted genes)
In-depth analyses in large cohort studies typically focus on substitutions or copy number aberrations/rearrangements
Recent Sanger Institute paper on 560 breast cancers searched for novel indel drivers in non-coding regions with significant recurrence (functional regulatory elements) [Nik-Zainal et al., Nature 2016]
Search for TGFBR2 in the COSMIC website
Explore the breakdown of mutation types for samples with TGFBR2 mutations (Distribution tab)
What proportion of samples have frameshift indels?
What size are most of these indels?
Look at the location of the indels within the gene (Gene View tab)
What is immediately striking about this?
Zoom in on the most frequently observed indel and turn on the DNA sequence display
What is the sequence context at this indel?
Click on the c.374delA deletion to access details of the cancers in which this mutation was observed
Several tools call both SNVs and indels particularly those tools that perform local reassembly or realignment around likely indel sites
Pindel is the somatic indel caller used in the Sanger CGP pipeline
Uses a pattern growth algorithm for unmapped paired end reads anchored to mapped mate
Can identity short and medium sized indels up to 10kb
Modified version used in CGP pipeline that takes advantage of additional alignment information available for longer reads (>100 bases) and can make use of split read alignments
Many indels exist in long homopolymers and short sequence repeats (di-nucleotides, tri-nucleotides, etc.)
Not easy to distinguish between true variants caused by replication slippage and sequencing errors
Benchmark exercises show that indels are harder to call accurately that SNVs
Total of 208 somatic insertions and 564 somatic deletions called by Pindel.
272 hompolymer A/T indels (35% of all indels), almost all of which are 1-bp insertions or deletions.