CRUK Summer School 2017

The biology underlying somatic structural variation

Lecture 1

Exercise 1: Understanding the relationship between SVs and short read sequencing

How might the following double-stranded break repair mechanisms manifest in short read sequence data aligned to reference genome at the breakpoint?
- Homologous recombination
- Non-homologous end-joining
How would the following structural variants manifest in short read sequence data aligned to reference genome?
- Deletion
- Tandem duplication
- Inversion

(Hint: How many breaks are there? Consider both reads that overlap the break and those that span the break.)

Advanced: Can you sketch how reads might overlap translocations?

Methods for calling SVs

Lecture 2

Exercise 2: SV classification

Manually (using pen and paper) reconstruct the following set of breakpoints into a single rearrangement event:
- 2:47676199 + 2:50629634 +
- 2:49437116 - 2:47678469 -
- 2:50632712 - 2:49431472 +
What might the functional consequence of this rearrangement be?

## [1] "http://www.nature.com/ncomms/2015/150401/ncomms7605/extref/ncomms7605-s1.pdf"

Working with SVs

Lecture 3

Exercise 3 - Visualising SVs

Open the test tumour and normal genomes in IGV, navigate to the following locations and record the number of reads supporting each breakpoint.
- chr1:150447295-150447299
- chr4:92896530-92896534
- chr6:24910138-24910139

(Hint: switch “Show soft-clipped bases” on under Tracks -> Preferences -> Alignments)

Load the deep whole-genome cell-line BRASS output (SV bedpe format) into R.

(Hint: Use the read.table function with the parameters: header, sep, stringsAsFactors, skip and comment.char. You will also need to reformat the chromosome co-ordinates e.g. change 1 to chr1.)

SVbedpe<-read.table(gzfile("HCC1143_vs_HCC1143_BL.annot.bedpe.gz"), 
                    header = T, sep="\t",stringsAsFactors = F, skip=69, comment.char = "")
colnames(SVbedpe)[1]<-"chr1"
SVbedpe$chr1<-paste0("chr",SVbedpe$chr1)
SVbedpe$chr2<-paste0("chr",SVbedpe$chr2)

Generate a plot showing the size distributions of the different SV classes.

library(ggplot2)
ggplot(SVbedpe[SVbedpe$bkdist>0,],aes(bkdist))+geom_density()+facet_grid(svclass ~ .,scales = "free_y")+xlim(0,1e6)

Using the circlize package generate a circos plot with SVs as links (use this documentation to assist you)

suppressMessages(library(circlize))
par(mar = c(1, 1, 1, 1))
circos.initializeWithIdeogram(plotType = c('axis', 'labels'))
bed1<-SVbedpe[,1:3]
bed2<-SVbedpe[,4:6]
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
colours<-cbbPalette[1:length(unique(SVbedpe$svclass))]
names(colours)<-unique(SVbedpe$svclass)
circos.genomicLink(bed1, bed2,col=colours[SVbedpe$svclass])
legend("topright", legend = names(colours),fill=colours,cex=0.35)

Advanced: Add copy-number calls to your circos plot

Load the ASCAT copy-number calls into R (for convenience these are stored in HCC1143.Rdata). Find the data object which contains segmented copy numbers and output the first 10 lines.

ascat.output<-readRDS("ascat.output.rds")
head(ascat.output$segments)

##    sample chr startpos   endpos nMajor nMinor
## 1 HCC1143   1    61735  6529823      2      2
## 2 HCC1143   1  6530189  8712280      2      1
## 3 HCC1143   1  8716357  9010627      4      1
## 4 HCC1143   1  9011627 20096288      2      1
## 5 HCC1143   1 20099109 23443472      2      2
## 6 HCC1143   1 23444113 43537981      2      1

Using these segments, add the copy-number calls to your circos plot.

(Hint: use the genomicTrackPlotRegion function)

totalcn<-ascat.output$segments[,c("chr","startpos","endpos")]
totalcn<-cbind(totalcn,value=ascat.output$segments$nMajor+ascat.output$segments$nMinor)
totalcn$chr<-paste0("chr",totalcn$chr)
totalcn$value[totalcn$value>10]<-10
circos.initializeWithIdeogram(plotType = c('axis', 'labels'))
circos.genomicTrackPlotRegion(totalcn,ylim=c(0,10),
                              panel.fun=function(region,value,...){
                                i=getI(...)
                 circos.genomicLines(region,value,type="segment",lwd=3,col="blue",...)})
circos.genomicLink(bed1, bed2,col=colours[SVbedpe$svclass])
legend("topright", legend = names(colours),fill=colours,cex=0.35)

CRUK Summer School 2017 - Day 4

Geoff Macintyre