Overview

In this section we will be looking at how IGV can be used for visualizing mutations in sequence data. Here we consider the scenario in which genome sequencing has been performed on a DNA sample, sequence reads have been aligned to the reference genome and a variant caller such as GATK HaplotypeCaller or MuTect2 has been run.

We will inspect some regions of the genome where there are possible variants in a breast cancer cell line to determine whether these are real events or artifacts. These will include single nucleotide variants (SNVs), small insertions and deletions (indels) and larger structural rearrangements.

Acknowledgement

The material presented here is adapted from the excellent RNA-seq tutorial from the Griffith lab at the McDonnell Genome Institute, Washington University School of Medicine, St. Louis.

https://github.com/griffithlab/rnaseq_tutorial/wiki


HCC1143 data set

We will be using publicly available Illumina sequence data generated for the HCC1143 cell line. The HCC1143 cell line was generated from a 52 year old caucasian woman with breast cancer. Additional information on this cell line can be found here: HCC1143 (tumor, TNM stage IIA, grade 3, primary ductal carcinoma) and HCC1143/BL (matched normal EBV transformed lymphoblast cell line).

Sequence reads were aligned to version GRCh37 of the human reference genome. We will be working with subsets of aligned reads in the region: chromosome 21: 19,000,000 - 20,000,000.

The BAM files containing these reads for the cancer cell line and the matched normal are:

  • HCC1143.tumour.21.19M-20M.bam
  • HCC1143.normal.21.19M-20M.bam

These need to be indexed to be read into IGV. The index files have the .bai suffix and allow IGV to speedily access and display the reads aligning to a specified genomic location.

The reads are from paired end sequencing. DNA fragments of approximately 350 base pairs have been sequenced from each end. The read lengths are 101bp.


Load aligned sequence data

First we need to ensure that IGV is using the same reference genome as that to which the sequence data were aligned, GRCh37, also known as hg19.

  • Select Human hg19 from the drop-down list in the top left of the IGV window.

Now we’re ready to load the sequence data.

  • Select File > Load from File... from the main menu and select the BAM file HCC1143.normal.21.19M-20M.bam using the file browser.