Contents

1 Introduction

In this practical session, we will familiarize ourselves with IGV (Integrative Genomics Viewer) to assess the ChIP-seq quality. IGV is used to visualize and explore next-generation sequencing data and annotations. Open the virtual desktop and type igv in the terminal. A window of IGV should open.

This material for IGV was adapted from the practical material by Shoko Hirosue in 2020.

1.1 Loading IGV and basic navigation on the genome browser

  1. Open IGV and select the genome build we were using for the alignment in the top left corner (“Human (hg38)”). If it’s not listed, select Genomes > Load Genome from Server, select hg38 and load the genome.

  2. Select a single chromosome (eg. chr3).

  3. Navigate to chr3:39,000,000-43,000,000.

  4. Double click on one of the genes to recentre the window on that gene.

  5. Navigate to your favourite gene (or ZMAT3).

  6. Zoom out to see the surrounding genes.

  7. Right click the left side panel next to your genome track and select “Expanded” to see all the transcripts.

1.2 Loading your data into IGV

  1. Now click File > Load from File… and open the bam files and peak files. Let’s open tp53_r2.fastq_trimmed.fastq_sorted.bam and its input file input.fastq_trimmed.fastq_sorted.bam, as well as the narrowPeak file tp53_r2.fastq_trimmed.fastq_sorted_peaks.narrowPeak and summit bed file tp53_r2.fastq_trimmed.fastq_sorted_summite.bed in the data directory (/home/ubuntu/Course_Materials/ChIPSeq/practicals/data or /home/ubuntu/Course_Materials/ChIPSeq/practicals/output/macs_output).

  2. Zoom into one of the peak regions.

  3. Check tp53 targets from (TRRUST): OGG1, XPC, SLC6A6, CTNNB1, MAP4, RASSF1.

  4. Select bam files you have loaded, right click them and select “Group Autoscale”.

  5. Bookmark this region: Go to Regions > Region Navigator. Click Add, and give your region a name (eg. MyFirstRegion) in the “Description” field. Click “View”. This way if you navigate somewhere else on the genome you can always easily access this region from Regions > Region Navigator.

  6. Remove all the files (just so it’s easier to see.) Load BigWig files (tp53_r2.fastq_trimmed.fastq_sorted_standard_treat_pileup.bw and tp53_r2.fastq_trimmed.fastq_sorted_standard_control_lambda.bw)

  7. Group autoscale the two tracks so they are comparable.

  8. Set different colours for each of the tracks (Right click at the file name, choose Change Track Colour (Positive values)…).

  9. Export an image from File > Save Image and have a look at the saved file.

Exercise 1

  1. Explore the IGV
  2. Load tp53 or p73 data

2 QC part 2

Now let’s have a quick look of some metrics we learnt from the lecture. The bioconductor package we are using here is ChIC.

# get the working directory
getwd()
# load the library
library(ChIC)

# load the data
chipName <- 'practicals/data/tp53_r2.fastq_trimmed.fastq_sorted'
chipBam <- readBamFile(chipName)
inputName <- 'practicals/data/input.fastq_trimmed.fastq_sorted'
inputBam <- readBamFile(inputName)

Select chromosome 3 of interest:

subset_chromosomes<-c("chr3")

chipSubset<-lapply(chipBam, FUN=function(x) {x[subset_chromosomes]})
inputSubset<-lapply(inputBam, FUN=function(x) {x[subset_chromosomes]})
str(chipSubset)

Now we can generate quality scores and strand cross-correlation plot using qualityScores_EM from ChIC packages.

# if you get error with plotting: /home/ubuntu/.local/share/rstudio/notebooks
# go to /home/ubuntu/.local/share/rstudio/notebooks
# change your permission `chmod +x /home/ubuntu/.local/share/rstudio/notebooks`
# or add savePlotPath='/home/ubuntu/Course_Materials/ChIPSeq/practicals/output/',

EM_Results <- qualityScores_EM(
       chipName=chipName,
       inputName=inputName,
       chip.data=chipSubset,
       input.data=inputSubset,
       annotationID="hg38",
       read_length=32,
       mc=8)

Remember the metrics we covered before lunch. You can check those PBC, RSC, NSC metics below:

PBC <- EM_Results$QCscores_ChIP$CC_PBC

RSC <- EM_Results$QCscores_ChIP$CC_RSC

NSC <- EM_Results$QCscores_ChIP$CC_NSC

PBC

Exercise 2

  1. Try to play around with it yourself (May take a few mins to generate plot)
  2. (Optional) Generate strand cross-correlation plot for `p73`.

3 Acknowledgements

Joanna Krupka

Shoko Hirosue

Dora Bihary