A course on Analysing Next Generation (/High Throughput etc..) Sequencing data using Bioconductor

Download .zip Download .tar.gz View on GitHub


This course provides an introduction to the tools available through the Bioconductor project for manipulating and analysing high-throughput sequencing (HTS) data. We will present workflows for the analysis of ChIP-Seq and RNA-seq data starting from aligned reads in bam format. We will also describe the various resources available through Bioconductor to annotate and visualize HTS data, which can be applied to any type of sequencing experiment.


  • Thomas Carroll
  • Mark Dunning
  • Suraj Menon
  • Bernard Pereira
  • Oscar Rueda
  • Roslin Russell
  • Shamith Samarajiwa
  • Prerequisites.

  • A knowledge of current sequencing technologies, data formats (e.g. fastq and bam) and alignment
  • A very basic knowledge of UNIX would be an advantage, but nothing will be assumed and extremely little will be required
  • Attendees should be comfortable with using the R statistical language to read and manipulate data, and produce simple graphs
  • Aims.

  • To provide an understanding of how aligned sequencing reads, genome sequences and genomic regions are represented in R.
  • To encourage confidence in reading sequencing reads into R, performing quality assessment and executing standard pipelines for RNA-Seq and ChIP-Seq analysis
  • Objectives.

  • Know what tools are available in Bioconductor for HTS analysis and understand the basic object-types that are utilised.
  • Given a set of gene identifiers, find out whereabouts in the genome they are located, and vice-versa (i.e. go from genomic coordinates to genes).
  • Produce a list of differentially expressed genes from an RNA-Seq experiment.
  • Import a set of ChIP-Seq peaks and investigate their biological context.
  • Day One.

  • Introduction to NGS Sequencing (L)
  • R recap (P)
  • Representing Sequencing data in Bioconductor (L)
  • Representing Sequencing data in Bioconductor (P)
  • Linear Models and Experimental Design (L)
  • Day Two.

  • Introduction to RNA Sequencing
  • RNA-seq Practical 1
  • RNA-seq Practical 2
  • Genome Annotation in Bioconductor
  • Genome Annotation Practical
  • Day Three.

  • Introduction to ChIP-Seq
  • QC of ChIP-seq data (L)
  • QC of ChIP-Seq data (P)
  • Downstream analysis of ChIP-Seq (L)
  • Downstream analysis of ChIP-Seq (P)
  • How to Run the course.

    We recommend using RStudio for the practicals

    Download the materials from this repository and install the required R and Bioconductor packages from within RStudio. This may take several minutes.

            biocLite(c("Biostrings", "ShortRead", "DESeq", "edgeR","biomaRt", "BSgenome",
            "BSgenome.Dmelanogaster.UCSC.dm6", "",
            "TxDb.Dmelanogaster.UCSC.dm3.ensGene", "pasillaBamSubset", "pasilla",
            "rtracklayer", "ggbio", "vsn","gplots","RColorBrewer","chipseq","htSeqTools","limma","NBPSeq","tweeDEseqCountData","","Rcade", "exomeCopy","CNAnorm", "ChIPQC","TxDb.Hsapiens.UCSC.hg19.knownGene","BSgenome.Hsapiens.UCSC.hg19", "ChIPpeakAnno","statmod","locfit"))

    Using Docker.

    docker run -p 8787:8787 markdunning/ngs-in-bioc
    Then load your web browser of choice and enter the address
    This will allow you to use RStudio in your web browser with the username and password 'rstudio'

    Example Data.

    Some bam files are required for RNA-seq analysis that are too large to distribute via github. They can be downloaded from the following links and placed in the folder:- Day2/bam

  • RNA-seq sample 16N aligned bam
  • RNA-seq sample 16N aligned bam index
  • RNA-seq sample 16T aligned bam
  • RNA-seq sample 16T aligned bam index
  • RNA-seq sample 18N aligned bam
  • RNA-seq sample 18N aligned bam index
  • RNA-seq sample 18T aligned bam
  • RNA-seq sample 18T aligned bam index
  • RNA-seq sample 19N aligned bam
  • RNA-seq sample 19N aligned bam index
  • RNA-seq sample 19T aligned bam
  • RNA-seq sample 19N aligned bam index