Ngs-in-bioc

A course on Analysing Next Generation (/High Throughput etc..) Sequencing data using Bioconductor

Download .zip Download .tar.gz View on GitHub

Description.

This course provides an introduction to the tools available through the Bioconductor project for manipulating and analysing high-throughput sequencing (HTS) data. We will present workflows for the analysis of ChIP-Seq and RNA-seq data starting from aligned reads in bam format. We will also describe the various resources available through Bioconductor to annotate and visualize HTS data, which can be applied to any type of sequencing experiment.

Authors.

Thomas Carroll

Mark Dunning

Suraj Menon

Bernard Pereira

Oscar Rueda

Roslin Russell

Shamith Samarajiwa

Prerequisites.

A knowledge of current sequencing technologies, data formats (e.g. fastq and bam) and alignment

A very basic knowledge of UNIX would be an advantage, but nothing will be assumed and extremely little will be required

Attendees should be comfortable with using the R statistical language to read and manipulate data, and produce simple graphs

Aims.

To provide an understanding of how aligned sequencing reads, genome sequences and genomic regions are represented in R.

To encourage confidence in reading sequencing reads into R, performing quality assessment and executing standard pipelines for RNA-Seq and ChIP-Seq analysis

Objectives.

Know what tools are available in Bioconductor for HTS analysis and understand the basic object-types that are utilised.

Given a set of gene identifiers, find out whereabouts in the genome they are located, and vice-versa (i.e. go from genomic coordinates to genes).

Produce a list of differentially expressed genes from an RNA-Seq experiment.

Import a set of ChIP-Seq peaks and investigate their biological context.

How to Run the course.

We recommend using RStudio for the practicals

Download the materials from this repository and install the required R and Bioconductor packages from within RStudio. This may take several minutes.

source("http://www.bioconductor.org/biocLite.R")
        biocLite(c("Biostrings", "ShortRead", "DESeq", "edgeR","biomaRt", "BSgenome",
        "BSgenome.Dmelanogaster.UCSC.dm6", "org.Dm.eg.db",
        "TxDb.Dmelanogaster.UCSC.dm3.ensGene", "pasillaBamSubset", "pasilla",
        "rtracklayer", "ggbio", "vsn","gplots","RColorBrewer","chipseq","htSeqTools","limma","NBPSeq","tweeDEseqCountData","org.Hs.eg.db","Rcade", "exomeCopy","CNAnorm", "ChIPQC","TxDb.Hsapiens.UCSC.hg19.knownGene","BSgenome.Hsapiens.UCSC.hg19", "ChIPpeakAnno","statmod","locfit"))

Using Docker.

docker run -p 8787:8787 markdunning/ngs-in-bioc

Then load your web browser of choice and enter the address

http://localhost:8787

This will allow you to use RStudio in your web browser with the username and password 'rstudio'

Example Data.

Some bam files are required for RNA-seq analysis that are too large to distribute via github. They can be downloaded from the following links and placed in the folder:- Day2/bam

RNA-seq sample 16N aligned bam

RNA-seq sample 16N aligned bam index

RNA-seq sample 16T aligned bam

RNA-seq sample 16T aligned bam index

RNA-seq sample 18N aligned bam

RNA-seq sample 18N aligned bam index

RNA-seq sample 18T aligned bam

RNA-seq sample 18T aligned bam index

RNA-seq sample 19N aligned bam

RNA-seq sample 19N aligned bam index

RNA-seq sample 19T aligned bam

RNA-seq sample 19N aligned bam index