Description.
High-throughput technologies such as next generation sequencing (NGS) can routinely produce massive amounts of data. These technologies allow us to describe all variants in a genome or to detect the whole set of transcripts that are present in a cell or tissue. However, such datasets pose new challenges in the way the data have to be analyzed, annotated and interpreted which are not trivial and are daunting to the wet-lab biologist. This course covers state-of-the-art and best-practice tools for NGS RNA-seq and ChIP-seq data analysis, which are of major relevance in today’s genomic and gene expression studies.
Instructors.
Prerequisites.
There is a lot of material to cover in the course, so we will assume that you are familiar with a few basics before you come. The tool that will we do most of the analysis in is R. There will be a short recap of the key concepts at the beginning of the course; however it will be beneficial if you are already familiar with the following
Several Online videos are available that cover this materials. For example
Aims.
Objectives.
Course Materials.
Day One.
Day Two.
Day Three.
Day Four.
Day Five.
We recommend using How to Run the course.RStudio for the practicals along with R version 3.2.1
Download the materials from this repository and install the required R and Bioconductor packages from within RStudio. This may take several minutes.
source("http://www.bioconductor.org/biocLite.R")
biocLite(c("Biostrings", "ShortRead", "DESeq", "edgeR","biomaRt", "BSgenome",
"pasillaBamSubset", "pasilla",
"rtracklayer", "ggbio", "vsn","gplots","RColorBrewer","chipseq","htSeqTools","limma","NBPSeq","tweeDEseqCountData","org.Hs.eg.db","Rcade", "ChIPQC","TxDb.Hsapiens.UCSC.hg19.knownGene","BSgenome.Hsapiens.UCSC.hg19","ChIPpeakAnno","statmod","locfit","Rsubread","goseq","GO.db"))
The Download zip file link at the top of this page will download all the lectures and practicals, and some example data. However, larger data files have to be downloaded from elsewhere because they are too large to share on github
Example Data.
Day 1
A breast cancer dataset is also required for the Bioconductor introductory practical. This folder can be downloaded from Dropbox. Once downloaded and unzipped, the folder should be placed inside the Day1 directory
Day 2
Using Docker.
If you not attending one of our courses in-person you can still run the course materials using the Docker system.
First, you will need to install the boot2Docker software.
Once you have boot2docker installed, an icon should appear on your Desktop (Windows) or Applications folder (Mac). After running this new application, a new window should appear will various lines of white text on a black background. The last line should read;
docker@boot2docker:~$
Now carefully type the following line of text (using the correct spaces and punctuation is very important!)
docker run -p 8787:8787 markdunning/cruk-bioinf-sschool
This will download and install some data. Once this has finished, you can open a web browser and type the following. This will launch a version of RStudio within your browser. You will need to enter the username 'rstudio' and password 'rstudio'.
http://localhost:8787
For exercises which use the command-line (e.g. alignment and qa practicals) run the following command in boot2docker
docker run -ti markdunning/cruk-bioinf-sschool /bin/bash
License
This work is licensed under the Creative Commons Attribution-ShareAlike 2.0 UK: England & Wales License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/uk/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
Resources