Analysis of publicly-available Microarray data

Although microarrays have been superseded by high-throughput sequencing technologies for gene expression profiling, years of experience gained from analysing microarray data has led to a variety of analysis techniques and datasets that can be exploited in other contexts. In this course, we will focus on retrieving and exploring microarray data from public repositories such as Gene Expression Omnibus (GEO).

Aims: During the course you will learn about

Objectives: After this course you should be able to

Prerequisites

R packages

We recommend using RStudio for the practicals along with R version 3.2.3

Download the materials from this repository and install the required R and Bioconductor packages from within RStudio. This may take several minutes.

source("http://www.bioconductor.org/biocLite.R")
biocLite(c("limma","affy","affyPLM","beadarray","GEOquery","genefilter","illuminaHumanv3.db","cluster",
           "ggplot2","GOstats","breastCancerVDX","breastCancerTRANSBIG","breastCancerNKI","pamr","survival",
           "estrogen","ArrayExpress","RColorBrewer","arrayQualityMetrics","hgu95av2cdf","hgu95av2.db","org.Hs.eg.db","wakefield"))

Data and Scripts

Please download this zip file, (~50MB) which contains all the data and R markdown files that you will need during the course If you are having problems downloading and imported GSE33126, you can download a pre-processed R object here

Day 1

  • Introduction to Microarray technologies
  • A workflow for the analysis of Affymetrix arrays (template code)
  • A workflow for the analysis of Illumina arrays (Reference)
  • Linear Models and Statistics for Differential Expression
  • Getting data from public repositories (template code)
  • Day 2

  • Differential Expression tutorial(template code)
  • Downstream Analysis tutorial(template code)
  • Clustering and Survival Analysis Lecture
  • Enrichment and ontologies tutorial(template code)
  • References

  • Thomas Girke's Bioconductor manual
  • R Programming wiki