Shiny for Bioinformatics Tutorial

Mark Dunning

Last modified: 13 Sep 2017

A use-case in Genomics

We are going to develop a simple app to allow a user to perform statistical tests and generate plots from a published dataset. For us, this seems to be quite a common use-case. Assuming that you know how to interrogate and analyse a given gene in R, it is not much more effort to build the interface with Shiny. The advantage being that you can use your existing familiarity with R and not having to learn new languages.

We will be using the breastCancerNKI dataset, which is already available as part of Bioconductor. If you didn’t install this already, you will need to do:-

source("http://www.bioconductor.org/biocLite.R")
biocLite("breastCancerNKI")

Briefly, this dataset comprises the gene expression profiling of breast cancer patients. There is extensive clinical data available, but we will be focusing on the ER (Estrogen Receptor) status of each patient. This variable (either positive or negative) is known to predict survival.

The R code for this app is available as part of the materials for this course cancer-app.R

The UI for the app will contain a drop-down where the user can select the name of the gene they wish to interrogate. For this demonstration we will provide a limited set of options and a default value of ESR1, which we expect to show a significant difference between the conditions.

ui <- fluidPage(
   
   # Application title
   titlePanel("Interrogating the NKI breast cancer dataset"),
   
   sidebarLayout(
      sidebarPanel(
         selectInput("thegene","Gene to Analyse",
                     choices=c("ESR1","ERBB2","PTEN"),
                       selected  = "ESR1")
      ),
      
      # Show a plot of the generated distribution
      mainPanel(
         plotOutput("boxplot")
      )
   )
)

The server code has to deal with loading the Bioconductor package and creating data objects that we can interrogate. The standard way to access data from an experimental data package involves the use of the functions exprs (to get expression values), fData (for gene annotation) and pData (for sample annotations).

Note that this code only needs to be run once for the initial setup of the app. It could be placed in a separate script and run with the source(...) function to make the code a bit cleaner.

Note that we could also include code here for loading R objects or reading from a file as required.

  library(breastCancerNKI)
  library(Biobase)
  data(nki)
  expression.values <- exprs(nki)
  features <- fData(nki)
  er.status <- pData(nki)$er

The matrix expression.values contains the values we would like to plot; each row is a different gene and the columns are breast cancer patients. The er.status vector is the categorical variable corresponding to ER status.

table(er.status)
## er.status
##   0   1 
##  88 249
expression.values[1:5,1:5]
##                 NKI_4  NKI_6  NKI_7  NKI_8  NKI_9
## Contig45645_RC -0.215  0.071  0.182 -0.343 -0.134
## Contig44916_RC -0.207  0.055  0.077  0.302  0.051
## D25272         -0.158 -0.010  0.059  0.169 -0.007
## J00129         -0.819 -0.391 -0.624 -0.528 -0.811
## Contig29982_RC -0.267 -0.310 -0.120 -0.447 -0.536

To make a boxplot of the expression level of a particular gene, we can use the formula syntax in R

boxplot(expression.values[1,] ~ er.status)

A slight complication is that the rows of the expression matrix are the manufacturer identifier and the user is going to be inputting a gene name (gene symbol). So to identify the gene we are interested in (ESR1 for example) we have to perform a mapping via the features matrix.

features[1:5,1:5]
##                         probe EntrezGene.ID     probe.name Alignment.score
## Contig45645_RC Contig45645_RC         64388 Contig45645_RC              60
## Contig44916_RC Contig44916_RC        140883 Contig44916_RC              60
## D25272                 D25272            NA           <NA>              NA
## J00129                 J00129          2244         J00129              60
## Contig29982_RC Contig29982_RC        286133 Contig29982_RC              60
##                Length.of.probe
## Contig45645_RC              60
## Contig44916_RC              60
## D25272                      NA
## J00129                      60
## Contig29982_RC              60

The relevant column in features is HUGO.gene.symbol and we will use the match function to check which rows match a particular bit of text. To keep things simple, we will ignore the fact that a gene might have more than one probe.

probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
values <- expression.values[probe.id,]

In this first iteration of the app, the user is expected to type the name of the gene from a drop-down list. The value they have typed can be retrieved from the server code using input$thegene.

The code for the app then becomes:-

library(shiny)

ui <- fluidPage(
   
   # Application title
   titlePanel("Interrogating the NKI breast cancer dataset"),
   
   sidebarLayout(
      sidebarPanel(
         selectInput("thegene","Gene to Analyse",
                     choices=c("ESR1","ERBB2","PTEN"),
                       selected  = "ESR1")
      ),
      
      # Show a plot of the generated distribution
      mainPanel(
         plotOutput("boxplot")
      )
   )
)

server <- function(input, output) {
   
  library(breastCancerNKI)
  
  data(nki)
  expression.values <- exprs(nki)
  features <- fData(nki)
  er.status <- pData(nki)$er
  
   output$boxplot <- renderPlot({

     gene <- input$thegene
     probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
     
     values <- expression.values[probe.id,]
     boxplot(values ~ er.stats)
   })
   
}

# Run the application 
shinyApp(ui = ui, server = server)

As we have seen previously, we can also add options to allow the user to customise the appearance of the plot

library(shiny)

ui <- fluidPage(
   
   # Application title
   titlePanel("Interrogating the NKI breast cancer dataset"),
   
   sidebarLayout(
      sidebarPanel(
         selectInput("thegene","Gene to Analyse",
                     choices=c("ESR1","ERBB2","PTEN"),
                       selected  = "ESR1"),
                 radioButtons("colour","Colour of histogram",choices=c("red","green","blue"),selected="red")
      ),
      
      # Show a plot of the generated distribution
      mainPanel(
         plotOutput("boxplot")
      )
   )
)

server <- function(input, output) {
   
  library(breastCancerNKI)
  
  data(nki)
  expression.values <- exprs(nki)
  features <- fData(nki)
  er.status <- pData(nki)$er
  
   output$boxplot <- renderPlot({

     gene <- input$thegene
     probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
     
     values <- expression.values[probe.id,]
     boxplot(values ~ er.status,col=input$colour)
   })
   
}

# Run the application 
shinyApp(ui = ui, server = server)

Exercise

  • Add an output which is the result of running a two-sided t-test to assess whether the selected gene has different expression level in ER positive versus ER negative tumours
    • HINT we suggest using the t.test function

Code Re-use

Hopefully you will have recognised that the code required to apply the t-test and the boxplot is pretty similar, and share the common step of filtering our expression matrix according to gene. Hence, we can write this filtering procedure as a function that is called when the boxplots and t-test output are produced.

An R script to run the app we just described is supplied in the file cancer-app-2.R. If you open the R script in RStudio, you will see that a function has been created to retrieve the gene expression values for a particular gene. The code chunks for the boxplot and t-test then call this code. Generally, it is a good idea to re-use code in this manner đź‘Ť.

We have added a Sys.sleep call to make the code wait for a while before returning the final data frame. This is to simulate the use-case when the filtering operations we need to perform require more computation than the simple dataset presented here.

  filterByExpression <- function(){
    gene <- input$thegene
     probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
      Sys.sleep(5)
     expression.values[probe.id,]
  }

Exercise

  • Run the app described by the file cancer-app-2.R. In particular, change the option for colour whilst keeping the gene name the same
  • What do you notice?
    • should changing the colour of the boxplot cause you to filter the data again?
    • does it seem efficient to run the app in this manner?

Introducing “reactives”

Unfortunately, there is some inefficiency into our app. The code inside the filterByExpression function will be triggered whenever the user requests a new plot by changing the gene name or the color of the plot. This code includes the steps to filter the expression data based on the gene that is currently selected. It makes sense for this code to be re-run when a new gene is selected, but not if a different colour is selected (which is the current situation). It would be much better to retrieve the data for a particular gene once only.

A solution is to define a reactive function. These are special functions in shiny that caches it’s values, and will only be re-run when any of these values become outdated.

  • The first time the function is called, the output values are stored in memory
  • The next time the function is called, the function will check if the saved value is out of date (e.g. whether any of th inputs have been updated)
  • If not, the saved values will be returned and no computation will be performed
  • If an update is required, the function will be re-run and a new output output will be saved

The R script cancer-app-3.R contains a slightly modified server script with filterByExpression being a reactive rather than a standard R function.

  filterByExpression <- reactive({
     gene <- input$thegene
     probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
      Sys.sleep(5)
     expression.values[probe.id,]
  })

Try changing the values for gene name and boxplot colour in this new shiny app. You should notice that the plot colour can be changed instantaneously, whereas selecting a different gene triggers the filterByExpression function which includes the sleep statement.

For more information on reactive programming, see the RStudio article

Other features

Dynamic tables

  • A table widget to allow users to plot any available gene
    • uses the DT library
    • the table can be sorted, filtered in a familiar manner
    • shiny keeps track of which row has been clicked; which can be accessed via the input list
    • see cancer-app-4.R

Actions. e.g. downloading the plot

There are various options to allow users to download data from your app.

  • create a clickable download button in the UI with downloadButton; specifying what the output object is to be called
  • use the downloadHandler function in the server script to define how the output file is created
    • in our case, we use the pdf function to start writing graphics to a temporary file, which then get saved into the specified file name
    • the name of the file can be hard-coded, or defined using a function call to get the current date (for example)
  • similar actions are supported such as a Go! button to execute an analysis
   sidebarLayout(
      sidebarPanel(
         selectInput("thegene","Gene to Analyse",
                     choices=c("ESR1","ERBB2","PTEN"),
                       selected  = "ESR1"),
                 radioButtons("colour","Colour of histogram",choices=c("red","green","blue"),selected="red"),
         downloadButton("plotPDF",label="Download")
      ),
....
  

server <- function(input, output) {

....
  
   output$plotPDF <- downloadHandler(
    filename = "boxplot.pdf",   
    ## filename could also be the result of a function call
    content = function(file){
     pdf(file)
      
     ## Could also have the plot generated by a function which is re-used
     ## i.e. the same function could generate the plot regardless of whether plot is being printed to pdf
     ## or to the screen
     
      gene <- input$thegene
     probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
     
     values <- filterByExpression()
     
     boxplot(values ~ er.status,col=input$colour)
     dev.off()
    }
    
...


}