We are going to develop a simple app to allow a user to perform statistical tests and generate plots from a published dataset. For us, this seems to be quite a common use-case. Assuming that you know how to interrogate and analyse a given gene in R, it is not much more effort to build the interface with Shiny. The advantage being that you can use your existing familiarity with R and not having to learn new languages.
We will be using the breastCancerNKI dataset, which is already available as part of Bioconductor. If you didn’t install this already, you will need to do:-
source("http://www.bioconductor.org/biocLite.R")
biocLite("breastCancerNKI")
Briefly, this dataset comprises the gene expression profiling of breast cancer patients. There is extensive clinical data available, but we will be focusing on the ER (Estrogen Receptor) status of each patient. This variable (either positive or negative) is known to predict survival.
The R code for this app is available as part of the materials for this course cancer-app.R
The UI for the app will contain a drop-down where the user can select the name of the gene they wish to interrogate. For this demonstration we will provide a limited set of options and a default value of ESR1
, which we expect to show a significant difference between the conditions.
ui <- fluidPage(
# Application title
titlePanel("Interrogating the NKI breast cancer dataset"),
sidebarLayout(
sidebarPanel(
selectInput("thegene","Gene to Analyse",
choices=c("ESR1","ERBB2","PTEN"),
selected = "ESR1")
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("boxplot")
)
)
)
The server
code has to deal with loading the Bioconductor package and creating data objects that we can interrogate. The standard way to access data from an experimental data package involves the use of the functions exprs
(to get expression values), fData
(for gene annotation) and pData
(for sample annotations).
Note that this code only needs to be run once for the initial setup of the app. It could be placed in a separate script and run with the source(...)
function to make the code a bit cleaner.
Note that we could also include code here for loading R objects or reading from a file as required.
library(breastCancerNKI)
library(Biobase)
data(nki)
expression.values <- exprs(nki)
features <- fData(nki)
er.status <- pData(nki)$er
The matrix expression.values
contains the values we would like to plot; each row is a different gene and the columns are breast cancer patients. The er.status
vector is the categorical variable corresponding to ER status.
table(er.status)
## er.status
## 0 1
## 88 249
expression.values[1:5,1:5]
## NKI_4 NKI_6 NKI_7 NKI_8 NKI_9
## Contig45645_RC -0.215 0.071 0.182 -0.343 -0.134
## Contig44916_RC -0.207 0.055 0.077 0.302 0.051
## D25272 -0.158 -0.010 0.059 0.169 -0.007
## J00129 -0.819 -0.391 -0.624 -0.528 -0.811
## Contig29982_RC -0.267 -0.310 -0.120 -0.447 -0.536
To make a boxplot of the expression level of a particular gene, we can use the formula syntax in R
boxplot(expression.values[1,] ~ er.status)
A slight complication is that the rows of the expression matrix are the manufacturer identifier and the user is going to be inputting a gene name (gene symbol). So to identify the gene we are interested in (ESR1
for example) we have to perform a mapping via the features
matrix.
features[1:5,1:5]
## probe EntrezGene.ID probe.name Alignment.score
## Contig45645_RC Contig45645_RC 64388 Contig45645_RC 60
## Contig44916_RC Contig44916_RC 140883 Contig44916_RC 60
## D25272 D25272 NA <NA> NA
## J00129 J00129 2244 J00129 60
## Contig29982_RC Contig29982_RC 286133 Contig29982_RC 60
## Length.of.probe
## Contig45645_RC 60
## Contig44916_RC 60
## D25272 NA
## J00129 60
## Contig29982_RC 60
The relevant column in features
is HUGO.gene.symbol
and we will use the match
function to check which rows match a particular bit of text. To keep things simple, we will ignore the fact that a gene might have more than one probe.
probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
values <- expression.values[probe.id,]
In this first iteration of the app, the user is expected to type the name of the gene from a drop-down list. The value they have typed can be retrieved from the server code using input$thegene
.
The code for the app then becomes:-
library(shiny)
ui <- fluidPage(
# Application title
titlePanel("Interrogating the NKI breast cancer dataset"),
sidebarLayout(
sidebarPanel(
selectInput("thegene","Gene to Analyse",
choices=c("ESR1","ERBB2","PTEN"),
selected = "ESR1")
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("boxplot")
)
)
)
server <- function(input, output) {
library(breastCancerNKI)
data(nki)
expression.values <- exprs(nki)
features <- fData(nki)
er.status <- pData(nki)$er
output$boxplot <- renderPlot({
gene <- input$thegene
probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
values <- expression.values[probe.id,]
boxplot(values ~ er.stats)
})
}
# Run the application
shinyApp(ui = ui, server = server)
As we have seen previously, we can also add options to allow the user to customise the appearance of the plot
library(shiny)
ui <- fluidPage(
# Application title
titlePanel("Interrogating the NKI breast cancer dataset"),
sidebarLayout(
sidebarPanel(
selectInput("thegene","Gene to Analyse",
choices=c("ESR1","ERBB2","PTEN"),
selected = "ESR1"),
radioButtons("colour","Colour of histogram",choices=c("red","green","blue"),selected="red")
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("boxplot")
)
)
)
server <- function(input, output) {
library(breastCancerNKI)
data(nki)
expression.values <- exprs(nki)
features <- fData(nki)
er.status <- pData(nki)$er
output$boxplot <- renderPlot({
gene <- input$thegene
probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
values <- expression.values[probe.id,]
boxplot(values ~ er.status,col=input$colour)
})
}
# Run the application
shinyApp(ui = ui, server = server)
t.test
functionHopefully you will have recognised that the code required to apply the t-test and the boxplot is pretty similar, and share the common step of filtering our expression matrix according to gene. Hence, we can write this filtering procedure as a function that is called when the boxplots and t-test output are produced.
An R script to run the app we just described is supplied in the file cancer-app-2.R. If you open the R script in RStudio, you will see that a function has been created to retrieve the gene expression values for a particular gene. The code chunks for the boxplot and t-test then call this code. Generally, it is a good idea to re-use code in this manner đź‘Ť.
We have added a Sys.sleep
call to make the code wait for a while before returning the final data frame. This is to simulate the use-case when the filtering operations we need to perform require more computation than the simple dataset presented here.
filterByExpression <- function(){
gene <- input$thegene
probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
Sys.sleep(5)
expression.values[probe.id,]
}
Unfortunately, there is some inefficiency into our app. The code inside the filterByExpression
function will be triggered whenever the user requests a new plot by changing the gene name or the color of the plot. This code includes the steps to filter the expression data based on the gene that is currently selected. It makes sense for this code to be re-run when a new gene is selected, but not if a different colour is selected (which is the current situation). It would be much better to retrieve the data for a particular gene once only.
A solution is to define a reactive function. These are special functions in shiny
that caches it’s values, and will only be re-run when any of these values become outdated.
The R script cancer-app-3.R contains a slightly modified server
script with filterByExpression
being a reactive
rather than a standard R function.
filterByExpression <- reactive({
gene <- input$thegene
probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
Sys.sleep(5)
expression.values[probe.id,]
})
Try changing the values for gene name and boxplot colour in this new shiny app. You should notice that the plot colour can be changed instantaneously, whereas selecting a different gene triggers the filterByExpression
function which includes the sleep statement.
For more information on reactive programming, see the RStudio article
DT
libraryshiny
keeps track of which row has been clicked; which can be accessed via the input
listThere are various options to allow users to download data from your app.
UI
with downloadButton
; specifying what the output object is to be calleddownloadHandler
function in the server
script to define how the output file is created
pdf
function to start writing graphics to a temporary file, which then get saved into the specified file name sidebarLayout(
sidebarPanel(
selectInput("thegene","Gene to Analyse",
choices=c("ESR1","ERBB2","PTEN"),
selected = "ESR1"),
radioButtons("colour","Colour of histogram",choices=c("red","green","blue"),selected="red"),
downloadButton("plotPDF",label="Download")
),
....
server <- function(input, output) {
....
output$plotPDF <- downloadHandler(
filename = "boxplot.pdf",
## filename could also be the result of a function call
content = function(file){
pdf(file)
## Could also have the plot generated by a function which is re-used
## i.e. the same function could generate the plot regardless of whether plot is being printed to pdf
## or to the screen
gene <- input$thegene
probe.id <- as.character(features$probe[match(gene, features$HUGO.gene.symbol)])
values <- filterByExpression()
boxplot(values ~ er.status,col=input$colour)
dev.off()
}
...
}