This site contains the materials for an R course run by the Bioinformatics Core at the Cancer Research UK Cambridge Institute.
April – June 2020
R is one of the leading programming languages in Data Science and the most widely used within CRUK CI for interacting with, analyzing and visualizing cancer biology datasets.
In this training course, we aim to provide a friendly introduction to R pitched at a beginners level but also for those who have been on R training courses previously and would like a refresher or to consolidate their skills.
The course will be run over 6 weeks with the following structure:
The first lesson will be on Tuesday 14 April, not the Easter Monday bank holiday, and thereafter every Monday for the following 5 weeks. Similarly we will run the Friday recap for the week of 4 – 8 May on the Thursday instead as 8 May is the special VE Day bank holiday.
We will be using Microsoft Teams in running this course and members of the Bioinformatics Core will be available during the course for 1:1 support using chats and calls within Teams.
Getting set up (6 April) - installing R and RStudio
Introduction to R (14 April) - Interacting with R using RStudio and introducing objects, data types and functions
Working with data (20 April) - Creating R scripts, working with tabular data and other types of objects in R, reading data into R
Data visualization with ggplot2 (27 April) - A common grammar to create scatter plots, bar charts, boxplots, histograms and line graphs for time series data
Data manipulation using dplyr (4 May) - Filtering and modifying tabular data, computing summary values, faceting with ggplot2
Grouping and combining data (11 May) - Advanced grouping and summarization operations, joining data from different tables, customizing ggplot2 plots
Restructuring data for analysis (18 May) - The concept of ‘tidy data’, pivoting and separating operations, ggplot2 extras
Capstone project – putting it all together in a typical data analysis including:
reading in a data set
handling missing values
selecting and filtering subsets of interest
creating plots
generating summary statistics
saving data transformed into a tidy format as a csv file for later analysis