Introduction

This site contains the materials for an R course run by the Bioinformatics Core at the Cancer Research UK Cambridge Institute.

April – June 2020

Instructors

  • Chandra Chilamakuri
  • Matt Eldridge
  • Mark Fernandes
  • Kamal Kishore
  • Sergio Martinez Cuesta
  • Ashley Sawle
  • Rory Stark

Description

R is one of the leading programming languages in Data Science and the most widely used within CRUK CI for interacting with, analyzing and visualizing cancer biology datasets.

In this training course, we aim to provide a friendly introduction to R pitched at a beginners level but also for those who have been on R training courses previously and would like a refresher or to consolidate their skills.

The course will be run over 6 weeks with the following structure:

  • Online lesson each Monday at 11am lasting 45 minutes via a video call in Microsoft Teams
    • the instructor will share his/her screen featuring an RStudio window during this call
    • this will be recorded for those who weren’t able to join the meeting or so you can replay the lesson
  • More in-depth material covering the concepts introduced on the Monday lesson to go through in your own time
  • A weekly assignment consisting of exercises to practice some of the concepts covered in that and previous weeks’ lessons
  • An online recap session each Friday at 11am to go through the assignment and answer any questions you may have
    • again this is expected to last around 45 minutes via a video call in Microsoft Teams

The first lesson will be on Tuesday 14 April, not the Easter Monday bank holiday, and thereafter every Monday for the following 5 weeks. Similarly we will run the Friday recap for the week of 4 – 8 May on the Thursday instead as 8 May is the special VE Day bank holiday.

We will be using Microsoft Teams in running this course and members of the Bioinformatics Core will be available during the course for 1:1 support using chats and calls within Teams.

Schedule

  1. Getting set up (6 April) - installing R and RStudio

  2. Introduction to R (14 April) - Interacting with R using RStudio and introducing objects, data types and functions

  3. Working with data (20 April) - Creating R scripts, working with tabular data and other types of objects in R, reading data into R

  4. Data visualization with ggplot2 (27 April) - A common grammar to create scatter plots, bar charts, boxplots, histograms and line graphs for time series data

  5. Data manipulation using dplyr (4 May) - Filtering and modifying tabular data, computing summary values, faceting with ggplot2

  6. Grouping and combining data (11 May) - Advanced grouping and summarization operations, joining data from different tables, customizing ggplot2 plots

  7. Restructuring data for analysis (18 May) - The concept of ‘tidy data’, pivoting and separating operations, ggplot2 extras

  8. Capstone project – putting it all together in a typical data analysis including:

    • reading in a data set

    • handling missing values

    • selecting and filtering subsets of interest

    • creating plots

    • generating summary statistics

    • saving data transformed into a tidy format as a csv file for later analysis