The first part requires you to create plots with faceting. Each of the plots you are asked to create is shown below so that you can compare the end result with your own.
patient-data-cleaned.txt
.library(tidyverse)
patients <- read_tsv("data/patient-data-cleaned.txt")
patients
## # A tibble: 100 x 15
## ID Name Sex Smokes Height Weight Birth State Grade Died
## <chr> <chr> <chr> <chr> <dbl> <dbl> <date> <chr> <dbl> <lgl>
## 1 AC/A… Mich… Male Non-S… 183. 76.6 1972-02-06 Geor… 2 FALSE
## 2 AC/A… Derek Male Non-S… 179. 80.4 1972-06-15 Colo… 2 FALSE
## 3 AC/A… Todd Male Non-S… 169. 75.5 1972-07-09 New … 2 FALSE
## 4 AC/A… Rona… Male Non-S… 176. 94.5 1972-08-17 Colo… 1 FALSE
## 5 AC/A… Chri… Fema… Non-S… 164. 71.8 1973-06-12 Geor… 2 TRUE
## 6 AC/A… Dana Fema… Smoker 158. 69.9 1973-07-01 Indi… 2 FALSE
## 7 AC/A… Erin Fema… Non-S… 162. 68.8 1972-03-26 New … 1 FALSE
## 8 AC/A… Rach… Fema… Non-S… 166. 70.4 1973-05-11 Colo… 1 FALSE
## 9 AC/A… Rona… Male Non-S… 181. 76.9 1971-12-31 Geor… 1 FALSE
## 10 AC/A… Bryan Male Non-S… 167. 79.1 1973-07-19 New … 2 FALSE
## # … with 90 more rows, and 5 more variables: Count <dbl>,
## # Date.Entered.Study <date>, Age <dbl>, BMI <dbl>, Overweight <lgl>
ggplot(data = patients, mapping = aes(x = BMI, y = Weight, colour = Height)) +
geom_point() +
facet_grid(Sex ~ Smokes)
ggplot(data = patients, mapping = aes(x = Smokes, y = BMI, fill = Sex)) +
geom_boxplot() +
facet_wrap(~ Age)
patients$Age <- factor(patients$Age)
ggplot(data = patients, mapping = aes(x = Sex, y = BMI, fill = Age)) +
geom_boxplot() +
facet_wrap(~ Smokes)
ggplot(data = patients, mapping = aes(x = Sex, y = BMI, fill = Age)) +
geom_violin() +
facet_wrap(~ Smokes)
ggplot(data = patients, mapping = aes(x = BMI)) +
geom_density(aes(fill = Sex), alpha = 0.5) +
facet_wrap(~ Grade)
clinical-data.txt
.library(tidyverse)
clinical_data <- read_tsv("data/clinical-data.txt")
clinical_data
## # A tibble: 10 x 7
## Subject Placebo.1 Placebo.2 Drug1.1 Drug1.2 Drug2.1 Drug2.2
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Patient1 49.8 53.8 48.4 48.4 40.8 38.3
## 2 Patient2 46.8 49.8 49.6 41.6 39.1 41.9
## 3 Patient3 48.7 48.1 40.5 49.2 40.3 35.1
## 4 Patient4 51.7 48.1 38.3 41.1 40.7 41.2
## 5 Patient5 48.9 48.3 43.1 39.4 43.3 34.9
## 6 Patient6 53.5 44.7 47.5 42.9 39.5 35.6
## 7 Patient7 53.6 47.0 49.2 46.4 37.4 38.8
## 8 Patient8 46.2 43.2 47.3 38.3 44.0 33.8
## 9 Patient9 50.5 56.2 43.4 48.6 41.6 34.4
## 10 Patient10 47.0 44.8 44.9 50.1 39.0 36.2
Currently the columns are Placebo.1
, Placebo.2
…Drug1.1
etc., however, “Placebo..” and “Drug..” are values not variables. Really there should be two variables, one called something like Treatment containing values of ‘Placebo..’, ‘Drug..’, and another called something like Value or Measure with the numbers. Possibly, the Treatment values are “Placebo”, “Drug1” and “Drug2” and the number after the ‘.’ indicates a replicate, but we don’t know this for sure.
gather
function from the tidyr
package.clinical_data <- gather(clinical_data, key = "Treatment", value = "Value", -Subject)
ggplot(clinical_data, mapping = aes(x = Treatment, y = Value)) +
geom_boxplot()