November 2024

RNAseq Workflow

Library Preparation

Sequencing

Bioinformatics Analysis

Image adapted from: Wang, Z., et al. (2009), Nature Reviews Genetics, 10, 57–63.

Library preparation

  1. RNA → Reverse Transcription → ctDNA …
  2. Fragmentation - short fragments ~200-300 nt …
  3. Adapter and Index binding …
  4. PCR Amplification.

Sequencing

Bioinformatics Analysis Preprocessing

Fastq file format

QC with FastQC

Alignment based quantification

Quantification with Quasi-mapping (Salmon)

QC of aligned reads

QC of aligned reads - Transcript coverage


Bioinformatics Analysis Data Exploration

Reading in the count data

library(tximport)
txi <- tximport(salmon_files, type = "salmon", tx2gene = tx2gene)
str(txi)
## List of 4
##  $ abundance          : num [1:35896, 1:12] 20.381 0 1.966 1.059 0.949 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:35896] "ENSMUSG00000000001" "ENSMUSG00000000003" "ENSMUSG00000000028" "ENSMUSG00000000037" ...
##   .. ..$ : chr [1:12] "SRR7657878" "SRR7657881" "SRR7657880" "SRR7657874" ...
##  $ counts             : num [1:35896, 1:12] 1039 0 65 39 8 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:35896] "ENSMUSG00000000001" "ENSMUSG00000000003" "ENSMUSG00000000028" "ENSMUSG00000000037" ...
##   .. ..$ : chr [1:12] "SRR7657878" "SRR7657881" "SRR7657880" "SRR7657874" ...
##  $ length             : num [1:35896, 1:12] 2905 541 1884 2100 480 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:35896] "ENSMUSG00000000001" "ENSMUSG00000000003" "ENSMUSG00000000028" "ENSMUSG00000000037" ...
##   .. ..$ : chr [1:12] "SRR7657878" "SRR7657881" "SRR7657880" "SRR7657874" ...
##  $ countsFromAbundance: chr "no"

Total counts per sample

Distribution of counts per gene

  • VST : variance stabilizing transformation
  • rlog : regularized log transformation
rlogCounts <- rlog(filtCounts)

Principal Component Analysis

Bioinformatics Analysis Differential Gene Expression Analysis

DESeq2 analysis workflow

Normalization

Normalization

Differential Expression - Modelling population distributions

Differential Expression - Modelling population distributions

Differential Expression - estimating dispersion

GLM for Differential Expression Analysis

One factor - three levels

Two factors - two levels each - Additive Model

Two factors - two levels each - Interaction Model

Multiple testing correction

  • When we do lots of tests we increase the chances of false positive results.
  • We apply an adjustment to the pvalue - Benjamini-Hochberg (FDR).

Case Study

Applying using DESeq2

Load Data

txiObj <- readRDS("RObjects/txi.rds")
sampleinfo <- read_tsv("data/samplesheet_corrected.tsv", col_types="cccc") %>%
  mutate(Status = fct_relevel(Status, "Uninfected"))

Define model

model <- as.formula(~ TimePoint + Status + TimePoint:Status)

Create DESeqDataSet object

ddsObj <- DESeqDataSetFromTximport(txi = txiObj,
                                   colData = sampleinfo,
                                   design = model)

Applying using DESeq2

Filter out uninformative genes

keep <- rowSums(counts(ddsObj) > 5
ddsObj <- ddsObj[keep,]

Run DESeq workflow: estimate size factors, estimate dispersion, run GLM

ddsObj <- DESeq(ddsObj)

Extract results

results.day11 <- results(ddsObj,
                         name="Status_Infected_vs_Uninfected",
                         alpha=0.05)

results.day33 <- results(ddsObj,
                         contrast = list(c("Status_Infected_vs_Uninfected", "TimePointd33.StatusInfected")),
                         alpha=0.05)

DESeq2 Results Table