Library Preparation
Sequencing
Bioinformatics Analysis
Image adapted from: Wang, Z., et al. (2009), Nature Reviews Genetics, 10, 57–63.
November 2024
Library Preparation
Sequencing
Bioinformatics Analysis
Image adapted from: Wang, Z., et al. (2009), Nature Reviews Genetics, 10, 57–63.
Library Preparation
Sequencing
Bioinformatics Analysis
Image adapted from: Wang, Z., et al. (2009), Nature Reviews Genetics, 10, 57–63.
- Ribosomal RNA
- Poly-A transcripts
- Other RNAs e.g. tRNA, miRNA etc.
Total RNA extraction
Poly-A Selection
Poly-A transcripts e.g.:
Ribominus selection
Poly-A transcripts + Other mRNAs e.g.:
Library Preparation
Sequencing
Bioinformatics Analysis
Image adapted from: Wang, Z., et al. (2009), Nature Reviews Genetics, 10, 57–63.
Library Preparation
Sequencing
Bioinformatics Analysis
Library Preparation
Sequencing
Bioinformatics Analysis
Check for any problems before we put time and effort into analysing potentially bad data
Good Data
Bad Data
Good Data
Bad Data
Good Data
Bad Data
AIM: Given a reference sequence and a set of short reads, align each read to the reference sequence finding the most likely origin of the read sequence.
Aligners: STAR, HISAT2
Counting: How many reads have come from a genomic feature?
* genomic feature can be gene or transcript or exon, but usually gene
Once the reads are mapped we know where on the genome the RNA fragment originated.
We also know the locations of exons of genes on the genome.
So the simplest approach is to count how many reads overlap each gene.
Salmon does not simply count reads, but uses a dual-phase parallel modelling and inference algorithm along with bias models to estimate expression at the transcript level.
Salmon also takes account of biases:
Multimapping: Reads which map equally well to multiple locations
GC bias: Higher GC content sequences are less likely to be observed as PCR is not efficient with high GC content sequences.
Positional bias: for most sequencing methods, the 3 prime end of transcripts are more likely to be observed.
Complexity bias: some sequences are easier to be bound and amplified than others.
Sequence-based bias: Bias in read start positions arising from the differential binding efficiency of random hexamer primers
Fragment length bias: Induced by size selection
Methods like Salmon attempt to mitigate the effect of technical biases by estimating sample-specific bias parameters.
The output “quant.sf” contains:
https://salmon.readthedocs.io/en/latest/file_formats.html
Picard Tools: