March 2023
AIM: Given a reference sequence and a set of short reads, align each read to the reference sequence finding the most likely origin of the read sequence.
Aligners: STAR, HISAT2
Switch to quasi-mapping or pseudo-alignment
Switch to quasi-mapping or pseudo-alignment
We now have the locations of our reads on the genome.
We also know the locations of exons of genes on the genome.
So the simplest approach is to count how many reads overlap each gene.
We now have the locations of our reads on the genome.
We also know the locations of exons of genes on the genome.
So the simplest approach is to count how many reads overlap each gene.
GC bias: Higher GC content sequences are less likely to be observed as PCR is not efficient with high GC content sequences.
Positional bias: for most sequencing methods, the 3 prime end of transcripts are more likely to be observed.
Complexity bias: some sequences are easier to be bound and amplified than others.
Sequence-based bias: Bias in read start positions arising from the differential binding efficiency of random hexamer primers
Fragment length bias: Induced by size selection
Above biases are sample specific
Methods like Salmon attempt to mitigate the effect of technical biases by estimating sample-specific bias parameters.
Patro et al. (2017) Nature Methods doi:10.1038/nmeth.4197
Patro et al. (2017) Nature Methods doi:10.1038/nmeth.4197