Different flavours:
mRNAseq
Targeted
Small RNA
Single Cell RNA-Seq
Discovery:
Transcripts
Isoforms
Splice junctions
Fusion genes
Differential expression:
Gene level expression changes
Relative isoform abundance
Splicing patterns
Variant calling
July 2018
Different flavours:
mRNAseq
Targeted
Small RNA
Single Cell RNA-Seq
Discovery:
Transcripts
Isoforms
Splice junctions
Fusion genes
Differential expression:
Gene level expression changes
Relative isoform abundance
Splicing patterns
Variant calling
The length of the transcript affects the number of RNA fragments present in the library from that gene.
The development of larger suites of unique dual-indexes should eliminate the index swapping issue.
Genome-based features
Exon or gene boundaries
Isoform structures
Gene multireads
Transcript-based features
Transcript assembly
Novel structures
Isoform multireads
HTSeq or Subread
Counting estimates the relative counts for each gene
Does this accurately represent the original population of RNAs?
The relationship between counts and RNA expression is not the same for all genes across all samples
Library Size
Differing sequencing depth
Gene properties
GC content, length, sequence
Library composition
Highly expressed genes overrepresented at the cost of lowly expressed genes
"Composition Bias"
Total Count
Normalise each sample by total number of reads sequenced.
Can also use another statistic similar to total count eg. median, upper quartile
Does not account for composition bias
Comparing feature abundance under different conditions
Assumes linearity of signal
When feature=gene, well-established pre- and post-analysis strategies exist
Mortazavi, A. et al (2008) Nature Methods
Simple difference in means
Replication introduces variation
Normal Distribution - t-test
Two parameters - mean
and sd
Suitable for microarray data but not for RNAseq data
Count data - Poisson distribution
One parameter - mean
\((\lambda)\)
variance
= mean
RNAseq counts for lowly expressed genes vary more than for highly expressed genes
Use the Negative Binomial distribution
In the NB distribution mean
not equal to variance
Two paramenters - mean
and dispersion
Anders, S. & Huber, W. (2010) Genome Biology
Estimating the dispersion parameter can be difficult with a small number of samples
DESeq2 models the variance as the sum of technical and biological variance
Esimate dispersion for each gene
‘Share’ dispersion information between genes to obtain fitted estimate
Shrink gene-wise estimates towards the the fitted estimates
Hamy et al. (2016) PLOS One
http://software.broadinstitute.org/gsea
Hamy et al. (2016) PLOS One
Liu et al. (2014) Bioinformatics
Thank you