April 2021
AIM: Given a reference sequence and a set of short reads, align each read to the reference sequence finding the most likely origin of the read sequence.
Aligners: STAR, HISAT2
Sequence Alignment/Map (SAM) format is the standard format for files containing aligned reads.
Definition of the format is available at https://samtools.github.io/hts-specs/SAMv1.pdf.
Two main parts:
……………………..
……………………..
Fast and good performance in published benchmark tests
First need to generate an index for the reference genome with the hisat2-build
command
Indexing is where all the work takes place and so is computationally intensive
Then we can align reads to the genome with hisat2
Create an index to the genome with HISAT2
Align reads to the genome with HISAT2
and store outcome in a SAM file
Convert the SAM file (human readable text) to BAM (binary) with samtools
Index the BAM file with samtools