September 2019
AIM: Given a reference sequence and a set of short reads, align each read to the reference sequence finding the most likely origin of the read sequence.
Aligners: STAR, HISAT2
Sequence Alignment/Map (SAM) format is the standard format for files containing aligned reads.
Definition of the format is avaiable at https://samtools.github.io/hts-specs/SAMv1.pdf.
Two main parts:
……………………..
……………………..
Fast and good performance in published benchmark tests.
First need to generate an index for the reference genome with the hisat2-build
command
Indexing is where all the work takes place and so is computationally intensive
Then we can align reads to the genome with hisat2
Create an index to the genome with HISAT2
Align reads to the genome with HISAT2
–> SAM file
Convert the SAM file to BAM with samtools
Index the BAM file with samtools