Reference genomes and GRC.
Fasta and FastQ (Unaligned sequences).
SAM/BAM (Aligned sequences).
BED (Genomic Intervals).
GFF/GTF (Gene annotation).
Wiggle files, BEDgraphs and BigWigs (Genomic scores).
The human genome isnt complete!Â
In fact, most model organisms's reference genomes are being regularly updated.
Reference genomes consist of mixture of known chromosomes and unplaced contigs called a " Genome Reference Assembly".
The latest genome assembly for humans is GRCh38.
Patches add information to the assembly without disrupting the chromosome coordinates . i.e GRCh38.p3
Â
DNA/cDNA
Fragment
DNA (PCR amplify)
Sequence DNA
Unaligned
sequence
Aligned sequences
Reference genome
Unaligned sequence files generated from HTS machines are mapped to a reference genome to produce aligned sequence files.
FASTQ -Â Unaligned sequencesÂ
SAM -Â Aligned sequences
FastQ (FASTA with Qualities)
FastQ - Header
FastQ - Qualities
SAM format
SAM - Header
SAM - Aligned Reads
Â
SAM
SAM
SAM
Post alignment, sequences reads are typically summarised into scores over/within genomic intervals.
BED -Â Genomic intervals and information.
Wiggle/BedGraph - Genomic intervals and scores.
GFF - Genomic annotation with information and scores
BED format (BED)
BED format (BED6)