Mark Dunning
Last modified: 18 Jul 2015
http://core-genomics.blogspot.co.uk/2014/08/seqc-kills-microarrays-not-quite.html
* Other sequencing technologies are available
“Uses the raw TIF files to locate clusters on the image, and outputs the cluster intensity, X,Y positions, and an estimate of the noise for each cluster. The output from image analysis provides the input for base calling.”
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
~ 250 Million reads (sequences) per Hi-Seq lane
@HWUSI-EAS100R:6:73:941:1973#0/1
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
Header gives information about the alignment and references sequences used
@HD VN:1.0 SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
HWI-ST1001:137:C12FPACXX:7:1115:14131:66670 0 chr1 12805 1 42M4I5M *
0 0 TTGGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCACCAATATG
CCCFFFFFHHGHHJJJJJHJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIJJ
AS:i:-28 XN:i:0 XM:i:2 XO:i:1XG:i:4 NM:i:6 MD:Z:2C41C2 YT:Z:UU NH:i:3
CC:Z:chr15 CP:i:102518319 XS:A:+ HI:i:0
HWI-ST1001:137:C12FPACXX:7:1115:14131:66670 0 chr1 12805 1 42M4I5M *
0 0 TTGGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCACCAATATG
CCCFFFFFHHGHHJJJJJHJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIJJ
AS:i:-28 XN:i:0 XM:i:2 XO:i:1XG:i:4 NM:i:6 MD:Z:2C41C2 YT:Z:UU NH:i:3
CC:Z:chr15 CP:i:102518319 XS:A:+ HI:i:0
samtools view mysequences.bam | head
$ samtools flagstat NA19914.chr22.bam
2109857 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplimentary
40096 + 0 duplicates
2064356 + 0 mapped (97.84%:-nan%)
2011540 + 0 paired in sequencing
1005911 + 0 read1
1005629 + 0 read2
1903650 + 0 properly paired (94.64%:-nan%)
1920538 + 0 with itself and mate mapped
45501 + 0 singletons (2.26%:-nan%)
5134 + 0 with mate mapped to a different chr
4794 + 0 with mate mapped to a different chr (mapQ>=5)
Often said that Bioinformaticians love coming up with new file formats
bed ; only first three columns are required
track name=pairedReads description="Clone Paired Reads" useScore=1
chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512
chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601
gff; (gene feature format)
track name=regulatory description="TeleGene(tm) Regulatory Regions"
visibility=2`
chr22 TeleGene enhancer 10000000 10001000 500 + . touch1
chr22 TeleGene promoter 10010000 10010100 900 + . touch1
chr22 TeleGene promoter 10020000 10025000 800 - . touch2
wig;
variableStep chrom=chr2
300701 12.5
300702 12.5
300703 12.5
300704 12.5
300705 12.5