Importing into Bioconductor

The VariantAnnotation package allows .vcf files to be imported. The readVcf function can be used and requires the file of a vcf file. You also need to specify a genome name. As we have seen already with other Bioconductor objects, typing the name of the object will print a summary to the screen. In the case of a “CollapsedVCF” object this is very detailed

We will use the file combined.chr20.subset.freebayes.vcf that you should have generated in the previous section.

The “header” information can be extracted using header. This will contain the definitions of all the per-genotype (INFO) and per-sample information stored in the file

INFO

As described above, the INFO column in a .vcf file gives per-variant metadata as a series of key-value pairs. We can interrogate these data using the info function, which returns a data-frame-like object. Consequently, we can use the $ operator to select a particular column of interest.

The name of each row is derived from the genomic location and base-change for each variant. There are too many columns to go through in detail, so we will just describe a few that are likely to be common to different callers.

infoMatrix <- info(hapmap.calls)
infoMatrix

Accessing genotypes

The called genotypes can be accessed using the geno function. Recall that in the .vcf file, we have one column of genotype information for each sample, with each column consisting of key:value pairs. Using geno we can access all the values for a particular keys and be able to compare across samples.

The column names of the data frame returned by geno are the same as the FORMAT in the .vcf description. Thus we can use a $ operator to access a particular set of values; GT in this case

geno(hapmap.calls)
head(geno(hapmap.calls)$GT)

The output allows us to compare the genotype for a particular variant across all samples; which we could not easily do from the .vcf.

Usually each entry is 0/0 for a homozygous reference, 0/1 for a heterozygous call and 1/1 for a homozyous alternate allele. An entry of . indicates a position where no call could be made due to insufficient data. Moreover, we can also find 0/2 and 1/2 in rare cases where a second alternative allele was found.

With table we can tabulate the calls between one sample and another

table(geno(hapmap.calls)$GT[,1])
table(geno(hapmap.calls)$GT[,2])
table(geno(hapmap.calls)$GT[,1], geno(hapmap.calls)$GT[,2])

Overlapping variant positions with genes

In this section we will see how we can overlap the calls we have made with other genomic features. For example, we are often interested in how many calls were made within a particular gene of interest.

For this section, we are going to use all calls made on chromosome 20 for a single sample; NA12878.

The rowRanges function will retrieve the positions of our variants as a familiar GRanges object. Along with the usual positional information, we also get extra “metadata” (mcols) about the base-change, a quality-score from freebayes and a placeholder for a filter (which has not been applied in this case).

NA12878.calls <- readVcf("NA12878.chr20.freebayes.vcf","hg19")
NA12878.calls <- readVcf("NA12878.chr20.freebayes.vcf","hg19")
NA12878.calls.ranges <- rowRanges(NA12878.calls)
NA12878.calls.ranges
GRanges object with 74812 ranges and 5 metadata columns:
                                               seqnames               ranges strand |
                                                  <Rle>            <IRanges>  <Rle> |
                                  20:61795_G/T       20       [61795, 61795]      * |
                                  20:63244_A/C       20       [63244, 63244]      * |
                                  20:65900_G/A       20       [65900, 65900]      * |
                                  20:66370_G/A       20       [66370, 66370]      * |
                      20:67500_TT/TTGGTATCTAGT       20       [67500, 67501]      * |
                                           ...      ...                  ...    ... .
  20:62961014_CTTTTTTTTTTTTTA/CTTTTTTTTTTTTTTA       20 [62961014, 62961028]      * |
                               20:62961318_G/A       20 [62961318, 62961318]      * |
                               20:62961724_C/T       20 [62961724, 62961724]      * |
                               20:62962130_C/T       20 [62962130, 62962130]      * |
                               20:62962891_C/T       20 [62962891, 62962891]      * |
                                               paramRangeID             REF
                                                   <factor>  <DNAStringSet>
                                  20:61795_G/T         <NA>               G
                                  20:63244_A/C         <NA>               A
                                  20:65900_G/A         <NA>               G
                                  20:66370_G/A         <NA>               G
                      20:67500_TT/TTGGTATCTAGT         <NA>              TT
                                           ...          ...             ...
  20:62961014_CTTTTTTTTTTTTTA/CTTTTTTTTTTTTTTA         <NA> CTTTTTTTTTTTTTA
                               20:62961318_G/A         <NA>               G
                               20:62961724_C/T         <NA>               C
                               20:62962130_C/T         <NA>               C
                               20:62962891_C/T         <NA>               C
                                                              ALT       QUAL
                                               <DNAStringSetList>  <numeric>
                                  20:61795_G/T                  T    95.0616
                                  20:63244_A/C                  C    88.3208
                                  20:65900_G/A                  A    44.2954
                                  20:66370_G/A                  A    57.5310
                      20:67500_TT/TTGGTATCTAGT       TTGGTATCTAGT    12.1582
                                           ...                ...        ...
  20:62961014_CTTTTTTTTTTTTTA/CTTTTTTTTTTTTTTA   CTTTTTTTTTTTTTTA  7.2667200
                               20:62961318_G/A                  A 46.1486000
                               20:62961724_C/T                  T  0.0518025
                               20:62962130_C/T                  T 57.5250000
                               20:62962891_C/T                  T 18.2903000
                                                    FILTER
                                               <character>
                                  20:61795_G/T           .
                                  20:63244_A/C           .
                                  20:65900_G/A           .
                                  20:66370_G/A           .
                      20:67500_TT/TTGGTATCTAGT           .
                                           ...         ...
  20:62961014_CTTTTTTTTTTTTTA/CTTTTTTTTTTTTTTA           .
                               20:62961318_G/A           .
                               20:62961724_C/T           .
                               20:62962130_C/T           .
                               20:62962891_C/T           .
  -------
  seqinfo: 86 sequences from hg19 genome

If we happen to know the genomic region corresponding to a particular gene, we can restrict our list of variants using standard R syntax.

  • let’s take the Gene PRND which is located between chr20:4,700,556-4,711,106 on the human genome version hg19
NA12878.calls.ranges[start(NA12878.calls.ranges) > 4700556   & end(NA12878.calls.ranges) < 4711106]

However, we might want to something more sophisticated and only consider variants in coding regions. For this we can take advantage of some pre-built packages in Bioconductor.

Pre-built databases of gene coordinates

Aside from the many useful software packages, Bioconductor also provides numerous annotation resources that we can utilise in our analysis. Firstly, we have a set of organism-level packages that can translate between different types of identifer. The package for humans is called org.Hs.eg.db. The advantage of such a package, rather than services such as biomaRt, is that we can do queries offline. The packages are updated every 6 months, so we can always be sure of what version of the relevant databases are being used.

library(org.Hs.eg.db)
org.Hs.eg.db

There are several types of “key” we can use to make a query, and we have to specify one of these names.

keytypes(org.Hs.eg.db)

For the given keytype we have chosen, we can also choose what data we want to retrieve. We can think of these as columns in a table, and the pre-defined values are given by:-

columns(org.Hs.eg.db)
eg <- select(org.Hs.eg.db, keys=c("BRCA1","PTEN"), keytype = "SYMBOL",columns = c("REFSEQ","ENSEMBL"))

You should see that the above command prints a message to the screen:- select() returned 1:many mapping between keys and columns. This is not an error message and R has still been able to generate the output requested.

eg

In this case, we have “many”" (well, two) values of ENSEMBL for the gene PTEN. In practice this means we probably want to think carefully about merging this data with other tables.




Exercise

  • Use the org.Hs.eg.db package to retrieve the Entrez Gene ID for the Gene PRND



You might expect to be able to retrieve information about the coordinates for a particular gene using the same interface. This was supported until recently, but the recommended approach now is to use another class of packages which describe the structure of genes in more detail.

The packages with the prefix TxDb.... represent the structure of all genes for a given organism in an efficient manner. For humans, we can use the package TxDb.Hsapiens.UCSC.hg19.knownGene to tell us about transcripts for the hg19 build of the genome. The package was generated using tables from the UCSC genome browser

As with the org.Hs.eg.db package we can load the package and inspect the kind of mappings available to us.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
columns(txdb)
keytypes(txdb)

You’ll see that all the mappings are regarding the coordinates and IDs of various genomic features. There is only one type of identifier used, in this case Entrez ID. If we know a genes Entrez ID, we can get the exon coordinates with the following query.




Exercise

  • Use the org.Hs.eg.db package to retrieve the exon coordinates for the Gene PRND
    • you will need to use the Entrez gene ID you found in the previous exercise
mygene <- 



It is useful to be able to retrive the coordinates in this manner. However, we should now be familiar with the way intervals can be represented using GRanges. We have the ability to create a GRanges object from the result:-

my.gr <- GRanges(mygene$EXONCHROM, IRanges(mygene$EXONSTART,mygene$EXONEND))
my.gr

The txdb packages also allow us to construct GenomicRanges representations for the exon structure of all genes. The results is a list, which the names of the list being a gene ID in Entrez format

all.exons <- exonsBy(txdb, "gene")
all.exons
my.gene <- all.exons[["23627"]]
my.gene

We are almost ready to do the overlap. However, there is an inconsistency in the naming conventions of the two sets of regions we are trying to overlap;

seqlevelsStyle(NA12878.calls.ranges)
seqlevelsStyle(my.gene)

Fortunately, we can rename the chromosome names of the variants with a single call. We can also modify the object so only information about chromosome 20 is retained (later-on we would receive an error due to the MT chromosomes being different length)

seqlevelsStyle(NA12878.calls.ranges) <- "UCSC"
NA12878.calls.ranges <- keepSeqlevels(NA12878.calls.ranges, "chr20")

The %over% function can be comapre two sets of ranges and produce a logical vector. Each entry in this vector being whether a particular location in the first set of ranges is present in the other. The vector can then be used to subset the variants in the usual manner.

in.gene <- NA12878.calls.ranges %over% my.gene
NA12878.calls.ranges[in.gene]

We can even write-out a vcf file containing just the positions we have identified

writeVcf(NA12878.calls[in.gene],filename = "selected.variants.vcf")

If all we cared about was the number of variants, we could use countOverlaps

countOverlaps(my.gene, NA12878.calls.ranges)



Exercise

  • Navigate to PRND in IGV and verify that the number of variants we have identified is correct
  • Count the number of variants called in NA12878 for each of the regions defined in the file regions.of.interest.bed
  • you can import this .bed file using the import function in the rtracklayer package
  • the file regions.of.interest.bed contains all exon coordinates of genes on chromosome 20. How could you generate such a file?
    • HINT: unlist(all.exons) will give a GRanges object with one entry per-exon (not in the list structure)
    • HINT: export in rtracklayer will write out various file types from GRanges objects
  • (Optional) Repeat the counting exercise from above, but this time counting variants within introns on chromosome 20
    • HINT: check out the help for exonsBy to see what other options are available for extracting genomic features



Summary

  • We have used a respectable genotype caller freebayes to call SNVs from a set of healthy individuals
    • there are many paramters that can be tweaked that we haven’t described here
  • The .vcf format contains a rich description of the called variants
  • Bioconducor tools can be used to import and parse .vcf files
  • We can use GenomicRanges to overlap our calls with other genomic features of interest
  • Production-level manipulation of .vcf would probably involve other non-R tools

Appendix

Files used in session

Commands used to generate bam files for genotype calling

samtools view -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/other_exome_alignments/NA19239/exome_alignment/NA19239.mapped.solid.mosaik.YRI.exome.20111114.bam 22 | samtools view -bS - > /data/hapmap/NA19239.chr22.bam
samtools index /data/hapmap/NA19239.chr22.bam

wget ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/phase3/data/NA12878/alignment/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam -O NA12878.chr20.bam
samtools index NA12878.chr20.bam

wget ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/phase3/data/NA12874/alignment/NA12874.chrom20.ILLUMINA.bwa.CEU.low_coverage.20130415.bam -O NA12874.chr20.bam
samtools index NA12874.chr20.bam
LS0tCnRpdGxlOiAiRXhwbG9yaW5nIFZDRiBmaWxlcyIKZGF0ZTogJ2ByIGZvcm1hdChTeXMudGltZSgpLCAiTGFzdCBtb2RpZmllZDogJWQgJWIgJVkiKWAnCm91dHB1dDogCiAgaHRtbF9ub3RlYm9vazogCiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKLS0tCgoKIyMgSW1wb3J0aW5nIGludG8gQmlvY29uZHVjdG9yCgpUaGUgYFZhcmlhbnRBbm5vdGF0aW9uYCBwYWNrYWdlIGFsbG93cyBgLnZjZmAgZmlsZXMgdG8gYmUgaW1wb3J0ZWQuIFRoZSBgcmVhZFZjZmAgZnVuY3Rpb24gY2FuIGJlIHVzZWQgYW5kIHJlcXVpcmVzIHRoZSBmaWxlIG9mIGEgdmNmIGZpbGUuIFlvdSBhbHNvIG5lZWQgdG8gc3BlY2lmeSBhIGdlbm9tZSBuYW1lLiBBcyB3ZSBoYXZlIHNlZW4gYWxyZWFkeSB3aXRoIG90aGVyIEJpb2NvbmR1Y3RvciBvYmplY3RzLCB0eXBpbmcgdGhlIG5hbWUgb2YgdGhlIG9iamVjdCB3aWxsIHByaW50IGEgc3VtbWFyeSB0byB0aGUgc2NyZWVuLiBJbiB0aGUgY2FzZSBvZiBhICJgQ29sbGFwc2VkVkNGYCIgb2JqZWN0IHRoaXMgaXMgdmVyeSBkZXRhaWxlZAoKV2Ugd2lsbCB1c2UgdGhlIGZpbGUgYGNvbWJpbmVkLmNocjIwLnN1YnNldC5mcmVlYmF5ZXMudmNmYCB0aGF0IHlvdSBzaG91bGQgaGF2ZSBnZW5lcmF0ZWQgaW4gdGhlIHByZXZpb3VzIHNlY3Rpb24uCgpgYGB7ciBtZXNzYWdlPUZBTFNFfQpsaWJyYXJ5KFZhcmlhbnRBbm5vdGF0aW9uKQpoYXBtYXAuY2FsbHMgPC0gcmVhZFZjZigiY29tYmluZWQuY2hyMjAuc3Vic2V0LmZyZWViYXllcy52Y2YiLCJoZzE5IikKCmBgYAoKCgpUaGUgImhlYWRlciIgaW5mb3JtYXRpb24gY2FuIGJlIGV4dHJhY3RlZCB1c2luZyBgaGVhZGVyYC4gVGhpcyB3aWxsIGNvbnRhaW4gdGhlIGRlZmluaXRpb25zIG9mIGFsbCB0aGUgcGVyLWdlbm90eXBlIChgSU5GT2ApIGFuZCBwZXItc2FtcGxlIGluZm9ybWF0aW9uIHN0b3JlZCBpbiB0aGUgZmlsZQoKCgojIyBgSU5GT2AKCkFzIGRlc2NyaWJlZCBhYm92ZSwgdGhlIGBJTkZPYCBjb2x1bW4gaW4gYSBgLnZjZmAgZmlsZSBnaXZlcyBwZXItdmFyaWFudCBtZXRhZGF0YSBhcyBhIHNlcmllcyBvZiBrZXktdmFsdWUgcGFpcnMuIFdlIGNhbiBpbnRlcnJvZ2F0ZSB0aGVzZSBkYXRhIHVzaW5nIHRoZSBgaW5mb2AgZnVuY3Rpb24sIHdoaWNoIHJldHVybnMgYSBkYXRhLWZyYW1lLWxpa2Ugb2JqZWN0LiBDb25zZXF1ZW50bHksIHdlIGNhbiB1c2UgdGhlIGAkYCBvcGVyYXRvciB0byBzZWxlY3QgYSBwYXJ0aWN1bGFyIGNvbHVtbiBvZiBpbnRlcmVzdC4KClRoZSBuYW1lIG9mIGVhY2ggcm93IGlzIGRlcml2ZWQgZnJvbSB0aGUgZ2Vub21pYyBsb2NhdGlvbiBhbmQgYmFzZS1jaGFuZ2UgZm9yIGVhY2ggdmFyaWFudC4gVGhlcmUgYXJlIHRvbyBtYW55IGNvbHVtbnMgdG8gZ28gdGhyb3VnaCBpbiBkZXRhaWwsIHNvIHdlIHdpbGwganVzdCBkZXNjcmliZSBhIGZldyB0aGF0IGFyZSBsaWtlbHkgdG8gYmUgY29tbW9uIHRvIGRpZmZlcmVudCBjYWxsZXJzLgoKYGBge3IgZXZhbD1GQUxTRX0KaW5mb01hdHJpeCA8LSBpbmZvKGhhcG1hcC5jYWxscykKaW5mb01hdHJpeApgYGAKCgoKCiMjIEFjY2Vzc2luZyBnZW5vdHlwZXMKClRoZSBjYWxsZWQgZ2Vub3R5cGVzIGNhbiBiZSBhY2Nlc3NlZCB1c2luZyB0aGUgYGdlbm9gIGZ1bmN0aW9uLiBSZWNhbGwgdGhhdCBpbiB0aGUgYC52Y2ZgIGZpbGUsIHdlIGhhdmUgb25lIGNvbHVtbiBvZiBnZW5vdHlwZSBpbmZvcm1hdGlvbiBmb3IgZWFjaCBzYW1wbGUsIHdpdGggZWFjaCBjb2x1bW4gY29uc2lzdGluZyBvZiBga2V5OnZhbHVlYCBwYWlycy4gVXNpbmcgYGdlbm9gIHdlIGNhbiBhY2Nlc3MgYWxsIHRoZSB2YWx1ZXMgZm9yIGEgcGFydGljdWxhciBrZXlzIGFuZCBiZSBhYmxlIHRvIGNvbXBhcmUgYWNyb3NzIHNhbXBsZXMuIAoKVGhlIGNvbHVtbiBuYW1lcyBvZiB0aGUgZGF0YSBmcmFtZSByZXR1cm5lZCBieSBgZ2Vub2AgYXJlIHRoZSBzYW1lIGFzIHRoZSBgRk9STUFUYCBpbiB0aGUgYC52Y2ZgIGRlc2NyaXB0aW9uLiBUaHVzIHdlIGNhbiB1c2UgYSBgJGAgb3BlcmF0b3IgdG8gYWNjZXNzIGEgcGFydGljdWxhciBzZXQgb2YgdmFsdWVzOyBgR1RgIGluIHRoaXMgY2FzZQoKYGBge3J9Cmdlbm8oaGFwbWFwLmNhbGxzKQpoZWFkKGdlbm8oaGFwbWFwLmNhbGxzKSRHVCkKYGBgClRoZSBvdXRwdXQgYWxsb3dzIHVzIHRvIGNvbXBhcmUgdGhlIGdlbm90eXBlIGZvciBhIHBhcnRpY3VsYXIgdmFyaWFudCBhY3Jvc3MgYWxsIHNhbXBsZXM7IHdoaWNoIHdlIGNvdWxkIG5vdCBlYXNpbHkgZG8gZnJvbSB0aGUgYC52Y2ZgLgoKVXN1YWxseSBlYWNoIGVudHJ5IGlzIGAwLzBgIGZvciBhIGhvbW96eWdvdXMgcmVmZXJlbmNlLCBgMC8xYCBmb3IgYSBoZXRlcm96eWdvdXMgY2FsbCBhbmQgYDEvMWAgZm9yIGEgaG9tb3p5b3VzIGFsdGVybmF0ZSBhbGxlbGUuIEFuIGVudHJ5IG9mIGAuYCBpbmRpY2F0ZXMgYSBwb3NpdGlvbiB3aGVyZSBubyBjYWxsIGNvdWxkIGJlIG1hZGUgZHVlIHRvIGluc3VmZmljaWVudCBkYXRhLiBNb3Jlb3Zlciwgd2UgY2FuIGFsc28gZmluZCBgMC8yYCBhbmQgYDEvMmAgaW4gcmFyZSBjYXNlcyB3aGVyZSBhIHNlY29uZCBhbHRlcm5hdGl2ZSBhbGxlbGUgd2FzIGZvdW5kLiAKCldpdGggYHRhYmxlYCB3ZSBjYW4gdGFidWxhdGUgdGhlIGNhbGxzIGJldHdlZW4gb25lIHNhbXBsZSBhbmQgYW5vdGhlcgoKYGBge3J9CnRhYmxlKGdlbm8oaGFwbWFwLmNhbGxzKSRHVFssMV0pCnRhYmxlKGdlbm8oaGFwbWFwLmNhbGxzKSRHVFssMl0pCnRhYmxlKGdlbm8oaGFwbWFwLmNhbGxzKSRHVFssMV0sIGdlbm8oaGFwbWFwLmNhbGxzKSRHVFssMl0pCgoKYGBgCgoKCiMjIE92ZXJsYXBwaW5nIHZhcmlhbnQgcG9zaXRpb25zIHdpdGggZ2VuZXMKCkluIHRoaXMgc2VjdGlvbiB3ZSB3aWxsIHNlZSBob3cgd2UgY2FuIG92ZXJsYXAgdGhlIGNhbGxzIHdlIGhhdmUgbWFkZSB3aXRoIG90aGVyIGdlbm9taWMgZmVhdHVyZXMuIEZvciBleGFtcGxlLCB3ZSBhcmUgb2Z0ZW4gaW50ZXJlc3RlZCBpbiBob3cgbWFueSBjYWxscyB3ZXJlIG1hZGUgd2l0aGluIGEgcGFydGljdWxhciBnZW5lIG9mIGludGVyZXN0LiAKCkZvciB0aGlzIHNlY3Rpb24sIHdlIGFyZSBnb2luZyB0byB1c2UgYWxsIGNhbGxzIG1hZGUgb24gY2hyb21vc29tZSAyMCBmb3IgYSBzaW5nbGUgc2FtcGxlOyBgTkExMjg3OGAuIAoKVGhlIGByb3dSYW5nZXNgIGZ1bmN0aW9uIHdpbGwgcmV0cmlldmUgdGhlIHBvc2l0aW9ucyBvZiBvdXIgdmFyaWFudHMgYXMgYSBmYW1pbGlhciBgR1Jhbmdlc2Agb2JqZWN0LiBBbG9uZyB3aXRoIHRoZSB1c3VhbCBwb3NpdGlvbmFsIGluZm9ybWF0aW9uLCB3ZSBhbHNvIGdldCBleHRyYSAibWV0YWRhdGEiIChgbWNvbHNgKSBhYm91dCB0aGUgYmFzZS1jaGFuZ2UsIGEgcXVhbGl0eS1zY29yZSBmcm9tIGBmcmVlYmF5ZXNgIGFuZCBhIHBsYWNlaG9sZGVyIGZvciBhIGZpbHRlciAod2hpY2ggaGFzIG5vdCBiZWVuIGFwcGxpZWQgaW4gdGhpcyBjYXNlKS4KCmBgYHtyfQpOQTEyODc4LmNhbGxzIDwtIHJlYWRWY2YoIk5BMTI4NzguY2hyMjAuZnJlZWJheWVzLnZjZiIsImhnMTkiKQpOQTEyODc4LmNhbGxzLnJhbmdlcyA8LSByb3dSYW5nZXMoTkExMjg3OC5jYWxscykKTkExMjg3OC5jYWxscy5yYW5nZXMKYGBgCgpJZiB3ZSBoYXBwZW4gdG8ga25vdyB0aGUgZ2Vub21pYyByZWdpb24gY29ycmVzcG9uZGluZyB0byBhIHBhcnRpY3VsYXIgZ2VuZSwgd2UgY2FuIHJlc3RyaWN0IG91ciBsaXN0IG9mIHZhcmlhbnRzIHVzaW5nIHN0YW5kYXJkIFIgc3ludGF4LgoKLSBsZXQncyB0YWtlIHRoZSBHZW5lICoqUFJORCoqIHdoaWNoIGlzIGxvY2F0ZWQgYmV0d2VlbiBgY2hyMjA6NCw3MDAsNTU2LTQsNzExLDEwNmAgb24gdGhlIGh1bWFuIGdlbm9tZSB2ZXJzaW9uIGBoZzE5YAoKYGBge3J9Ck5BMTI4NzguY2FsbHMucmFuZ2VzW3N0YXJ0KE5BMTI4NzguY2FsbHMucmFuZ2VzKSA+IDQ3MDA1NTYgICAmIGVuZChOQTEyODc4LmNhbGxzLnJhbmdlcykgPCA0NzExMTA2XQpgYGAKCkhvd2V2ZXIsIHdlIG1pZ2h0IHdhbnQgdG8gc29tZXRoaW5nIG1vcmUgc29waGlzdGljYXRlZCBhbmQgb25seSBjb25zaWRlciB2YXJpYW50cyBpbiBjb2RpbmcgcmVnaW9ucy4gRm9yIHRoaXMgd2UgY2FuIHRha2UgYWR2YW50YWdlIG9mIHNvbWUgcHJlLWJ1aWx0IHBhY2thZ2VzIGluIEJpb2NvbmR1Y3Rvci4KCiMjIyBQcmUtYnVpbHQgZGF0YWJhc2VzIG9mIGdlbmUgY29vcmRpbmF0ZXMKCkFzaWRlIGZyb20gdGhlIG1hbnkgdXNlZnVsIHNvZnR3YXJlIHBhY2thZ2VzLCBCaW9jb25kdWN0b3IgYWxzbyBwcm92aWRlcyBudW1lcm91cyBhbm5vdGF0aW9uIHJlc291cmNlcyB0aGF0IHdlIGNhbiB1dGlsaXNlIGluIG91ciBhbmFseXNpcy4gRmlyc3RseSwgd2UgaGF2ZSBhIHNldCBvZiBvcmdhbmlzbS1sZXZlbCBwYWNrYWdlcyB0aGF0IGNhbiB0cmFuc2xhdGUgYmV0d2VlbiBkaWZmZXJlbnQgdHlwZXMgb2YgaWRlbnRpZmVyLiBUaGUgcGFja2FnZSBmb3IgaHVtYW5zIGlzIGNhbGxlZCBgb3JnLkhzLmVnLmRiYC4gVGhlIGFkdmFudGFnZSBvZiBzdWNoIGEgcGFja2FnZSwgcmF0aGVyIHRoYW4gc2VydmljZXMgc3VjaCBhcyBiaW9tYVJ0LCBpcyB0aGF0IHdlIGNhbiBkbyBxdWVyaWVzIG9mZmxpbmUuIFRoZSBwYWNrYWdlcyBhcmUgdXBkYXRlZCBldmVyeSA2IG1vbnRocywgc28gd2UgY2FuIGFsd2F5cyBiZSBzdXJlIG9mIHdoYXQgdmVyc2lvbiBvZiB0aGUgcmVsZXZhbnQgZGF0YWJhc2VzIGFyZSBiZWluZyB1c2VkLgoKYGBge3IgbWVzc2FnZT1GQUxTRX0KbGlicmFyeShvcmcuSHMuZWcuZGIpCm9yZy5Icy5lZy5kYgpgYGAKClRoZXJlIGFyZSBzZXZlcmFsIHR5cGVzIG9mIOKAnGtleeKAnSB3ZSBjYW4gdXNlIHRvIG1ha2UgYSBxdWVyeSwgYW5kIHdlIGhhdmUgdG8gc3BlY2lmeSBvbmUgb2YgdGhlc2UgbmFtZXMuCgpgYGB7cn0Ka2V5dHlwZXMob3JnLkhzLmVnLmRiKQoKYGBgCgpGb3IgdGhlIGdpdmVuIGtleXR5cGUgd2UgaGF2ZSBjaG9zZW4sIHdlIGNhbiBhbHNvIGNob29zZSB3aGF0IGRhdGEgd2Ugd2FudCB0byByZXRyaWV2ZS4gV2UgY2FuIHRoaW5rIG9mIHRoZXNlIGFzIGNvbHVtbnMgaW4gYSB0YWJsZSwgYW5kIHRoZSBwcmUtZGVmaW5lZCB2YWx1ZXMgYXJlIGdpdmVuIGJ5Oi0KCmBgYHtyfQpjb2x1bW5zKG9yZy5Icy5lZy5kYikKCmBgYAoKCgpgYGB7cn0KZWcgPC0gc2VsZWN0KG9yZy5Icy5lZy5kYiwga2V5cz1jKCJCUkNBMSIsIlBURU4iKSwga2V5dHlwZSA9ICJTWU1CT0wiLGNvbHVtbnMgPSBjKCJSRUZTRVEiLCJFTlNFTUJMIikpCmBgYAoKWW91IHNob3VsZCBzZWUgdGhhdCB0aGUgYWJvdmUgY29tbWFuZCBwcmludHMgYSBtZXNzYWdlIHRvIHRoZSBzY3JlZW46LSBgc2VsZWN0KCkgcmV0dXJuZWQgMTptYW55IG1hcHBpbmcgYmV0d2VlbiBrZXlzIGFuZCBjb2x1bW5zYC4gVGhpcyBpcyBub3QgYW4gZXJyb3IgbWVzc2FnZSBhbmQgUiBoYXMgc3RpbGwgYmVlbiBhYmxlIHRvIGdlbmVyYXRlIHRoZSBvdXRwdXQgcmVxdWVzdGVkLgoKYGBge3J9CmVnCmBgYAoKSW4gdGhpcyBjYXNlLCB3ZSBoYXZlICJtYW55IiIgKHdlbGwsIHR3bykgdmFsdWVzIG9mIEVOU0VNQkwgZm9yIHRoZSBnZW5lIFBURU4uIEluIHByYWN0aWNlIHRoaXMgbWVhbnMgd2UgcHJvYmFibHkgd2FudCB0byB0aGluayBjYXJlZnVsbHkgYWJvdXQgbWVyZ2luZyB0aGlzIGRhdGEgd2l0aCBvdGhlciB0YWJsZXMuCgoKKioqKioqCioqKioqKgoqKioqKioKCiMjIyBFeGVyY2lzZQoKLSBVc2UgdGhlIGBvcmcuSHMuZWcuZGJgIHBhY2thZ2UgdG8gcmV0cmlldmUgdGhlIEVudHJleiBHZW5lIElEIGZvciB0aGUgR2VuZSBgUFJORGAKCmBgYHtyfQoKYGBgCgoqKioqKioKKioqKioqCioqKioqKgoKCgpZb3UgbWlnaHQgZXhwZWN0IHRvIGJlIGFibGUgdG8gcmV0cmlldmUgaW5mb3JtYXRpb24gYWJvdXQgdGhlIGNvb3JkaW5hdGVzIGZvciBhIHBhcnRpY3VsYXIgZ2VuZSB1c2luZyB0aGUgc2FtZSBpbnRlcmZhY2UuIFRoaXMgd2FzIHN1cHBvcnRlZCB1bnRpbCByZWNlbnRseSwgYnV0IHRoZSByZWNvbW1lbmRlZCBhcHByb2FjaCBub3cgaXMgdG8gdXNlIGFub3RoZXIgY2xhc3Mgb2YgcGFja2FnZXMgd2hpY2ggZGVzY3JpYmUgdGhlIHN0cnVjdHVyZSBvZiBnZW5lcyBpbiBtb3JlIGRldGFpbC4KClRoZSBwYWNrYWdlcyB3aXRoIHRoZSBwcmVmaXggYFR4RGIuLi4uYCByZXByZXNlbnQgdGhlIHN0cnVjdHVyZSBvZiBhbGwgZ2VuZXMgZm9yIGEgZ2l2ZW4gb3JnYW5pc20gaW4gYW4gZWZmaWNpZW50IG1hbm5lci4gRm9yIGh1bWFucywgd2UgY2FuIHVzZSB0aGUgcGFja2FnZSBgVHhEYi5Ic2FwaWVucy5VQ1NDLmhnMTkua25vd25HZW5lYCB0byB0ZWxsIHVzIGFib3V0IHRyYW5zY3JpcHRzIGZvciB0aGUgaGcxOSBidWlsZCBvZiB0aGUgZ2Vub21lLiBUaGUgcGFja2FnZSB3YXMgZ2VuZXJhdGVkIHVzaW5nIHRhYmxlcyBmcm9tIHRoZSBVQ1NDIGdlbm9tZSBicm93c2VyCgpBcyB3aXRoIHRoZSBgb3JnLkhzLmVnLmRiYCBwYWNrYWdlIHdlIGNhbiBsb2FkIHRoZSBwYWNrYWdlIGFuZCBpbnNwZWN0IHRoZSBraW5kIG9mIG1hcHBpbmdzIGF2YWlsYWJsZSB0byB1cy4KCmBgYHtyIGV2YWw9RkFMU0V9CmxpYnJhcnkoVHhEYi5Ic2FwaWVucy5VQ1NDLmhnMTkua25vd25HZW5lKQp0eGRiIDwtIFR4RGIuSHNhcGllbnMuVUNTQy5oZzE5Lmtub3duR2VuZQpjb2x1bW5zKHR4ZGIpCmBgYAoKYGBge3IgZXZhbD1GQUxTRX0Ka2V5dHlwZXModHhkYikKYGBgCgpZb3XigJlsbCBzZWUgdGhhdCBhbGwgdGhlIG1hcHBpbmdzIGFyZSByZWdhcmRpbmcgdGhlIGNvb3JkaW5hdGVzIGFuZCBJRHMgb2YgdmFyaW91cyBnZW5vbWljIGZlYXR1cmVzLiBUaGVyZSBpcyBvbmx5IG9uZSB0eXBlIG9mIGlkZW50aWZpZXIgdXNlZCwgaW4gdGhpcyBjYXNlIGBFbnRyZXogSURgLiBJZiB3ZSBrbm93IGEgZ2VuZXMgRW50cmV6IElELCB3ZSBjYW4gZ2V0IHRoZSBleG9uIGNvb3JkaW5hdGVzIHdpdGggdGhlIGZvbGxvd2luZyBxdWVyeS4KCgoKKioqKioqCioqKioqKgoqKioqKioKCiMjIyBFeGVyY2lzZQoKLSBVc2UgdGhlIGBvcmcuSHMuZWcuZGJgIHBhY2thZ2UgdG8gcmV0cmlldmUgdGhlIGV4b24gY29vcmRpbmF0ZXMgZm9yIHRoZSBHZW5lIGBQUk5EYAogICAgKyB5b3Ugd2lsbCBuZWVkIHRvIHVzZSB0aGUgRW50cmV6IGdlbmUgSUQgeW91IGZvdW5kIGluIHRoZSBwcmV2aW91cyBleGVyY2lzZQoKYGBge3IgZXZhbD1GQUxTRX0KCm15Z2VuZSA8LSAKCmBgYAoKKioqKioqCioqKioqKgoqKioqKioKCkl0IGlzIHVzZWZ1bCB0byBiZSBhYmxlIHRvIHJldHJpdmUgdGhlIGNvb3JkaW5hdGVzIGluIHRoaXMgbWFubmVyLiBIb3dldmVyLCB3ZSBzaG91bGQgbm93IGJlIGZhbWlsaWFyIHdpdGggdGhlIHdheSBpbnRlcnZhbHMgY2FuIGJlIHJlcHJlc2VudGVkIHVzaW5nIEdSYW5nZXMuIFdlIGhhdmUgdGhlIGFiaWxpdHkgdG8gY3JlYXRlIGEgR1JhbmdlcyBvYmplY3QgZnJvbSB0aGUgcmVzdWx0Oi0KCmBgYHtyIGV2YWw9RkFMU0V9Cm15LmdyIDwtIEdSYW5nZXMobXlnZW5lJEVYT05DSFJPTSwgSVJhbmdlcyhteWdlbmUkRVhPTlNUQVJULG15Z2VuZSRFWE9ORU5EKSkKbXkuZ3IKYGBgCgpUaGUgYHR4ZGJgIHBhY2thZ2VzIGFsc28gYWxsb3cgdXMgdG8gY29uc3RydWN0IGBHZW5vbWljUmFuZ2VzYCByZXByZXNlbnRhdGlvbnMgZm9yIHRoZSBleG9uIHN0cnVjdHVyZSBvZiBhbGwgZ2VuZXMuIFRoZSByZXN1bHRzIGlzIGEgbGlzdCwgd2hpY2ggdGhlIG5hbWVzIG9mIHRoZSBsaXN0IGJlaW5nIGEgZ2VuZSBJRCBpbiBFbnRyZXogZm9ybWF0CgpgYGB7ciBldmFsPUZBTFNFfQphbGwuZXhvbnMgPC0gZXhvbnNCeSh0eGRiLCAiZ2VuZSIpCmFsbC5leG9ucwpteS5nZW5lIDwtIGFsbC5leG9uc1tbIjIzNjI3Il1dCm15LmdlbmUKYGBgCgpXZSBhcmUgYWxtb3N0IHJlYWR5IHRvIGRvIHRoZSBvdmVybGFwLiBIb3dldmVyLCB0aGVyZSBpcyBhbiBpbmNvbnNpc3RlbmN5IGluIHRoZSBuYW1pbmcgY29udmVudGlvbnMgb2YgdGhlIHR3byBzZXRzIG9mIHJlZ2lvbnMgd2UgYXJlIHRyeWluZyB0byBvdmVybGFwOwoKYGBge3J9CnNlcWxldmVsc1N0eWxlKE5BMTI4NzguY2FsbHMucmFuZ2VzKQpzZXFsZXZlbHNTdHlsZShteS5nZW5lKQpgYGAKCkZvcnR1bmF0ZWx5LCB3ZSBjYW4gcmVuYW1lIHRoZSBjaHJvbW9zb21lIG5hbWVzIG9mIHRoZSB2YXJpYW50cyB3aXRoIGEgc2luZ2xlIGNhbGwuIFdlIGNhbiBhbHNvIG1vZGlmeSB0aGUgb2JqZWN0IHNvIG9ubHkgaW5mb3JtYXRpb24gYWJvdXQgY2hyb21vc29tZSAyMCBpcyByZXRhaW5lZCAobGF0ZXItb24gd2Ugd291bGQgcmVjZWl2ZSBhbiBlcnJvciBkdWUgdG8gdGhlIGBNVGAgY2hyb21vc29tZXMgYmVpbmcgZGlmZmVyZW50IGxlbmd0aCkKCmBgYHtyIGV2YWw9RkFMU0V9CnNlcWxldmVsc1N0eWxlKE5BMTI4NzguY2FsbHMucmFuZ2VzKSA8LSAiVUNTQyIKTkExMjg3OC5jYWxscy5yYW5nZXMgPC0ga2VlcFNlcWxldmVscyhOQTEyODc4LmNhbGxzLnJhbmdlcywgImNocjIwIikKCmBgYAoKVGhlIGAlb3ZlciVgIGZ1bmN0aW9uIGNhbiBiZSBjb21hcHJlIHR3byBzZXRzIG9mIHJhbmdlcyBhbmQgcHJvZHVjZSBhICpsb2dpY2FsIHZlY3RvciouIEVhY2ggZW50cnkgaW4gdGhpcyB2ZWN0b3IgYmVpbmcgd2hldGhlciBhIHBhcnRpY3VsYXIgbG9jYXRpb24gaW4gdGhlIGZpcnN0IHNldCBvZiByYW5nZXMgaXMgcHJlc2VudCBpbiB0aGUgb3RoZXIuIFRoZSB2ZWN0b3IgY2FuIHRoZW4gYmUgdXNlZCB0byBzdWJzZXQgdGhlIHZhcmlhbnRzIGluIHRoZSB1c3VhbCBtYW5uZXIuCgpgYGB7ciBldmFsPUZBTFNFfSAKaW4uZ2VuZSA8LSBOQTEyODc4LmNhbGxzLnJhbmdlcyAlb3ZlciUgbXkuZ2VuZQpOQTEyODc4LmNhbGxzLnJhbmdlc1tpbi5nZW5lXQoKCmBgYAoKV2UgY2FuIGV2ZW4gd3JpdGUtb3V0IGEgdmNmIGZpbGUgY29udGFpbmluZyBqdXN0IHRoZSBwb3NpdGlvbnMgd2UgaGF2ZSBpZGVudGlmaWVkCgpgYGB7cn0Kd3JpdGVWY2YoTkExMjg3OC5jYWxsc1tpbi5nZW5lXSxmaWxlbmFtZSA9ICJzZWxlY3RlZC52YXJpYW50cy52Y2YiKQpgYGAKCklmIGFsbCB3ZSBjYXJlZCBhYm91dCB3YXMgdGhlIG51bWJlciBvZiB2YXJpYW50cywgd2UgY291bGQgdXNlIGBjb3VudE92ZXJsYXBzYAoKYGBge3J9CmNvdW50T3ZlcmxhcHMobXkuZ2VuZSwgTkExMjg3OC5jYWxscy5yYW5nZXMpCmBgYAoKCgoqKioqKioKKioqKioqCioqKioqKgoKIyMjIEV4ZXJjaXNlCgotIE5hdmlnYXRlIHRvIFBSTkQgaW4gSUdWIGFuZCB2ZXJpZnkgdGhhdCB0aGUgbnVtYmVyIG9mIHZhcmlhbnRzIHdlIGhhdmUgaWRlbnRpZmllZCBpcyBjb3JyZWN0Ci0gQ291bnQgdGhlIG51bWJlciBvZiB2YXJpYW50cyBjYWxsZWQgaW4gTkExMjg3OCBmb3IgZWFjaCBvZiB0aGUgcmVnaW9ucyBkZWZpbmVkIGluIHRoZSBmaWxlIGByZWdpb25zLm9mLmludGVyZXN0LmJlZGAKICArIHlvdSBjYW4gaW1wb3J0IHRoaXMgYC5iZWRgIGZpbGUgdXNpbmcgdGhlIGBpbXBvcnRgIGZ1bmN0aW9uIGluIHRoZSBgcnRyYWNrbGF5ZXJgIHBhY2thZ2UKLSAgdGhlIGZpbGUgYHJlZ2lvbnMub2YuaW50ZXJlc3QuYmVkYCBjb250YWlucyBhbGwgZXhvbiBjb29yZGluYXRlcyBvZiBnZW5lcyBvbiBjaHJvbW9zb21lIDIwLiBIb3cgY291bGQgeW91IGdlbmVyYXRlIHN1Y2ggYSBmaWxlPwogICAgKyBISU5UOiBgdW5saXN0KGFsbC5leG9ucylgIHdpbGwgZ2l2ZSBhIGBHUmFuZ2VzYCBvYmplY3Qgd2l0aCBvbmUgZW50cnkgcGVyLWV4b24gKG5vdCBpbiB0aGUgbGlzdCBzdHJ1Y3R1cmUpCiAgICArIEhJTlQ6IGBleHBvcnRgIGluIGBydHJhY2tsYXllcmAgd2lsbCB3cml0ZSBvdXQgdmFyaW91cyBmaWxlIHR5cGVzIGZyb20gYEdSYW5nZXNgIG9iamVjdHMKLSAoT3B0aW9uYWwpIFJlcGVhdCB0aGUgY291bnRpbmcgZXhlcmNpc2UgZnJvbSBhYm92ZSwgYnV0IHRoaXMgdGltZSBjb3VudGluZyB2YXJpYW50cyB3aXRoaW4gaW50cm9ucyBvbiBjaHJvbW9zb21lIDIwCiAgICArIEhJTlQ6IGNoZWNrIG91dCB0aGUgaGVscCBmb3IgYGV4b25zQnlgIHRvIHNlZSB3aGF0IG90aGVyIG9wdGlvbnMgYXJlIGF2YWlsYWJsZSBmb3IgZXh0cmFjdGluZyBnZW5vbWljIGZlYXR1cmVzCgoqKioqKioKKioqKioqCioqKioqKgoKCiMjIFN1bW1hcnkKCi0gV2UgaGF2ZSB1c2VkIGEgcmVzcGVjdGFibGUgZ2Vub3R5cGUgY2FsbGVyIGBmcmVlYmF5ZXNgIHRvIGNhbGwgU05WcyBmcm9tIGEgc2V0IG9mIGhlYWx0aHkgaW5kaXZpZHVhbHMKICAgICsgdGhlcmUgYXJlIG1hbnkgcGFyYW10ZXJzIHRoYXQgY2FuIGJlIHR3ZWFrZWQgdGhhdCB3ZSBoYXZlbid0IGRlc2NyaWJlZCBoZXJlCi0gVGhlIGAudmNmYCBmb3JtYXQgY29udGFpbnMgYSByaWNoIGRlc2NyaXB0aW9uIG9mIHRoZSBjYWxsZWQgdmFyaWFudHMKLSBCaW9jb25kdWNvciB0b29scyBjYW4gYmUgdXNlZCB0byBpbXBvcnQgYW5kIHBhcnNlIGAudmNmYCBmaWxlcwotIFdlIGNhbiB1c2UgYEdlbm9taWNSYW5nZXNgIHRvIG92ZXJsYXAgb3VyIGNhbGxzIHdpdGggb3RoZXIgZ2Vub21pYyBmZWF0dXJlcyBvZiBpbnRlcmVzdAotIFByb2R1Y3Rpb24tbGV2ZWwgbWFuaXB1bGF0aW9uIG9mIGAudmNmYCB3b3VsZCBwcm9iYWJseSBpbnZvbHZlIG90aGVyIG5vbi1SIHRvb2xzCiAgICArIFtiZWR0b29sc10oaHR0cDovL2JlZHRvb2xzLnJlYWR0aGVkb2NzLmlvL2VuL2xhdGVzdC8pCiAgICArIFt2Y2Z0b29sc10oaHR0cHM6Ly92Y2Z0b29scy5naXRodWIuaW8vaW5kZXguaHRtbCkKICAKCiMgPGEgbmFtZT0iYXBwZW5kaXgiPjwvYT4gQXBwZW5kaXgKCiMjIEZpbGVzIHVzZWQgaW4gc2Vzc2lvbgoKQ29tbWFuZHMgdXNlZCB0byBnZW5lcmF0ZSBiYW0gZmlsZXMgZm9yIGdlbm90eXBlIGNhbGxpbmcKCmBgYHt9CnNhbXRvb2xzIHZpZXcgLWggZnRwOi8vZnRwLjEwMDBnZW5vbWVzLmViaS5hYy51ay92b2wxL2Z0cC90ZWNobmljYWwvb3RoZXJfZXhvbWVfYWxpZ25tZW50cy9OQTE5MjM5L2V4b21lX2FsaWdubWVudC9OQTE5MjM5Lm1hcHBlZC5zb2xpZC5tb3NhaWsuWVJJLmV4b21lLjIwMTExMTE0LmJhbSAyMiB8IHNhbXRvb2xzIHZpZXcgLWJTIC0gPiAvZGF0YS9oYXBtYXAvTkExOTIzOS5jaHIyMi5iYW0Kc2FtdG9vbHMgaW5kZXggL2RhdGEvaGFwbWFwL05BMTkyMzkuY2hyMjIuYmFtCgp3Z2V0IGZ0cDovL2Z0cC5uY2JpLm5sbS5uaWguZ292LzEwMDBnZW5vbWVzL2Z0cC9waGFzZTMvZGF0YS9OQTEyODc4L2FsaWdubWVudC9OQTEyODc4LmNocm9tMjAuSUxMVU1JTkEuYndhLkNFVS5sb3dfY292ZXJhZ2UuMjAxMjEyMTEuYmFtIC1PIE5BMTI4NzguY2hyMjAuYmFtCnNhbXRvb2xzIGluZGV4IE5BMTI4NzguY2hyMjAuYmFtCgp3Z2V0IGZ0cDovL2Z0cC5uY2JpLm5sbS5uaWguZ292LzEwMDBnZW5vbWVzL2Z0cC9waGFzZTMvZGF0YS9OQTEyODc0L2FsaWdubWVudC9OQTEyODc0LmNocm9tMjAuSUxMVU1JTkEuYndhLkNFVS5sb3dfY292ZXJhZ2UuMjAxMzA0MTUuYmFtIC1PIE5BMTI4NzQuY2hyMjAuYmFtCnNhbXRvb2xzIGluZGV4IE5BMTI4NzQuY2hyMjAuYmFtCmBgYAoKCgojIFJlZmVyZW5jZXMKCi0gW2ZyZWViYXllcyB0dXRvcmlhbF0oaHR0cDovL2NsYXZpdXMuYmMuZWR1L35lcmlrL0NTSEwtYWR2YW5jZWQtc2VxdWVuY2luZy9mcmVlYmF5ZXMtdHV0b3JpYWwuaHRtbCkKLSBbZnJlZWJheWVzIHR1dG9yaWFsIDJdKGh0dHBzOi8vbGlicmFyaWVzLmlvL2dpdGh1Yi9iZW5qYXlwdW50by9mcmVlYmF5ZXMpCi0gW3ZjZiBmaWxlcyB0dXRvcmlhbF0oaHR0cHM6Ly9mYWN1bHR5Lndhc2hpbmd0b24uZWR1L2Jyb3duaW5nL2JlYWdsZS9pbnRyby10by12Y2YuaHRtbCkKLSBbYmVkdG9vbHMgdHV0b3JpYWxdKGh0dHA6Ly9xdWlubGFubGFiLm9yZy90dXRvcmlhbHMvYmVkdG9vbHMvYmVkdG9vbHMuaHRtbCkKCgo=