Overview

Schematic outlining the data model and main data sources used in Open Targets Genetics

The aim of Open Targets Genetics is to aggregate evidence linking (i) variants to disease, and (ii) variants to genes, so that for a specific disease potential drug targets can be prioritised based on robust genetic information.

Disease association information (= study) is obtained from genome-wide association studies (GWAS) which link disease status (or other trait measurements) to common genetic variation. Due to how GWAS results are reported, we often only know the lead variant () at each associated locus. However, it cannot be assumed that the lead variant is causing the association, instead, we expand the lead variant to include all tag variants (), which make up a more complete set of potentially causal variants. The lead to tag expansions are made using two methods: (i) fine-mapping / credible set analysis, where full summary statistics are available; (ii) linkage-disequilibrium expansion.

Given a set of potentially causal tag variants, we next assign these to genes () using our variant-to-gene (V2G) pipeline. The V2G pipeline combines data from three main sources:

  1. Molecular phenotype quantitative trait loci experiments (e.g. eQTLs and pQTLs)

  2. Chromatin interaction experiments (e.g. Promoter Capture Hi-C)

  3. In silico functional predictions (e.g. Variant Effect Predictor from Ensembl)

  4. Distance from the canonical transcript start site (TSS)

For each variant, the pipeline first assigns functional evidence to variant-gene pairs (V, G) across all sources, then applies a scoring algorithm to produce aggregated V2G scores. Detailed methods can be found here.

‚Äč