FAQs

What genome build are data in the Genetics Portal based on?

All data are based on GRCh38 from the Genome Reference Consortium.

The Genetics Portal is a variant-centric resource that complements the Open Targets Platform. Users can navigate to the Open Targets Platform from our Genetics portal and drug information in clinical trials (or already marketed) for any target of interest.

How do I cite discoveries made using Open Targets Genetics?

Please cite our latest paper Open Targets Platform: new developments and updates two years on.

How can I stay informed about new features and developments?

For updates, please subscribe to our newsletter. Questions and feedback can be directed to us via email.

Why are there two variants mapped to the same rsID?

RsIDs are not unique to a single variant. Many multi-allelic sites are assigned a single rsID, and some rsIDs can point to different positions in the genome. We have mapped all rsIDs from GWAS Catalog to unique variants. A small minority of rsIDs will map to multiple variant IDs (approximately 0.6% of lead variants). When this occurs, variants will be duplicated in the portal.

Why are betas and odds ratios displayed inconsistently in the portal?

Effect sizes are derived from summary statistics, where available, otherwise they are taken from GWAS Catalog curated data. All effects have been harmonised to be with respect to the alternative allele.

An effect size may not be shown in the portal if: (1) the effect was not curated for that association by GWAS Catalog; (2) the variant is palindromic as it is not possible to accurately infer the strand, and so direction; (3) The reported risk allele is not-concordant with the alleles in our variant index; (4) the rsID to variant ID mapping was ambiguous (not one-to-one).

Sometimes GWAS Catalog data has been curated from multiple tables in a publication, some with betas, others with odds ratios. In these cases a mixture of betas and odds ratios may be displayed for a single study.

Why is there no LD information for my associated-locus of interest?

Linkage disequilibrium is calculated using the 1000 Genome Phase 3 reference panel. If your variant is not in this panel post-QC (MAF > 1% and max missing rate < 0.05) then we will not provide any LD information for it.

Why is there no credible set information for my associated-locus of interest?

Fine-mapping can only be conducted for studies that we have full summary statistics for. Currently this only consists of UK Biobank summary statistics from the Neale lab. We are currently working with the GWAS Catalog to create a summary statistic repository, which will then be included in the Open Targets Genetics. We encourage the scientific community to submit their full summary statistics to the GWAS Catalog.

Why isn't my variant in the portal?

Our variant index is built from the gnomAD (v2.1) site list, filtered to keep only variants with minor allele frequency > 0.1% in any population (code). If a variant is not in our index, it will not exist in the portal.

Why doesn't my variant report the GTEx QTL?

We apply a multiple testing correction that is different from GTEx method. We use a method that is applicable across datasets, and not all datasets conduct a permutation analysis. We use a Bonferroni correction based on the number of variants tested per gene, i.e. p < 0.05 / (number of tests per gene). For example, GTEx assigns rs4734621 (8_102432699_T_C) UBR5 whereas our V2G pipeline assigns it to ODF1 and NCALD. More details on filtering can be found in the pre-processing help page

How do I download the credible set of variants for an association of interest?

Credible set information is available for all studies that have gone through our fine-mapping pipeline. The full set of variants in the 95% credible set can be downloaded using the Tag Variant table on the Variant page for the lead-variant at your locus of interest.

What Variant-to-Gene (V2G) score threshold should I use as a "significance" cut-off?

The V2G scores are intended as a way to rank genes based on all available functional data. We do not provide an arbitrary cut-off for V2G scores.

The data used to calculate V2G scores are already pre-filtered to remove associations with low evidence after multiple testing procedures are applied. Therefore, any (V,G)(V,G) pair with a non-zero score has at least one good string of evidence in the data. The higher the V2G score, the more evidence there is for a functional association.

Why are case counts missing for some case-control studies?

Sample case counts are stored as part of a text string in GWAS Catalog. This makes the information difficult to parse reliably. We have decided not to show case numbers for these studies.

What is the alternative allele? Why not use the minor allele?

The reference (ref) and alternative (alt) alleles can be determined by looking at the variant ID, which takes the form: chromosome_position_reference_alternative

The ref allele refers to the base that is found in the reference genome, currently GRCh38 in the portal. The alt allele refers to any base, other than the reference, that is found at the locus. The alt allele is not necessarily the minor allele.

For example, if we look at rs2476601 (1_114377568_A_G). A is the ref, and G is the alt. The allele frequencies and effect are with respect to the alt. So the G has frequency of 0.88 in Non-Finnish European, making it the major allele and A the minor allele.

There can be more than 1 alt allele per position in the genome, in which case they will appear as two separate variant IDs in the portal.

Using ref/alt, as opposed to major/minor, keeps things consistent across studies/populations.

Why is the number of independently associated loci different in the portal compared to the study's publication?

We report any association that is curated by the GWAS Catalog (see inclusion criteria), except for a subset of studies (N=162) for which we apply an additional step of distance based clumping (±500kb).