Open Targets Genetics is a variant-centric resource that complements the Open Targets Platform. Users can navigate to the Open Targets Platform from Open Targets Genetics and obtain for example drug information in clinical trials (or already marketed) for any target of interest.
Many multi-allelic sites can be assigned a single rsID, and some rsIDs can point to different positions in the genome. This means that rsIDs are not unique to a single variant. We have mapped all rsIDs from GWAS Catalog to unique variants. A small minority of rsIDs will map to multiple variant IDs (approximately 0.6% of lead variants). When this occurs, variants will be duplicated in the portal.
Lead variant is the variant at a given associated locus with the most significant (smallest) p-value whereas Tag variant is the variant that is correlated with the lead variant (r2>0.7) or present in the credible set at a GWAS-associated signal.
The effect allele is the allele whose effects in relation to disease are being studied. In Open Targets Genetics, this is always the alternative allele.
The direction of the effect of the alternative allele can be obtained from the PheWAS plot. If the association has a positive beta coefficient, this means the alternative (effect allele) allele increases the risk. If this value is negative, the alternative allele (effect allele) decreases the risk.
Effect sizes are derived from summary statistics, where available, otherwise they are taken from GWAS Catalog curated data. All effects have been harmonised to be with respect to the alternative allele.
An effect size may not be shown in the portal if: (1) the effect was not curated for that association by GWAS Catalog; (2) the variant is palindromic as it is not possible to accurately infer the strand, and so direction; (3) The reported risk allele is not-concordant with the alleles in our variant index; (4) the rsID to variant ID mapping was ambiguous (not one-to-one).
Sometimes GWAS Catalog data has been curated from multiple tables in a publication, some with betas, others with odds ratios. In these cases a mixture of betas and odds ratios may be displayed for a single study.
For every single variant that is independently and significantly associated with one study, we will display individual beta coefficient values with respect to the alternative allele of each of these variants, such as variants 19_44886339_G_A, 19_44908822_C_T, 1_109274968_G_T associated with LDL cholesterol.
On the other hand, we display the study beta coefficient in the colocalisation table of the study locus page e.g. LDL cholesterol (GCST002222) with locus around 19_44886339_G_A (rs7254892). This beta is with respect to the alternative allele of a single variant, the lead variant at the top of the study locus page (i.e. rs7254892 for the LDL cholesterol study).
The reason we have decided to display the study beta it to facilitate the comparison of the direction of effect across different colocalising tissues.
Summary statistics is the aggregate p-values and association data for every variant analysed in a genome-wide association study.
Linkage disequilibrium is calculated using the 1000 Genome Phase 3 reference panel. If your variant is not in this panel post-QC (MAF > 1% and max missing rate < 0.05) then we will not provide any LD information for it.
Fine-mapping can only be conducted for studies that we have full summary statistics for. Currently this consists of a subset of GWAS catalog studies, UK Biobank summary statistics from the Neale lab and credible set derived by FINNGEN. We encourage the scientific community to submit their full summary statistics to the GWAS Catalog.
Our variant index is built from the gnomAD (v2.1) site list, filtered to keep only variants with minor allele frequency > 0.1% in any population (code). If a variant is not in our index, it will not exist in the portal.
We apply a multiple testing correction that is different from the GTEx method. We use a method that is applicable across datasets, and not all datasets conduct a permutation analysis. We use a Bonferroni correction based on the number of variants tested per gene, i.e. p < 0.05 / (number of tests per gene). For example, GTEx assigns rs4734621 (8_102432699_T_C) to UBR5 whereas our V2G pipeline assigns it to both ODF1 and NCALD. More details on filtering can be found in the pre-processing help page
Credible set information is available for all studies that have gone through our fine-mapping pipeline. The full set of variants in the 95% credible set can be downloaded using the Tag Variant table on the Variant page for the lead-variant at your locus of interest.
The data used to calculate V2G scores are already pre-filtered to remove associations with low evidence after multiple testing procedures are applied. Therefore, any
pair with a non-zero score has at least one good string of evidence in the data. The higher the V2G score, the more evidence there is for a functional association.
Sample case counts are stored as part of a text string in GWAS Catalog. This makes the information difficult to parse reliably. We have decided not to show case numbers for these studies.
The reference (ref) and alternative (alt) alleles can be determined by looking at the variant ID, which takes the form: chromosome_position_reference_alternative
The ref allele refers to the base that is found in the reference genome, currently GRCh38 in the portal. The alt allele refers to any base, other than the reference, that is found at the locus. The alt allele is not necessarily the minor allele.
For example, if we look at rs2476601 (1_114377568_A_G). A is the ref, and G is the alt. The allele frequencies and effect are with respect to the alt. So the G has frequency of 0.88 in Non-Finnish European, making it the major allele and A the minor allele.
There can be more than 1 alt allele per position in the genome, in which case they will appear as two separate variant IDs in the portal.
Using ref/alt, as opposed to major/minor, keeps things consistent across studies/populations.