You can get all data from Open Targets Genetics via:
EMBL-EBI FTP, database Open Targets Genetics​
Google BigQuery, project open-targets-genetics:190505​
​Google Cloud Storage (GCS) paywalled public bucket
Please note that if you download this data using Google Cloud Storage, all charges to bucket open-targets-genetics-releases
will be billed to the requester.
Please refer to the Requester Pays feature for Google Cloud Storage for more detail.
Data | 19.03.03 | 10.05.05 | 20.02.01 |
​Backend scripts​ | 19.03.08 | 19.05.28 | 20.02.03 |
​Spark pipeline​ | 19.03.10 | 19.05.15 | 20.02.01 |
​GraphQL API​ | 19.03.11 | 19.05.26 | 20.02.07 |
The list of datasets with each corresponding data schema
Please change the URL tags to their corresponding tables, stated above, as required.
Folder name | Format | Spark Schema | SQL Schema (Clickhouse Dialect) |
variant-index | parquet | - | - |
v2g | jsonl | ​schema link​ | ​schema link​ |
v2d | jsonl | ​schema link​ | ​schema link​ |
d2v2g | jsonl | - | ​schema link​ |
lut/genes-index | jsonl | - | ​schema link​ |
lut/overlap-index | jsonl | - | ​schema link​ |
lut/study-index | jsonl | - | ​schema link​ |
lut/variant-index | jsonl | - | ​schema link​ |
You can potentially stream the content directly from a Google Cloud Bucket using gsutil
command
gsutil cat 'gs://open-targets-genetics-releases/19.03.04/lut/variant-index/part-*' | head -1 | jq .{"chr_id": "1","position": 55545,"ref_allele": "C","alt_allele": "T","rs_id": "rs28396308","most_severe_consequence": "downstream_gene_variant","gene_id_any_distance": 13546,"gene_id_any": "ENSG00000186092","gene_id_prot_coding_distance": 13546,"gene_id_prot_coding": "ENSG00000186092","raw": 0.028059,"phred": 3.065,"gnomad_afr": 0.3264216148287779,"gnomad_amr": 0.4533582089552239,"gnomad_asj": 0.26666666666666666,"gnomad_eas": 0.35822021116138764,"gnomad_fin": 0.31313131313131315,"gnomad_nfe": 0.26266330506532204,"gnomad_nfe_est": 0.3397858319604613,"gnomad_nfe_nwe": 0.23609443777511005,"gnomad_nfe_onf": 0.2256,"gnomad_nfe_seu": 0.1,"gnomad_oth": 0.27403846153846156}
There is an initial bash script you can use in order to load all data into a ClickHouse instance. In that script, you will find lines like this
echo create studies tablesclickhouse-client -m -n < studies_log.sqlgsutil cat "${base_path}/lut/study-index/part-*" | clickhouse-client -h 127.0.0.1 --query="insert into ot.studies_log format JSONEachRow "clickhouse-client -m -n < studies.sqlclickhouse-client -m -n -q "drop table ot.studies_log;"