Data Download

Download Options

You can get all data from Open Targets Genetics data via:

Please refer to the Requester Pays feature for Google Cloud Storage.

Although Open Targets makes this data publicly available, all data charges for the bucket open-targets-genetics-releases will be billed to the requester.

We are currently copying this data to the EMBL-EBI public FTP / databases / Open Targets site as well; the final URL will be added to this post shortly.

Versioning Table

Data

19.03.03

Backend scripts

19.03.08

Spark pipeline

19.03.10

GraphQL API

19.03.11

Data Schema

The list of datasets with each corresponding data schema

please change as required the url tags to the corresponding table stated above

Folder name

Format

Spark Schema

SQL Schema (Clickhouse Dialect)

variant-index

parquet

-

-

v2g

jsonl

schema link

schema link

v2d

jsonl

schema link

schema link

d2v2g

jsonl

-

schema link

lut/genes-index

jsonl

-

schema link

lut/overlap-index

jsonl

-

schema link

lut/study-index

jsonl

-

schema link

lut/variant-index

jsonl

-

schema link

Some Tips

Checking some lines from the GCS Bucket

You can potentially stream the content directly from a Google Cloud Bucket using gsutil command

gsutil cat 'gs://open-targets-genetics-releases/19.03.04/lut/variant-index/part-*' | head -1 | jq .
{
"chr_id": "1",
"position": 55545,
"ref_allele": "C",
"alt_allele": "T",
"rs_id": "rs28396308",
"most_severe_consequence": "downstream_gene_variant",
"gene_id_any_distance": 13546,
"gene_id_any": "ENSG00000186092",
"gene_id_prot_coding_distance": 13546,
"gene_id_prot_coding": "ENSG00000186092",
"raw": 0.028059,
"phred": 3.065,
"gnomad_afr": 0.3264216148287779,
"gnomad_amr": 0.4533582089552239,
"gnomad_asj": 0.26666666666666666,
"gnomad_eas": 0.35822021116138764,
"gnomad_fin": 0.31313131313131315,
"gnomad_nfe": 0.26266330506532204,
"gnomad_nfe_est": 0.3397858319604613,
"gnomad_nfe_nwe": 0.23609443777511005,
"gnomad_nfe_onf": 0.2256,
"gnomad_nfe_seu": 0.1,
"gnomad_oth": 0.27403846153846156
}

Loading data into a ClickHouse instance

There is an initial bash script you can use in order to load all data into a ClickHouse instance. In that script, you will find lines like this

echo create studies tables
clickhouse-client -m -n < studies_log.sql
gsutil cat "${base_path}/lut/study-index/part-*" | clickhouse-client -h 127.0.0.1 --query="insert into ot.studies_log format JSONEachRow "
clickhouse-client -m -n < studies.sql
clickhouse-client -m -n -q "drop table ot.studies_log;"