Data Download
You can get all data from Open Targets Genetics via:
Please note that if you download this data using Google Cloud Storage, all charges to bucket open-targets-genetics-releases will be billed to the requester.
Please refer to the Requester Pays feature for Google Cloud Storage for more detail.

Versioning Table

Data
19.03.03
10.05.05
20.02.01
21.06.08
19.03.08
19.05.28
20.02.03
21.06.02
19.03.10
19.05.15
20.02.01
21.06.03
19.03.11
19.05.26
20.02.07
21.06.02

Data Schema

The list of datasets with each corresponding data schema
Please change the URL tags to their corresponding tables, stated above, as required.
Folder name
Format
Spark Schema
SQL Schema (Clickhouse Dialect)
variant-index
parquet
-
-
v2g
jsonl
v2d
jsonl
d2v2g
jsonl
-
lut/genes-index
jsonl
-
lut/overlap-index
jsonl
-
lut/study-index
jsonl
-
lut/variant-index
jsonl
-

Some Tips

Checking some lines from the GCS Bucket

You can potentially stream the content directly from a Google Cloud Bucket using gsutil command
1
gsutil cat 'gs://open-targets-genetics-releases/19.03.04/lut/variant-index/part-*' | head -1 | jq .
2
{
3
"chr_id": "1",
4
"position": 55545,
5
"ref_allele": "C",
6
"alt_allele": "T",
7
"rs_id": "rs28396308",
8
"most_severe_consequence": "downstream_gene_variant",
9
"gene_id_any_distance": 13546,
10
"gene_id_any": "ENSG00000186092",
11
"gene_id_prot_coding_distance": 13546,
12
"gene_id_prot_coding": "ENSG00000186092",
13
"raw": 0.028059,
14
"phred": 3.065,
15
"gnomad_afr": 0.3264216148287779,
16
"gnomad_amr": 0.4533582089552239,
17
"gnomad_asj": 0.26666666666666666,
18
"gnomad_eas": 0.35822021116138764,
19
"gnomad_fin": 0.31313131313131315,
20
"gnomad_nfe": 0.26266330506532204,
21
"gnomad_nfe_est": 0.3397858319604613,
22
"gnomad_nfe_nwe": 0.23609443777511005,
23
"gnomad_nfe_onf": 0.2256,
24
"gnomad_nfe_seu": 0.1,
25
"gnomad_oth": 0.27403846153846156
26
}
Copied!

Loading data into a ClickHouse instance

There is an initial bash script you can use in order to load all data into a ClickHouse instance. In that script, you will find lines like this
1
echo create studies tables
2
clickhouse-client -m -n < studies_log.sql
3
gsutil cat "${base_path}/lut/study-index/part-*" | clickhouse-client -h 127.0.0.1 --query="insert into ot.studies_log format JSONEachRow "
4
clickhouse-client -m -n < studies.sql
5
clickhouse-client -m -n -q "drop table ot.studies_log;"
Copied!
Last modified 4mo ago