Data Download
You can get all data from Open Targets Genetics via:
Please note that if you download this data using Google Cloud Storage, all charges to bucket
open-targets-genetics-releases
will be billed to the requester. The list of datasets with each corresponding data schema
Please change the URL tags to their corresponding tables, stated above, as required.
Folder name | Format | Spark Schema | SQL Schema (Clickhouse Dialect) |
variant-index | parquet | - | - |
v2g | jsonl | ||
v2d | jsonl | ||
d2v2g | jsonl | - | |
lut/genes-index | jsonl | - | |
lut/overlap-index | jsonl | - | |
lut/study-index | jsonl | - | |
lut/variant-index | jsonl | - |
The
gsutil
command can be used to preview datasets prior to downloading:gsutil cat 'gs://open-targets-genetics-releases/19.03.04/lut/variant-index/part-*' | head -1 | jq .
{
"chr_id": "1",
"position": 55545,
"ref_allele": "C",
"alt_allele": "T",
"rs_id": "rs28396308",
"most_severe_consequence": "downstream_gene_variant",
"gene_id_any_distance": 13546,
"gene_id_any": "ENSG00000186092",
"gene_id_prot_coding_distance": 13546,
"gene_id_prot_coding": "ENSG00000186092",
"raw": 0.028059,
"phred": 3.065,
"gnomad_afr": 0.3264216148287779,
"gnomad_amr": 0.4533582089552239,
"gnomad_asj": 0.26666666666666666,
"gnomad_eas": 0.35822021116138764,
"gnomad_fin": 0.31313131313131315,
"gnomad_nfe": 0.26266330506532204,
"gnomad_nfe_est": 0.3397858319604613,
"gnomad_nfe_nwe": 0.23609443777511005,
"gnomad_nfe_onf": 0.2256,
"gnomad_nfe_seu": 0.1,
"gnomad_oth": 0.27403846153846156
}
There is an initial bash script you can use in order to load all data into a ClickHouse instance. In that script, you will find lines like this
echo create studies tables
clickhouse-client -m -n < studies_log.sql
gsutil cat "${base_path}/lut/study-index/part-*" | clickhouse-client -h 127.0.0.1 --query="insert into ot.studies_log format JSONEachRow "
clickhouse-client -m -n < studies.sql
clickhouse-client -m -n -q "drop table ot.studies_log;"
Last modified 1yr ago