Comment on page
Data Download
You can get all data from Open Targets Genetics via:
Please note that if you download this data using Google Cloud Storage, all charges to bucket
open-targets-genetics-releases
will be billed to the requester.The list of datasets with each corresponding data schema
Please change the URL tags to their corresponding tables, stated above, as required.
Folder name | Format | Spark Schema | SQL Schema (Clickhouse Dialect) |
---|---|---|---|
variant-index | parquet | - | - |
v2g | jsonl | ||
v2d | jsonl | ||
d2v2g | jsonl | - | |
lut/genes-index | jsonl | - | |
lut/overlap-index | jsonl | - | |
lut/study-index | jsonl | - | |
lut/variant-index | jsonl | - |
The
gsutil
command can be used to preview datasets prior to downloading:gsutil cat 'gs://open-targets-genetics-releases/19.03.04/lut/variant-index/part-*' | head -1 | jq .
{
"chr_id": "1",
"position": 55545,
"ref_allele": "C",
"alt_allele": "T",
"rs_id": "rs28396308",
"most_severe_consequence": "downstream_gene_variant",
"gene_id_any_distance": 13546,
"gene_id_any": "ENSG00000186092",
"gene_id_prot_coding_distance": 13546,
"gene_id_prot_coding": "ENSG00000186092",
"raw": 0.028059,
"phred": 3.065,
"gnomad_afr": 0.3264216148287779,
"gnomad_amr": 0.4533582089552239,
"gnomad_asj": 0.26666666666666666,
"gnomad_eas": 0.35822021116138764,
"gnomad_fin": 0.31313131313131315,
"gnomad_nfe": 0.26266330506532204,
"gnomad_nfe_est": 0.3397858319604613,
"gnomad_nfe_nwe": 0.23609443777511005,
"gnomad_nfe_onf": 0.2256,
"gnomad_nfe_seu": 0.1,
"gnomad_oth": 0.27403846153846156
}
There is an initial bash script you can use in order to load all data into a ClickHouse instance. In that script, you will find lines like this
echo create studies tables
clickhouse-client -m -n < studies_log.sql
gsutil cat "${base_path}/lut/study-index/part-*" | clickhouse-client -h 127.0.0.1 --query="insert into ot.studies_log format JSONEachRow "
clickhouse-client -m -n < studies.sql
clickhouse-client -m -n -q "drop table ot.studies_log;"
Last modified 1mo ago