Phase 1 of the pipeline is to prepare the input data (V2D, V2G and summary statistics tables) in a standardised way. Workflows are written in Python and run using Snakemake workflow management system to ensure analyses are reproducible and portable. Workflows are run on on a Google Compute instance, or the Sanger Institute cluster, and the output is stored on Google Cloud Storage (GCS).