Downloads and demos¶

This page collects every artifact Registry Forge - Patient Edition produces from the synthetic sample_data/ files, plus the bundled zip. Each artifact embeds an "exploratory" disclaimer in its header so it can't be confused with a clinical product.

Synthetic data, exploratory output

All the artifacts on this page were generated from three synthetic patients (Jane Demo, Joe Demo, Alex Demo). The patients are not real. Distributions, time series, code patterns, and feature values shown are illustrative only. Do not use these outputs for clinical or epidemiological decisions. These demos are intended to show what the tool produces - not what your cohort will look like.

Download the source bundle¶

The fastest way to get started: download the bundle, extract, and pip install .:

Download source bundle (.zip) :material-download:

The repository itself may be private; releases are published as zip artifacts on the public release page. If your organization needs a specific commit, request access from the Boyce Lab.

Download the demo output bundle¶

Every artifact below, plus the source sample_data/ and the master CSV, packaged into a single zip:

Download demo bundle (.zip) :material-folder-download:

The zip is roughly 200 KB. Useful when you want to see the file layout without running the pipeline yourself.

Core outputs¶

Record dashboard¶

A self-contained HTML record explorer. Open it in any browser - no server, no network calls. Search by patient, category, vocabulary, or any text in any column.

Open in a new tab :material-open-in-new:

EDA dashboard¶

A self-contained Chart.js dashboard with category breakdown, vocabulary mix, time series, top diagnoses and medications, top labs/vitals, and demographics.

Open in a new tab :material-open-in-new:

Standards exports¶

OMOP CDM v5.4¶

Seven tables produced from the three demo patients. Download links open the CSV directly:

Table	Demo file	Rows	What's in it
PERSON	PERSON.csv	3	One row per patient
OBSERVATION_PERIOD	OBSERVATION_PERIOD.csv	3	Date span of records per patient
CONDITION_OCCURRENCE	CONDITION_OCCURRENCE.csv	15	Problem-list entries
DRUG_EXPOSURE	DRUG_EXPOSURE.csv	15	Medication entries with SIG
MEASUREMENT	MEASUREMENT.csv	3	Labs and vitals with numeric values
OBSERVATION	OBSERVATION.csv	0	Allergies, social/family history (none in this synthetic cohort)
DEVICE_EXPOSURE	DEVICE_EXPOSURE.csv	0	Implants, durable equipment (none in this synthetic cohort)

These demo files have *_concept_id columns set to 0 because they were generated without an Athena vocabulary attached. Real runs with --omop-vocab=<path> populate them.

GA4GH Phenopackets v2¶

One JSON file per patient. Disease terms use Mondo IDs because Mondo mapping was applied before phenopacket generation:

PT-8E3474A39B.json - Jane Demo (focal epilepsy, 17yo, mainstream)
PT-1E49D84549.json - Joe Demo (Dravet syndrome, 8yo, refractory)
PT-C93A5A73D6.json - Alex Demo (generalized epilepsy, 22yo, mild ID)

Validate with pxf validate <file.json> from the phenopacket-tools CLI.

What's NOT in the demo bundle¶

A complete demo bundle would also include:

profile_master.html and profile_features.html from ydata-profiling
sweetviz_master.html from sweetviz
An OMOP run with *_concept_id columns populated against a real Athena vocab

Those aren't in the public demo because they require the optional [eda] extras or a multi-GB vocab download. Generate them locally:

pip install registryforge-patient[eda]
registryforge-patient parse ./sample_data --output ./out \
    --omop --omop-vocab /path/to/athena/vocab \
    --phenopackets --mondo --eda --flag-notes

Real-data privacy reminder¶

The demos on this page are safe to share because they describe synthetic patients. Outputs from real patient data are not safe to share. Every generated HTML file built with eda_is_phi=True carries a red banner instructing the reader not to email, sync, or commit the file. The same applies to patient_master.csv, the record dashboard, OMOP tables, and Phenopackets. See Privacy & PHI for the full guidance.