Skip to content

Downloads and demos

This page collects every artifact Registry Forge - Patient Edition produces from the synthetic sample_data/ files, plus the bundled zip. Each artifact embeds an "exploratory" disclaimer in its header so it can't be confused with a clinical product.

Synthetic data, exploratory output

All the artifacts on this page were generated from three synthetic patients (Jane Demo, Joe Demo, Alex Demo). The patients are not real. Distributions, time series, code patterns, and feature values shown are illustrative only. Do not use these outputs for clinical or epidemiological decisions. These demos are intended to show what the tool produces - not what your cohort will look like.

Download the source bundle

The fastest way to get started: download the bundle, extract, and pip install .:

Download source bundle (.zip) :material-download:

The repository itself may be private; releases are published as zip artifacts on the public release page. If your organization needs a specific commit, request access from the Boyce Lab.

Download the demo output bundle

Every artifact below, plus the source sample_data/ and the master CSV, packaged into a single zip:

Download demo bundle (.zip) :material-folder-download:

The zip is roughly 200 KB. Useful when you want to see the file layout without running the pipeline yourself.

Core outputs

Record dashboard

A self-contained HTML record explorer. Open it in any browser - no server, no network calls. Search by patient, category, vocabulary, or any text in any column.

Open in a new tab :material-open-in-new:

EDA dashboard

A self-contained Chart.js dashboard with category breakdown, vocabulary mix, time series, top diagnoses and medications, top labs/vitals, and demographics.

Open in a new tab :material-open-in-new:

Standards exports

OMOP CDM v5.4

Seven tables produced from the three demo patients. Download links open the CSV directly:

Table Demo file Rows What's in it
PERSON PERSON.csv 3 One row per patient
OBSERVATION_PERIOD OBSERVATION_PERIOD.csv 3 Date span of records per patient
CONDITION_OCCURRENCE CONDITION_OCCURRENCE.csv 15 Problem-list entries
DRUG_EXPOSURE DRUG_EXPOSURE.csv 15 Medication entries with SIG
MEASUREMENT MEASUREMENT.csv 3 Labs and vitals with numeric values
OBSERVATION OBSERVATION.csv 0 Allergies, social/family history (none in this synthetic cohort)
DEVICE_EXPOSURE DEVICE_EXPOSURE.csv 0 Implants, durable equipment (none in this synthetic cohort)

These demo files have *_concept_id columns set to 0 because they were generated without an Athena vocabulary attached. Real runs with --omop-vocab=<path> populate them.

GA4GH Phenopackets v2

One JSON file per patient. Disease terms use Mondo IDs because Mondo mapping was applied before phenopacket generation:

Validate with pxf validate <file.json> from the phenopacket-tools CLI.

What's NOT in the demo bundle

A complete demo bundle would also include:

  • profile_master.html and profile_features.html from ydata-profiling
  • sweetviz_master.html from sweetviz
  • An OMOP run with *_concept_id columns populated against a real Athena vocab

Those aren't in the public demo because they require the optional [eda] extras or a multi-GB vocab download. Generate them locally:

pip install registryforge-patient[eda]
registryforge-patient parse ./sample_data --output ./out \
    --omop --omop-vocab /path/to/athena/vocab \
    --phenopackets --mondo --eda --flag-notes

Real-data privacy reminder

The demos on this page are safe to share because they describe synthetic patients. Outputs from real patient data are not safe to share. Every generated HTML file built with eda_is_phi=True carries a red banner instructing the reader not to email, sync, or commit the file. The same applies to patient_master.csv, the record dashboard, OMOP tables, and Phenopackets. See Privacy & PHI for the full guidance.