Skip to content

Installation

Registry Forge - Patient Edition ships as a pip-installable Python package with a command-line interface and importable modules. Notebooks are demos that import the package.

Prerequisites

  • Python 3.9 or newer
  • About 200 MB of disk for dependencies (pandas, numpy)

The source repository is private during early development. Releases are published as public zip assets on GitHub Releases, and that is the supported install path for external users:

pip install https://github.com/BoyceLab/RegistryForge4Patients/releases/latest/download/RegistryForgePatient.zip

To pin a specific version, use the tag URL instead of latest:

pip install https://github.com/BoyceLab/RegistryForge4Patients/releases/download/v0.1.0/RegistryForgePatient.zip

The Release page also publishes a demo output bundle (registry_forge_patient_bundle.zip) showing every artifact the pipeline produces against the synthetic sample patients.

Install from PyPI (when published)

pip install registryforge-patient

This becomes the simplest path once the package is published. Until then, use the Release asset URL above.

Install from a local checkout (maintainers / collaborators with repo access)

git clone https://github.com/BoyceLab/RegistryForge4Patients.git
cd RegistryForge4Patients

# Optional but recommended: virtual environment
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate

# Install in editable mode
pip install -e .

The -e (editable) install means any local changes you make to the source are picked up immediately.

Verify the install

registryforge-patient --version
registryforge-patient --help

You should see version 0.1.0 and a help message listing four subcommands: parse, omop, phenopackets, mondo.

Run the smoke test against the included synthetic data:

registryforge-patient parse ./sample_data --output ./out --omop --phenopackets --mondo

This produces the full output bundle (master CSV, dashboard, feature matrix, OMOP tables, Phenopackets v2 JSONs, Mondo-mapped CSV, and a zip of everything) in ./out/.

Optional extras

# EDA reports (ydata-profiling, sweetviz)
pip install "registryforge-patient[eda]"

# Notebooks (Jupyter, matplotlib, seaborn)
pip install "registryforge-patient[notebook]"

If you installed from a Release asset, the same extras syntax works:

pip install "https://github.com/BoyceLab/RegistryForge4Patients/releases/latest/download/RegistryForgePatient.zip[eda]"

What gets installed

The package installs:

  • A Python module registryforge_patient with submodules: parser, builder, dashboard, identity, phi, harmonize, omop, vocab, mondo, phenopackets, eda, notes, pipeline, cli.
  • A command-line entry point registryforge-patient.

Dependencies: pandas>=1.3 and numpy>=1.21. Everything else is in the Python standard library.

For the demo notebooks under notebooks/, you'll also want Jupyter (or use the [notebook] extras).

Run the demo notebook

jupyter notebook notebooks/01_quickstart.ipynb

The quickstart notebook is a thin wrapper around the package - it imports registryforge_patient.build_outputs and runs it. Useful for interactive exploration of the outputs. For production runs, the CLI is faster.

Privacy note for installation

The package itself makes no network calls. Pip needs network access to download dependencies during install, but that's standard pip behavior - your patient data never leaves your machine through this package.

If your IT setup blocks pip from reaching pypi.org, you can install offline:

# On an internet-connected machine
pip download registryforge-patient pandas numpy -d ./offline_wheels

# Transfer ./offline_wheels to the target machine, then
pip install --no-index --find-links ./offline_wheels registryforge-patient

See Privacy & PHI for more.