Contributing¶
Issues, pull requests, and parser samples are welcome.
What helps most¶
In rough priority order:
- Vendor-specific parsing samples. If your patient-portal export produces zero records or weird records, a scrubbed C-CDA file plus a description of what went wrong is the most useful thing you can contribute.
- Section-code mappings. The
SECTION_TO_CATEGORYtable in the notebook handles common LOINC section codes. If your vendor uses one we don't recognize and you can identify what category it should map to, that's a one-line PR. - Documentation clarifications. If a stage page doesn't match what you actually saw the parser do, that's a bug in the docs.
- Privacy / de-identification tooling. A separate companion notebook that takes
patient_master.csvand produces a Safe Harbor-compatible version would unlock external sharing of dashboards. Big lift but high value. - Code-system harmonization. Mapping SNOMED → ICD-10, or RxNorm → ATC, against a local copy of Athena vocabularies. Optional add-on, not core.
What does NOT help¶
- Untested vendor-specific code paths. Adding logic that handles your specific export but breaks generic CDA isn't useful. Tests against synthetic C-CDA samples are required.
- Sample data containing real PHI. We won't accept any data we can't verify is synthetic or fully scrubbed. Even partial PHI is a hard no.
- Renaming things to match a different schema standard. The Registry Forge schema is what gives this tool interoperability with the parent project. Pick a different name if you want to fork in a different direction.
Development setup¶
git clone https://github.com/BoyceLab/RegistryForge4Patients.git
cd RegistryForge4Patients
python -m venv .venv
source .venv/bin/activate
pip install pandas numpy matplotlib jupyter mkdocs-material
Run the notebook against synthetic samples in notebook/sample_data/. Run the docs site locally with:
It'll open at http://127.0.0.1:8000.
Testing changes to the parser¶
The notebook ships with a test_ccda.xml synthetic document covering the main entry types (problems with SNOMED + ICD-10 translation, medications with RxNorm, labs with LOINC + PQ values, allergies). Run the notebook end-to-end against that file as a smoke test before opening a PR.
For a heavier regression test, run against any synthetic CCDA corpus you have access to. A vendor-neutral synthetic test suite is on the roadmap.
Style¶
- Python: standard
pep8-ish. The notebook is the source of truth; if you're refactoring, keep cell boundaries meaningful. - Docs: short paragraphs, plain prose. Avoid jargon when a clear word exists.
- Pull requests: one logical change per PR. A docs typo and a parser bugfix should be separate PRs.
Code of conduct¶
Be kind. Researchers working with patient-shared data are usually solving a real problem, often for one specific patient or family they care about. Treat questions and bug reports accordingly.
License contributions¶
By submitting a pull request, you agree your contribution is licensed under the project's MIT License.