Configuration¶
All configuration lives in constants near the top of run_pipeline.py. To
change behavior, edit the file -- there's no separate config file.
Paths¶
BASE_DIR = '/path/to/working/directory/'
CCDA_CSV = os.path.join(BASE_DIR, 'CCDA and FHIR data/ccda_chunks.csv')
FHIR_CSV = os.path.join(BASE_DIR, 'CCDA and FHIR data/fhir_chunks.csv')
CCDA_DIR = os.path.join(BASE_DIR, 'ccda_assembled/')
FHIR_DIR = os.path.join(BASE_DIR, 'fhir_assembled/')
OUT_JSON = os.path.join(BASE_DIR, 'dashboard_data.json')
OUT_XLSX = os.path.join(BASE_DIR, 'dashboard_data.xlsx')
OUT_CSV_DIR = os.path.join(BASE_DIR, 'csv_exports/')
BASE_DIR is the only path you typically need to change. Everything else
is derived from it.
Mapping CSV candidates¶
The pipeline checks several locations for uuid_mapping.csv and uses the
first one that exists:
MAPPING_CANDIDATES = [
os.path.join(BASE_DIR, 'uuid_mapping.csv'),
# add more paths if your mapping lives elsewhere
]
The CSV must have at minimum document_uuid and patient_id columns
(in any order). Optional columns: first_name, last_name, mrn,
dob, gender. The pipeline auto-detects the document UUID column
under any of these names: document_uuid, id, doc_uuid,
fhirdocument_id.
Test-patient exclusions¶
Two sources, both always applied:
Hardcoded list¶
Near the top of run_pipeline.py:
HARDCODED_TEST_EXCLUSIONS = [
'name: Test Patient',
'name_contains: training',
'mrn_contains: 99999',
]
File-based¶
The pipeline checks for test_patients.txt in BASE_DIR. If present,
its rules are merged with the hardcoded list. See
Stage 6 for the rule syntax.
Code-system preferences¶
Which coding to pick when a CodeableConcept has multiple parallel codings:
CODE_SYSTEM_PREF = {
'medication': ['rxnorm', '2.16.840.1.113883.6.88'],
'problem': ['snomed', '2.16.840.1.113883.6.96',
'icd-10', '2.16.840.1.113883.6.90'],
'lab': ['loinc', '2.16.840.1.113883.6.1'],
# ...
}
To prefer ICD-10 over SNOMED for problems, swap the order. The pipeline
matches by substring against both the system name (e.g. "SNOMED-CT")
and the OID, so either works.
Display-enrichment tables¶
LOINC_DISPLAY and SNOMED_DISPLAY are plain Python dicts. Add or
remove entries directly:
Section template OIDs¶
CCDA section template IDs:
TPL = {
'problems_section': '2.16.840.1.113883.10.20.22.2.5',
'medications_section':'2.16.840.1.113883.10.20.22.2.1',
# ...
}
These are HL7 CDA R2 standard OIDs and rarely need changing. If your warehouse uses non-standard template IDs, override them here.
Output disabling¶
To skip Excel output (useful for very large bundles where openpyxl is slow):
To skip CSV exports, do the same for the per-tab CSV loop. JSON output is the canonical format and shouldn't be skipped.