Stage 3 -- FHIR resource extraction¶
Purpose¶
Walk each FHIR Bundle and emit structured records by category. Cross-bundle references resolve via a global pre-pass.
Resource types extracted¶
13 FHIR R4 resource types map to dashboard tabs:
| FHIR resource | Bundle key |
|---|---|
Patient |
patients |
Condition |
problems |
Observation (vital-signs category) |
vitals |
Observation (other) |
labs |
MedicationRequest / MedicationStatement / MedicationAdministration |
medications |
Procedure |
procedures |
Encounter |
encounters |
AllergyIntolerance |
allergies |
Immunization |
immunizations |
CarePlan |
careplans |
DiagnosticReport |
diagnostic_reports |
Goal |
goals |
DocumentReference |
document_references (also used for cross-bundle subject linkage) |
Resources that are not patient-scoped (Practitioner, Organization,
Location, Device, etc.) are skipped intentionally -- they're not part
of the patient record.
Code-system-aware coding walk¶
A FHIR CodeableConcept can carry multiple parallel codings (RxNorm + NDC
for medications, SNOMED + ICD-10 for problems). The "right" code depends on
the domain:
| Domain | Preferred system |
|---|---|
| Medication | RxNorm |
| Problem | SNOMED, then ICD-10 |
| Lab | LOINC |
| Vital | LOINC |
| Procedure | CPT, then SNOMED |
| Allergy | RxNorm, then SNOMED |
| Immunization | CVX |
| Encounter | CPT, then SNOMED |
fhir_prioritize_coding(concept, domain) walks every coding, identifies
each one's system from the URL or name, and returns the highest-preference
match. The full list of codings is preserved as all_codings on the
record so downstream code can audit the choice.
The same logic applies to CCDA <code> elements via
extract_all_codings(element), which descends into every <translation>
child. This is where outer NDC + inner RxNorm pairs are correctly resolved
to the RxNorm code.
Cross-bundle pre-pass¶
# Before the main FHIR pass, walk every bundle once to build:
global_med_index = {} # Medication resource id/URL -> CodeableConcept
global_docref_pid = {} # DocumentReference id/URL -> patient_id
The main pass then uses these indices when:
- A
MedicationRequest.medicationReferencepoints to a Medication that isn't in the current bundle (common in per-resource-type-per-bundle exports). - A document UUID isn't in
uuid_mapping.csvbut a FHIRDocumentReferencefor it exists in some bundle and that DocumentRef has asubject.reference.
The pre-pass is one extra disk read per bundle. For 1,800 bundles it takes a few seconds.
Defensive parsing¶
FHIR bundles in the wild contain malformed entries: strings where dicts are expected, missing required fields, occasional encoding artifacts. The parser:
- Wraps each resource in
try/exceptand logs the failure rather than aborting the run. - Guards every
dict.getcall withisinstance(x, dict)checks before recursing. - Skips resources without a resolvable
patient_idrather than emitting orphan records.
A typical export sees <2% of bundles error out. The errors are logged
with the file name and exception type for follow-up.