Skip to content

Stage 3 -- FHIR resource extraction

Purpose

Walk each FHIR Bundle and emit structured records by category. Cross-bundle references resolve via a global pre-pass.

Resource types extracted

13 FHIR R4 resource types map to dashboard tabs:

FHIR resource Bundle key
Patient patients
Condition problems
Observation (vital-signs category) vitals
Observation (other) labs
MedicationRequest / MedicationStatement / MedicationAdministration medications
Procedure procedures
Encounter encounters
AllergyIntolerance allergies
Immunization immunizations
CarePlan careplans
DiagnosticReport diagnostic_reports
Goal goals
DocumentReference document_references (also used for cross-bundle subject linkage)

Resources that are not patient-scoped (Practitioner, Organization, Location, Device, etc.) are skipped intentionally -- they're not part of the patient record.

Code-system-aware coding walk

A FHIR CodeableConcept can carry multiple parallel codings (RxNorm + NDC for medications, SNOMED + ICD-10 for problems). The "right" code depends on the domain:

Domain Preferred system
Medication RxNorm
Problem SNOMED, then ICD-10
Lab LOINC
Vital LOINC
Procedure CPT, then SNOMED
Allergy RxNorm, then SNOMED
Immunization CVX
Encounter CPT, then SNOMED

fhir_prioritize_coding(concept, domain) walks every coding, identifies each one's system from the URL or name, and returns the highest-preference match. The full list of codings is preserved as all_codings on the record so downstream code can audit the choice.

The same logic applies to CCDA <code> elements via extract_all_codings(element), which descends into every <translation> child. This is where outer NDC + inner RxNorm pairs are correctly resolved to the RxNorm code.

Cross-bundle pre-pass

# Before the main FHIR pass, walk every bundle once to build:
global_med_index = {}        # Medication resource id/URL -> CodeableConcept
global_docref_pid = {}       # DocumentReference id/URL -> patient_id

The main pass then uses these indices when:

  • A MedicationRequest.medicationReference points to a Medication that isn't in the current bundle (common in per-resource-type-per-bundle exports).
  • A document UUID isn't in uuid_mapping.csv but a FHIR DocumentReference for it exists in some bundle and that DocumentRef has a subject.reference.

The pre-pass is one extra disk read per bundle. For 1,800 bundles it takes a few seconds.

Defensive parsing

FHIR bundles in the wild contain malformed entries: strings where dicts are expected, missing required fields, occasional encoding artifacts. The parser:

  • Wraps each resource in try/except and logs the failure rather than aborting the run.
  • Guards every dict.get call with isinstance(x, dict) checks before recursing.
  • Skips resources without a resolvable patient_id rather than emitting orphan records.

A typical export sees <2% of bundles error out. The errors are logged with the file name and exception type for follow-up.