Skip to content

Stage 4 -- Joining & assembly

Purpose

Merge CCDA and FHIR sources into a single coherent patient set, link documents to patients across all formats, deduplicate, and assemble the output bundle.

Patient discovery order

Patients can appear from three sources:

  1. FHIR Patient resources -- authoritative for demographics; processed first.
  2. CCDA <recordTarget> -- carries name, MRN, DOB, gender; used to create patients not seen in FHIR, or to fill gaps.
  3. uuid_mapping.csv demographics -- if the mapping CSV has first_name, last_name, mrn, dob, or gender columns, those fill in any patients still missing fields.

Each source can only add values, never overwrite. The first non-empty value wins.

Cross-format document linkage

A document is linked to a patient via the first match in this chain:

  1. The document's CCDA <recordTarget> (if it's a CCDA).
  2. uuid_mapping.csv lookup by document UUID.
  3. Global DocumentReference index built in Stage 3.
  4. MRN-based fallback: the CCDA's MRN normalized (alphanumeric, uppercase) matches a known FHIR patient's normalized MRN.

A document with no resolved patient is still emitted in the bundle's documents array with patient_id: null -- it shows up in the dashboard under "Unlinked" but doesn't pollute any patient's record.

Deduplication

Patients are merged by (lower(first_name), lower(last_name), dob). All three keys must be non-empty for a merge to fire. The patient with the most filled fields becomes canonical; records and documents from the duplicates are reassigned.

This handles the common case where a single person has multiple Epic patient IDs from organizational mergers, ID re-issuance, or data feed glitches.

Schema aliasing

The bundle emits both canonical and downstream-friendly field names so that consumers using either schema work without translation:

Canonical Aliases
effective_date effective_datetime, plus per-tab: start_date (encounters, medications), authored_on (medications), onset_datetime (problems), recorded_date (allergies, immunizations), performed_datetime (procedures), occurrence_datetime (immunizations)
display_name allergen (allergies), vaccine (immunizations), visit_type_display (encounters)
code allergen_code (allergies), vaccine_code (immunizations)
code_system allergen_system (allergies), vaccine_system (immunizations)

Aliases are added in-place; both names refer to the same value.

Computed fields

Each patient record gets a computed num_documents field equal to the count of documents whose patient_id matches.

The bundle also emits a labs_vitals convenience array equal to labs + vitals for downstream consumers who don't want to merge them client-side.