Stage 4 -- Joining & assembly¶

Purpose¶

Merge CCDA and FHIR sources into a single coherent patient set, link documents to patients across all formats, deduplicate, and assemble the output bundle.

Patient discovery order¶

Patients can appear from three sources:

FHIR Patient resources -- authoritative for demographics; processed first.
CCDA <recordTarget> -- carries name, MRN, DOB, gender; used to create patients not seen in FHIR, or to fill gaps.
uuid_mapping.csv demographics -- if the mapping CSV has first_name, last_name, mrn, dob, or gender columns, those fill in any patients still missing fields.

Each source can only add values, never overwrite. The first non-empty value wins.

Cross-format document linkage¶

A document is linked to a patient via the first match in this chain:

The document's CCDA <recordTarget> (if it's a CCDA).
uuid_mapping.csv lookup by document UUID.
Global DocumentReference index built in Stage 3.
MRN-based fallback: the CCDA's MRN normalized (alphanumeric, uppercase) matches a known FHIR patient's normalized MRN.

A document with no resolved patient is still emitted in the bundle's documents array with patient_id: null -- it shows up in the dashboard under "Unlinked" but doesn't pollute any patient's record.

Deduplication¶

Patients are merged by (lower(first_name), lower(last_name), dob). All three keys must be non-empty for a merge to fire. The patient with the most filled fields becomes canonical; records and documents from the duplicates are reassigned.

This handles the common case where a single person has multiple Epic patient IDs from organizational mergers, ID re-issuance, or data feed glitches.

Schema aliasing¶

The bundle emits both canonical and downstream-friendly field names so that consumers using either schema work without translation:

Canonical	Aliases
`effective_date`	`effective_datetime`, plus per-tab: `start_date` (encounters, medications), `authored_on` (medications), `onset_datetime` (problems), `recorded_date` (allergies, immunizations), `performed_datetime` (procedures), `occurrence_datetime` (immunizations)
`display_name`	`allergen` (allergies), `vaccine` (immunizations), `visit_type_display` (encounters)
`code`	`allergen_code` (allergies), `vaccine_code` (immunizations)
`code_system`	`allergen_system` (allergies), `vaccine_system` (immunizations)

Aliases are added in-place; both names refer to the same value.

Computed fields¶

Each patient record gets a computed num_documents field equal to the count of documents whose patient_id matches.

The bundle also emits a labs_vitals convenience array equal to labs + vitals for downstream consumers who don't want to merge them client-side.