Stage 4 -- Joining & assembly¶
Purpose¶
Merge CCDA and FHIR sources into a single coherent patient set, link documents to patients across all formats, deduplicate, and assemble the output bundle.
Patient discovery order¶
Patients can appear from three sources:
- FHIR Patient resources -- authoritative for demographics; processed first.
- CCDA
<recordTarget>-- carries name, MRN, DOB, gender; used to create patients not seen in FHIR, or to fill gaps. uuid_mapping.csvdemographics -- if the mapping CSV hasfirst_name,last_name,mrn,dob, orgendercolumns, those fill in any patients still missing fields.
Each source can only add values, never overwrite. The first non-empty value wins.
Cross-format document linkage¶
A document is linked to a patient via the first match in this chain:
- The document's CCDA
<recordTarget>(if it's a CCDA). uuid_mapping.csvlookup by document UUID.- Global
DocumentReferenceindex built in Stage 3. - MRN-based fallback: the CCDA's MRN normalized (alphanumeric, uppercase) matches a known FHIR patient's normalized MRN.
A document with no resolved patient is still emitted in the bundle's
documents array with patient_id: null -- it shows up in the dashboard
under "Unlinked" but doesn't pollute any patient's record.
Deduplication¶
Patients are merged by (lower(first_name), lower(last_name), dob). All
three keys must be non-empty for a merge to fire. The patient with the
most filled fields becomes canonical; records and documents from the
duplicates are reassigned.
This handles the common case where a single person has multiple Epic patient IDs from organizational mergers, ID re-issuance, or data feed glitches.
Schema aliasing¶
The bundle emits both canonical and downstream-friendly field names so that consumers using either schema work without translation:
| Canonical | Aliases |
|---|---|
effective_date |
effective_datetime, plus per-tab: start_date (encounters, medications), authored_on (medications), onset_datetime (problems), recorded_date (allergies, immunizations), performed_datetime (procedures), occurrence_datetime (immunizations) |
display_name |
allergen (allergies), vaccine (immunizations), visit_type_display (encounters) |
code |
allergen_code (allergies), vaccine_code (immunizations) |
code_system |
allergen_system (allergies), vaccine_system (immunizations) |
Aliases are added in-place; both names refer to the same value.
Computed fields¶
Each patient record gets a computed num_documents field equal to the
count of documents whose patient_id matches.
The bundle also emits a labs_vitals convenience array equal to
labs + vitals for downstream consumers who don't want to merge them
client-side.