Skip to content

Sample data

The documentation includes a complete set of synthetic, deidentified sample data so you can run the pipeline end-to-end without any real EHR access.

Files

File Size Description
sample_ccda.xml ~19 KB A rich CCDA continuity-of-care document with eight populated sections (Problems, Medications, Vital Signs, Allergies, Immunizations, Procedures, Laboratory Results, Encounters, Plan of Care) for one synthetic ALS patient
sample_fhir_bundle.json ~80 KB A FHIR R4 Bundle of 57 resources covering every category the pipeline extracts
uuid_mapping.csv <1 KB Document-to-patient mapping with full demographic columns
ccda_chunks.csv ~25 KB The CCDA, base64-encoded and reformatted as the pipeline-ready chunked CSV
fhir_chunks.csv ~85 KB The FHIR bundle reformatted as the pipeline-ready chunked CSV

The chunked CSVs use the same exact format the Databricks export produces. Drop them into <work_dir>/CCDA and FHIR data/ and the pipeline reads them without any further transformation.

The demo patient

A synthetic ALS patient at approximately 12 months post-diagnosis, multidisciplinary clinic management, transitioning to NIV and PEG support:

Field Value
Name Jane Marie Demo
MRN MRN-DEMO-001
DOB 1965-03-15
Gender Female
Marital status Married

What's in the bundle

After running the pipeline against the sample data, the output bundle contains:

Category Count Examples
Patients 1 Jane Marie Demo
Documents 1 The CCDA continuity-of-care document
Encounters 4 Initial neurology consult, MDC clinic visits, inpatient admission
Problems 6 ALS, dysphagia, respiratory weakness, dysarthria, fatigue, depression
Medications 8 Riluzole, edaravone, baclofen, dextromethorphan-quinidine, trazodone, tizanidine, vitamin D3, lorazepam
Procedures 4 EMG, MRI brain, BiPAP initiation, PEG tube placement
Lab results 10 CMP (Na/K/Cl/CO2/BUN/Cr/glucose), Hgb, TSH, vitamin D
Vital signs 7 BP, HR, RR, weight, SpO2, FVC
Allergies 3 Penicillin (high), sulfa (moderate), latex (low)
Immunizations 4 Flu, COVID-19 booster, PCV13, Tdap (with lot numbers)
Care plans 3 Multidisciplinary care, palliative consult, communication strategies
Diagnostic reports 3 EMG, MRI brain, pulmonary function testing
Goals 3 Functional independence, symptom management, advance care planning
Section narratives 9 One per CCDA section
Total records 84

Every record has all relevant fields populated: dates, codes, code systems, display names, status, units, values, dose, route, frequency, severity, clinical status, etc. This lets the dashboard demonstrate every column it can show.

CCDA + FHIR overlap

The CCDA and FHIR sources both describe the same patient with overlapping data. After running the pipeline, the dedupe step merges them into one patient record with combined evidence. Some records (e.g. blood pressure) appear from both sources in their respective tabs — the source field distinguishes them.

This mirrors what real exports look like: the same clinical fact often appears in multiple formats from the EHR's various export pipelines.

Running the demo

See the Quickstart for end-to-end instructions. To see the bundle in the dashboard without running anything yourself, jump to Live demo.