Sample data¶
The documentation includes a complete set of synthetic, deidentified sample data so you can run the pipeline end-to-end without any real EHR access.
Files¶
| File | Size | Description |
|---|---|---|
sample_ccda.xml |
~19 KB | A rich CCDA continuity-of-care document with eight populated sections (Problems, Medications, Vital Signs, Allergies, Immunizations, Procedures, Laboratory Results, Encounters, Plan of Care) for one synthetic ALS patient |
sample_fhir_bundle.json |
~80 KB | A FHIR R4 Bundle of 57 resources covering every category the pipeline extracts |
uuid_mapping.csv |
<1 KB | Document-to-patient mapping with full demographic columns |
ccda_chunks.csv |
~25 KB | The CCDA, base64-encoded and reformatted as the pipeline-ready chunked CSV |
fhir_chunks.csv |
~85 KB | The FHIR bundle reformatted as the pipeline-ready chunked CSV |
The chunked CSVs use the same exact format the Databricks export produces.
Drop them into <work_dir>/CCDA and FHIR data/ and the pipeline reads them
without any further transformation.
The demo patient¶
A synthetic ALS patient at approximately 12 months post-diagnosis, multidisciplinary clinic management, transitioning to NIV and PEG support:
| Field | Value |
|---|---|
| Name | Jane Marie Demo |
| MRN | MRN-DEMO-001 |
| DOB | 1965-03-15 |
| Gender | Female |
| Marital status | Married |
What's in the bundle¶
After running the pipeline against the sample data, the output bundle contains:
| Category | Count | Examples |
|---|---|---|
| Patients | 1 | Jane Marie Demo |
| Documents | 1 | The CCDA continuity-of-care document |
| Encounters | 4 | Initial neurology consult, MDC clinic visits, inpatient admission |
| Problems | 6 | ALS, dysphagia, respiratory weakness, dysarthria, fatigue, depression |
| Medications | 8 | Riluzole, edaravone, baclofen, dextromethorphan-quinidine, trazodone, tizanidine, vitamin D3, lorazepam |
| Procedures | 4 | EMG, MRI brain, BiPAP initiation, PEG tube placement |
| Lab results | 10 | CMP (Na/K/Cl/CO2/BUN/Cr/glucose), Hgb, TSH, vitamin D |
| Vital signs | 7 | BP, HR, RR, weight, SpO2, FVC |
| Allergies | 3 | Penicillin (high), sulfa (moderate), latex (low) |
| Immunizations | 4 | Flu, COVID-19 booster, PCV13, Tdap (with lot numbers) |
| Care plans | 3 | Multidisciplinary care, palliative consult, communication strategies |
| Diagnostic reports | 3 | EMG, MRI brain, pulmonary function testing |
| Goals | 3 | Functional independence, symptom management, advance care planning |
| Section narratives | 9 | One per CCDA section |
| Total records | 84 |
Every record has all relevant fields populated: dates, codes, code systems, display names, status, units, values, dose, route, frequency, severity, clinical status, etc. This lets the dashboard demonstrate every column it can show.
CCDA + FHIR overlap¶
The CCDA and FHIR sources both describe the same patient with overlapping
data. After running the pipeline, the dedupe step merges them into one
patient record with combined evidence. Some records (e.g. blood pressure)
appear from both sources in their respective tabs — the source field
distinguishes them.
This mirrors what real exports look like: the same clinical fact often appears in multiple formats from the EHR's various export pipelines.
Running the demo¶
See the Quickstart for end-to-end instructions. To see the bundle in the dashboard without running anything yourself, jump to Live demo.