Quickstart¶
Run the pipeline end-to-end against the included sample data in under a minute, no real EHR access required.
1. Set up a working directory¶
2. Drop in the pipeline files¶
Copy these from the project release into the working directory:
3. Drop in the sample data¶
Copy the sample files from docs/assets/ (or download from the
Sample data page) into the working directory:
arc-pipeline-demo/
├── CCDA and FHIR data/
│ ├── ccda_chunks.csv
│ └── fhir_chunks.csv
├── uuid_mapping.csv
├── run_pipeline.py
├── dashboard.html
└── test_patients.txt
4. Adjust the working directory path (one line)¶
Open run_pipeline.py, find the BASE_DIR constant near the top, and set
it to your working directory:
(Trailing slash matters.) See Configuration for the full list of paths.
5. Install dependencies¶
6. Run¶
Expected output (truncated):
[12:00:00] ALS TDI ETL Pipeline -- starting
[12:00:00] Mapping CSV: uuid_mapping.csv (1 rows)
[12:00:00] doc->patient entries: 1
[12:00:00] STAGE 1 -- Decoding & reassembly
[12:00:01] Wrote 1 files to ccda_assembled/
[12:00:01] Wrote 1 files to fhir_assembled/
[12:00:01] STAGE 2+3 -- Parsing CCDA and FHIR
[12:00:01] Format counts: {'ccda_xml': 1}
[12:00:01] Patients (post-mapping fill): 1
[12:00:01] problems: 1
[12:00:01] medications: 0
[12:00:01] ...
[12:00:01] STAGE 4 -- Assembling bundle
[12:00:01] STAGE 5 -- Enriching display names
[12:00:01] Enriched 0 display names
[12:00:01] STAGE 6 -- Filtering test patients
[12:00:01] Total active exclusion rules: 0
[12:00:01] Removed 0 test patients
[12:00:01] Pipeline complete.
7. View the results¶
Open dashboard.html in any modern browser. Click the file picker and
select dashboard_data.json. You'll see one demo patient with their
problem, medication, encounter, and CCDA section narrative.
You can now scale up to your real data by replacing the sample chunked CSVs with output from the Databricks export.