Skip to content

Quickstart

Run the pipeline end-to-end against the included sample data in under a minute, no real EHR access required.

1. Set up a working directory

mkdir arc-pipeline-demo
cd arc-pipeline-demo
mkdir "CCDA and FHIR data"

2. Drop in the pipeline files

Copy these from the project release into the working directory:

arc-pipeline-demo/
├── run_pipeline.py
├── dashboard.html
└── test_patients.txt

3. Drop in the sample data

Copy the sample files from docs/assets/ (or download from the Sample data page) into the working directory:

arc-pipeline-demo/
├── CCDA and FHIR data/
│   ├── ccda_chunks.csv
│   └── fhir_chunks.csv
├── uuid_mapping.csv
├── run_pipeline.py
├── dashboard.html
└── test_patients.txt

4. Adjust the working directory path (one line)

Open run_pipeline.py, find the BASE_DIR constant near the top, and set it to your working directory:

BASE_DIR = '/path/to/arc-pipeline-demo/'

(Trailing slash matters.) See Configuration for the full list of paths.

5. Install dependencies

pip install pandas pypdf openpyxl

6. Run

python run_pipeline.py

Expected output (truncated):

[12:00:00] ALS TDI ETL Pipeline -- starting
[12:00:00] Mapping CSV: uuid_mapping.csv (1 rows)
[12:00:00]   doc->patient entries: 1
[12:00:00] STAGE 1 -- Decoding & reassembly
[12:00:01]   Wrote 1 files to ccda_assembled/
[12:00:01]   Wrote 1 files to fhir_assembled/
[12:00:01] STAGE 2+3 -- Parsing CCDA and FHIR
[12:00:01]   Format counts: {'ccda_xml': 1}
[12:00:01]   Patients (post-mapping fill): 1
[12:00:01]     problems: 1
[12:00:01]     medications: 0
[12:00:01]     ...
[12:00:01] STAGE 4 -- Assembling bundle
[12:00:01] STAGE 5 -- Enriching display names
[12:00:01]   Enriched 0 display names
[12:00:01] STAGE 6 -- Filtering test patients
[12:00:01]   Total active exclusion rules: 0
[12:00:01]   Removed 0 test patients
[12:00:01] Pipeline complete.

7. View the results

Open dashboard.html in any modern browser. Click the file picker and select dashboard_data.json. You'll see one demo patient with their problem, medication, encounter, and CCDA section narrative.

You can now scale up to your real data by replacing the sample chunked CSVs with output from the Databricks export.