Quickstart¶

Run the pipeline end-to-end against the included sample data in under a minute, no real EHR access required.

1. Set up a working directory¶

mkdir arc-pipeline-demo
cd arc-pipeline-demo
mkdir "CCDA and FHIR data"

2. Drop in the pipeline files¶

Copy these from the project release into the working directory:

arc-pipeline-demo/
├── run_pipeline.py
├── dashboard.html
└── test_patients.txt

3. Drop in the sample data¶

Copy the sample files from docs/assets/ (or download from the Sample data page) into the working directory:

arc-pipeline-demo/
├── CCDA and FHIR data/
│   ├── ccda_chunks.csv
│   └── fhir_chunks.csv
├── uuid_mapping.csv
├── run_pipeline.py
├── dashboard.html
└── test_patients.txt

4. Adjust the working directory path (one line)¶

Open run_pipeline.py, find the BASE_DIR constant near the top, and set it to your working directory:

BASE_DIR = '/path/to/arc-pipeline-demo/'

(Trailing slash matters.) See Configuration for the full list of paths.

5. Install dependencies¶

pip install pandas pypdf openpyxl

6. Run¶

python run_pipeline.py

Expected output (truncated):

[12:00:00] ALS TDI ETL Pipeline -- starting
[12:00:00] Mapping CSV: uuid_mapping.csv (1 rows)
[12:00:00]   doc->patient entries: 1
[12:00:00] STAGE 1 -- Decoding & reassembly
[12:00:01]   Wrote 1 files to ccda_assembled/
[12:00:01]   Wrote 1 files to fhir_assembled/
[12:00:01] STAGE 2+3 -- Parsing CCDA and FHIR
[12:00:01]   Format counts: {'ccda_xml': 1}
[12:00:01]   Patients (post-mapping fill): 1
[12:00:01]     problems: 1
[12:00:01]     medications: 0
[12:00:01]     ...
[12:00:01] STAGE 4 -- Assembling bundle
[12:00:01] STAGE 5 -- Enriching display names
[12:00:01]   Enriched 0 display names
[12:00:01] STAGE 6 -- Filtering test patients
[12:00:01]   Total active exclusion rules: 0
[12:00:01]   Removed 0 test patients
[12:00:01] Pipeline complete.

7. View the results¶

Open dashboard.html in any modern browser. Click the file picker and select dashboard_data.json. You'll see one demo patient with their problem, medication, encounter, and CCDA section narrative.

You can now scale up to your real data by replacing the sample chunked CSVs with output from the Databricks export.