Clinical-note extraction dashboard¶
Preview — under active development. This module is still being refined. Patterns, mappings, and category boundaries should be validated against your own corpus before being relied on for analysis or publication.
note_dashboard.py is the visualization layer for
note_extraction.py. It reads
note_extractions.csv and emits a single, self-contained HTML report
(no external dependencies) showing the ALS-specific clinical content
captured from unstructured narrative — functional rating scores, FVC %
predicted, diagnostic certainty, family history, genetic mutations, FTD
spectrum mentions, and dated treatment milestones — with full
privacy controls.
Live demo¶
Input¶
| File | Description |
|---|---|
note_extractions.csv |
Regex matches against notes[].narrative_text and documents[].plain_text for ALSFRS-R total + 4 subdomains, ECAS / FTD spectrum, El Escorial certainty, onset region, family history (negative / positive / genetic mutation), FVC % predicted, and dated treatment milestones (PEG / tracheostomy / NIV / riluzole / edaravone start). |
Produced by note_extraction.py against the same
bundle the other modules consume.
Output¶
A single HTML file — note_dashboard.html — with six tabs:
1. Overview by category¶
Patient counts and total record counts grouped into six clinical-content categories: ALSFRS-R, Pulmonary, Diagnosis, Family history & genetics, ECAS / FTD, and Treatment milestones.
2. All patterns¶
Filterable per-pattern table with patient counts. Filter by pattern name (e.g. "fvc", "alsfrs", "el_escorial") or category.
3. Patient × pattern matrix¶
Heatmap-style grid showing PT-NNNN-pseudonymized patients × pattern. Useful for finding patients with complete vs. partial documentation profiles.
4. Captured-value distribution¶
- Numeric patterns (ALSFRS-R total + subdomains, ECAS total + ALS-specific, FVC % predicted): shows captured-value count, min, median, and max
- Categorical / freetext patterns (El Escorial certainty, onset region, genetic mutation): shows the top distinct values and their record counts
This is the most useful tab for pattern tuning — if you see strange captured values, the pattern needs site-specific refinement before those values can be admitted to analysis.
5. Source-record snippets¶
Up to 3 representative excerpts per pattern, prioritizing unique-patient diversity. Each snippet shows pseudonym, source kind, captured value, and the surrounding 200 characters of narrative for verification.
6. About¶
Methodology, privacy controls, and scope-of-use caveats.
Privacy controls (baked in)¶
- Patient identifiers replaced with PT-NNNN pseudonyms, stable within one run
- Snippet text truncated to 200 characters
- Captured values truncated to 60 characters
- Patterns with fewer than k=2 unique patients are suppressed entirely
- Resource UUIDs are never emitted
Usage¶
import note_dashboard
note_dashboard.main(
extractions_csv_path = './note_extractions.csv',
out_path = './note_dashboard.html',
cohort_name = 'Your cohort name',
k = 2, # k-anonymity threshold
)
Scope and limitations¶
The patterns shipped with note_extraction.py are
seed patterns calibrated against a single registry (ARC). Adopters
should validate every pattern against their own narrative corpus before
using captured values for analysis; site-specific phrasing conventions,
EHR-vendor template structures, and individual-clinician dictation
patterns vary substantially. Treat this dashboard as a
chart-review-preparation and pattern-tuning aid; do not admit numeric
values (ALSFRS-R, FVC%, ECAS) into downstream analysis without
validation against the patient's structured measurement record or the
original note.
Visual design¶
Like the other Registry Forge demos, the dashboard is a single self-contained HTML file with no external scripts and no network calls at view time. The header uses a purple → indigo → navy gradient to visually distinguish it from the device dashboard (navy → purple) and the exposure dashboard (teal → navy → purple).
Related¶
- Note extraction — module that produces the CSV this dashboard reads
- Device dashboard — companion privacy-safe dashboard for clinical equipment / ALS-care-indicator data
- Exposure dashboard — companion dashboard for environmental, occupational, and toxic exposures
- Cohort EDA report — broader cohort exploration for demographics, comorbidities, and code-vocabulary distribution