Skip to content

Clinical-note extraction dashboard

Preview — under active development. This module is still being refined. Patterns, mappings, and category boundaries should be validated against your own corpus before being relied on for analysis or publication.

note_dashboard.py is the visualization layer for note_extraction.py. It reads note_extractions.csv and emits a single, self-contained HTML report (no external dependencies) showing the ALS-specific clinical content captured from unstructured narrative — functional rating scores, FVC % predicted, diagnostic certainty, family history, genetic mutations, FTD spectrum mentions, and dated treatment milestones — with full privacy controls.

Live demo

Live demo — pre-loaded against a synthetic ALS cohort Open in new tab

Input

File Description
note_extractions.csv Regex matches against notes[].narrative_text and documents[].plain_text for ALSFRS-R total + 4 subdomains, ECAS / FTD spectrum, El Escorial certainty, onset region, family history (negative / positive / genetic mutation), FVC % predicted, and dated treatment milestones (PEG / tracheostomy / NIV / riluzole / edaravone start).

Produced by note_extraction.py against the same bundle the other modules consume.

Output

A single HTML file — note_dashboard.html — with six tabs:

1. Overview by category

Patient counts and total record counts grouped into six clinical-content categories: ALSFRS-R, Pulmonary, Diagnosis, Family history & genetics, ECAS / FTD, and Treatment milestones.

2. All patterns

Filterable per-pattern table with patient counts. Filter by pattern name (e.g. "fvc", "alsfrs", "el_escorial") or category.

3. Patient × pattern matrix

Heatmap-style grid showing PT-NNNN-pseudonymized patients × pattern. Useful for finding patients with complete vs. partial documentation profiles.

4. Captured-value distribution

  • Numeric patterns (ALSFRS-R total + subdomains, ECAS total + ALS-specific, FVC % predicted): shows captured-value count, min, median, and max
  • Categorical / freetext patterns (El Escorial certainty, onset region, genetic mutation): shows the top distinct values and their record counts

This is the most useful tab for pattern tuning — if you see strange captured values, the pattern needs site-specific refinement before those values can be admitted to analysis.

5. Source-record snippets

Up to 3 representative excerpts per pattern, prioritizing unique-patient diversity. Each snippet shows pseudonym, source kind, captured value, and the surrounding 200 characters of narrative for verification.

6. About

Methodology, privacy controls, and scope-of-use caveats.

Privacy controls (baked in)

  • Patient identifiers replaced with PT-NNNN pseudonyms, stable within one run
  • Snippet text truncated to 200 characters
  • Captured values truncated to 60 characters
  • Patterns with fewer than k=2 unique patients are suppressed entirely
  • Resource UUIDs are never emitted

Usage

import note_dashboard

note_dashboard.main(
    extractions_csv_path = './note_extractions.csv',
    out_path             = './note_dashboard.html',
    cohort_name          = 'Your cohort name',
    k                    = 2,           # k-anonymity threshold
)

Scope and limitations

The patterns shipped with note_extraction.py are seed patterns calibrated against a single registry (ARC). Adopters should validate every pattern against their own narrative corpus before using captured values for analysis; site-specific phrasing conventions, EHR-vendor template structures, and individual-clinician dictation patterns vary substantially. Treat this dashboard as a chart-review-preparation and pattern-tuning aid; do not admit numeric values (ALSFRS-R, FVC%, ECAS) into downstream analysis without validation against the patient's structured measurement record or the original note.

Visual design

Like the other Registry Forge demos, the dashboard is a single self-contained HTML file with no external scripts and no network calls at view time. The header uses a purple → indigo → navy gradient to visually distinguish it from the device dashboard (navy → purple) and the exposure dashboard (teal → navy → purple).

  • Note extraction — module that produces the CSV this dashboard reads
  • Device dashboard — companion privacy-safe dashboard for clinical equipment / ALS-care-indicator data
  • Exposure dashboard — companion dashboard for environmental, occupational, and toxic exposures
  • Cohort EDA report — broader cohort exploration for demographics, comorbidities, and code-vocabulary distribution