Device & equipment extraction¶

device_extraction.py walks the bundle that run_pipeline.py produces and finds mentions of medical devices and durable medical equipment (DME) — both structured codes (HCPCS Level II, SNOMED CT procedure codes, CPT-4) and unstructured mentions (regex against CCDA section narratives and decoded note text). It emits two CSVs that share a patient_id column so you can join them.

Equipment matters disproportionately for motor neuron disease and other progressive neurologic disease: speech-generating devices, BiPAP / non-invasive ventilation, cough-assist, hospital beds, power wheelchairs, feeding tubes, and all of the OT / PT / SLP referrals that come with them. Most of that information rarely lives in discrete code fields; even when DME is coded, registries often miss it because their pipeline focuses on diagnoses, labs, and medications rather than HCPCS Level II. This module covers both surfaces.

The module is best characterized as an equipment-identification ontology combining billing, procedural, terminology, and NLP concepts — not a pure code lookup and not a pure NLP extractor. Every output row makes its provenance explicit so downstream phenotyping can decide which surface to trust.

Coverage¶

The module ships a hand-curated mapping table and a regex pattern set covering the equipment classes most relevant to neurodegenerative disease:

Category	Structured codes (HCPCS / SNOMED / CPT)	Regex patterns (free text)
AAC / speech-generating devices	E2500–E2512, CPT 92605–92609, CPT 97755 (assistive-tech assessment)	"AAC device", "speech-generating device", "SGD", "speech device", "communication device", "Tobii Dynavox", "Prentke-Romich / PRC", "Accent", "NovaChat", "Lingraphica", "TouchChat", "Proloquo", "TD Pilot", "Grid Pad", "Smartbox", "eye-gaze", "eye-tracking", "communication board", "letter board", "ETRAN", "voice banking", "message banking", "ModelTalker", "VocaliD"
Wheelchairs (manual)	K0001–K0009 (canonical set; the broader E1037–E1170 range previously listed was dropped because it mixes pediatric and custom-seating items)	"wheelchair", "manual w/c", "transport chair"
Wheelchairs (power)	E1230, E1231–E1234, K0813–K0890, CPT 97542	"power wheelchair", "motorized wheelchair", "PWC", "tilt-in-space", "wheelchair tilt / recline", "sip-and-puff", "head array"
Pressure-relief seating cushions	E2601–E2608, E2624, E2625	(covered by "cushion" in the wheelchair context)
Standers	E0637, E0638, E0642	"stander", "standing frame", "sit-to-stand"
Mobility aids	E0100–E0148	"walker", "rollator", "quad cane", "crutches"
Transfer aids	E0635, E0639, E0641	"Hoyer lift", "patient lift", "mechanical lift", "ceiling lift", "transfer board / belt / sling / pole", "gait belt", "Sara Stedy", "Sabina"
Bath / toileting safety	E0163, E0240, E0241, E0244, E0247	"shower chair", "tub bench", "commode", "raised toilet seat", "grab bars"
Beds / mattresses	E0250–E0256, E0260, E0261, E0277, E0297 (E0181 moved out — it is a positioning cushion, not a bed)	"hospital bed", "alternating-pressure mattress", "low-air-loss mattress", "pressure-reducing mattress"
Respiratory equipment (high relevance for motor neuron disease)	E0470, E0471, E0472, E0464, E0465, E0466, E0481, E0482, E0483, E0500, E0600, E1390–E1392	"BiPAP", "BPAP", "bi-level", "AVAPS", "CPAP", "NIV", "NPPV", "NIPPV", "noninvasive ventilation", "sip ventilation", "home ventilator", "tracheostomy ventilator", "Trilogy", "Astral", "Vivo", "VOCSN", "Luisa", "LTV", "cough assist", "MI-E", "CoughAssist", "VitalCough", "Comfort Cough", "HFCWO", "high-frequency chest wall oscillation", "The Vest", "InCourage", "SmartVest", "AffloVest", "suction machine", "Yankauer", "oxygen concentrator", "HomeFill", "Inogen", "SimplyGo", "Eclipse", "portable oxygen", "home oxygen"
Feeding equipment	B9002, B4034–B4036 (supply kits), B4081–B4088 (tubes and accessories)	"PEG tube", "G-tube", "GJ-tube", "NG tube", "gastrostomy", "jejunostomy", "feeding pump", "enteral pump", "Kangaroo", "Joey pump", "Infinity pump", "EnteraLite", "Flexiflo", "enteral feeds / tube feeds / bolus feeds", "continuous feeds", "gravity feeds"
Orthotic / bracing	CPT 97760, 97761, 97763	"AFO", "KAFO", "HKAFO", "ankle-foot orthosis", "cervical collar", "neck brace", "head support collar", "wrist splint", "hand splint", "resting hand splint"
Environmental control / assistive tech	(no dedicated codes)	"environmental control unit", "ECU", "smart-home accessibility", "voice-controlled environment", "switch access"
Home modifications	(no dedicated codes)	"wheelchair ramp", "stairlift", "chair lift", "accessible bathroom", "grab bar installation", "home modification"
Equipment referrals	(no dedicated codes — captured by regex)	"DME order / referral / prescription", "OT eval for equipment", "PT consult for wheelchair", "SLP eval for AAC", "wheelchair fitting", "seating clinic", "home safety eval"

The dicts and pattern list at the top of the module are the right place to extend coverage; everything is plain Python with no dependencies beyond the standard library.

A note on brand-name detection¶

Brand-name patterns are kept alongside generic patterns. Real EHR narrative routinely names the device by brand ("patient uses a Trilogy 100 overnight", "trial of Tobii Dynavox", "Kangaroo pump with bolus feeds"), so collapsing everything to generic terms would lose recall. Both surfaces feed the same output; downstream phenotyping decides whether to roll brand variants up to a generic equipment class (e.g. all ventilator_brand matches under "Home ventilator").

A note on SNOMED CT¶

SNOMED CT concept IDs in this module are seed values. Before production clinical use, validate against the current SNOMED CT release for active status, preferred term, descendants, and whether you want device concepts versus procedure concepts. Hierarchies vary by implementation, and concepts are sometimes inactive or replaced. The module's source notes which SNOMED IDs are flagged for validation.

A note on the structured-versus-NLP split¶

HCPCS = equipment / supplies billing codes. CPT = clinician services / evaluations. SNOMED CT = clinical terminology. Regex = NLP extraction. The four are intentionally mixed because all four surfaces carry equipment information in real EHR data, but they do not mean the same thing — a CPT 97542 ("wheelchair management / training") tells you that a clinician spent time on wheelchair-related care, while HCPCS K0848 tells you a specific power wheelchair was ordered. Downstream phenotyping should treat the four surfaces as separate evidence rather than collapsing them.

A note on calibration¶

This module's HCPCS ranges have been tightened from previous iterations based on a structured critique of the original mappings. For high-sensitivity research screening, the current set is a reasonable starting point. For production claims phenotyping or registry abstraction, expect to do further codebook refinement against your local data — particularly around SNOMED concept validation and the manual-wheelchair classes.

Outputs¶

Two side-by-side CSVs, both UTF-8 with BOM (Excel-friendly), all cells quoted, sorted by patient_id then source.

`device_codes.csv`¶

One row per coded record in the bundle whose (vocabulary, code) matches a known equipment code. Columns:

Column	Description
`patient_id`	Patient identifier from `dashboard_data.json`
`source_kind`	Bundle category the record came from (`procedures`, `medications`, `care_plans`, etc.)
`source_id`	Record-level ID for traceability
`code_system`	Vocabulary the code is drawn from (`HCPCS`, `SNOMED-CT`, `CPT-4`)
`code`	The code itself
`recorded_display_name`	The display name the EHR provided (often empty for HCPCS)
`forge_label`	The label the module assigns from its mapping table
`category`	Equipment category (e.g. `wheelchair_power`, `respiratory`, `aac_device`)
`effective_date`	Record's effective date or start date
`status`	Record status if present

`device_extractions.csv`¶

One row per regex match in CCDA section narratives, decoded note text, and diagnostic-report narratives. Columns:

Column	Description
`patient_id`	Patient identifier
`source_kind`	`ccda_section`, `note`, or `diagnostic_report`
`source_id`	Document or section identifier for traceability
`pattern`	Internal name of the regex pattern that matched
`category`	Equipment category
`value`	The exact text the regex matched
`snippet`	±60 characters of surrounding context, for spot-review
`char_offset`	Character position of the match in the source text
`description`	Human-readable description of the pattern

The two files together let you build per-patient equipment summaries — flag a patient if either file has any row for them, count by category to track equipment uptake over time, or feed both as columns into the patient master CSV for a one-row-per-patient view.

Running it¶

From the command line, sitting in a directory with dashboard_data.json:

python device_extraction.py

From Python:

import device_extraction
device_extraction.main(
    bundle_path = './dashboard_data.json',
    out_root    = './',
)

In Colab, after the main pipeline cell:

import sys, importlib

WORK = '/content/work'
DRIVE = '/content/drive/MyDrive/ALS_TDI_complete_FINAL_PIPELINE'

if WORK not in sys.path: sys.path.insert(0, WORK)
for m in [k for k in list(sys.modules) if k == 'device_extraction']:
    del sys.modules[m]
import device_extraction

device_extraction.main(
    bundle_path = f'{WORK}/dashboard_data.json',
    out_root    = WORK,
)

# Copy the two CSVs back to Drive
import shutil
for name in ('device_codes.csv', 'device_extractions.csv'):
    shutil.copy(f'{WORK}/{name}', f'{DRIVE}/{name}')

What this is and isn't¶

Is: a focused, dependency-free module that complements the ALS-specific note extraction with everything an ALS or progressive-neurologic registry needs to track around adaptive equipment, respiratory equipment, feeding equipment, and the referral pathways that bring them in.
Is: designed for joinability — same patient_id key as patient_master.csv, note_extractions.csv, and the per-patient Phenopacket JSONs, so equipment status can be cross-referenced with diagnoses, ALSFRS-R scores, and milestone dates.
Is: explicit about what it captures versus what it doesn't — every regex pattern carries a description string so reviewers can audit the extraction rationale, and every code maps to a category for downstream grouping.
Isn't: a replacement for a clinical equipment-tracking workflow. The patterns are seed patterns; site-specific tuning against real narrative samples is expected.
Isn't: an attempt to capture supplies (catheters, dressings, batteries) that aren't the device itself. Scope is intentionally limited to the equipment that affects functional independence.