Add CCEP ECoG dataset, Localize SOZ task, and SPES models by mkoretsky1 · Pull Request #1027 · sunlabuiuc/PyHealth

mkoretsky1 · 2026-04-19T20:39:49Z

Contributors

Mathew Koretsky (NetID: mathewk4) (mathewk4@illinois.edu)
Kevin Splinter (NetID: kevints5) (kevints5@illinois.edu)
Darian Tavana (NetID: dtavana2) (dtavana2@illinois.edu)

Type of Contribution

Full Pipeline: Dataset + Task + Model (Option 4)

Original Paper

Norris, J.; Chari, A.; van Blooijs, D.; Cooray, G. K.; Friston, K.; Tisdall, M. M.; and Rosch, R. E. 2024. Localising the Seizure Onset Zone from Single-Pulse Electrical Stimulation Responses with a CNN Transformer. Proceedings of the 9th Machine Learning for Healthcare Conference, volume 252 of Proceedings of Machine Learning Research. PMLR. (https://proceedings.mlr.press/v252/norris24a.html).

Description

Adds a full PyHealth pipeline for electrode-level seizure onset zone (SOZ) localization from CCEP ECoG recordings, based on Norris et al. (2024). Includes a BIDS-compatible dataset class (CCEPECoGDataset), a task class (LocalizeSOZ) that preprocesses raw stimulation EEG into paired divergent/convergent response tensors, and two models (SPESResNet, SPESTransformer) for binary SOZ classification. An end-to-end example script demonstrates 5-fold patient-level cross-validation with per-fold normalization and an ablation study isolating the contribution of trial variability and the hybrid embedding in the transformer model.

Files to Review

pyhealth/datasets/ccep_ecog.py - CCEP ECoG dataset implementation
pyhealth/tasks/localize_soz.py - Localize SOZ task implementation
pyhealth/models/spes.py - SPES models implementation
tests/core/test_ccep_ecog.py - Dataset and task test suite
tests/core/test_spes.py - Model test suite
examples/ccep_ecog_localize_soz_spes.py - End-to-end example demonstrating dataset, task, and model usage
pyproject.toml - Added requirements specific to this implementation

joshuasteier · 2026-04-19T21:25:52Z

Hi @mkoretsky1 , Kevin, and Darian, thanks for putting this together. A few items I'd like to see addressed, plus some smaller things.

Blockers

1. The Destrieux atlas .rda file may be missing.

get_destrieux_lobe in localize_soz.py does:

rda_path = Path(__file__).resolve().parent.parent / "destrieux.rda"
_destrieux_df = pyreadr.read_r(str(rda_path))["destrieux"]

That resolves to pyhealth/destrieux.rda, but I don't see the file in the PR diff. If it's not committed, the except Exception: return "Unknown" fallback silently fires on every electrode and every sample gets electrode_lobe = region_to_index["Unknown"]. Could you confirm the .rda is either added to the PR or document how users obtain it? Either way, please change the silent except to log a warning (once per process) when the file is missing so this failure mode is visible.

2. Bare except: with state leak in process_metric_for_analysis.

try:
    ...
    response_df['distances'] = distances
    response_df = response_df.sort_values(by='distances', ascending=True)
except:
    print("Error with subject", subject)

The bare except: catches KeyboardInterrupt and SystemExit, prints, then the code keeps going with response_df and distances in whatever state they were in at the point of failure. Please change to except Exception as e: with a logger.warning and either return None or re-raise — continuing after a caught exception with partial state is dangerous.

3. Metadata CSV written into the data root.

output_path = os.path.join(root, "ccep_ecog-metadata-pyhealth.csv")
df.to_csv(output_path, index=False)

BIDS datasets are often on mounted storage with restricted write permissions. Matt McKenna's fix for the same pattern in #981 shipped as:

def __init__(self, root: str = ".", config_path: Optional[str] = ..., **kwargs):
    self._verify_data(root)
    self._tmp_dir = tempfile.mkdtemp(prefix="pyhealth_ccep_ecog_")
    self._index_data(root, self._tmp_dir)
    super().__init__(
        root=self._tmp_dir,
        tables=["ecog"],
        ...
    )

def __del__(self):
    shutil.rmtree(self._tmp_dir, ignore_errors=True)

Could you mirror that pattern? _index_data writes the CSV into the tmp dir, the tmp dir becomes the root passed to super().__init__ so the YAML's file_path resolves correctly, and __del__ cleans up.

4. dataset.set_shuffle(...) will crash on real data.

In get_spes_dataloader:

def get_spes_dataloader(dataset, batch_size, shuffle=False, norm_stats=None):
    dataset.set_shuffle(shuffle)
    return DataLoader(...)

set_shuffle is only defined on InMemorySampleDataset, not on the disk-backed SampleDataset that BaseDataset.set_task(...) returns. Your example path is CCEPECoGDataset(root=...).set_task(LocalizeSOZ(), ...), which returns the disk-backed class. The first time anyone runs the example end-to-end against real ds004080 data, this will raise AttributeError: 'SampleDataset' object has no attribute 'set_shuffle'.

Simplest fix: drop the set_shuffle call and pass shuffle=shuffle directly to DataLoader(...). The in-memory shuffle hook isn't buying you anything the DataLoader can't do.

Code quality (none blocking)

process_metric_for_analysis has several commented-out np.save('../data/main/...') lines with hardcoded relative paths. Please remove.
process_run_data and process_metric_for_analysis use print(...) for status and error reporting; the dataset file uses logger. Please convert the print calls to logger.warning / logger.info for consistency.
LocalizeSOZ.__call__ iterates over ("ecog", "train", "eval") splits, but only ecog is a real table in the YAML. The dummy patient in the test deliberately returns empty for train/eval. Is iterating over those two extra splits intentional future-proofing, or vestigial? If vestigial, drop them.
25 commits with several "Update localize_soz.py" messages. Please squash to a small number of logical commits before merge (dataset, task, models, example + tests, docs is a reasonable split).
LocalizeSOZ tests mock process_run_data and process_for_analysis entirely, so the MNE/BIDS epoching and the 13 mm distance filter are not exercised in tests. A single end-to-end test using a tiny synthetic BrainVision file (maybe 4 channels, 2 stim events) would lock in the actual preprocessing behaviour. Not a blocker, but worth considering as a follow-up.

Minor

SPESResNet._sample_channels samples with replacement when len(valid_rows) < self.input_channels. A one-line comment explaining that duplicated channels are acceptable here would help future readers.
In _index_data, has_soz is set from the first electrodes.tsv found and then breaks. If a patient has multiple sessions with different labeling completeness, the single has_soz flag may not reflect the specific session on each downstream row. Low priority but worth a look.

The model implementations are clean, and the per-fold normalization is done correctly. Once the blockers are addressed I'm happy to approve.

…ation

…mple

mkoretsky1 · 2026-04-19T23:17:32Z

@joshuasteier We've fixed the blockers and made some code quality updates. When you have a chance could you please re-review the PR?

The one thing we're unsure about is on 1. Destrieux atlas .rda file. For now, we have hard-coded the brain region indexes into pyhealth/tasks/localize_soz.py. Please let us know if this is sufficient or if you would prefer to have the file itself, and if so, where that file should go in the PyHealth repo. Thanks!

joshuasteier self-requested a review April 19, 2026 21:04

dtavana and others added 6 commits April 19, 2026 18:53

Add CCEPECoGDataset with BIDS validation and metadata indexing

26a4eb7

Add LocalizeSOZ task with stimulation data processing and SOZ localiz…

bbc09a5

…ation

Update test_ccep_ecog.py

1257de0

Add SPESResNet and SPESTransformer models with CCEP ECoG training exa…

43ba28c

…mple

add rst files for model

297d426

Fixing blockers and style

1c4b6da

mkoretsky1 force-pushed the ccep_ecog branch from 263f467 to 1c4b6da Compare April 19, 2026 23:15

Fix dual import on model init

fc2b133

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CCEP ECoG dataset, Localize SOZ task, and SPES models#1027

Add CCEP ECoG dataset, Localize SOZ task, and SPES models#1027
mkoretsky1 wants to merge 7 commits intosunlabuiuc:masterfrom
dtavana:ccep_ecog

mkoretsky1 commented Apr 19, 2026

Uh oh!

joshuasteier commented Apr 19, 2026

Uh oh!

mkoretsky1 commented Apr 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mkoretsky1 commented Apr 19, 2026

Contributors

Type of Contribution

Original Paper

Description

Files to Review

Uh oh!

joshuasteier commented Apr 19, 2026

Uh oh!

mkoretsky1 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mkoretsky1 commented Apr 19, 2026 •

edited

Loading