Skip to content

Add MIMIC-III TLS Dataset, IHM Task, and StepwiseEmbedding Model#1057

Open
akshadpai wants to merge 3 commits intosunlabuiuc:masterfrom
akshadpai:full-pipeline-4-15
Open

Add MIMIC-III TLS Dataset, IHM Task, and StepwiseEmbedding Model#1057
akshadpai wants to merge 3 commits intosunlabuiuc:masterfrom
akshadpai:full-pipeline-4-15

Conversation

@akshadpai
Copy link
Copy Markdown

@akshadpai akshadpai commented Apr 21, 2026

Pull Request Description

Contributors

  • Akshad Pai, NetID: avpai2
  • Matthew Ruth, NetID: mrruth2

Type of Contribution

Type: Full Pipeline (Dataset + Task + Model)

  • Dataset
  • Task
  • Model
  • Example ablation pipeline
  • Tests
  • Documentation

Original Paper

This contribution reproduces part of the pipeline from:

On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series
Kuznetsova et al., 2023
Paper link: https://arxiv.org/abs/2311.08902

High-Level Description

This PR implements a StepwiseEmbedding pipeline in PyHealth for TLS-preprocessed MIMIC-III time-series, targeting in-hospital mortality prediction.

Specifically, this work reproduces the step-wise embedding component of the paper, including grouped feature embeddings and ablation comparisons between direct and grouped representations.

The contribution includes:

  • MIMIC3TLSDataset, a dataset adapter for TLS-style hourly MIMIC-III time-series with 42 features per timestep
  • InHospitalMortalityTLS, a binary in-hospital mortality task that converts per-stay timestep events into dense (T, F) time-series samples
  • StepwiseEmbedding, a PyHealth BaseModel implementation supporting direct embedding baselines and grouped step-wise embedding variants
  • Feature grouping metadata for organ-system and variable-type grouping
  • A utility to convert TLS HDF5 outputs such as Standard_scaled.h5 into PyHealth-compatible timeseries.csv
  • An example script demonstrating the full dataset/task/model pipeline and ablations
  • Synthetic unit tests for the dataset/task and model
  • API documentation and docs index updates

The paper-scale data path expects MIMIC-III to be preprocessed externally by the TLS pipeline into HDF5 format. This PR converts that HDF5 output into the flat timeseries.csv format consumed by MIMIC3TLSDataset.

Reproduction / Example Usage

The main example is:

python examples/mimic3tls_ihm_stepwiseembedding.py \
  --data_root /path/to/tls_csv_dir \
  --n_epochs 5 \
  --batch_size 32 \
  --hidden_dim 64

For a lightweight smoke test with synthetic data:

python examples/mimic3tls_ihm_stepwiseembedding.py \
  --synthetic \
  --n_epochs 1 \
  --batch_size 16 \
  --hidden_dim 16

The example evaluates:

  • Task ablation: 48-hour vs 24-hour observation windows
  • Input/grouping ablation: direct features vs variable-type groups vs organ-system groups
  • Model ablation: backbone-only, linear, MLP, direct FTT, type-grouped FTT, and organ-grouped FTT

File Guide

Dataset

  • pyhealth/datasets/mimic3_tls.py

    • Adds MIMIC3TLSDataset
    • Defines the 42 TLS feature names
    • Defines organ-system and variable-type feature groupings
    • Includes export_h5_to_csv() for converting TLS HDF5 output to timeseries.csv
  • pyhealth/datasets/configs/mimic3_tls.yaml

    • Defines how the flat timeseries.csv table is parsed by PyHealth
  • pyhealth/datasets/__init__.py

    • Exports MIMIC3TLSDataset

Task

  • pyhealth/tasks/ihm_tls.py

    • Adds InHospitalMortalityTLS
    • Handles timestep sorting, observation window truncation, feature extraction, and label validation
  • pyhealth/tasks/__init__.py

    • Exports InHospitalMortalityTLS

Model

  • pyhealth/models/stepwise_embedding.py

    • Adds StepwiseEmbedding and supporting embedding layers
    • Supports linear, MLP, FTT-style, and grouped FTT-style embeddings
  • pyhealth/models/__init__.py

    • Exports StepwiseEmbedding and StepwiseEmbeddingLayer

Example

  • examples/mimic3tls_ihm_stepwiseembedding.py

    • Demonstrates the end-to-end dataset/task/model pipeline
    • Includes synthetic mode for lightweight execution
    • Runs ablations and reports AUROC/AUPRC

Tests

  • tests/core/test_mimic3_tls.py

    • Tests dataset constants, feature groupings, and task behavior using synthetic data
  • tests/core/test_stepwise_embedding.py

    • Tests model initialization, forward pass, output shapes, gradients, and grouped configurations

Tests

Run:

python -m pytest tests/core/test_mimic3_tls.py tests/core/test_stepwise_embedding.py

Latest local result:

45 passed (0 failed)

The tests are lightweight and use only synthetic/pseudo data.

Notes for Reviewers

  • This PR does not reimplement the raw MIMIC → TLS preprocessing pipeline; it focuses on the PyHealth-compatible portion after TLS preprocessing
  • Real usage expects a TLS HDF5 file such as Standard_scaled.h5
  • The dataset utility converts this artifact into timeseries.csv
  • The example script includes a synthetic mode so the full pipeline can be executed without restricted MIMIC-III access

akshadpai and others added 3 commits April 15, 2026 21:32
* test: exercise InHospitalMortalityTLS on synthetic Patient events

Add TestInHospitalMortalityTLSCall for timestamp sort, observation
window truncation, IHM label after sort, NaN and non-numeric feature
coercion, feature_subset, and empty or invalid cases.

Include PR_FULL_PIPELINE_FOLLOWUP.md with rubric-style contributor
and paper text (avpai2, mrruth2) for GitHub PR description.

Made-with: Cursor

* docs: add contributor headers (names, NetIDs, paper, description)

Add course-required top-of-file metadata for avpai2/mrruth2 across TLS
dataset, IHM task, StepwiseEmbedding model, example, tests, and YAML
config. Remove duplicate paper line and unused os import in example.

Made-with: Cursor

* test: shrink synthetic fixtures for faster unit tests

Reduce synthetic patient counts and sequence lengths in TLS and
StepwiseEmbedding tests to keep fixtures tiny (2-3 patients, short
in-memory tensors) while preserving behavior coverage.

Made-with: Cursor

* chore: remove PR_FULL_PIPELINE_FOLLOWUP helper doc

Made-with: Cursor
@akshadpai akshadpai marked this pull request as ready for review April 21, 2026 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants