Add MIMIC-III TLS Dataset, IHM Task, and StepwiseEmbedding Model#1057
Open
akshadpai wants to merge 3 commits intosunlabuiuc:masterfrom
Open
Add MIMIC-III TLS Dataset, IHM Task, and StepwiseEmbedding Model#1057akshadpai wants to merge 3 commits intosunlabuiuc:masterfrom
akshadpai wants to merge 3 commits intosunlabuiuc:masterfrom
Conversation
* test: exercise InHospitalMortalityTLS on synthetic Patient events Add TestInHospitalMortalityTLSCall for timestamp sort, observation window truncation, IHM label after sort, NaN and non-numeric feature coercion, feature_subset, and empty or invalid cases. Include PR_FULL_PIPELINE_FOLLOWUP.md with rubric-style contributor and paper text (avpai2, mrruth2) for GitHub PR description. Made-with: Cursor * docs: add contributor headers (names, NetIDs, paper, description) Add course-required top-of-file metadata for avpai2/mrruth2 across TLS dataset, IHM task, StepwiseEmbedding model, example, tests, and YAML config. Remove duplicate paper line and unused os import in example. Made-with: Cursor * test: shrink synthetic fixtures for faster unit tests Reduce synthetic patient counts and sequence lengths in TLS and StepwiseEmbedding tests to keep fixtures tiny (2-3 patients, short in-memory tensors) while preserving behavior coverage. Made-with: Cursor * chore: remove PR_FULL_PIPELINE_FOLLOWUP helper doc Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
Contributors
avpai2mrruth2Type of Contribution
Type: Full Pipeline (Dataset + Task + Model)
Original Paper
This contribution reproduces part of the pipeline from:
On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series
Kuznetsova et al., 2023
Paper link: https://arxiv.org/abs/2311.08902
High-Level Description
This PR implements a StepwiseEmbedding pipeline in PyHealth for TLS-preprocessed MIMIC-III time-series, targeting in-hospital mortality prediction.
Specifically, this work reproduces the step-wise embedding component of the paper, including grouped feature embeddings and ablation comparisons between direct and grouped representations.
The contribution includes:
MIMIC3TLSDataset, a dataset adapter for TLS-style hourly MIMIC-III time-series with 42 features per timestepInHospitalMortalityTLS, a binary in-hospital mortality task that converts per-stay timestep events into dense(T, F)time-series samplesStepwiseEmbedding, a PyHealthBaseModelimplementation supporting direct embedding baselines and grouped step-wise embedding variantsStandard_scaled.h5into PyHealth-compatibletimeseries.csvThe paper-scale data path expects MIMIC-III to be preprocessed externally by the TLS pipeline into HDF5 format. This PR converts that HDF5 output into the flat
timeseries.csvformat consumed byMIMIC3TLSDataset.Reproduction / Example Usage
The main example is:
For a lightweight smoke test with synthetic data:
The example evaluates:
File Guide
Dataset
pyhealth/datasets/mimic3_tls.pyMIMIC3TLSDatasetexport_h5_to_csv()for converting TLS HDF5 output totimeseries.csvpyhealth/datasets/configs/mimic3_tls.yamltimeseries.csvtable is parsed by PyHealthpyhealth/datasets/__init__.pyMIMIC3TLSDatasetTask
pyhealth/tasks/ihm_tls.pyInHospitalMortalityTLSpyhealth/tasks/__init__.pyInHospitalMortalityTLSModel
pyhealth/models/stepwise_embedding.pyStepwiseEmbeddingand supporting embedding layerspyhealth/models/__init__.pyStepwiseEmbeddingandStepwiseEmbeddingLayerExample
examples/mimic3tls_ihm_stepwiseembedding.pyTests
tests/core/test_mimic3_tls.pytests/core/test_stepwise_embedding.pyTests
Run:
Latest local result:
The tests are lightweight and use only synthetic/pseudo data.
Notes for Reviewers
Standard_scaled.h5timeseries.csv