Skip to content

PR from PyHealth_for_labrador#1048

Open
Haoming0161 wants to merge 44 commits intosunlabuiuc:masterfrom
coconight01:master
Open

PR from PyHealth_for_labrador#1048
Haoming0161 wants to merge 44 commits intosunlabuiuc:masterfrom
coconight01:master

Conversation

@Haoming0161
Copy link
Copy Markdown

Motivation
Reproduce and extend the Labrador paper in PyHealth as a Model contribution (Option 2).

Provide a clear, review-friendly implementation with complete docs, index registration, runnable ablation example, and fast synthetic tests.

Ensure contribution quality aligns with course rubric (implementation, documentation, code style, testing, and PR formatting).

Contributors (Group)
Ying Liang (ying24)

Haoming Qin (hqin11)

Yuhan Ding (yuhand7)

Type of Contribution
Option 2: Model contribution

Original Paper
Bellamy et al., Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data (ML4H 2024)

Link: https://arxiv.org/abs/2312.11502

High-level Description
Added LabradorModel (inherits BaseModel) with:

joint lab-code/value embedding

transformer encoder

optional classifier head

optional MLM head for categorical + continuous predictions

masking-aware MLM losses (categorical_mlm_loss, continuous_mlm_loss)

Added module-level and class-level documentation with usage context and paper citation.

Added compact ablation example using synthetic data and hyperparameter variations.

Added/updated fast synthetic unit tests for model instantiation, forward behavior, output shapes, loss computation, padding edge case, and gradient flow.

File Guide (What to Review)
Core model

pyhealth/models/labrador.py

Model API docs

docs/api/models/pyhealth.models.Labrador.rst

Docs index update

docs/api/models.rst

Example / ablation

examples/mimic4_mortality_labrador.py

Tests

tests/test_labrador.py

Testing
pytest -q tests/test_labrador.py --durations=10

python -m py_compile pyhealth/models/labrador.py tests/test_labrador.py examples/mimic4_mortality_labrador.py

Result: all Labrador tests pass; per-test runtime remains very small (milliseconds-level for test-call time).

Known Limits
Example ablation uses synthetic data for fast/reproducible CI-style execution; it is a demonstration script, not a full real-dataset benchmark pipeline.

Total pytest wall-clock includes startup/import overhead; individual test calls remain fast.

coconight01 and others added 30 commits April 9, 2026 03:51
…-implementation

Improve Labrador model tests with synthetic fixtures
Improve Labrador model tests with synthetic fixtures
…-implementation-tc2cpe

Add MLM head and losses to LabradorModel, make classifier optional, and expand tests
…-implementation-jyoaeb

Labrador: add joint value embedding, MLM head, losses, model refactor and tests
Haoming0161 and others added 14 commits April 17, 2026 14:39
Align Labrador tests & model
…-implementation-drlpou

Motivation
This PR contributes a PyHealth adaptation of the Labrador architecture for lab-centric modeling.
The goal is to provide a review-friendly model contribution with complete required components: model implementation, API docs, index registration, ablation example, and fast synthetic tests.

Paper reference:
Bellamy et al., Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data (ML4H 2024): https://arxiv.org/abs/2312.11502. 

Required Files (Course Checklist)
1) Core implementation (model)
pyhealth/models/labrador.py: implements LabradorModel, LabradorEmbedding, LabradorValueEmbedding, and LabradorMLMHead with optional classifier/MLM heads and masking-aware losses. 

2) API docs
docs/api/models/pyhealth.models.Labrador.rst: model doc page with paper link, main model docs, supporting modules, and usage pointer. 

3) Index update
docs/api/models.rst: added Labrador to the models toctree for discoverability. 

4) Example / ablation script
examples/mimic4_mortality_labrador.py: synthetic fast ablation-style example varying hidden_dim, num_layers, and dropout. 

5) Tests
tests/test_labrador.py: synthetic unit tests for embedding outputs, validation errors, MLM head outputs/losses, forward behavior, all-padding stability, and gradient flow. 

Example
Run the ablation-style example:

python examples/mimic4_mortality_labrador.py
What it does:

Builds synthetic lab sequences and binary labels.

Splits train/validation sets.

Trains/evaluates multiple Labrador hyperparameter configs quickly. 

Tests
Executed commands:

pytest -q tests/test_labrador.py --durations=10
python -m py_compile pyhealth/models/labrador.py examples/mimic4_mortality_labrador.py
Observed:

tests/test_labrador.py: all tests passed.

Per-case timings are in milliseconds range (fast unit-test style).

Compile checks passed. 

Known Limits
The example script is intentionally synthetic for speed and reproducibility; it is not intended as a full real-dataset benchmark. 

Total pytest wall-clock time includes Python/pytest startup and module-import overhead; per-testcase execution remains very fast.
…-implementation-dr4v5o

Implement Labrador model with MLM/value heads, docs, example, and tests
Polish Labrador model docs and add structured ablation example
…-for-model-implementation-ghfhpt

Add Labrador model (embeddings, MLM head), docs, example, and tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants