PTB-XL, ECG-QA datasets and preprocess, resampling tasks#1051
Open
yiyunw3 wants to merge 37 commits intosunlabuiuc:masterfrom
Open
PTB-XL, ECG-QA datasets and preprocess, resampling tasks#1051yiyunw3 wants to merge 37 commits intosunlabuiuc:masterfrom
yiyunw3 wants to merge 37 commits intosunlabuiuc:masterfrom
Conversation
add: ecg-qa dataset
feat: implement ECG-QA dataset download capability and testing
Add: PTBXLDataset for PyHealth
use BaseDataset
modify task to do resampling
add: ECGQA example and update datasets and tasks
Rename files and task classes for consistency
|
🎉 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contributor: Jovian Wang (jovianw2@illinois.edu), Matthew Pham (mdpham2@illinois.edu), Yiyun Wang (yiyunw3@illinois.edu)
Contribution Type: Dataset + task
Original paper: https://arxiv.org/abs/2410.14464
Original datasets:
Description
This PR includes a dataset + task contribution.
We added two new PyHealth datasets for PTB-XL and ECG-QA data along with preprocess task, resampling task, and an ECG-QA example to show the possible usages of the datasets.
Our main goal is to reproduce and extend the multimodal meta-learning framework for few-shot ECG question answering as mentioned in the paper by exploring how including more patient information like age and gender in the ECG questions would help improving the overall accuracy of the output.
It's also very beneficial to add the two datasets being used during the process to PyHealth as they haven't previously been included in PyHealth and it would reduce a lot of the complexities for reproduction with the help of PyHealth features.
Files to Review
Datasets
ecgqa.py(ecgqa.yaml)ptbxl.py(ptbxl.yaml)The following features applies to both datasets:
Tasks
ptbxl_resampling.pyecgqa_preprocess.pyThe
ptbxl_resamplingtask is designed to standardize PTB-XL data for the FSL ECG QA model. The task uses Fourier-based interpolation (scipy.signal.resample) to downsample 12-lead ECG signals from 500Hz to 250Hz, effectively transforming the data shape from (12 x 5000) to (12 x 2500) while preserving morphological integrity. Additionally, the task output is formatted for multi-label classification to support the clinical reality of patients having multiple, co-occurring cardiac diagnoses.The
ecg_preprocesstask optionally joins QA dataset with an ECG signal dataset (like PTB-XL) on patient_id, creating a combined output for efficient training few-shot training. It also generates a key for episodic sampling.The output of the two tasks can then be very easily fed into the existing training pipelines for the framework of the few-shot ECG question answering that we are interested in.
Example
ecgqa_fsl.pyThis task runs through the full preprocessing pipeline combining the ECG signals from PTB-XL with the questions and answers from the ECG-QA dataset.
The workflow is:
Unit tests
test_ecgqa.pytest_ptbxl.py