-
Notifications
You must be signed in to change notification settings - Fork 555
Dev #320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* hackathon 2024 draft * Update 202410-sunlab-hackthon.md * update intro based on 10/18 meeting * add more intro * update mentor * merged commits for multimodal pr * fix small bug in Mortality30DaysMIMIC4 --------- Co-authored-by: Jimeng Sun <jimeng.sun@gmail.com>
Testing PR. Deleted the file I accidentally pushed earlier. The contribution guide seems to be correct so far.
Medical Coding Pull Request, Reviewer Notes: Medical Coding demo here: https://colab.research.google.com/drive/1ThYP_5ng5xPQwscv5XztefkkoTruhjeK?usp=sharing
1. Patient is now a sequence of event. 2. Updated Patient class to initialize with a Polars DataFrame for event management.
- Unified APIs for all modalities. - Enabled data loading based on YAML configs. - Switched to Polars backend. - Removed deprecated base_dataset, sample_dataset. - Renamed base_dataset_v2 as base_dataset. - Renamed sample_dataset_v2 as sample_dataset. - Moved padding to collate_fn. - Cleaned up unused featurizer classes.
Simplified MIMIC4Dataset class by merging loading functions Introduced a YAML configuration file for dataset management, detailing file paths and attributes for various tables.
- Renamed `TaskTemplate` to `BaseTask`. - Introduced `InHospitalMortalityMIMIC4`. - Introduced `Readmission30DaysMIMIC4`.
- Introduced a new processor registry to manage different data processors. - Implemented base processor classes: `Processor`, `FeatureProcessor`, `SampleProcessor`, and `DatasetProcessor`. - Added specific processors for images (`ImageProcessor`), labels (`BinaryLabelProcessor`, `MultiClassLabelProcessor`, `MultiLabelProcessor`, `RegressionLabelProcessor`), sequences (`SequenceProcessor`), signals (`SignalProcessor`), and time series (`TimeseriesProcessor`). - Each processor includes methods for processing data and managing state, with appropriate error handling and configuration options.
- Updated `BaseModel` to streamline initialization and remove deprecated parameters. - Introduced `EmbeddingModel` for handling embedding layers for various input types. - Refactored `RNN` class to utilize `EmbeddingModel` for embedding inputs, enhancing modularity. - Cleaned up unused code and improved type annotations for better clarity and maintainability.
Co-authored-by: John Wu
Co-authored-by: John Wu
Major Refactor: Unified Event Stream, YAML Config, Multimodal Processor, Simplified Model I think it looks good so far, we can iterate if we find more issues. Easier to break things in the dev branch and then hotfix later with our tiny size.
Collaborator
|
FYI: We’ve decided to remove the |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Massive refactor:
This pull request includes several changes to improve the PyHealth framework, particularly in dataset management and configuration. The most important changes include the addition of a new hackathon announcement, updates to dataset initialization and configuration, and the introduction of a new base dataset class.
Hackathon Announcement:
hackthon/202410-sunlab-hackthon.md: Added a detailed announcement for the Sunlab PyHealth Hackathon 2024, including schedule, project ideas, and resources.Dataset Initialization:
pyhealth/data/__init__.py: Simplified import statements by removing unused imports.pyhealth/datasets/__init__.py: Commented out unused dataset imports and added new import forBaseDataset.New Base Dataset Class:
pyhealth/datasets/base_dataset.py: Introduced a newBaseDatasetclass with detailed methods for loading and processing dataset tables, handling joins, and generating task-specific sample datasets.Configuration Updates:
pyhealth/datasets/configs/mimic3.yaml: Added configuration for MIMIC-III dataset, including file paths, patient attributes, and table joins.pyhealth/datasets/configs/mimic4.yaml: Added configuration for MIMIC-IV dataset, detailing file paths, patient attributes, and table joins.Miscellaneous Fixes:
pyhealth/datasets/covid19_cxr.py: Fixed typos and updated patient processing to use thePatientclass. [1] [2] [3] [4]pyhealth/datasets/featurizers/__init__.py: Removed unused featurizer imports.