Skip to content

Conversation

@jhnwu3
Copy link
Collaborator

@jhnwu3 jhnwu3 commented Aug 31, 2025

This pull request introduces a new social determinants of health (SDoH) sentence classifier based on a fine-tuned large language model, adds its dependencies, and provides a corresponding test. It also adds a utility for reproducible random seed setting in tests. The main changes are grouped below:

New SDoH Classifier Implementation:

  • Added SdohClassifier in pyhealth/models/sdoh.py, which uses a fine-tuned Llama model with PEFT adapters to predict SDoH labels from clinical sentences. Includes prompt engineering, model loading, and response parsing logic.

Dependency Management:

  • Added peft and accelerate to the pyproject.toml dependencies to support the new classifier's model loading and inference.

Testing and Reproducibility:

  • Added TestSdoh in tests/core/test_sdoh.py to validate classifier predictions with a sample sentence, ensuring correct label extraction.
  • Added set_random_seed utility in tests/base.py for deterministic test runs, including CUDA and cuDNN configuration for reproducibility.

@jhnwu3 jhnwu3 requested a review from plandes September 9, 2025 15:20
@plandes plandes merged commit 6989978 into master Sep 9, 2025
1 check passed
@jhnwu3 jhnwu3 deleted the sdoh branch September 10, 2025 19:40
dalloliogm pushed a commit to dalloliogm/PyHealth that referenced this pull request Nov 26, 2025
* add sentence level sdoh multi-label classification model and test

* revert test mask

* sdoh: llama download HF api key; response parse test; fix inf test

* doc

* just added more details to the docs

* readthedocs updates

---------

Co-authored-by: Paul Landes <landes@mailc.net>
Co-authored-by: John Wu <johnwu3@sunlab-serv-03.cs.illinois.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants