Add BHCToAVS model for patient-friendly summaries #730

charanw · 2025-12-10T02:16:05Z

Contributor: Charan Williams (charanw2@illinois.edu)

Contribution Type: New Model

Description:
Added a new clinical summarization model, BHCToAVS, which converts Brief Hospital Course (BHC) notes into patient-friendly After-Visit Summaries (AVS). The model wraps a fine-tuned Mistral-7B LoRA adapter hosted on Hugging Face and integrates with the PyHealth model API. This contribution includes the full model implementation, unit tests, documentation, and an example usage script.

Files to Review:

pyhealth/models/bhc_to_avs.py — Main model implementation
pyhealth/models/__init__.py — Added import for the new model
tests/core/test_bhc_to_avs.py — Unit test for the BHCToAVS model
docs/api/models/pyhealth.models.bhc_to_avs.rst — Sphinx documentation file
docs/api/models.rst — Updated model index to include BHCToAVS
examples/bhc_to_avs_example.py — Example usage demonstrating model prediction

Introduces the BHCToAVS model, which converts clinical Brief Hospital Course (BHC) notes into After-Visit Summaries (AVS) using a fine-tuned Mistral 7B model with LoRA adapters. Adds model implementation, documentation, an example usage script, and unit tests.

Copilot

Pull request overview

This PR adds a new clinical summarization model, BHCToAVS, that converts Brief Hospital Course (BHC) notes into patient-friendly After-Visit Summaries (AVS) using a fine-tuned Mistral-7B LoRA adapter. The implementation integrates with PyHealth's model API and includes comprehensive documentation and examples.

Implements a new text generation model wrapping a Hugging Face LoRA adapter
Adds unit tests with graceful handling of model download failures
Provides example usage demonstrating the model's clinical text summarization capabilities

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
pyhealth/models/bhc_to_avs.py	Core model implementation with predict() method for generating patient-friendly summaries
pyhealth/models/init.py	Added BHCToAVS to module exports
tests/core/test_bhc_to_avs.py	Unit test validating the predict method with error handling for gated models
docs/api/models/pyhealth.models.BHCToAVS.rst	Sphinx autodoc configuration for the new model
docs/api/models.rst	Updated model index to include BHCToAVS
examples/bhc_to_avs_example.py	Example script demonstrating model usage with synthetic clinical text

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/core/test_bhc_to_avs.py

pyhealth/models/bhc_to_avs.py

Logiquo · 2025-12-27T10:34:01Z

The CI has failed.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…velop

Enhanced the BHCToAVS model with improved docstrings, error handling for Hugging Face token requirements, and more robust pipeline initialization. Updated and expanded the test suite to include both unit tests with a mocked pipeline and an optional integration test for real model inference.

Introduces a __post_init__ method in BHCToAVS to call the BaseModel initializer, ensuring proper nn.Module setup. Also updates docstring formatting and attribute documentation for clarity.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-29T19:00:49Z

pyhealth/models/bhc_to_avs.py

+                raise RuntimeError(
+                    "Hugging Face token not found. This model requires access to a gated repository.\n\n"
+                    "Set the HF_TOKEN environment variable or pass hf_token=... when initializing BHCToAVS.\n\n"
+                    "Example:\n"
+                    "  export HF_TOKEN='hf_...'\n"
+                    "  model = BHCToAVS()\n"
+                )


The error message spans lines 87-92 and provides good guidance, but the message could be more specific about where to obtain a HuggingFace token. Consider adding a link to the HuggingFace token generation page (https://huggingface.co/settings/tokens) to help users quickly resolve this issue.

Copilot · 2025-12-29T19:00:50Z

pyhealth/models/bhc_to_avs.py

+# Prompt used during fine-tuning
+_PROMPT = (
+    "Summarize for the patient what happened during the hospital stay based on this doctor's note:\n"
+    "{bhc}\n\n"
+    "Summary for the patient:\n"
+)


The comment on line 36 states "Prompt used during fine-tuning" but this prompt is actually used during inference (as seen on line 146). If this prompt was indeed used during fine-tuning and is also being reused during inference, the comment should clarify this. If it's only used during inference, the comment is misleading and should be corrected to "Prompt template used during inference" or similar.

Copilot · 2025-12-29T19:00:50Z

pyhealth/models/bhc_to_avs.py

+    def _get_pipeline(self):
+        """Create and cache the text-generation pipeline."""
+        if not hasattr(self, "_pipeline"):
+            # Resolve HuggingFace token
+            token = self._resolve_token()
+
+            # Throw RuntimeError if token is not found
+            if token is None:
+                raise RuntimeError(
+                    "Hugging Face token not found. This model requires access to a gated repository.\n\n"
+                    "Set the HF_TOKEN environment variable or pass hf_token=... when initializing BHCToAVS.\n\n"
+                    "Example:\n"
+                    "  export HF_TOKEN='hf_...'\n"
+                    "  model = BHCToAVS()\n"
+                )
+
+            # Load base model
+            base = AutoModelForCausalLM.from_pretrained(
+                self.base_model_id,
+                torch_dtype=torch.bfloat16,
+                device_map="auto",
+                token=token,
+            )
+
+            # Load LoRA adapter
+            model = PeftModelForCausalLM.from_pretrained(
+                base,
+                self.adapter_model_id,
+                torch_dtype=torch.bfloat16,
+                token=token,
+            )
+
+            tokenizer = AutoTokenizer.from_pretrained(self.base_model_id, token=token)
+
+            # Create HF pipeline
+            self._pipeline = pipeline(
+                "text-generation",
+                model=model,
+                tokenizer=tokenizer,
+                model_kwargs={"torch_dtype": torch.bfloat16},
+            )
+
+        return self._pipeline


The _get_pipeline method loads a large 7B parameter model without any explicit guidance on resource requirements or expected load time. Consider adding documentation (either in the class docstring or method docstring) about: (1) expected memory requirements (GPU/CPU), (2) approximate model loading time, and (3) recommended hardware specifications. This would help users understand the resource implications before attempting to use the model.

Copilot · 2025-12-29T19:00:50Z

pyhealth/models/bhc_to_avs.py

+    def __post_init__(self):
+        # Ensure nn.Module (via BaseModel) is initialized
+        super().__init__()
+


The BHCToAVS class uses a dataclass decorator but also inherits from BaseModel (which is an nn.Module). The post_init method calls super().init(), but this pattern is inconsistent with how BaseModel is typically used in the codebase. Looking at SdohClassifier (another dataclass-based model in the same codebase), it does not override post_init. Additionally, BaseModel's init expects an optional dataset parameter, but this implementation calls it without any arguments. This could lead to incorrect initialization. Consider removing post_init entirely or ensuring it properly initializes the BaseModel with appropriate parameters.

Suggested change

def __post_init__(self):

# Ensure nn.Module (via BaseModel) is initialized

super().__init__()

docs/api/models/pyhealth.models.BHCToAVS.rst

pyhealth/models/bhc_to_avs.py

Copilot · 2025-12-29T19:00:51Z

pyhealth/models/bhc_to_avs.py

+    base_model_id: str = field(default="mistralai/Mistral-7B-Instruct-v0.3")
+    adapter_model_id: str = field(default="williach31/mistral-7b-bhc-to-avs-lora")
+    hf_token: str | None = None


The BHCToAVS model deviates from the standard PyHealth model API pattern. Most models in the codebase (e.g., LogisticRegression, AdaCare, GAMENet) accept a dataset parameter in their initialization to query information like tokens and schemas. This model does not accept or use a dataset parameter, which is inconsistent with the BaseModel interface it inherits from. Consider whether this model should follow the standard pattern or if the deviation is intentional for this use case. If intentional, this should be clearly documented.

pyhealth/models/bhc_to_avs.py

Copilot · 2025-12-29T19:00:51Z

tests/core/test_bhc_to_avs.py

+        self.assertGreater(len(summary.strip()), 0)
+
+        # Output should be different from input
+        self.assertNotIn(bhc_text[:40], summary)


The test assertion on line 66 uses a weak check that only verifies the first 40 characters of the input are not present in the output. This could pass even if the model is simply copying most of the input text or behaving incorrectly. Consider adding more robust assertions that verify the output actually represents a simplified, patient-friendly summary (e.g., checking for absence of medical jargon, checking for specific expected transformations, or verifying the output differs meaningfully from the input).

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

charanw · 2025-12-29T19:08:50Z

Thank you for the review. I updated the tests by adding a separate optional integration test and modifying the unit test to patch in a dummy pipeline, so CI does not require model downloads or gated access.

Logiquo · 2025-12-30T08:49:31Z

Typically a BaseModels should take a SampleDataset to examine a first few batch of data to determine how to construct a models. I don't think your model follows this structure?

Logiquo added the component: model Contribute a new model to PyHealth label Dec 18, 2025

Logiquo requested a review from Copilot December 27, 2025 10:24

Copilot started reviewing on behalf of Logiquo December 27, 2025 10:25 View session

Logiquo self-requested a review December 27, 2025 10:25

Copilot AI reviewed Dec 27, 2025

View reviewed changes

Logiquo added the status: wait response Pending PR author's response label Dec 27, 2025

charanw and others added 2 commits December 29, 2025 11:18

Remove redundant device_map paramter

66f37d9

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add input validation for bhc_text

d7cf144

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Logiquo added status: need review Pending maintainer's review and removed status: wait response Pending PR author's response labels Dec 29, 2025

charanw and others added 6 commits December 29, 2025 11:21

Remove unused imports

c0747cd

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Set return_full_text to false to avoid returning input

ad3842a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Remove redundant _PROMPT

18fbe2c

Merge branch 'develop' of https://github.com/charanw/PyHealth into de…

ded3e2c

…velop

Add __post_init__ to ensure BaseModel initialization

6afb92f

Introduces a __post_init__ method in BHCToAVS to call the BaseModel initializer, ensuring proper nn.Module setup. Also updates docstring formatting and attribute documentation for clarity.

charanw requested a review from Copilot December 29, 2025 18:56

Copilot started reviewing on behalf of charanw December 29, 2025 18:56 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

charanw and others added 3 commits December 29, 2025 13:03

Update docs/api/models/pyhealth.models.BHCToAVS.rst

02e33e2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update pyhealth/models/bhc_to_avs.py

982dfa1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update pyhealth/models/bhc_to_avs.py

4ad294b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Logiquo added status: wait response Pending PR author's response and removed status: need review Pending maintainer's review labels Dec 30, 2025

	def __post_init__(self):
	# Ensure nn.Module (via BaseModel) is initialized
	super().__init__()

Add BHCToAVS model for patient-friendly summaries #730

Are you sure you want to change the base?

Add BHCToAVS model for patient-friendly summaries #730

Uh oh!

Conversation

charanw commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Logiquo commented Dec 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

charanw commented Dec 29, 2025

Uh oh!

Logiquo commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants