Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ We recommend installing the latest alpha version from PyPi, which offers signifi

.. code-block:: bash

pip install pyhealth==2.0a10
pip install pyhealth==2.0a13

This version includes optimized implementations and enhanced features compared to the legacy version.

Expand Down
1 change: 1 addition & 0 deletions docs/newsletter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Welcome to the PyHealth Newsletter! Here you'll find regular updates about new f
:caption: Newsletter Archive
:reversed:

newsletter/2025-12-memory-optimization
newsletter/2025-12-research-initiative-call
newsletter/2025-11-pyhealth-updates

Expand Down
82 changes: 82 additions & 0 deletions docs/newsletter/2025-12-memory-optimization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
Memory Optimization: PyHealth Now Runs on 16GB RAM
==================================================

*Published: December 18, 2025*

Hey everyone,

Over the past month, we've been working hard to reduce PyHealth's memory footprint when working with large public EHR datasets like MIMIC-IV. The result? **A major backend overhaul that lets you run PyHealth on machines with as little as 16GB of RAM—even when processing MIMIC-IV's 300,000+ patients with both medical codes and lab events spanning millions of records.** This makes clinical predictive modeling substantially more accessible, especially as memory prices are expected to keep rising [1]_.


----


The Scale We're Dealing With
----------------------------

To construct a multimodal mortality task in MIMIC-IV, we parse and scan:

* 315,460 patients
* 454,324 admissions
* 5,006,884 diagnosis codes
* 704,124 procedure codes
* 124,342,638 lab events

.. list-table::
:header-rows: 1
:widths: 30 25 25 25

* - Metric
- Pandas
- PyHealth 2.0a10
- PyHealth 2.0a13
* - Memory Required
- 49.23GB
- 385GB
- **16GB** ⚡
* - Run Time (h)
- 26.03
- 2.25
- 3.97


----


What Changed?
-------------

Previously, PyHealth loaded all patient data into memory whenever any computation was needed—meaning you'd load every patient's records just to analyze one patient's EHR. Fast with Polars, but wildly wasteful.

In PyHealth 2.0a13+, we've switched to lazy-loading patient data, pulling records into memory only when actually needed.


----


We Need Your Help!
------------------

Please test the new approach and report any issues here: `[Tracking] Tracking issue for the new memory efficient dataset · Issue #740 <https://github.com/sunlabuiuc/PyHealth/issues/740>`_

Don't hesitate to share comments or suggestions.


----


Looking Ahead
-------------

This memory optimization is just the first step in our mission to make healthcare AI truly accessible. By dramatically lowering the computational barriers to entry, we're opening the door for researchers and clinicians with limited resources—whether that's a PhD student with a laptop or a hospital in a low-resource setting—to build and deploy meaningful healthcare AI solutions.

We believe democratizing these tools is essential for ensuring healthcare AI benefits everyone, not just those with access to massive compute infrastructure.


----


References
----------

.. [1] https://www.tomsguide.com/news/live/ram-price-crisis-updates
Loading