Welcome to PyHealth!

Note

This README may be out of date. For the most up-to-date documentation, tutorials, and API reference, please visit our official documentation site at pyhealth.readthedocs.io.

Important

Join our PyHealth Discord Community! We are actively looking for contributors and want to get to know our users better! Click here to join Discord
Signup for our mailing list! We will email any significant PyHealth changes that are soon to come! Click here to subscribe

Citing PyHealth 🤝

Yang, Chaoqi, Zhenbang Wu, Patrick Jiang, Zhen Lin, Junyi Gao, Benjamin P. Danek, and Jimeng Sun. 2023. "PyHealth: A Deep Learning Toolkit for Healthcare Applications." In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 5788–89. KDD '23. New York, NY, USA: Association for Computing Machinery.

@inproceedings{pyhealth2023yang,
    author = {Yang, Chaoqi and Wu, Zhenbang and Jiang, Patrick and Lin, Zhen and Gao, Junyi and Danek, Benjamin and Sun, Jimeng},
    title = {{PyHealth}: A Deep Learning Toolkit for Healthcare Predictive Modeling},
    url = {https://github.com/sunlabuiuc/PyHealth},
    booktitle = {Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2023},
    year = {2023}
}

PyHealth is a comprehensive deep learning toolkit for supporting clinical predictive modeling, which is designed for both ML researchers and medical practitioners. We can make your healthcare AI applications easier to develop, test, and deploy—more flexible and more customizable. [Tutorials]

Key Features

Modular 5-stage pipeline for healthcare ML
Healthcare-first: medical codes and clinical datasets (MIMIC, eICU, OMOP)
33+ pre-built models and production-ready trainer/metrics
10+ supported healthcare tasks and datasets
Fast (~3x faster than pandas) data processing for quick experimentation

[News!] We are continuously implementing good papers and benchmarks into PyHealth, checkout the [Planned List]. Welcome to pick one from the list and send us a PR or add more influential and new papers into the plan list.

1. Installation 🚀

Python Version Recommendation

We recommend using Python 3.12 for optimal parallel processing and memory management performance. While PyHealth supports Python 3.8+, Python 3.12 provides significant improvements in these areas.

Recommended Installation (Alpha Version)

We recommend installing the latest alpha version from PyPi, which offers significant improvements in performance:

pip install pyhealth==2.0a13

This version includes optimized implementations and enhanced features compared to the legacy version.

Legacy Version

The older stable version is still available for backward compatibility:

pip install pyhealth

For Contributors and Developers

If you are contributing to PyHealth or need the latest development features, install from GitHub source:

git clone https://github.com/sunlabuiuc/PyHealth.git
cd PyHealth
pip install -e .

Note: PyHealth has multiple neural network based models implemented in PyTorch. However, PyHealth does NOT install these DL libraries for you. If you want to use neural-net based models, please make sure PyTorch is installed.

2. Introduction 📖

pyhealth provides these functionalities (we are still enriching some modules):

You can use the following functions independently:

Dataset: MIMIC-III, MIMIC-IV, eICU, OMOP-CDM, EHRShot, COVID19-CXR, SleepEDF, SHHS, ISRUC, customized EHR datasets, etc.
Tasks: diagnosis-based drug recommendation, patient hospitalization and mortality prediction, readmission prediction, length of stay forecasting, sleep staging, etc.
ML models: RNN, LSTM, GRU, Transformer, RETAIN, SafeDrug, GAMENet, MoleRec, AdaCare, ConCare, StageNet, GRASP, SparcNet, ContraWR, Deepr, TCN, Dr. Agent, etc.

Building a healthcare AI pipeline can be as short as 10 lines of code in PyHealth.

3. Build ML Pipelines 🏆

All healthcare tasks in our package follow a five-stage pipeline:

We try hard to make sure each stage is as separate as possible, so that people can customize their own pipeline by only using our data processing steps or the ML models.

Module 1: <pyhealth.datasets>

pyhealth.datasets provides a clean structure for the dataset, independent from the tasks. We support MIMIC-III, MIMIC-IV, eICU, OMOP-CDM, and more. The output (mimic3base) is a multi-level dictionary structure (see illustration below).

from pyhealth.datasets import MIMIC3Dataset

mimic3base = MIMIC3Dataset(
    # root directory of the dataset
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    # raw CSV table name
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
    # map all NDC codes to CCS codes in these tables
    code_mapping={"NDC": "CCSCM"},
)

Module 2: <pyhealth.tasks>

pyhealth.tasks defines how to process each patient's data into a set of samples for the tasks. In the package, we provide several task examples, such as drug recommendation, mortality prediction, and readmission prediction. It is easy to customize your own tasks following our template.

from pyhealth.tasks import ReadmissionPredictionMIMIC3

mimic3sample = mimic3base.set_task(ReadmissionPredictionMIMIC3())
mimic3sample[0] # show the information of the first sample

from pyhealth.datasets import split_by_patient, get_dataloader

train_ds, val_ds, test_ds = split_by_patient(mimic3sample, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)

Module 3: <pyhealth.models>

pyhealth.models provides different ML models with very similar argument configs.

from pyhealth.models import Transformer

model = Transformer(
    dataset=mimic3sample,
)

Module 4: <pyhealth.trainer>

pyhealth.trainer can specify training arguments, such as epochs, optimizer, learning rate, etc. The trainer will automatically save the best model and output the path in the end.

from pyhealth.trainer import Trainer

trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc_samples",
)

Module 5: <pyhealth.metrics>

pyhealth.metrics provides several common evaluation metrics (refer to Doc and see what are available).

# method 1
trainer.evaluate(test_loader)

# method 2
from pyhealth.metrics.binary import binary_metrics_fn

y_true, y_prob, loss = trainer.inference(test_loader)
binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"])

4. Medical Code Map 🏥

pyhealth.codemap provides two core functionalities. This module can be used independently.

For code ontology lookup within one medical coding system (e.g., name, category, sub-concept);

from pyhealth.medcode import InnerMap

icd9cm = InnerMap.load("ICD9CM")
icd9cm.lookup("428.0")
# `Congestive heart failure, unspecified`
icd9cm.get_ancestors("428.0")
# ['428', '420-429.99', '390-459.99', '001-999.99']

atc = InnerMap.load("ATC")
atc.lookup("M01AE51")
# `ibuprofen, combinations`
atc.lookup("M01AE51", "drugbank_id")
# `DB01050`
atc.lookup("M01AE51", "description")
# Ibuprofen is a non-steroidal anti-inflammatory drug (NSAID) derived ...
atc.lookup("M01AE51", "indication")
# Ibuprofen is the most commonly used and prescribed NSAID. It is very common over the ...

For code mapping between two coding systems (e.g., ICD9CM to CCSCM).

from pyhealth.medcode import CrossMap

codemap = CrossMap.load("ICD9CM", "CCSCM")
codemap.map("428.0")
# ['108']

codemap = CrossMap.load("NDC", "RxNorm")
codemap.map("50580049698")
# ['209387']

codemap = CrossMap.load("NDC", "ATC")
codemap.map("50090539100")
# ['A10AC04', 'A10AD04', 'A10AB04']

5. Medical Code Tokenizer 💬

pyhealth.tokenizer is used for transformations between string-based tokens and integer-based indices, based on the overall token space. We provide flexible functions to tokenize 1D, 2D and 3D lists. This module can be used independently.

from pyhealth.tokenizer import Tokenizer

# Example: we use a list of ATC3 code as the token
token_space = ['A01A', 'A02A', 'A02B', 'A02X', 'A03A', 'A03B', 'A03C', 'A03D', \
        'A03F', 'A04A', 'A05A', 'A05B', 'A05C', 'A06A', 'A07A', 'A07B', 'A07C', \
        'A12B', 'A12C', 'A13A', 'A14A', 'A14B', 'A16A']
tokenizer = Tokenizer(tokens=token_space, special_tokens=["<pad>", "<unk>"])

# 2d encode
tokens = [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', 'B035', 'C129']]
indices = tokenizer.batch_encode_2d(tokens)
# [[8, 9, 10, 11], [12, 1, 1, 0]]

# 2d decode
indices = [[8, 9, 10, 11], [12, 1, 1, 0]]
tokens = tokenizer.batch_decode_2d(indices)
# [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', '<unk>', '<unk>']]

# 3d encode
tokens = [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], \
    [['A04A', 'B035', 'C129']]]
indices = tokenizer.batch_encode_3d(tokens)
# [[[8, 9, 10, 11], [24, 25, 0, 0]], [[12, 1, 1, 0], [0, 0, 0, 0]]]

# 3d decode
indices = [[[8, 9, 10, 11], [24, 25, 0, 0]], \
    [[12, 1, 1, 0], [0, 0, 0, 0]]]
tokens = tokenizer.batch_decode_3d(indices)
# [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], [['A04A', '<unk>', '<unk>']]]

6. Tutorials 🧑‍🏫

We provide the following tutorials to help users get started with our pyhealth. Please bear with us as we update the documentation on how to use PyHealth 2.0.

Tutorial 0: Introduction to pyhealth.data [Video]

Tutorial 1: Introduction to pyhealth.datasets [Video (PyHealth 1.6)]

Tutorial 2: Introduction to pyhealth.tasks [Video (PyHealth 1.6)]

Tutorial 3: Introduction to pyhealth.models [Video]

Tutorial 4: Introduction to pyhealth.trainer [Video]

Tutorial 5: Introduction to pyhealth.metrics [Video]

Tutorial 6: Introduction to pyhealth.tokenizer [Video]

Tutorial 7: Introduction to pyhealth.medcode [Video]

The following tutorials will help users build their own task pipelines.

Pipeline 1: Chest Xray Classification [Video]

Pipeline 2: Medical Coding

Pipeline 3: Medical Transcription Classification

Pipeline 4: Mortality Prediction

Pipeline 5: Readmission Prediction

We provide advanced tutorials for supporting various needs.

Advanced Tutorial 1: Fit your dataset into our pipeline [Video]

Advanced Tutorial 2: Define your own healthcare task

Advanced Tutorial 3: Adopt customized model into pyhealth [Video]

Advanced Tutorial 4: Load your own processed data into pyhealth and try out our ML models [Video]

7. Datasets 🏔️

We provide the processing files for the following open EHR datasets:

MIMIC-III	`pyhealth.datasets.MIMIC3Dataset`	2016	MIMIC-III Clinical Database
MIMIC-IV	`pyhealth.datasets.MIMIC4Dataset`	2020	MIMIC-IV Clinical Database
eICU	`pyhealth.datasets.eICUDataset`	2018	eICU Collaborative Research Database
OMOP	`pyhealth.datasets.OMOPDataset`		OMOP-CDM schema based dataset
EHRShot	`pyhealth.datasets.EHRShotDataset`	2023	Few-shot EHR benchmarking dataset
COVID19-CXR	`pyhealth.datasets.COVID19CXRDataset`	2020	COVID-19 chest X-ray image dataset
SleepEDF	`pyhealth.datasets.SleepEDFDataset`	2018	Sleep-EDF dataset
SHHS	`pyhealth.datasets.SHHSDataset`	2016	Sleep Heart Health Study dataset
ISRUC	`pyhealth.datasets.ISRUCDataset`	2016	ISRUC-SLEEP dataset

8. Machine/Deep Learning Models ✈️

Deep Learning Models

Model	Year	Key Innovation
RETAIN	2016	Interpretable attention for clinical decisions
GAMENet	2019	Memory networks for drug recommendation
SafeDrug	2021	Molecular graphs for safe drug combinations
MoleRec	2023	Substructure-aware drug recommendation
AdaCare	2020	Scale-adaptive feature extraction
ConCare	2020	Transformer-based patient modeling
StageNet	2020	Disease progression stage modeling
GRASP	2021	Graph neural networks for patient clustering
MICRON	2021	Medication change prediction with recurrent residual networks

Foundation Models

Model	Year	Description
Transformer	2017	Attention-based sequence modeling
RNN/LSTM/GRU	2011	Recurrent neural networks for sequences
CNN	1989	Convolutional networks for structured data
TCN	2018	Temporal convolutional networks
MLP	1986	Multi-layer perceptrons for tabular data

Specialized Models

Model	Year	Specialization
ContraWR	2021	Biosignal analysis (EEG, ECG)
SparcNet	2023	Seizure detection and sleep staging
Deepr	2017	Electronic health records
Dr. Agent	2020	Reinforcement learning for clinical decisions

Check the interactive map on benchmark EHR predictive tasks.

9. Research Initiative 🔬

The PyHealth Research Initiative is a year-round, open research program that brings together talented individuals from diverse backgrounds to conduct cutting-edge research in healthcare AI.

How to participate:

Join our Discord server
Submit a high-quality PR to the PyHealth repository
Check the documentation for more details

Recent research from the initiative has been published at venues including ML4H 2025 and other top conferences.

10. About Us 👥

We are the SunLab healthcare research team at UIUC.

Current Maintainers:

Zhenbang Wu (Ph.D. Student @ UIUC)
John Wu (Ph.D. Student @ UIUC)
Junyi Gao (Ph.D. Student @ University of Edinburgh)
Jimeng Sun (Professor @ UIUC)

Get in Touch:

Name		Name	Last commit message	Last commit date
Latest commit History 900 Commits
.github/workflows		.github/workflows
chat-assistant		chat-assistant
docs		docs
examples		examples
figure		figure
hackthon		hackthon
leaderboard		leaderboard
pyhealth		pyhealth
test-resources		test-resources
tests		tests
tools		tools
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE.md		LICENSE.md
README.rst		README.rst
makefile		makefile
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Welcome to PyHealth!

Citing PyHealth 🤝

1. Installation 🚀

2. Introduction 📖

3. Build ML Pipelines 🏆

Module 1: <pyhealth.datasets>

Module 2: <pyhealth.tasks>

Module 3: <pyhealth.models>

Module 4: <pyhealth.trainer>

Module 5: <pyhealth.metrics>

4. Medical Code Map 🏥

5. Medical Code Tokenizer 💬

6. Tutorials 🧑‍🏫

7. Datasets 🏔️

8. Machine/Deep Learning Models ✈️

9. Research Initiative 🔬

10. About Us 👥

About

Licenses found

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 44

Uh oh!

Languages

License

Licenses found

sunlabuiuc/PyHealth

Folders and files

Latest commit

History

Repository files navigation

Welcome to PyHealth!

Citing PyHealth 🤝

1. Installation 🚀

2. Introduction 📖

3. Build ML Pipelines 🏆

Module 1: <pyhealth.datasets>

Module 2: <pyhealth.tasks>

Module 3: <pyhealth.models>

Module 4: <pyhealth.trainer>

Module 5: <pyhealth.metrics>

4. Medical Code Map 🏥

5. Medical Code Tokenizer 💬

6. Tutorials 🧑‍🏫

7. Datasets 🏔️

8. Machine/Deep Learning Models ✈️

9. Research Initiative 🔬

10. About Us 👥

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 44

Uh oh!

Languages

Packages