Skip to content

Conversation

@Logiquo
Copy link
Collaborator

@Logiquo Logiquo commented Dec 18, 2025

Contributor: Yongda Fan (yongdaf2@illinois.edu)

Contribution Type: Dataset

Description
Support configure n_workers for dask cluster.

Files to Review
pyhealth/datasets/base_dataset.py

@Logiquo Logiquo added component: dataset Contribute a new dataset to PyHealth infra Infrastructure: data loading, caching, pipelines labels Dec 18, 2025
n_workers=self.num_workers,
threads_per_worker=1,
processes=False,
processes=not in_notebook(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why do we need to check if we're in a jupyter notebook or not?

Copy link
Collaborator Author

@Logiquo Logiquo Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jupyter notebook is a bit tricky when dealing with multiprocessing (due to the natural of Python). So we would typically prefer to not use it when possible.

e.g. main guard in the cell

Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Logiquo Logiquo merged commit e3ba18c into sunlabuiuc:master Dec 18, 2025
1 check passed
@Logiquo Logiquo deleted the dataset-n-worker branch December 19, 2025 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: dataset Contribute a new dataset to PyHealth infra Infrastructure: data loading, caching, pipelines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants