Add num_workers to BaseDataset #743

Logiquo · 2025-12-18T00:54:47Z

Contributor: Yongda Fan (yongdaf2@illinois.edu)

Contribution Type: Dataset

Description
Support configure n_workers for dask cluster.

Files to Review
pyhealth/datasets/base_dataset.py

jhnwu3 · 2025-12-18T23:08:58Z

pyhealth/datasets/base_dataset.py

+                    n_workers=self.num_workers,
                    threads_per_worker=1,
-                    processes=False,
+                    processes=not in_notebook(),


Just curious, why do we need to check if we're in a jupyter notebook or not?

jupyter notebook is a bit tricky when dealing with multiprocessing (due to the natural of Python). So we would typically prefer to not use it when possible.

e.g. main guard in the cell

jhnwu3

lgtm

add num_workers to BaseDataset

59c2318

Logiquo requested a review from jhnwu3 December 18, 2025 00:54

Logiquo mentioned this pull request Dec 18, 2025

[Tracking] Tracking issue for the new memory efficient dataset. #740

Open

11 tasks

Logiquo added 3 commits December 17, 2025 20:15

Fix test

1fbd504

Fix MIMIC4

69f2323

use multi-process mode when not in notebook to speed up the process

7f90406

Logiquo added component: dataset Contribute a new dataset to PyHealth infra Infrastructure: data loading, caching, pipelines labels Dec 18, 2025

Merge remote-tracking branch 'upstream/master' into dataset-n-worker

1b8a086

jhnwu3 reviewed Dec 18, 2025

View reviewed changes

jhnwu3 approved these changes Dec 18, 2025

View reviewed changes

Logiquo merged commit e3ba18c into sunlabuiuc:master Dec 18, 2025
1 check passed

Logiquo deleted the dataset-n-worker branch December 19, 2025 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add num_workers to BaseDataset #743

Add num_workers to BaseDataset #743

Uh oh!

Logiquo commented Dec 18, 2025 •

edited

Loading

Uh oh!

jhnwu3 Dec 18, 2025

Uh oh!

Logiquo Dec 18, 2025 •

edited

Loading

Uh oh!

jhnwu3 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add num_workers to BaseDataset #743

Add num_workers to BaseDataset #743

Uh oh!

Conversation

Logiquo commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhnwu3 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Logiquo Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhnwu3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Logiquo commented Dec 18, 2025 •

edited

Loading

Logiquo Dec 18, 2025 •

edited

Loading