Skip to content

[HPU][Critical Issue Fix] ThreadPool instead of Pool for parallel pre-processing#39002

Merged
IlyasMoutawwakil merged 5 commits intohuggingface:mainfrom
dsmertin:dsmertin/fix/hpu-multiprocess-segfault
Jun 24, 2025
Merged

[HPU][Critical Issue Fix] ThreadPool instead of Pool for parallel pre-processing#39002
IlyasMoutawwakil merged 5 commits intohuggingface:mainfrom
dsmertin:dsmertin/fix/hpu-multiprocess-segfault

Conversation

@dsmertin
Copy link
Copy Markdown
Contributor

There had been a problem with Gaudi (HPU) handling multiprocessed pre-processing which was patched with #38790
Indeed there're limitations when different processes try to use one HPU device.

So I changed Pool which is based on creating new processes with ThreadPool which uses threads in the same process.

@dsmertin dsmertin force-pushed the dsmertin/fix/hpu-multiprocess-segfault branch from 9361ffe to dbc9328 Compare June 24, 2025 13:44
@dsmertin
Copy link
Copy Markdown
Contributor Author

Regarding the issue you created.
@IlyasMoutawwakil please review.

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

Hi @dsmertin I don't think this solves the issue, but rather propagates it to non-HPU devices.
Correct me if I'm wrong but threadpool is subject to the GIL and the task we're parallelizing here is CPU-bound (not I/O bound), so using threads almost adds no parallelism ?

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

We can accept the use for threadpool but only with HPU, since on other devices we still wanna make full use of the multiple processes, it could be something like:

pool_cls = ThreadPool if is_torch_hpu_available() else Pool
with pool_cls(...

@dsmertin
Copy link
Copy Markdown
Contributor Author

dsmertin commented Jun 24, 2025

We can accept the use for threadpool but only with HPU, since on other devices we still wanna make full use of the multiple processes, it could be something like:

pool_cls = ThreadPool if is_torch_hpu_available() else Pool
with pool_cls(...

Let me prepare the change.

UPD:
It's ready.

@IlyasMoutawwakil IlyasMoutawwakil requested a review from ydshieh June 24, 2025 14:23
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would leave @IlyasMoutawwakil for a final ✅ and merge it

thanks

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

@dsmertin no need to update the branch if the tests are passing (or waiting for approval) 😁

@IlyasMoutawwakil IlyasMoutawwakil merged commit ea9a309 into huggingface:main Jun 24, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants