CB improvements for serving by SunMarc · Pull Request #45063 · huggingface/transformers

SunMarc · 2026-03-27T16:07:43Z

What does this PR do?

This PR adds some features that makes serving more efficient. It shouldn't impact generate_batch at all:

Per-request result delivery via callbacks (replaces shared queue contention). Added _request_callbacks dict and register_result_handler(request_id, callback) — a unified API for async result delivery. The generation thread delivers results directly to registered callbacks instead of everything going through the shared output_queue. This eliminates the O(n²) requeue contention that get_result with request_id filtering had at high concurrency.
The generation loop waits on an Event instead of busy-spinning when there are no requests. add_request signals it via .set() to wake the loop immediately. Zero CPU when idle, instant wakeup on new request. In our server, the issue was that yhe busy-spin was holding the GIL when idle, which slowed down tokenization on the event loop thread.

HuggingFaceDocBuilderDev · 2026-03-27T16:29:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc · 2026-03-27T17:22:51Z

        self.model_device = model_device
        self.model_dtype = model_dtype
        self.scheduler = scheduler
+        self._deliver_output = deliver_output


a bit ugly, maybe there a cleaner way to do this

Maybe we can create a dedicated object to handle delivering the output, and passing it to the processor at creation time? It would have its own lock, method, and just a reference to the output queue. It will also clear up the manager class.

good idea !

SunMarc · 2026-03-27T17:38:01Z

@bot /style

github-actions · 2026-03-27T17:38:34Z

Style fix fix runs successfully without any file modified.

ArthurZucker

Nice! We could add tests in tests/cli/test_serve.py ?

remi-or · 2026-03-30T08:30:54Z

+        if self.log_prob_generation:
+            raise NotImplementedError("log_prob_generation is not supported yet")
+
+    def _register_handler(self, request_id: str, callback: callable, loop: asyncio.AbstractEventLoop) -> None:


Seems like this function and _unregister_handler could be removed: they are 2-lines called only once, might as well inline them

SunMarc · 2026-03-30T12:15:31Z

+        for request in requests_in_batch:
+            state = request.state


my test was failing without this fix, feels correct to me

yes, sorry this is fixed in a un-merged PR, good fix

SunMarc · 2026-03-30T12:16:08Z

+    """
+
+    def __init__(self) -> None:
+        self.output_queue = queue.Queue()


moved the output queue here

SunMarc · 2026-03-30T12:18:15Z

Nice! We could add tests in tests/cli/test_serve.py ?

We already have plenty of these tests in serve as this is basically the default path there when the PR over there will be merge. I will still add a few tests here.

remi-or

LGTM, thanks!

* merge * update * fix * style * simpler * style * review ! * style * batch output * style * type

SunMarc added 4 commits March 27, 2026 15:54

merge

880e6e0

update

4aa7fec

fix

3ab4e09

style

06bacbb

SunMarc added 2 commits March 27, 2026 16:33

simpler

ef10618

style

09d5fe1

SunMarc commented Mar 27, 2026

View reviewed changes

SunMarc requested a review from remi-or March 27, 2026 17:37

SunMarc requested a review from ArthurZucker March 27, 2026 17:38

Merge branch 'main' into output-callback-cb

120e37b

SunMarc mentioned this pull request Mar 27, 2026

[refactor] Serving into proper modules #44796

Merged

ArthurZucker approved these changes Mar 30, 2026

View reviewed changes

Comment thread src/transformers/generation/continuous_batching/continuous_api.py Outdated

remi-or reviewed Mar 30, 2026

View reviewed changes

SunMarc added 3 commits March 30, 2026 12:11

review !

ac0d6a1

Merge remote-tracking branch 'origin/main' into output-callback-cb

66314b5

style

9d52002

SunMarc requested a review from remi-or March 30, 2026 12:14

SunMarc commented Mar 30, 2026

View reviewed changes

remi-or approved these changes Mar 30, 2026

View reviewed changes

SunMarc and others added 4 commits March 30, 2026 17:37

batch output

ef1c710

style

caaab6e

type

4c1cd01

Merge branch 'main' into output-callback-cb

480828d

SunMarc added this pull request to the merge queue Mar 30, 2026

Merged via the queue into main with commit 8213e0d Mar 30, 2026
30 checks passed

SunMarc deleted the output-callback-cb branch March 30, 2026 18:48

sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Mar 31, 2026

CB improvements for serving (huggingface#45063)

7019c1a

* merge * update * fix * style * simpler * style * review ! * style * batch output * style * type

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Apr 4, 2026

CB improvements for serving (huggingface#45063)

9e3b557

* merge * update * fix * style * simpler * style * review ! * style * batch output * style * type

Conversation

SunMarc commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Mar 27, 2026

Uh oh!

github-actions Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

remi-or left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SunMarc commented Mar 27, 2026 •

edited

Loading

github-actions Bot commented Mar 27, 2026 •

edited

Loading

SunMarc commented Mar 30, 2026 •

edited

Loading