[serve] Update tool call to switch to `parse_response` by SunMarc · Pull Request #45485 · huggingface/transformers

SunMarc · 2026-04-16T22:31:06Z

What does this PR do?

This PR update the tool calling support in serve to switch to parse_response. I've updated qwen support to use the same template as parse_response so that we only keep one implementation. With this gemma4 tool calling is supported. I've made also other changes:

Support for reponse api tool calling structure from what i've seen from their docs (check the tests) but we will never be able to correctly test this unless some clients adopt it.
If there is content alongside tool call, we ignore it when passing to apply_chat_template. This was causing issues with gemma4 as the content as put after the tool call and the tool response. But I think in general, it should be fine to not show the content. We can revisit this if needed as I'm not sure that this. Like if there is a thinking phase, I feel like this could make sense to have it in the context.
More tests regarding multi-turn tool calling. I've also did a minor refactor of the tool calling test, i've put all the tests together.

HuggingFaceDocBuilderDev · 2026-04-16T22:41:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc · 2026-04-16T22:47:26Z

run-slow: cli

github-actions · 2026-04-16T22:48:50Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["cli"]
quantizations: []

github-actions · 2026-04-16T23:03:36Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	3488f7c8	workflow commit (merge commit)
PR	ce495c22	branch commit (from PR)
main	9b1a47c6	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

LysandreJik

Nice, looks good! Let's indeed trying with some tool-enable frontends to see how it performs

LysandreJik · 2026-04-17T07:58:44Z

+                # 5. Tool calls are parsed after generation completes (not during streaming),
+                # because the full token sequence is needed for reliable parsing.


I'm not sure if this is a satisfactory workaround: it does mean that we'll need to wait for the generation to complete before handling tool calls, right? Let's see how others do it in the ecosystem, but from memory they do on-the-fly tool-call-parsing

can be revisited after the PR maybe

I think that if there is one tool call, it shouldn't matter much. Note that usually the generation ends up just after the tool_call request, so we have the wait until the end in any case. But if we have multiple tool calls, it might be nice to have the on-the-fly tool-call-parsing. However, not sure if we will see much speed gain. In any case, this should be quite easy to fix but I'll check if it is worth adding the extra logic when testing with real use case.

This is on me - the schema parser assumes a complete message. If incremental processing is a hard requirement, I might need to go back and revisit the spec, and how exactly we handle parsing.

Parsing is significantly harder than just chat formatting because of things like this - it might be a sign that we need arbitrary code to do it correctly, which is what other frameworks have, and we can't just have user-authorable schemas embedded in config files. If so, we'd need to maintain a list of message parser functions either in Transformers or some other HF repo!

Co-authored-by: Lysandre Debut <hi@lysand.re>

SunMarc · 2026-04-17T13:51:01Z

run-slow: cli

github-actions · 2026-04-17T13:52:18Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["cli"]
quantizations: []

github-actions · 2026-04-17T14:02:15Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	6cdee456	workflow commit (merge commit)
PR	44c17718	branch commit (from PR)
main	5399876c	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

LysandreJik

Alright, let's do it! Thanks @SunMarc

…5485) * fix tool call to support parse response * simplify parser * simplification ! * Update src/transformers/cli/serving/response.py Co-authored-by: Lysandre Debut <hi@lysand.re> --------- Co-authored-by: Lysandre Debut <hi@lysand.re>

fix tool call to support parse response

d562e04

SunMarc requested review from LysandreJik, Rocketknight1 and vasqu April 16, 2026 22:31

simplify parser

ce495c2

SunMarc commented Apr 16, 2026

View reviewed changes

Comment thread src/transformers/cli/serving/utils.py Outdated

SunMarc commented Apr 16, 2026

View reviewed changes

Comment thread src/transformers/cli/serving/utils.py Outdated

LysandreJik reviewed Apr 17, 2026

View reviewed changes

SunMarc and others added 2 commits April 17, 2026 13:49

simplification !

a384446

Update src/transformers/cli/serving/response.py

44c1771

Co-authored-by: Lysandre Debut <hi@lysand.re>

SunMarc requested a review from LysandreJik April 17, 2026 16:34

LysandreJik approved these changes Apr 20, 2026

View reviewed changes

SunMarc added this pull request to the merge queue Apr 20, 2026

Merged via the queue into main with commit e15297e Apr 20, 2026
19 checks passed

SunMarc deleted the tool-call-serve branch April 20, 2026 15:38

stevhliu mentioned this pull request Apr 21, 2026

[docs] multi-turn tool calling #45554

Merged

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

		# 5. Tool calls are parsed after generation completes (not during streaming),
		# because the full token sequence is needed for reliable parsing.

Conversation

SunMarc commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2026

Uh oh!

SunMarc commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 16, 2026

CI Results

Commit Info

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LysandreJik Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

LysandreJik Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

SunMarc Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SunMarc commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

CI Results

Commit Info

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SunMarc commented Apr 16, 2026 •

edited

Loading

SunMarc Apr 17, 2026 •

edited

Loading