Skip to content

[serve] Update tool call to switch to parse_response#45485

Merged
SunMarc merged 4 commits intomainfrom
tool-call-serve
Apr 20, 2026
Merged

[serve] Update tool call to switch to parse_response#45485
SunMarc merged 4 commits intomainfrom
tool-call-serve

Conversation

@SunMarc
Copy link
Copy Markdown
Member

@SunMarc SunMarc commented Apr 16, 2026

What does this PR do?

This PR update the tool calling support in serve to switch to parse_response. I've updated qwen support to use the same template as parse_response so that we only keep one implementation. With this gemma4 tool calling is supported. I've made also other changes:

  • Support for reponse api tool calling structure from what i've seen from their docs (check the tests) but we will never be able to correctly test this unless some clients adopt it.
  • If there is content alongside tool call, we ignore it when passing to apply_chat_template. This was causing issues with gemma4 as the content as put after the tool call and the tool response. But I think in general, it should be fine to not show the content. We can revisit this if needed as I'm not sure that this. Like if there is a thinking phase, I feel like this could make sense to have it in the context.
  • More tests regarding multi-turn tool calling. I've also did a minor refactor of the tool calling test, i've put all the tests together.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc
Copy link
Copy Markdown
Member Author

SunMarc commented Apr 16, 2026

run-slow: cli

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["cli"]
quantizations: []

Comment thread src/transformers/cli/serving/utils.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 3488f7c8 workflow commit (merge commit)
PR ce495c22 branch commit (from PR)
main 9b1a47c6 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

Comment thread src/transformers/cli/serving/utils.py Outdated
Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good! Let's indeed trying with some tool-enable frontends to see how it performs

Comment thread src/transformers/cli/serving/chat_completion.py
Comment thread src/transformers/cli/serving/response.py
Comment on lines +449 to +450
# 5. Tool calls are parsed after generation completes (not during streaming),
# because the full token sequence is needed for reliable parsing.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a satisfactory workaround: it does mean that we'll need to wait for the generation to complete before handling tool calls, right? Let's see how others do it in the ecosystem, but from memory they do on-the-fly tool-call-parsing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be revisited after the PR maybe

Copy link
Copy Markdown
Member Author

@SunMarc SunMarc Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if there is one tool call, it shouldn't matter much. Note that usually the generation ends up just after the tool_call request, so we have the wait until the end in any case. But if we have multiple tool calls, it might be nice to have the on-the-fly tool-call-parsing. However, not sure if we will see much speed gain. In any case, this should be quite easy to fix but I'll check if it is worth adding the extra logic when testing with real use case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on me - the schema parser assumes a complete message. If incremental processing is a hard requirement, I might need to go back and revisit the spec, and how exactly we handle parsing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing is significantly harder than just chat formatting because of things like this - it might be a sign that we need arbitrary code to do it correctly, which is what other frameworks have, and we can't just have user-authorable schemas embedded in config files. If so, we'd need to maintain a list of message parser functions either in Transformers or some other HF repo!

Comment thread src/transformers/cli/serving/utils.py
Comment thread src/transformers/cli/serving/utils.py Outdated
SunMarc and others added 2 commits April 17, 2026 13:49
@SunMarc
Copy link
Copy Markdown
Member Author

SunMarc commented Apr 17, 2026

run-slow: cli

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["cli"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 6cdee456 workflow commit (merge commit)
PR 44c17718 branch commit (from PR)
main 5399876c base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@SunMarc SunMarc requested a review from LysandreJik April 17, 2026 16:34
Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's do it! Thanks @SunMarc

@SunMarc SunMarc added this pull request to the merge queue Apr 20, 2026
Merged via the queue into main with commit e15297e Apr 20, 2026
19 checks passed
@SunMarc SunMarc deleted the tool-call-serve branch April 20, 2026 15:38
lvliang-intel pushed a commit to lvliang-intel/transformers that referenced this pull request Apr 21, 2026
…5485)

* fix tool call to support parse response

* simplify parser

* simplification !

* Update src/transformers/cli/serving/response.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
artem-spector pushed a commit to artem-spector/transformers that referenced this pull request Apr 21, 2026
…5485)

* fix tool call to support parse response

* simplify parser

* simplification !

* Update src/transformers/cli/serving/response.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants