Skip to content

Autoparser: True streaming#20177

Merged
pwilkin merged 3 commits intoggml-org:masterfrom
pwilkin:true-streaming
Mar 7, 2026
Merged

Autoparser: True streaming#20177
pwilkin merged 3 commits intoggml-org:masterfrom
pwilkin:true-streaming

Conversation

@pwilkin
Copy link
Copy Markdown
Member

@pwilkin pwilkin commented Mar 6, 2026

In the final changes to the autoparser I modified the atomicity constraint to be pretty restrictive due to models such as GLM 4.7-Flash which put arguments directly after the function name, with no markers in between. Now I'm relaxing that constraint so people can look at their favorite assistant building the tool call live ;)

@pwilkin pwilkin changed the title True streaming Autoparser: True streaming Mar 6, 2026
@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented Mar 6, 2026

These already wrap in atomic:

    common_peg_parser tool_open(const common_peg_parser & p) { return atomic(tag(TOOL_OPEN, p)); }
    common_peg_parser tool_close(const common_peg_parser & p) { return atomic(tag(TOOL_CLOSE, p)); }
    common_peg_parser tool_id(const common_peg_parser & p) { return atomic(tag(TOOL_ID, p)); }
    common_peg_parser tool_name(const common_peg_parser & p) { return atomic(tag(TOOL_NAME, p)); }
    common_peg_parser tool_arg_open(const common_peg_parser & p) { return atomic(tag(TOOL_ARG_OPEN, p)); }
    common_peg_parser tool_arg_close(const common_peg_parser & p) { return atomic(tag(TOOL_ARG_CLOSE, p)); }
    common_peg_parser tool_arg_name(const common_peg_parser & p) { return atomic(tag(TOOL_ARG_NAME, p)); }

Copy link
Copy Markdown
Contributor

@aldehir aldehir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust you tested this on multiple models? Again, the consequence of a generalized approach: the impact surface is way larger.

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 7, 2026

@aldehir yeah in most cases this wasn't even a problem to begin with, from real-life models the only one affected was GLM 4.7-Flash since it doesn't wrap its function name in anything. Pure JSON models are unaffected because they have function names atomically in "", so this just covers all the fringe cases.

@pwilkin pwilkin merged commit c024d85 into ggml-org:master Mar 7, 2026
73 of 75 checks passed
@bchtrue
Copy link
Copy Markdown

bchtrue commented Mar 8, 2026

@pwilkin You did it, congratulations and thank you from all of us!

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 10, 2026
* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing

* Whitespace

* Remove redundant atomics
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing

* Whitespace

* Remove redundant atomics
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing

* Whitespace

* Remove redundant atomics
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing

* Whitespace

* Remove redundant atomics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants