Skip to content

Fix structured outputs#20223

Merged
pwilkin merged 2 commits intoggml-org:masterfrom
pwilkin:structured-output
Mar 8, 2026
Merged

Fix structured outputs#20223
pwilkin merged 2 commits intoggml-org:masterfrom
pwilkin:structured-output

Conversation

@pwilkin
Copy link
Copy Markdown
Member

@pwilkin pwilkin commented Mar 8, 2026

Fixes support for structured outputs in autoparser.

@pwilkin pwilkin requested a review from aldehir March 8, 2026 01:26
@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 8, 2026

Fixes #20221

@tarruda
Copy link
Copy Markdown

tarruda commented Mar 8, 2026

@pwilkin thanks for this.

I'm not familiar with the testing infrastructure, but could we add the example request as a regression test for llama-server structured output?

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 8, 2026

Can try to add a server test for it, see how the tiny-stories model handles structured output ;)

Comment thread common/chat-auto-parser-generator.cpp Outdated
if (has_response_format) {
return ctx.reasoning_parser + p.space() +
p.content(p.schema(p.json(), "response-format", inputs.json_schema)) + p.end();
return ctx.reasoning_parser + p.space() + p.optional(p.literal("```json") + p.space()) +

This comment was marked as outdated.

Comment thread common/chat-auto-parser-generator.cpp Outdated
Co-authored-by: Aldehir Rojas <hello@alde.dev>
p.content(p.schema(p.json(), "response-format", inputs.json_schema)) + p.end();
auto response_format = p.rule("response-format", p.content(p.schema(p.json(), "response-format-schema", inputs.json_schema)));
return ctx.reasoning_parser + p.space() + p.choice({
p.literal("```json") + p.space() + response_format + p.space() + p.literal("```"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not be interpreting this correctly, but should the model be able to output fenced markdown blocks on a structured response output? The main guarantee of a structured JSON response is that it will only contain valid JSON without having to extract from any markdown blocks.

Copy link
Copy Markdown
Contributor

@aldehir aldehir Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser will extract the JSON within the code fences, you won't see them in the response (see definition wrapped in p.content()).

This is good for models whose training set contain a bunch of JSON examples in code fences, such as Gemma 3.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always assumed that llama.cpp manipulated the model's token prediction in a way that ensured only tokens that kept the output valid according to the JSON schema, and that outputting a fenced tokens would not even be possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar-constrained decoding works for any context-free language. JSON is one, but so is JSON wrapped in fences. We need to support more than just JSON, especially for reasoning models that need to reason before generating a structured response.

@pwilkin pwilkin merged commit 62b8143 into ggml-org:master Mar 8, 2026
77 of 78 checks passed
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 10, 2026
* Fix structured outputs

* Update common/chat-auto-parser-generator.cpp

Co-authored-by: Aldehir Rojas <hello@alde.dev>

---------

Co-authored-by: Aldehir Rojas <hello@alde.dev>
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
* Fix structured outputs

* Update common/chat-auto-parser-generator.cpp

Co-authored-by: Aldehir Rojas <hello@alde.dev>

---------

Co-authored-by: Aldehir Rojas <hello@alde.dev>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* Fix structured outputs

* Update common/chat-auto-parser-generator.cpp

Co-authored-by: Aldehir Rojas <hello@alde.dev>

---------

Co-authored-by: Aldehir Rojas <hello@alde.dev>
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* Fix structured outputs

* Update common/chat-auto-parser-generator.cpp

Co-authored-by: Aldehir Rojas <hello@alde.dev>

---------

Co-authored-by: Aldehir Rojas <hello@alde.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants