Fix structured outputs#20223
Conversation
|
Fixes #20221 |
|
@pwilkin thanks for this. I'm not familiar with the testing infrastructure, but could we add the example request as a regression test for llama-server structured output? |
|
Can try to add a server test for it, see how the tiny-stories model handles structured output ;) |
| if (has_response_format) { | ||
| return ctx.reasoning_parser + p.space() + | ||
| p.content(p.schema(p.json(), "response-format", inputs.json_schema)) + p.end(); | ||
| return ctx.reasoning_parser + p.space() + p.optional(p.literal("```json") + p.space()) + |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
Co-authored-by: Aldehir Rojas <hello@alde.dev>
| p.content(p.schema(p.json(), "response-format", inputs.json_schema)) + p.end(); | ||
| auto response_format = p.rule("response-format", p.content(p.schema(p.json(), "response-format-schema", inputs.json_schema))); | ||
| return ctx.reasoning_parser + p.space() + p.choice({ | ||
| p.literal("```json") + p.space() + response_format + p.space() + p.literal("```"), |
There was a problem hiding this comment.
I might not be interpreting this correctly, but should the model be able to output fenced markdown blocks on a structured response output? The main guarantee of a structured JSON response is that it will only contain valid JSON without having to extract from any markdown blocks.
There was a problem hiding this comment.
The parser will extract the JSON within the code fences, you won't see them in the response (see definition wrapped in p.content()).
This is good for models whose training set contain a bunch of JSON examples in code fences, such as Gemma 3.
There was a problem hiding this comment.
I've always assumed that llama.cpp manipulated the model's token prediction in a way that ensured only tokens that kept the output valid according to the JSON schema, and that outputting a fenced tokens would not even be possible.
There was a problem hiding this comment.
Grammar-constrained decoding works for any context-free language. JSON is one, but so is JSON wrapped in fences. We need to support more than just JSON, especially for reasoning models that need to reason before generating a structured response.
* Fix structured outputs * Update common/chat-auto-parser-generator.cpp Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev>
* Fix structured outputs * Update common/chat-auto-parser-generator.cpp Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev>
* Fix structured outputs * Update common/chat-auto-parser-generator.cpp Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev>
* Fix structured outputs * Update common/chat-auto-parser-generator.cpp Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev>
Fixes support for structured outputs in autoparser.