chat : Avoid partial reasoning tags in response content#15149
Closed
p1-0tr wants to merge 1 commit intoggml-org:masterfrom
Closed
chat : Avoid partial reasoning tags in response content#15149p1-0tr wants to merge 1 commit intoggml-org:masterfrom
p1-0tr wants to merge 1 commit intoggml-org:masterfrom
Conversation
7e13319 to
4c64211
Compare
If a model uses a multi-part reasoning tag we can end up with part of
the tag in the message content when using streaming mode. E.g.
$ curl -N http://localhost:8080/v1/chat/completions -d '{
"model": "hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"stream": true
}' -H "Content-Type: application/json"
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<|channel|>"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"analysis"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"The"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}
...
This happens because the chat parser can't make a full match on the
first parts of the reasoning tag. So, modify try_consume_literal() to
speculatively consume a partially matching string in case the parser is
constructed with partial set to true.
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
4c64211 to
82bf586
Compare
Author
|
No longer needed with #15181 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If a model uses a multi-part reasoning tag we can end up with part of the tag in the message content when using streaming mode. E.g.
This happens because the chat parser can't make a full match on the first parts of the reasoning tag. So, modify try_consume_literal() to speculatively consume a partially matching string in case the parser is constructed with partial set to true.
Make sure to read the contributing guidelines before submitting a PR