chat : add MiniMax M2 specialized tool-call handler#22106
Open
doctorjei wants to merge 1 commit intoggml-org:masterfrom
Open
chat : add MiniMax M2 specialized tool-call handler#22106doctorjei wants to merge 1 commit intoggml-org:masterfrom
doctorjei wants to merge 1 commit intoggml-org:masterfrom
Conversation
The autoparser (peg-native) infers a grammar from the MiniMax-M2
template that handles a single <invoke> element cleanly but
mis-specifies the repetition rule for multiple <invoke> elements
inside one <minimax:tool_call> wrapper. Parallel tool calls with
the generic path trip the streaming parser's self-consistency check
("Invalid diff: now finding less tool calls!"), which is the
test-harness analogue of the production GGML_ABORT at
llama-grammar.cpp:1435 on real MiniMax M2.7 output.
Add a specialized handler following the Kimi K2 pattern: XML
invoke/parameter parsing, lazy grammar gated by <minimax:tool_call>
trigger, reasoning extraction via <think>/</think>. Dispatch
requires three MiniMax-specific literals in the template source
(<minimax:tool_call>, <invoke name=, <parameter name=) so any
future variant that drops the XML idiom falls through to the
autoparser.
Include five test fixtures in tests/test-chat.cpp: parallel calls
with different tools, parallel calls with the same tool (both repro
the gap), string parameter with embedded XML-ish content, multi-line
string value, and two-integer-parameter invocation. The three
passing-on-master cases document that the autoparser's gap is
specifically repetition, not content shape.
|
Hi @doctorjei, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes autoparser response (failure/abort) to Minimax-M2.7 system prompt.
Overview
Adds specialized tool-call handler for MiniMax M2.7 template (and probably later versions, I'm guessing). Right now, without it, M2.7 output with tools crashes llama-server (
GGML_ABORT) atsrc/llama-grammar.cpp:1435(EOG with non-empty stack) when<invoke>is emitted.Why? (Reproducing the Issue)
Reproducible in-tree via
tests/test-chat.cppon current master (b8840). Parallel<invoke>elements inside a<minimax:tool_call>wrapper confuse thepeg_tester.The autoparser (
peg-native) infers grammar structure from the template via differential rendering. MiniMax's template uses XML with repeatableinvokeelements for parallel calls. The parser correctly infers per-invoke structure but mis-specifies the repetition rule, so any secondinvokeis lost.This is a regression; an earlier working version was in the mainline (#16932,
1920345) (via a generalized XML tool-call parser), but the autoparser refactoring (#18675) replaced it. This PR restores specialized handling for MiniMax M2.7 (and likely other M2 versions) without reverting the broader refactor.Implementation
This implementation follows the Kimi K2 / DeepSeek V3.2 pattern for templates the autoparser cannot handle.
common_chat_params_init_minimaxprepares PEG for wrapper/invoke/param grammar (parallel calls).<think>…</think>blocks) ahead of tool calls.tool_arg_string_value) to preserve embedded XML-style content; non-strings are reconstructed through JSON.common_chat_try_specialized_templaterequires three MiniMax-specific literals in template source (<minimax:tool_call>,<invoke name=,<parameter name=)Testing
Extends the existing MiniMax block in
tests/test-chat.cppwith five test cases.<invoke>elements; two different toolsInvalid diff: now finding less tool calls!)<invoke>elements; same tool twice<div><script>…</script></div>tool_arg_string_valueis verbatim\n)until("</parameter>")boundary on multi-line content<invoke>zero_or_moreover parameter list + non-string JSON reconstructionThe passing test cases are also focused on repetition (vs content shape) to provide additional regression coverage.
Additional information
src/llama-grammar.cpp:1435(GGML_ABORT("fatal error")when EOG token is accepted with non-empty grammar stacks).Requirements
AI was used to identify the appropriate strategy, draft a harness, and draft initial code snippets. Every line was reviewed, edited as appropriate, and included manually in commits.