Skip to content

chat : add MiniMax M2 specialized tool-call handler#22106

Open
doctorjei wants to merge 1 commit intoggml-org:masterfrom
doctorjei:minimax-pr
Open

chat : add MiniMax M2 specialized tool-call handler#22106
doctorjei wants to merge 1 commit intoggml-org:masterfrom
doctorjei:minimax-pr

Conversation

@doctorjei
Copy link
Copy Markdown

Fixes autoparser response (failure/abort) to Minimax-M2.7 system prompt.

Overview

Adds specialized tool-call handler for MiniMax M2.7 template (and probably later versions, I'm guessing). Right now, without it, M2.7 output with tools crashes llama-server (GGML_ABORT) at src/llama-grammar.cpp:1435 (EOG with non-empty stack) when <invoke> is emitted.

Why? (Reproducing the Issue)

Reproducible in-tree via tests/test-chat.cpp on current master (b8840). Parallel <invoke> elements inside a <minimax:tool_call> wrapper confuse the peg_tester.

The autoparser (peg-native) infers grammar structure from the template via differential rendering. MiniMax's template uses XML with repeatable invoke elements for parallel calls. The parser correctly infers per-invoke structure but mis-specifies the repetition rule, so any second invoke is lost.

This is a regression; an earlier working version was in the mainline (#16932, 1920345) (via a generalized XML tool-call parser), but the autoparser refactoring (#18675) replaced it. This PR restores specialized handling for MiniMax M2.7 (and likely other M2 versions) without reverting the broader refactor.

Implementation

This implementation follows the Kimi K2 / DeepSeek V3.2 pattern for templates the autoparser cannot handle.

  • common_chat_params_init_minimax prepares PEG for wrapper/invoke/param grammar (parallel calls).
  • Reasoning is extracted (<think>…</think> blocks) ahead of tool calls.
  • String-typed parameters are captured verbatim (tool_arg_string_value) to preserve embedded XML-style content; non-strings are reconstructed through JSON.
  • Dispatch in common_chat_try_specialized_template requires three MiniMax-specific literals in template source (<minimax:tool_call>, <invoke name=, <parameter name=)

Testing

Extends the existing MiniMax block in tests/test-chat.cpp with five test cases.

Test Case Purpose Master (no fix)
Parallel <invoke> elements; two different tools Reproduces crash pattern Crashes (Invalid diff: now finding less tool calls!)
Parallel <invoke> elements; same tool twice Additional variant of crash Crashes (same)
String parameter with embedded <div><script>…</script></div> Verifies tool_arg_string_value is verbatim Passes
Multi-line string parameter (Python code with \n) Verifies until("</parameter>") boundary on multi-line content Passes
Two integer parameters in one <invoke> Verifies zero_or_more over parameter list + non-string JSON reconstruction Passes

The passing test cases are also focused on repetition (vs content shape) to provide additional regression coverage.

Additional information

Requirements

AI was used to identify the appropriate strategy, draft a harness, and draft initial code snippets. Every line was reviewed, edited as appropriate, and included manually in commits.

The autoparser (peg-native) infers a grammar from the MiniMax-M2
template that handles a single <invoke> element cleanly but
mis-specifies the repetition rule for multiple <invoke> elements
inside one <minimax:tool_call> wrapper. Parallel tool calls with
the generic path trip the streaming parser's self-consistency check
("Invalid diff: now finding less tool calls!"), which is the
test-harness analogue of the production GGML_ABORT at
llama-grammar.cpp:1435 on real MiniMax M2.7 output.

Add a specialized handler following the Kimi K2 pattern: XML
invoke/parameter parsing, lazy grammar gated by <minimax:tool_call>
trigger, reasoning extraction via <think>/</think>. Dispatch
requires three MiniMax-specific literals in the template source
(<minimax:tool_call>, <invoke name=, <parameter name=) so any
future variant that drops the XML idiom falls through to the
autoparser.

Include five test fixtures in tests/test-chat.cpp: parallel calls
with different tools, parallel calls with the same tool (both repro
the gap), string parameter with embedded XML-ish content, multi-line
string value, and two-integer-parameter invocation. The three
passing-on-master cases document that the autoparser's gap is
specifically repetition, not content shape.
@doctorjei doctorjei requested review from a team and pwilkin as code owners April 19, 2026 04:43
@github-actions github-actions Bot added the testing Everything test related label Apr 19, 2026
@ggml-gh-bot
Copy link
Copy Markdown

ggml-gh-bot Bot commented Apr 19, 2026

Hi @doctorjei, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 2 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant