chat : add MiniMax M2 specialized tool-call handler by doctorjei · Pull Request #22106 · ggml-org/llama.cpp

doctorjei · 2026-04-19T04:43:10Z

Fixes autoparser response (failure/abort) to Minimax-M2.7 system prompt.

Overview

Adds specialized tool-call handler for MiniMax M2.7 template (and probably later versions, I'm guessing). Right now, without it, M2.7 output with tools crashes llama-server (GGML_ABORT) at src/llama-grammar.cpp:1435 (EOG with non-empty stack) when <invoke> is emitted.

Why? (Reproducing the Issue)

Reproducible in-tree via tests/test-chat.cpp on current master (b8840). Parallel <invoke> elements inside a <minimax:tool_call> wrapper confuse the peg_tester.

The autoparser (peg-native) infers grammar structure from the template via differential rendering. MiniMax's template uses XML with repeatable invoke elements for parallel calls. The parser correctly infers per-invoke structure but mis-specifies the repetition rule, so any second invoke is lost.

This is a regression; an earlier working version was in the mainline (#16932, 1920345) (via a generalized XML tool-call parser), but the autoparser refactoring (#18675) replaced it. This PR restores specialized handling for MiniMax M2.7 (and likely other M2 versions) without reverting the broader refactor.

Implementation

This implementation follows the Kimi K2 / DeepSeek V3.2 pattern for templates the autoparser cannot handle.

common_chat_params_init_minimax prepares PEG for wrapper/invoke/param grammar (parallel calls).
Reasoning is extracted (<think>…</think> blocks) ahead of tool calls.
String-typed parameters are captured verbatim (tool_arg_string_value) to preserve embedded XML-style content; non-strings are reconstructed through JSON.
Dispatch in common_chat_try_specialized_template requires three MiniMax-specific literals in template source (<minimax:tool_call>, <invoke name=, <parameter name=)

Testing

Extends the existing MiniMax block in tests/test-chat.cpp with five test cases.

Test Case	Purpose	Master (no fix)
Parallel `<invoke>` elements; two different tools	Reproduces crash pattern	Crashes (`Invalid diff: now finding less tool calls!`)
Parallel `<invoke>` elements; same tool twice	Additional variant of crash	Crashes (same)
String parameter with embedded `<div><script>…</script></div>`	Verifies `tool_arg_string_value` is verbatim	Passes
Multi-line string parameter (Python code with `\n`)	Verifies `until("</parameter>")` boundary on multi-line content	Passes
Two integer parameters in one `<invoke>`	Verifies `zero_or_more` over parameter list + non-string JSON reconstruction	Passes

The passing test cases are also focused on repetition (vs content shape) to provide additional regression coverage.

Additional information

Precedent: chat: dedicated DeepSeek v3.2 parser + "official" template #21785 (DeepSeek V3.2 dedicated parser)
Earlier working implementation: common: Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) #16932 (superseded by Autoparser - complete refactoring of parser architecture #18675)
Crash assertion: src/llama-grammar.cpp:1435 (GGML_ABORT("fatal error") when EOG token is accepted with non-empty grammar stacks).

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES

AI was used to identify the appropriate strategy, draft a harness, and draft initial code snippets. Every line was reviewed, edited as appropriate, and included manually in commits.

The autoparser (peg-native) infers a grammar from the MiniMax-M2 template that handles a single <invoke> element cleanly but mis-specifies the repetition rule for multiple <invoke> elements inside one <minimax:tool_call> wrapper. Parallel tool calls with the generic path trip the streaming parser's self-consistency check ("Invalid diff: now finding less tool calls!"), which is the test-harness analogue of the production GGML_ABORT at llama-grammar.cpp:1435 on real MiniMax M2.7 output. Add a specialized handler following the Kimi K2 pattern: XML invoke/parameter parsing, lazy grammar gated by <minimax:tool_call> trigger, reasoning extraction via <think>/</think>. Dispatch requires three MiniMax-specific literals in the template source (<minimax:tool_call>, <invoke name=, <parameter name=) so any future variant that drops the XML idiom falls through to the autoparser. Include five test fixtures in tests/test-chat.cpp: parallel calls with different tools, parallel calls with the same tool (both repro the gap), string parameter with embedded XML-ish content, multi-line string value, and two-integer-parameter invocation. The three passing-on-master cases document that the autoparser's gap is specifically repetition, not content shape.

ggml-gh-bot · 2026-04-19T04:46:51Z

Hi @doctorjei, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 2 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

doctorjei requested review from a team and pwilkin as code owners April 19, 2026 04:43

github-actions Bot added the testing Everything test related label Apr 19, 2026

doctorjei mentioned this pull request Apr 26, 2026

chat : add MiniMax M2 specialized tool-call handler domvox/llama.cpp-turboquant-hip#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat : add MiniMax M2 specialized tool-call handler#22106

chat : add MiniMax M2 specialized tool-call handler#22106
doctorjei wants to merge 1 commit intoggml-org:masterfrom
doctorjei:minimax-pr

doctorjei commented Apr 19, 2026

Uh oh!

ggml-gh-bot Bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doctorjei commented Apr 19, 2026

Overview

Why? (Reproducing the Issue)

Implementation

Testing

Additional information

Requirements

Uh oh!

ggml-gh-bot Bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant