Summary
When using a local vLLM instance as the inference provider, tool calls are never executed. The model generates tool call XML in the response content as raw text instead of populating the structured tool_calls array.
Root Cause
Two issues combine to break tool calling:
1. vLLM's --tool-call-parser only applies to /v1/chat/completions
vLLM v0.17.1 implements both /v1/responses and /v1/chat/completions, but the --tool-call-parser flag only processes responses on the chat completions path. Requests hitting /v1/responses get internally translated, but tool call parsing is skipped during that translation.
2. Onboarding probe selects openai-responses for vLLM
probeOpenAiLikeEndpoint() in bin/lib/onboard.js tries /v1/responses first. Since vLLM accepts it, preferredInferenceApi is set to "openai-responses" and baked into the sandbox's openclaw.json:
{
"models": {
"providers": {
"inference": {
"api": "openai-responses"
}
}
}
}
OpenClaw then sends all inference requests via /v1/responses, where tool calls pass through unparsed.
Reproduction
- Start vLLM with
--enable-auto-tool-choice --tool-call-parser qwen3_coder
- Run
nemoclaw onboard, select "Local vLLM"
- Connect to the sandbox and ask the agent to do anything requiring a tool call
- Tool calls appear as raw XML text in the chat instead of being executed
Proposed Fix
For the vllm and nim-local provider paths in bin/lib/onboard.js, override the probe result to force openai-completions:
// After validateOpenAiLikeSelection() for vLLM (~line 1899):
preferredInferenceApi = "openai-completions";
Same for the NIM-local path (~line 1783), since NIM uses vLLM internally.
This ensures requests go through /v1/chat/completions where the tool-call-parser works.
Additional Note — vLLM flags for Nemotron-3-Super
The correct vLLM flags for tool calling with Nemotron-3-Super (per NVIDIA's cookbook):
--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--reasoning-parser-plugin ./super_v3_reasoning_parser.py
--reasoning-parser super_v3
NemoClaw's docs/onboarding don't mention these flags are required. Surfacing this during the vLLM onboard flow would prevent confusion.
Environment
- NemoClaw: current main
- vLLM: v0.17.1
- OpenShell: 0.0.16
- OpenClaw: 2026.3.11
- Platform: DGX Spark (GB10)
Summary
When using a local vLLM instance as the inference provider, tool calls are never executed. The model generates tool call XML in the response content as raw text instead of populating the structured
tool_callsarray.Root Cause
Two issues combine to break tool calling:
1. vLLM's
--tool-call-parseronly applies to/v1/chat/completionsvLLM v0.17.1 implements both
/v1/responsesand/v1/chat/completions, but the--tool-call-parserflag only processes responses on the chat completions path. Requests hitting/v1/responsesget internally translated, but tool call parsing is skipped during that translation.2. Onboarding probe selects
openai-responsesfor vLLMprobeOpenAiLikeEndpoint()inbin/lib/onboard.jstries/v1/responsesfirst. Since vLLM accepts it,preferredInferenceApiis set to"openai-responses"and baked into the sandbox'sopenclaw.json:{ "models": { "providers": { "inference": { "api": "openai-responses" } } } }OpenClaw then sends all inference requests via
/v1/responses, where tool calls pass through unparsed.Reproduction
--enable-auto-tool-choice --tool-call-parser qwen3_codernemoclaw onboard, select "Local vLLM"Proposed Fix
For the
vllmandnim-localprovider paths inbin/lib/onboard.js, override the probe result to forceopenai-completions:Same for the NIM-local path (~line 1783), since NIM uses vLLM internally.
This ensures requests go through
/v1/chat/completionswhere the tool-call-parser works.Additional Note — vLLM flags for Nemotron-3-Super
The correct vLLM flags for tool calling with Nemotron-3-Super (per NVIDIA's cookbook):
NemoClaw's docs/onboarding don't mention these flags are required. Surfacing this during the vLLM onboard flow would prevent confusion.
Environment