Skip to content

Tool calls broken with local vLLM provider — openai-responses bypasses tool-call-parser #976

@thadreber-web

Description

@thadreber-web

Summary

When using a local vLLM instance as the inference provider, tool calls are never executed. The model generates tool call XML in the response content as raw text instead of populating the structured tool_calls array.

Root Cause

Two issues combine to break tool calling:

1. vLLM's --tool-call-parser only applies to /v1/chat/completions

vLLM v0.17.1 implements both /v1/responses and /v1/chat/completions, but the --tool-call-parser flag only processes responses on the chat completions path. Requests hitting /v1/responses get internally translated, but tool call parsing is skipped during that translation.

2. Onboarding probe selects openai-responses for vLLM

probeOpenAiLikeEndpoint() in bin/lib/onboard.js tries /v1/responses first. Since vLLM accepts it, preferredInferenceApi is set to "openai-responses" and baked into the sandbox's openclaw.json:

{
  "models": {
    "providers": {
      "inference": {
        "api": "openai-responses"
      }
    }
  }
}

OpenClaw then sends all inference requests via /v1/responses, where tool calls pass through unparsed.

Reproduction

  1. Start vLLM with --enable-auto-tool-choice --tool-call-parser qwen3_coder
  2. Run nemoclaw onboard, select "Local vLLM"
  3. Connect to the sandbox and ask the agent to do anything requiring a tool call
  4. Tool calls appear as raw XML text in the chat instead of being executed

Proposed Fix

For the vllm and nim-local provider paths in bin/lib/onboard.js, override the probe result to force openai-completions:

// After validateOpenAiLikeSelection() for vLLM (~line 1899):
preferredInferenceApi = "openai-completions";

Same for the NIM-local path (~line 1783), since NIM uses vLLM internally.

This ensures requests go through /v1/chat/completions where the tool-call-parser works.

Additional Note — vLLM flags for Nemotron-3-Super

The correct vLLM flags for tool calling with Nemotron-3-Super (per NVIDIA's cookbook):

--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--reasoning-parser-plugin ./super_v3_reasoning_parser.py
--reasoning-parser super_v3

NemoClaw's docs/onboarding don't mention these flags are required. Surfacing this during the vLLM onboard flow would prevent confusion.

Environment

  • NemoClaw: current main
  • vLLM: v0.17.1
  • OpenShell: 0.0.16
  • OpenClaw: 2026.3.11
  • Platform: DGX Spark (GB10)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions