Skip to content

Silent agent crash and subsequent 400 Bad Request with local vLLM (Qwen 3 XML Tools) #16488

@ashishkrishnan

Description

@ashishkrishnan

Environment / System Details:

  • Hardware: Local NVIDIA DGX Spark (GB10 / SM 12.1)
  • Backend: vLLM 0.16.0rc2
  • Models Tested: nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4 and Qwen3-Coder-Next-FP8
  • vLLM Flags: --enable-auto-tool-choice, --tool-call-parser qwen3_xml or qwen3_coder
  • OpenCode Config: @ai-sdk/openai provider (to bypass strict schema crashes on the vllm provider route).

The Core Issue:
When vLLM's XML parser encounters a <tool_call> tag, it streams an initial tool delta before the function name is fully parsed.

Observed Behavior:
When OpenCode triggers the tool call in Step 3, the model successfully generates valid XML on the server side:

<tool_call>
{"name": "glob", "arguments": {"pattern": "**"}}
</tool_call>

However, OpenCode fails in the following sequence:

  • Run 1 (Initial Prompt): The OpenCode agent silently halts/aborts the loop. Nothing happens on the client side, and the tool is never executed.
    (Note: Setting "stream": false in opencode.json does not prevent this, as the background agent loop seems to force a stream regardless).

  • Run 2 (Retrying Prompt): Instantly fails with a 400 Bad Request. The aborted tool state from Run 1 corrupts the context history, causing the vLLM server to reject the malformed conversation payload on the next request.

(Additional Context: If the OpenCode provider is explicitly named "vllm", it triggers a separate issue entirely. It attempts to hit the /v1/responses endpoint, resulting in an immediate 400 Bad Request because vLLM's strict Pydantic validation rejects OpenCode's custom LocalShellCall and ApplyPatchCall schemas).

Plugins

No response

OpenCode version

1.20.0

Steps to reproduce

Steps to Reproduce:

  1. Start a local vLLM server running a Qwen 3 model with XML parsing enabled:
    vllm serve Qwen/Qwen3-Coder-Next-FP8 --enable-auto-tool-choice --tool-call-parser qwen3_xml
  2. Configure opencode.json to point to the local server using the @ai-sdk/openai provider.
  3. In OpenCode, issue a prompt that triggers a tool call (e.g., "Read the codebase to understand its structure").
  4. Observe the silent failure, then issue the exact same prompt a second time.

Screenshot and/or share link

Image Image Image

Operating System

macos

Terminal

iterm

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcoreAnything pertaining to core functionality of the application (opencode server stuff)

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions