Environment / System Details:
- Hardware: Local NVIDIA DGX Spark (GB10 / SM 12.1)
- Backend: vLLM
0.16.0rc2
- Models Tested:
nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4 and Qwen3-Coder-Next-FP8
- vLLM Flags:
--enable-auto-tool-choice, --tool-call-parser qwen3_xml or qwen3_coder
- OpenCode Config:
@ai-sdk/openai provider (to bypass strict schema crashes on the vllm provider route).
The Core Issue:
When vLLM's XML parser encounters a <tool_call> tag, it streams an initial tool delta before the function name is fully parsed.
Observed Behavior:
When OpenCode triggers the tool call in Step 3, the model successfully generates valid XML on the server side:
<tool_call>
{"name": "glob", "arguments": {"pattern": "**"}}
</tool_call>
However, OpenCode fails in the following sequence:
-
Run 1 (Initial Prompt): The OpenCode agent silently halts/aborts the loop. Nothing happens on the client side, and the tool is never executed.
(Note: Setting "stream": false in opencode.json does not prevent this, as the background agent loop seems to force a stream regardless).
-
Run 2 (Retrying Prompt): Instantly fails with a 400 Bad Request. The aborted tool state from Run 1 corrupts the context history, causing the vLLM server to reject the malformed conversation payload on the next request.
(Additional Context: If the OpenCode provider is explicitly named "vllm", it triggers a separate issue entirely. It attempts to hit the /v1/responses endpoint, resulting in an immediate 400 Bad Request because vLLM's strict Pydantic validation rejects OpenCode's custom LocalShellCall and ApplyPatchCall schemas).
Plugins
No response
OpenCode version
1.20.0
Steps to reproduce
Steps to Reproduce:
- Start a local vLLM server running a Qwen 3 model with XML parsing enabled:
vllm serve Qwen/Qwen3-Coder-Next-FP8 --enable-auto-tool-choice --tool-call-parser qwen3_xml
- Configure
opencode.json to point to the local server using the @ai-sdk/openai provider.
- In OpenCode, issue a prompt that triggers a tool call (e.g., "Read the codebase to understand its structure").
- Observe the silent failure, then issue the exact same prompt a second time.
Screenshot and/or share link
Operating System
macos
Terminal
iterm
Environment / System Details:
0.16.0rc2nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4andQwen3-Coder-Next-FP8--enable-auto-tool-choice,--tool-call-parser qwen3_xmlorqwen3_coder@ai-sdk/openaiprovider (to bypass strict schema crashes on thevllmprovider route).The Core Issue:
When vLLM's XML parser encounters a
<tool_call>tag, it streams an initial tool delta before the function name is fully parsed.Observed Behavior:
When OpenCode triggers the tool call in Step 3, the model successfully generates valid XML on the server side:
However, OpenCode fails in the following sequence:
Run 1 (Initial Prompt): The OpenCode agent silently halts/aborts the loop. Nothing happens on the client side, and the tool is never executed.
(Note: Setting
"stream": falseinopencode.jsondoes not prevent this, as the background agent loop seems to force a stream regardless).Run 2 (Retrying Prompt): Instantly fails with a
400 Bad Request. The aborted tool state from Run 1 corrupts the context history, causing the vLLM server to reject the malformed conversation payload on the next request.(Additional Context: If the OpenCode provider is explicitly named
"vllm", it triggers a separate issue entirely. It attempts to hit the/v1/responsesendpoint, resulting in an immediate 400 Bad Request because vLLM's strict Pydantic validation rejects OpenCode's customLocalShellCallandApplyPatchCallschemas).Plugins
No response
OpenCode version
1.20.0
Steps to reproduce
Steps to Reproduce:
vllm serve Qwen/Qwen3-Coder-Next-FP8 --enable-auto-tool-choice --tool-call-parser qwen3_xmlopencode.jsonto point to the local server using the@ai-sdk/openaiprovider.Screenshot and/or share link
Operating System
macos
Terminal
iterm