Silent agent crash and subsequent `400 Bad Request` with local vLLM (Qwen 3 XML Tools)

**Environment / System Details:**
* **Hardware:** Local NVIDIA DGX Spark (GB10 / SM 12.1)
* **Backend:** vLLM `0.16.0rc2`
* **Models Tested:** `nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4` and `Qwen3-Coder-Next-FP8`
* **vLLM Flags:** `--enable-auto-tool-choice`, `--tool-call-parser qwen3_xml` or `qwen3_coder`
* **OpenCode Config:** `@ai-sdk/openai` provider (to bypass strict schema crashes on the `vllm` provider route).

**The Core Issue:**
When vLLM's XML parser encounters a `<tool_call>` tag, it streams an initial tool delta *before* the function name is fully parsed. 

**Observed Behavior:**
When OpenCode triggers the tool call in Step 3, the model successfully generates valid XML on the server side:

```xml
<tool_call>
{"name": "glob", "arguments": {"pattern": "**"}}
</tool_call>
```

However, OpenCode fails in the following sequence:

* **Run 1 (Initial Prompt):** The OpenCode agent silently halts/aborts the loop. Nothing happens on the client side, and the tool is never executed.
*(Note: Setting `"stream": false` in `opencode.json` does not prevent this, as the background agent loop seems to force a stream regardless).*

* **Run 2 (Retrying Prompt):** Instantly fails with a `400 Bad Request`. The aborted tool state from Run 1 corrupts the context history, causing the vLLM server to reject the malformed conversation payload on the next request.

*(Additional Context: If the OpenCode provider is explicitly named `"vllm"`, it triggers a separate issue entirely. It attempts to hit the `/v1/responses` endpoint, resulting in an immediate 400 Bad Request because vLLM's strict Pydantic validation rejects OpenCode's custom `LocalShellCall` and `ApplyPatchCall` schemas).*

### Plugins

_No response_

### OpenCode version

1.20.0

### Steps to reproduce

**Steps to Reproduce:**
1. Start a local vLLM server running a Qwen 3 model with XML parsing enabled:
   `vllm serve Qwen/Qwen3-Coder-Next-FP8 --enable-auto-tool-choice --tool-call-parser qwen3_xml`
2. Configure `opencode.json` to point to the local server using the `@ai-sdk/openai` provider.
3. In OpenCode, issue a prompt that triggers a tool call (e.g., "Read the codebase to understand its structure").
4. Observe the silent failure, then issue the exact same prompt a second time.

### Screenshot and/or share link

<img width="1504" height="918" alt="Image" src="https://github.com/user-attachments/assets/7bb91141-7281-4d07-87b2-9ef6c813f1c4" />
<img width="1200" height="933" alt="Image" src="https://github.com/user-attachments/assets/1fa95bfb-63ba-409b-95a9-3c5fdeb007cd" />
<img width="1408" height="982" alt="Image" src="https://github.com/user-attachments/assets/655a468c-70ca-4efc-bbe1-d14ccacf9083" />

### Operating System

macos

### Terminal

iterm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silent agent crash and subsequent `400 Bad Request` with local vLLM (Qwen 3 XML Tools) #16488

Plugins

OpenCode version

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Silent agent crash and subsequent 400 Bad Request with local vLLM (Qwen 3 XML Tools) #16488

Description

Plugins

OpenCode version

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Silent agent crash and subsequent `400 Bad Request` with local vLLM (Qwen 3 XML Tools) #16488