feat(provider): add openai-compatible custom tool compat by Delqhi · Pull Request #16531 · anomalyco/opencode

Delqhi · 2026-03-07T23:15:34Z

Issue for this PR

Closes #234
Closes #15756

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

This adds an opt-in compatibility layer for custom @ai-sdk/openai-compatible providers that do not emit modern tool_calls reliably.

The changes do three things:

rewrite outgoing tools / tool_choice payloads into legacy functions / function_call when the raw-function-call parser is enabled
repair incoming legacy function_call responses into structured tool calls before OpenCode processes them
optionally recover structured tool intent from JSON or single-tool text replies for providers that return text instead of native tool calls

The goal is to keep OpenCode sessions moving for custom providers that can express tool intent, but do not fully match the modern OpenAI tool-calling shape.

How did you verify your code works?

bun test test/provider/openai-compatible-compat.test.ts
bun test test/provider/copilot/copilot-chat-model.test.ts
bun test test/session/llm.test.ts
bun test test/provider/provider.test.ts
bun run typecheck

Screenshots / recordings

Not a UI change.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

Copilot

Pull request overview

This PR adds an opt-in compatibility layer for custom @ai-sdk/openai-compatible providers that don't support modern tool_calls reliably. It introduces a toolParser option that can be configured per provider to rewrite tool-related request/response payloads between legacy (function_call/functions) and modern (tool_calls/tools) formats, and to recover tool call intent from structured text or JSON content.

Changes:

New openai-compatible-compat.ts module implementing request body rewriting (modern→legacy), JSON response rewriting (legacy→modern), and SSE stream rewriting with three parser types: raw-function-call, json, and single-tool-text
Integration into the existing fetch wrapper in provider.ts to intercept and transform requests/responses when toolParser is configured
New test file with unit tests covering the four main rewrite paths

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`packages/opencode/src/provider/openai-compatible-compat.ts`	New compat layer with functions to parse tool parsers config, rewrite request bodies to legacy format, and rewrite JSON/streaming responses to modern tool_calls format
`packages/opencode/src/provider/provider.ts`	Imports compat functions and integrates them into the custom fetch wrapper: rewrites outgoing POST bodies and intercepts responses for transformation
`packages/opencode/test/provider/openai-compatible-compat.test.ts`	Unit tests for request body rewriting, JSON response rewriting, SSE stream single-tool-text rewriting, and SSE stream JSON tag rewriting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-07T23:22:44Z

packages/opencode/src/provider/openai-compatible-compat.ts

+  }
+}
+
+const SYNTHETIC_TOOL_CALL_ID = "call_opencode_compat_0"


The synthetic tool call ID is a static constant "call_opencode_compat_0", meaning every synthetic tool call across all response cycles will have the same ID. In multi-turn conversations, the conversation history will contain multiple assistant messages with identical tool_calls[0].id and matching tool result tool_call_id values. This can confuse the model when it sees the conversation history, as it cannot distinguish which tool result corresponds to which tool call.

Consider generating a unique ID per call, e.g., using crypto.randomUUID() or a counter-based approach like call_opencode_compat_${Date.now()}_${Math.random().toString(36).slice(2, 8)}.

Suggested change

const SYNTHETIC_TOOL_CALL_ID = "call_opencode_compat_0"

const SYNTHETIC_TOOL_CALL_ID = `call_opencode_compat_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`

Copilot · 2026-03-07T23:22:44Z

packages/opencode/src/provider/provider.ts

+        if (contentType.includes("text/event-stream")) {
+          const text = await response.text()
+          return new Response(rewriteOpenAICompatibleStreamResponse(text, toolParsers), {
+            status: response.status,
+            statusText: response.statusText,
+            headers,
+          })


When toolParsers.length > 0, every streaming response (text/event-stream) is fully buffered via response.text() before being re-serialized and returned. This effectively disables incremental streaming for all requests when any tool parser is configured, meaning users won't see any intermediate output (like reasoning text) until the entire response completes. For long completions, this could appear as a significant delay/hang.

For the raw-function-call parser alone, incremental transformation would be possible since transformChunkEvent already operates per-event. The full buffering is only strictly necessary when json or single-tool-text parsers are active (since they need the complete content to decide). Consider short-circuiting to an incremental transform when only raw-function-call is active.

Copilot · 2026-03-07T23:22:44Z

packages/opencode/src/provider/provider.ts

+        if (toolParsers.length === 0) {
+          return response
+        }
+
+        const headers = new Headers(response.headers)
+        headers.delete("content-length")
+        const contentType = headers.get("content-type") ?? ""
+        if (contentType.includes("text/event-stream")) {
+          const text = await response.text()
+          return new Response(rewriteOpenAICompatibleStreamResponse(text, toolParsers), {
+            status: response.status,
+            statusText: response.statusText,
+            headers,
+          })
+        }
+
+        if (contentType.includes("application/json")) {
+          const text = await response.text()
+          try {
+            return new Response(JSON.stringify(rewriteOpenAICompatibleJsonResponse(JSON.parse(text), toolParsers)), {
+              status: response.status,
+              statusText: response.statusText,
+              headers,
+            })
+          } catch {
+            return new Response(text, {
+              status: response.status,
+              statusText: response.statusText,
+              headers,
+            })
+          }
+        }
+
+        return response


The response rewriting logic runs for every response when toolParsers.length > 0, including non-chat-completions endpoints (e.g., model listing). While rewriteOpenAICompatibleJsonResponse is a no-op when choices[0] doesn't exist, the response body is still fully consumed (via response.text()) and re-wrapped in a new Response object unnecessarily. Consider guarding the response rewriting with a check on the request URL or method to only intercept chat completions responses, similar to how the request body rewriting checks opts.method === "POST".

Delqhi · 2026-03-07T23:44:00Z

Repro / before-after summary:

Before

custom @ai-sdk/openai-compatible provider sends plain text or legacy function_call instead of modern tool_calls
OpenCode sees a normal stop and the session ends after the first assistant reply

After with opt-in config

{
  "provider": {
    "custom-provider": {
      "npm": "@ai-sdk/openai-compatible",
      "api": "https://api.example.com/v1",
      "options": {
        "toolParser": [
          { "type": "raw-function-call" },
          { "type": "json" },
          { "type": "single-tool-text", "tool": "bash", "argument": "command" }
        ]
      }
    }
  }
}

outgoing tools/tool_choice can be rewritten into legacy functions/function_call
incoming legacy function_call gets normalized into tool_calls
text-only or tagged JSON tool intents can be recovered into a structured tool call before OpenCode consumes the response

That means broken custom providers can keep multi-turn sessions alive without changing behavior for normal providers, because the whole layer is opt-in through provider.options.toolParser.

Verified locally with:

bun test test/provider/openai-compatible-compat.test.ts
bun test test/provider/copilot/copilot-chat-model.test.ts
bun test test/session/llm.test.ts
bun test test/provider/provider.test.ts
bun run typecheck

github-actions · 2026-03-08T00:54:08Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

daniel-farina · 2026-04-03T13:27:16Z

I tested this and it helped me get gemma working with open code, documented here:

https://gist.github.com/daniel-farina/87dc1c394b94e45bb700d27e9ea03193

brunostc · 2026-04-08T10:21:38Z

Can we get this prioritized? -.-

fenneclabs · 2026-04-08T15:53:11Z

interested by that fix as well

BrutchsamaJeanLouis · 2026-04-08T21:53:44Z

looking forward to this merge

deadbaed · 2026-04-09T06:53:19Z

We could get traction to this PR to get it merged. Would it be possible to rebase from latest dev branch @Delqhi?

AceCodePt · 2026-04-09T07:17:51Z

Same. I would reeeally like to try Gemma working with opencode.

BrutchsamaJeanLouis · 2026-04-09T14:54:43Z

Findings from testing Gemma 4 (31B) + llama.cpp + thinking enabled

Spent some time debugging Gemma 4 tool calling with latest llama.cpp (b8736, PRs #21326 + #21418 merged) and wanted to share what I found — some of these may be relevant to this PR's scope.

1. `reasoning_content` in streaming deltas breaks tool_call detection

This is the biggest issue. When Gemma 4 has thinking enabled, llama.cpp sends streaming deltas with a reasoning_content field:

{"delta": {"reasoning_content": "The user wants to..."}}

The AI SDK sees ~30 chunks with an unknown field and empty content, then misses the tool_calls that arrive in the final chunks. By the time the toolParser layer runs, the SDK has already dropped them.

This probably needs to be handled in the fetch wrapper (where the response is intercepted), not in the parser layer — strip or drop reasoning_content from deltas before the SDK processes them.

2. `<tool_use>` tag + array format

Some plugins (e.g. oh-my-openagent) instruct models to emit tool calls as:

<tool_use>
[
  {"tool": "read", "parameters": {"filePath": "README.md"}},
  {"tool": "bash", "parameters": {"command": "ls -R"}}
]
</tool_use>

Two things the current json parser wouldn't catch:

<tool_use> tag (PR only extracts <tool_call>)
Array of tool calls in a single block (PR parses single objects)
parameters key (toToolCall checks arguments/input/args but not parameters)

3. KV cache invalidation from response rewriting

If the compat layer converts reasoning_content → content in responses, the conversation history OpenCode sends back won't match what llama.cpp originally generated. The Jinja template formats these differently (<think> tags vs plain text), so the token prefix diverges → KV cache miss every turn → progressive slowdown on longer conversations.

Stripping reasoning_content entirely from responses (rather than converting) avoids this — the reasoning still happens server-side and improves output quality, it just isn't displayed.

Test setup

llama.cpp b8736 with --jinja, router mode, Gemma 4 31B Q4_K_M
chat-template-kwargs = {"enable_thinking": true}, reasoning-budget = 140
OpenCode v1.4.0 on Windows, @ai-sdk/openai-compatible provider
Verified llama.cpp returns correct tool_calls + finish_reason: "tool_calls" at the API level (both streaming and non-streaming via curl)

feat(provider): add openai-compatible custom tool compat

6b8c312

Copilot AI review requested due to automatic review settings March 7, 2026 23:15

github-actions bot added the needs:compliance This means the issue will auto-close after 2 hours. label Mar 7, 2026

Copilot started reviewing on behalf of Delqhi March 7, 2026 23:16 View session

This was referenced Mar 7, 2026

SDK session.prompt() returns empty responses with custom OpenAI-compatible provider #15756

Open

Tool Calling Issues with Open Source Models in OpenCode #234

Open

Copilot AI reviewed Mar 7, 2026

View reviewed changes

github-actions bot mentioned this pull request Mar 8, 2026

📊 AI CLI 工具社区动态日报 2026-03-08 duanyytop/agents-radar#95

Open

github-actions bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Mar 8, 2026

github-actions bot mentioned this pull request Mar 8, 2026

📊 AI CLI 工具社区动态日报 2026-03-08 rollysys/agents-radar#52

Open

noxgle mentioned this pull request Apr 4, 2026

Gemma 4 (e4b) tool calling fails via Ollama OpenAI-compatible API — streaming tool_calls not recognized #20995

Open

	const SYNTHETIC_TOOL_CALL_ID = "call_opencode_compat_0"
	const SYNTHETIC_TOOL_CALL_ID = `call_opencode_compat_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`

Conversation

Delqhi commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Delqhi commented Mar 7, 2026

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

daniel-farina commented Apr 3, 2026

Uh oh!

brunostc commented Apr 8, 2026

Uh oh!

fenneclabs commented Apr 8, 2026

Uh oh!

BrutchsamaJeanLouis commented Apr 8, 2026

Uh oh!

deadbaed commented Apr 9, 2026

Uh oh!

AceCodePt commented Apr 9, 2026

Uh oh!

BrutchsamaJeanLouis commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Findings from testing Gemma 4 (31B) + llama.cpp + thinking enabled

1. reasoning_content in streaming deltas breaks tool_call detection

2. <tool_use> tag + array format

3. KV cache invalidation from response rewriting

Test setup

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Delqhi commented Mar 7, 2026 •

edited

Loading

BrutchsamaJeanLouis commented Apr 9, 2026 •

edited

Loading

1. `reasoning_content` in streaming deltas breaks tool_call detection

2. `<tool_use>` tag + array format