Skip to content

feat: add OpenAI-compatible LLM provider#240

Closed
fatinghenji wants to merge 1 commit into
rohitg00:mainfrom
fatinghenji:feat/openai-llm-provider
Closed

feat: add OpenAI-compatible LLM provider#240
fatinghenji wants to merge 1 commit into
rohitg00:mainfrom
fatinghenji:feat/openai-llm-provider

Conversation

@fatinghenji
Copy link
Copy Markdown
Contributor

@fatinghenji fatinghenji commented May 8, 2026

Summary

Adds a new openai LLM provider that uses raw fetch to call any OpenAI-compatible /v1/chat/completions endpoint.

Motivation

Currently, AgentMemory only supports Anthropic, Gemini, OpenRouter, MiniMax, and the agent-sdk fallback for LLM-backed compression and summarization. Users with OpenAI API keys (or keys from OpenAI-compatible services like DeepSeek, SiliconFlow, Azure OpenAI, vLLM, LM Studio) cannot use their existing credentials for the LLM layer, even though the embedding layer already supports OPENAI_API_KEY via OpenAIEmbeddingProvider.

Changes

  • src/types.ts: Add openai to ProviderType union
  • src/providers/openai.ts: New OpenAIProvider class using raw fetch (no SDK dependency)
    • Supports any /v1/chat/completions endpoint
    • Respects OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_MODEL env vars
  • src/providers/index.ts: Wire openai case into createBaseProvider()
  • src/config.ts:
    • Add OPENAI_API_KEY detection to detectProvider() (with OPENAI_API_KEY_FOR_LLM opt-out)
    • Add OPENAI_API_KEY to detectLlmProviderKind()
    • Add openai to VALID_PROVIDERS set
    • Update no-key warning message

Supported Endpoints

Service OPENAI_BASE_URL Notes
OpenAI official https://api.openai.com (default)
DeepSeek https://api.deepseek.com/v1
SiliconFlow https://api.siliconflow.cn/v1
Azure OpenAI https://{resource}.openai.azure.com/openai/deployments/{deployment}
vLLM / LM Studio http://localhost:8000/v1 or http://localhost:1234/v1
Ollama http://localhost:11434/v1 With --enable-openai

Configuration Example

# Embedding + LLM both via SiliconFlow
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.siliconflow.cn/v1
OPENAI_MODEL=deepseek-chat
OPENAI_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B
OPENAI_EMBEDDING_DIMENSIONS=4096
EMBEDDING_PROVIDER=openai

Backwards Compatibility

  • Default behavior unchanged: OPENAI_API_KEY is now checked first in detectProvider(), but only activates when the key is present
  • Users who only use OPENAI_API_KEY for embedding and prefer another LLM provider can set OPENAI_API_KEY_FOR_LLM=false to skip auto-detection
  • No breaking changes to existing provider configurations

Testing

  • npm run build passes
  • npm test passes: 838 tests, 75 test files

Checklist

  • Build passes
  • Tests pass
  • No new dependencies
  • Backwards compatible
  • Follows existing provider patterns (raw fetch, env var config)

Summary by CodeRabbit

  • New Features

    • Added OpenAI LLM provider support with compatibility for OpenAI-compatible APIs, including configurable base URL and model parameters.
  • Bug Fixes

    • Increased network request timeouts for session management, memory consolidation, and summarization operations to improve reliability.

Adds a new 'openai' LLM provider that uses raw fetch to call any
OpenAI-compatible /v1/chat/completions endpoint.

Supported endpoints:
- OpenAI official
- DeepSeek
- SiliconFlow (硅基流动)
- Azure OpenAI
- vLLM / LM Studio / Ollama

Shares OPENAI_API_KEY with the existing OpenAI embedding provider.
Respects OPENAI_BASE_URL and OPENAI_MODEL env vars.

Includes OPENAI_API_KEY in detectProvider() and VALID_PROVIDERS.

Closes: LLM provider gap for OpenAI-compatible APIs
@vercel
Copy link
Copy Markdown

vercel Bot commented May 8, 2026

@fatinghenji is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds OpenAI-compatible LLM provider support to the memory system. It introduces a new OpenAIProvider class, updates provider detection and configuration logic to recognize OpenAI credentials, extends the provider factory to instantiate OpenAI providers, and increases network request timeouts for session lifecycle operations.

Changes

OpenAI Provider Integration

Layer / File(s) Summary
Type Definitions
src/types.ts
ProviderType union is expanded to include "openai" as a supported provider identifier.
OpenAI Provider Class
src/providers/openai.ts
New OpenAIProvider class implements MemoryProvider with compress and summarize methods that POST to /v1/chat/completions, with error handling and response validation.
Provider Detection
src/config.ts
detectProvider() recognizes OpenAI config when OPENAI_API_KEY is set; detectLlmProviderKind() treats OPENAI_API_KEY as an llm signal; loadFallbackConfig() adds "openai" to VALID_PROVIDERS; warning text includes OPENAI_API_KEY.
Provider Factory
src/providers/index.ts
createBaseProvider() imports OpenAIProvider and adds an "openai" switch case that constructs OpenAIProvider with model, maxTokens, and baseURL from config.
Network Timeouts
plugin/scripts/session-end.mjs, plugin/scripts/stop.mjs
Session lifecycle operations increase AbortSignal.timeout values for /agentmemory/session/end, /crystals/auto, /consolidate-pipeline, /claude-bridge/sync, and /summarize endpoints.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • rohitg00/agentmemory#213: Both PRs increase abort/timeout durations for the same memory-related endpoints (session end, consolidation/crystallization, and summarize) used during session shutdown.
  • rohitg00/agentmemory#188: Both PRs modify detectProvider in src/config.ts to change provider-detection and fallback behavior.
  • rohitg00/agentmemory#186: Both PRs add OpenAI-compatible endpoint configuration for OpenAI providers including OPENAI_BASE_URL and related OpenAI env handling.

Poem

🐰 A new provider hops into the fold,
OpenAI's messages bright and bold,
With timeouts extended, no more rushing,
The sessions complete, no more hushing.
Type-safe and wired, the config knows,
Where the autumn rabbit's wisdom flows. 🍂

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main change: adding OpenAI-compatible LLM provider support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/scripts/session-end.mjs (1)

34-61: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sequential timeout stacking can stall session-end for minutes.

These fetches run one after another, so under failures/timeouts this path can block for the sum of all timeouts. With the increased values, teardown latency can become very high.

Suggested change (parallelize optional calls, keep per-call timeout)
- if (process.env["CONSOLIDATION_ENABLED"] === "true") {
-   try {
-     await fetch(`${REST_URL}/agentmemory/crystals/auto`, {
-       method: "POST",
-       headers: authHeaders(),
-       body: JSON.stringify({ olderThanDays: 0 }),
-       signal: AbortSignal.timeout(6e4)
-     });
-   } catch {}
-   try {
-     await fetch(`${REST_URL}/agentmemory/consolidate-pipeline`, {
-       method: "POST",
-       headers: authHeaders(),
-       body: JSON.stringify({
-         tier: "all",
-         force: true
-       }),
-       signal: AbortSignal.timeout(12e4)
-     });
-   } catch {}
- }
- if (process.env["CLAUDE_MEMORY_BRIDGE"] === "true") try {
-   await fetch(`${REST_URL}/agentmemory/claude-bridge/sync`, {
-     method: "POST",
-     headers: authHeaders(),
-     signal: AbortSignal.timeout(3e4)
-   });
- } catch {}
+ const backgroundTasks = [];
+ if (process.env["CONSOLIDATION_ENABLED"] === "true") {
+   backgroundTasks.push(
+     fetch(`${REST_URL}/agentmemory/crystals/auto`, {
+       method: "POST",
+       headers: authHeaders(),
+       body: JSON.stringify({ olderThanDays: 0 }),
+       signal: AbortSignal.timeout(6e4)
+     }).catch(() => {})
+   );
+   backgroundTasks.push(
+     fetch(`${REST_URL}/agentmemory/consolidate-pipeline`, {
+       method: "POST",
+       headers: authHeaders(),
+       body: JSON.stringify({ tier: "all", force: true }),
+       signal: AbortSignal.timeout(12e4)
+     }).catch(() => {})
+   );
+ }
+ if (process.env["CLAUDE_MEMORY_BRIDGE"] === "true") {
+   backgroundTasks.push(
+     fetch(`${REST_URL}/agentmemory/claude-bridge/sync`, {
+       method: "POST",
+       headers: authHeaders(),
+       signal: AbortSignal.timeout(3e4)
+     }).catch(() => {})
+   );
+ }
+ await Promise.allSettled(backgroundTasks);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/scripts/session-end.mjs` around lines 34 - 61, The sequential fetch
calls in session-end.mjs to `${REST_URL}/agentmemory/crystals/auto`,
`${REST_URL}/agentmemory/consolidate-pipeline`, and
`${REST_URL}/agentmemory/claude-bridge/sync` can add up their AbortSignal
timeouts and stall teardown; change to fire these optional calls in parallel
(e.g., collect the individual fetch promises and use Promise.allSettled) while
preserving each call’s AbortSignal.timeout and authHeaders(), and handle/log
failures per-request rather than awaiting them serially so the total delay is
bounded by the longest single timeout instead of the sum.
🧹 Nitpick comments (2)
src/config.ts (1)

53-53: ⚡ Quick win

Prefer removing this WHAT-style provider list comment.

This comment describes behavior directly visible in code; keep only rationale comments when needed.

As per coding guidelines, "Avoid code comments explaining WHAT — use clear naming instead".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/config.ts` at line 53, Remove the WHAT-style comment containing
"OpenAI-compatible: supports OpenAI, DeepSeek, SiliconFlow, Azure, vLLM, LM
Studio" from src/config.ts; this line duplicates information already expressed
in code and violates the guideline to avoid WHAT comments—delete that comment
and, if necessary, replace it with a brief rationale comment only (e.g., why
compatibility matters) near the relevant config symbol or constant to preserve
intent.
src/providers/openai.ts (1)

7-25: ⚡ Quick win

Trim WHAT-style comments and keep rationale-only docs.

The large provider-description block is mostly implementation/feature listing. Prefer concise naming + minimal rationale comments.

As per coding guidelines, "Avoid code comments explaining WHAT — use clear naming instead".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/openai.ts` around lines 7 - 25, Replace the large
"OpenAI-compatible LLM provider" header comment with a concise rationale-only
doc: keep a short one-line description ("OpenAI-compatible LLM provider") and a
minimal required/optional env var list (retain OPENAI_API_KEY and optionally
OPENAI_BASE_URL, OPENAI_MODEL, MAX_TOKENS) but remove the WHAT-style
feature/implementation bullets and example providers; update the top-of-file
comment in src/providers/openai.ts (the block currently starting with
"OpenAI-compatible LLM provider.") so it only explains purpose and essential
configuration variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/config.ts`:
- Around line 169-170: detectLlmProviderKind currently treats any present
OPENAI_API_KEY as selecting "llm" even when users set
OPENAI_API_KEY_FOR_LLM=false; update detectLlmProviderKind to honor the opt-out
by checking that OPENAI_API_KEY exists AND OPENAI_API_KEY_FOR_LLM is not the
string "false" (or equivalent falsy flag) before returning "llm" (e.g., replace
the hasRealValue(env["OPENAI_API_KEY"]) check with
hasRealValue(env["OPENAI_API_KEY"]) && env["OPENAI_API_KEY_FOR_LLM"] !==
"false"); keep the existing MINIMAX_API_KEY logic and ensure detectProvider
remains consistent with this change.

In `@src/providers/openai.ts`:
- Around line 48-64: The fetch in the private async call(systemPrompt: string,
userPrompt: string) method has no timeout/abort, so add an AbortController,
start a timer (e.g. using setTimeout) that calls controller.abort() after a
configurable timeout (use an existing property or add this.requestTimeout), pass
controller.signal to fetch, clear the timer on success, and handle the abort
case (detect DOMException/AbortError and throw or return a descriptive error) so
stalled upstream requests don't hang compression/summarization; reference the
call method, this.baseUrl/this.apiKey, and this.model/maxTokens when applying
the change.

---

Outside diff comments:
In `@plugin/scripts/session-end.mjs`:
- Around line 34-61: The sequential fetch calls in session-end.mjs to
`${REST_URL}/agentmemory/crystals/auto`,
`${REST_URL}/agentmemory/consolidate-pipeline`, and
`${REST_URL}/agentmemory/claude-bridge/sync` can add up their AbortSignal
timeouts and stall teardown; change to fire these optional calls in parallel
(e.g., collect the individual fetch promises and use Promise.allSettled) while
preserving each call’s AbortSignal.timeout and authHeaders(), and handle/log
failures per-request rather than awaiting them serially so the total delay is
bounded by the longest single timeout instead of the sum.

---

Nitpick comments:
In `@src/config.ts`:
- Line 53: Remove the WHAT-style comment containing "OpenAI-compatible: supports
OpenAI, DeepSeek, SiliconFlow, Azure, vLLM, LM Studio" from src/config.ts; this
line duplicates information already expressed in code and violates the guideline
to avoid WHAT comments—delete that comment and, if necessary, replace it with a
brief rationale comment only (e.g., why compatibility matters) near the relevant
config symbol or constant to preserve intent.

In `@src/providers/openai.ts`:
- Around line 7-25: Replace the large "OpenAI-compatible LLM provider" header
comment with a concise rationale-only doc: keep a short one-line description
("OpenAI-compatible LLM provider") and a minimal required/optional env var list
(retain OPENAI_API_KEY and optionally OPENAI_BASE_URL, OPENAI_MODEL, MAX_TOKENS)
but remove the WHAT-style feature/implementation bullets and example providers;
update the top-of-file comment in src/providers/openai.ts (the block currently
starting with "OpenAI-compatible LLM provider.") so it only explains purpose and
essential configuration variables.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 08a6b7a3-cec0-4cba-800b-bf86735a2970

📥 Commits

Reviewing files that changed from the base of the PR and between 1effa2c and 6879064.

📒 Files selected for processing (6)
  • plugin/scripts/session-end.mjs
  • plugin/scripts/stop.mjs
  • src/config.ts
  • src/providers/index.ts
  • src/providers/openai.ts
  • src/types.ts

Comment thread src/config.ts
Comment on lines +169 to +170
hasRealValue(env["MINIMAX_API_KEY"]) ||
hasRealValue(env["OPENAI_API_KEY"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor OPENAI_API_KEY_FOR_LLM=false in provider-kind detection.

detectProvider() supports opt-out, but detectLlmProviderKind() currently reports "llm" whenever OPENAI_API_KEY exists. That creates inconsistent runtime behavior when users intentionally disable OpenAI for LLM.

Suggested patch
   if (
     hasRealValue(env["ANTHROPIC_API_KEY"]) ||
     hasRealValue(env["GEMINI_API_KEY"]) ||
     hasRealValue(env["GOOGLE_API_KEY"]) ||
     hasRealValue(env["OPENROUTER_API_KEY"]) ||
     hasRealValue(env["MINIMAX_API_KEY"]) ||
-    hasRealValue(env["OPENAI_API_KEY"])
+    (hasRealValue(env["OPENAI_API_KEY"]) &&
+      env["OPENAI_API_KEY_FOR_LLM"] !== "false")
   ) {
     return "llm";
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hasRealValue(env["MINIMAX_API_KEY"]) ||
hasRealValue(env["OPENAI_API_KEY"])
if (
hasRealValue(env["ANTHROPIC_API_KEY"]) ||
hasRealValue(env["GEMINI_API_KEY"]) ||
hasRealValue(env["GOOGLE_API_KEY"]) ||
hasRealValue(env["OPENROUTER_API_KEY"]) ||
hasRealValue(env["MINIMAX_API_KEY"]) ||
(hasRealValue(env["OPENAI_API_KEY"]) &&
env["OPENAI_API_KEY_FOR_LLM"] !== "false")
) {
return "llm";
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/config.ts` around lines 169 - 170, detectLlmProviderKind currently treats
any present OPENAI_API_KEY as selecting "llm" even when users set
OPENAI_API_KEY_FOR_LLM=false; update detectLlmProviderKind to honor the opt-out
by checking that OPENAI_API_KEY exists AND OPENAI_API_KEY_FOR_LLM is not the
string "false" (or equivalent falsy flag) before returning "llm" (e.g., replace
the hasRealValue(env["OPENAI_API_KEY"]) check with
hasRealValue(env["OPENAI_API_KEY"]) && env["OPENAI_API_KEY_FOR_LLM"] !==
"false"); keep the existing MINIMAX_API_KEY logic and ensure detectProvider
remains consistent with this change.

Comment thread src/providers/openai.ts
Comment on lines +48 to +64
private async call(systemPrompt: string, userPrompt: string): Promise<string> {
const url = `${this.baseUrl}/v1/chat/completions`;
const response = await fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${this.apiKey}`,
},
body: JSON.stringify({
model: this.model,
max_tokens: this.maxTokens,
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: userPrompt },
],
}),
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a request timeout to the OpenAI fetch path.

This call currently has no abort/timeout guard, so a stalled upstream can hang compression/summarization indefinitely.

Suggested patch
   private async call(systemPrompt: string, userPrompt: string): Promise<string> {
     const url = `${this.baseUrl}/v1/chat/completions`;
-    const response = await fetch(url, {
-      method: "POST",
-      headers: {
-        "Content-Type": "application/json",
-        Authorization: `Bearer ${this.apiKey}`,
-      },
-      body: JSON.stringify({
-        model: this.model,
-        max_tokens: this.maxTokens,
-        messages: [
-          { role: "system", content: systemPrompt },
-          { role: "user", content: userPrompt },
-        ],
-      }),
-    });
+    const controller = new AbortController();
+    const timeout = setTimeout(() => controller.abort(), 30_000);
+    let response: Response;
+    try {
+      response = await fetch(url, {
+        method: "POST",
+        signal: controller.signal,
+        headers: {
+          "Content-Type": "application/json",
+          Authorization: `Bearer ${this.apiKey}`,
+        },
+        body: JSON.stringify({
+          model: this.model,
+          max_tokens: this.maxTokens,
+          messages: [
+            { role: "system", content: systemPrompt },
+            { role: "user", content: userPrompt },
+          ],
+        }),
+      });
+    } catch (error) {
+      if ((error as Error).name === "AbortError") {
+        throw new Error("OpenAI API request timed out after 30s");
+      }
+      throw error;
+    } finally {
+      clearTimeout(timeout);
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/openai.ts` around lines 48 - 64, The fetch in the private async
call(systemPrompt: string, userPrompt: string) method has no timeout/abort, so
add an AbortController, start a timer (e.g. using setTimeout) that calls
controller.abort() after a configurable timeout (use an existing property or add
this.requestTimeout), pass controller.signal to fetch, clear the timer on
success, and handle the abort case (detect DOMException/AbortError and throw or
return a descriptive error) so stalled upstream requests don't hang
compression/summarization; reference the call method, this.baseUrl/this.apiKey,
and this.model/maxTokens when applying the change.

@rohitg00
Copy link
Copy Markdown
Owner

rohitg00 commented May 8, 2026

Reviewed — the OpenAI-compatible provider itself is clean and well-scoped. Like the raw-fetch approach (avoids SDK stainless headers, works against any /v1/chat/completions endpoint including DeepSeek, SiliconFlow, vLLM, LM Studio). The OPENAI_API_KEY_FOR_LLM=false opt-out is a thoughtful escape hatch for users who already use OPENAI_API_KEY for embeddings only.

Two things before merge:

1. Scope creep on the timeout bumps.

The hook script timeout changes (5s→30s, 15s→60s, 30s→120s, 30s→120s) in plugin/scripts/session-end.mjs and plugin/scripts/stop.mjs are unrelated to the OpenAI provider and don't fit this PR's title. Also they're going the opposite direction of #222, which just landed in spirit (5s → 1.5s ceiling on session-start.ts) to fix engine starvation under parallel claude -p fan-out.

Two options:

2. Update README env block + provider table.

The OpenAI entry is missing from the LLM-provider table in README.md and from the env block. Once those land it's a clean v1 of OpenAI support and #185 closes cleanly with this as the successor.

Otherwise approving the provider code itself. Will hold the merge for @rohitg00 once the scope question is resolved.

@flamerged
Copy link
Copy Markdown
Contributor

I tested the Ollama side of this PR path against current Ollama Cloud/local behavior and I think this PR can close the Ollama use case if it adds one small passthrough.

Findings from 2026-05-12:

  • Direct Ollama Cloud exposes OpenAI-compatible discovery: GET https://ollama.com/v1/models returns 200 with cloud models including deepseek-v4-flash, kimi-k2.6, gpt-oss:120b, etc.
  • Direct native discovery also works: GET https://ollama.com/api/tags returns 200.
  • Local signed-in Ollama daemon exposes cloud models through the OpenAI-compatible endpoint at http://127.0.0.1:11434/v1/chat/completions.

The important edge case: for a thinking model (kimi-k2.6:cloud), this request returned reasoning but empty final content:

{
  "model": "kimi-k2.6:cloud",
  "messages": [{"role":"user","content":"Reply exactly: ok"}],
  "max_tokens": 20,
  "stream": false
}

The response had message.reasoning populated and message.content empty. The same request worked when I added either:

{"reasoning_effort":"none"}

or:

{"reasoning":{"effort":"none"}}

Native Ollama /api/chat also worked with think:false.

So for Ollama Cloud via this OpenAI-compatible provider, I think we need one optional env passthrough such as:

OPENAI_REASONING_EFFORT=none

and then include reasoning_effort in the /v1/chat/completions body when set. That would let users configure:

OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://127.0.0.1:11434/v1
OPENAI_MODEL=kimi-k2.6:cloud
OPENAI_REASONING_EFFORT=none

or direct cloud:

OPENAI_API_KEY=<ollama cloud key>
OPENAI_BASE_URL=https://ollama.com/v1
OPENAI_MODEL=kimi-k2.6
OPENAI_REASONING_EFFORT=none

Without that, this PR may work for non-thinking models but fail AgentMemory's compression/summarization path on some Ollama Cloud thinking models because the provider treats empty message.content as an unexpected response.

@fatinghenji
Copy link
Copy Markdown
Contributor Author

Closing in favor of a clean rebased PR that addresses all review feedback.

rohitg00 added a commit that referenced this pull request May 16, 2026
…lish (#432)

Patch bump per the established rule: additive surface only. OpenAI
provider is a new optional surface that activates only when
OPENAI_API_KEY is set, gated by OPENAI_API_KEY_FOR_LLM. Telemetry
project_name pin is pure observability metadata. Compare polish is
docs/website only.

PRs included since v0.9.16:
  #307 — OpenAI-compatible LLM provider (universal adapter for
         OpenAI, Azure OpenAI auto-detected by hostname, DeepSeek,
         SiliconFlow, vLLM, LM Studio, Ollama). Plus the maintainer-
         pushed Azure detection + fetch timeout + README scope hint
         follow-ups. Closes #185, #232, #312, supersedes #240.
  #426 — pin worker telemetry project_name
  #427 — Compare section polish (title + native plugins cell + grid)

Files bumped (9):
  package.json, packages/mcp/package.json, plugin/.claude-plugin/plugin.json,
  plugin/.codex-plugin/plugin.json, src/version.ts, src/types.ts,
  src/functions/export-import.ts, test/export-import.test.ts,
  CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants