fix(embed): allow separate OPENAI_EMBEDDING_BASE_URL + OPENAI_EMBEDDING_API_KEY#503
fix(embed): allow separate OPENAI_EMBEDDING_BASE_URL + OPENAI_EMBEDDING_API_KEY#503efenex wants to merge 2 commits into
Conversation
|
@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughOpenAIEmbeddingProvider now supports embedding-specific env vars: it documents and prefers ChangesEmbedding-specific configuration and credentials
Sequence Diagram(s): sequenceDiagram
participant Caller
participant OpenAIEmbeddingProvider
participant EnvVars
participant OpenAIAPI
Caller->>OpenAIEmbeddingProvider: construct or call embed()
OpenAIEmbeddingProvider->>EnvVars: read constructor apiKey, OPENAI_EMBEDDING_API_KEY, OPENAI_API_KEY
OpenAIEmbeddingProvider->>EnvVars: read OPENAI_EMBEDDING_BASE_URL, OPENAI_BASE_URL via normalizeBaseUrl
OpenAIEmbeddingProvider->>OpenAIAPI: send embedding request to resolved base URL with resolved apiKey
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/providers/embedding/openai.ts`:
- Line 79: Update the thrown error in the check that uses this.apiKey (the line
`if (!this.apiKey) throw new Error("OPENAI_API_KEY is required");`) to mention
all accepted API key sources — the constructor parameter,
OPENAI_EMBEDDING_API_KEY, or OPENAI_API_KEY — so users who set
OPENAI_EMBEDDING_API_KEY aren’t misled; locate the guard around this.apiKey (in
the OpenAI embedding provider code) and replace the message with a concise one
listing those three sources.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3482749c-dcca-4d4a-bd97-b780e42f8abb
📒 Files selected for processing (1)
src/providers/embedding/openai.ts
…rpora) rebuildIndex called `await vectorIndexAddGuarded(...)` per memory and per observation. Each call is one HTTP round-trip to the embedding provider for a single input. On a 500k-observation imported corpus against an embedding endpoint with even modest latency, that's serial 100-200ms per call = 14-28 hours of wallclock. The new non-blocking rebuild path (rohitg00#500) made this no longer block boot, but the rebuild itself still takes the same wallclock. Add `vectorIndexAddBatchGuarded()` next to the existing per-item helper, accepting an array of items and calling `provider.embedBatch()` once. For batchable endpoints (vLLM, Triton, OpenAI's `/v1/embeddings` all accept an `input` array), latency for N items is roughly the latency of a single embed because network + GPU setup amortize. Refactor `rebuildIndex` to accumulate items into a buffer and flush every REBUILD_EMBED_BATCH_SIZE (default 32). BM25 add stays per-item-synchronous; only the vector path is batched. Validated against a vLLM Qwen3-Embedding-8B endpoint: - single embed: 175ms - batch-of-32: 737ms (= 23ms/item amortized, ~7.6× speedup) - projected backfill time for 500k obs: 25h → 3h Per-item failure shape is preserved: - whole-batch network/provider error → all skipped, single warn line (vs N warns previously when the same error hit every item) - per-item dimension mismatch → that item skipped, others continue - rebuildIndex return value unchanged (count of attempted items) Override knob: - REBUILD_EMBED_BATCH_SIZE (default 32) — set lower for endpoints with small per-request input limits, higher for endpoints that prefer larger batches. Set to 1 to fall back to the per-item path. 39/39 existing tests in search-index/vector-index/remember-bm25-index pass unchanged. Related: rohitg00#500 (non-blocking rebuildIndex), rohitg00#503 (separate embedding base URL).
3f24800 to
45f8e19
Compare
|
Rebased onto current
Tests pass locally with 17/17 in |
|
Hey @efenex — sitting on this to confirm scope. The PR adds two new env vars ( To gate the addition: which concrete user is blocked today by the single-pair shape? E.g. "LLM on DeepSeek + embeddings on local Ollama" is a real ask, and #503 does unblock it. But if there's no tracked issue documenting a real user hitting this, the override might fit better as a downstream wrapper than as in-tree config. Could you (a) link to a user issue describing the blocker, or (b) describe the deployment shape that needs the split? Once that's clear I'll either merge as-is or close in favor of a downstream pattern. (Holding here, not blocking — the diff itself is clean and uses the existing |
…rpora) (#504) * fix(rebuild): batch embed calls in rebuildIndex (25h → 3h on large corpora) rebuildIndex called `await vectorIndexAddGuarded(...)` per memory and per observation. Each call is one HTTP round-trip to the embedding provider for a single input. On a 500k-observation imported corpus against an embedding endpoint with even modest latency, that's serial 100-200ms per call = 14-28 hours of wallclock. The new non-blocking rebuild path (#500) made this no longer block boot, but the rebuild itself still takes the same wallclock. Add `vectorIndexAddBatchGuarded()` next to the existing per-item helper, accepting an array of items and calling `provider.embedBatch()` once. For batchable endpoints (vLLM, Triton, OpenAI's `/v1/embeddings` all accept an `input` array), latency for N items is roughly the latency of a single embed because network + GPU setup amortize. Refactor `rebuildIndex` to accumulate items into a buffer and flush every REBUILD_EMBED_BATCH_SIZE (default 32). BM25 add stays per-item-synchronous; only the vector path is batched. Validated against a vLLM Qwen3-Embedding-8B endpoint: - single embed: 175ms - batch-of-32: 737ms (= 23ms/item amortized, ~7.6× speedup) - projected backfill time for 500k obs: 25h → 3h Per-item failure shape is preserved: - whole-batch network/provider error → all skipped, single warn line (vs N warns previously when the same error hit every item) - per-item dimension mismatch → that item skipped, others continue - rebuildIndex return value unchanged (count of attempted items) Override knob: - REBUILD_EMBED_BATCH_SIZE (default 32) — set lower for endpoints with small per-request input limits, higher for endpoints that prefer larger batches. Set to 1 to fall back to the per-item path. 39/39 existing tests in search-index/vector-index/remember-bm25-index pass unchanged. Related: #500 (non-blocking rebuildIndex), #503 (separate embedding base URL). * fix(rebuild): per-item vi.add try/catch to preserve soft-fail Restores the pre-batch soft-fail behavior — a single failing vi.add() no longer aborts the entire rebuild batch. Failures are logged and counted toward fail, just like dimension mismatches above.
…NG_API_KEY The OpenAI-compat embedding provider previously read OPENAI_BASE_URL and OPENAI_API_KEY only, which the chat-LLM path (src/providers/openai.ts) also reads. That couples both calls to the same endpoint — so operators who want, say, chat completions on a fast hosted provider (Novita / DeepInfra) and embeddings on a self-hosted vLLM cluster (Qwen3-Embedding-8B on a dedicated GPU) had to either move both to the same endpoint or run agentmemory against a single provider with whatever embedding model it happens to expose. Add two embedding-scoped overrides with fallback to the existing vars: OPENAI_EMBEDDING_BASE_URL → falls back to OPENAI_BASE_URL → default OPENAI_EMBEDDING_API_KEY → falls back to OPENAI_API_KEY → required The fallback chain keeps existing setups working without any .env changes. New setups can mix and match — common patterns: # vLLM (self-hosted GPU, free, batchable) for embeddings + Novita (DeepSeek V4 Flash) for chat OPENAI_BASE_URL=https://api.novita.ai/v3/openai OPENAI_API_KEY=sk-novita-... OPENAI_EMBEDDING_BASE_URL=https://embed.your.lan OPENAI_EMBEDDING_API_KEY=local-no-auth # endpoints that ignore Authorization OPENAI_EMBEDDING_MODEL=Qwen3-Embedding-8B OPENAI_EMBEDDING_DIMENSIONS=4096 # Local Ollama for embeddings + remote for chat OPENAI_BASE_URL=https://api.openai.com OPENAI_API_KEY=sk-... OPENAI_EMBEDDING_BASE_URL=http://localhost:11434 OPENAI_EMBEDDING_API_KEY=ollama OPENAI_EMBEDDING_MODEL=nomic-embed-text The separate API key matters because most local endpoints (Ollama / LM Studio / llama.cpp / vLLM) ignore Authorization entirely but Node fetch still requires a non-empty Bearer token. Setting OPENAI_EMBEDDING_API_KEY=anything-truthy unblocks that case without revealing the real OPENAI_API_KEY to whatever's on localhost. No code-paths other than the embedding provider are touched. Reviewed against the 17 existing test cases in test/embedding-provider.test.ts; no regression (the 6 pre-existing failures on main are env-pollution when ~/.agentmemory/.env has API keys set, unrelated to this change).
45f8e19 to
3980a74
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/embedding-provider.test.ts (1)
58-63: 💤 Low valueConsider clearing the new embedding env vars in beforeEach for test isolation.
The
beforeEachblock clearsOPENAI_BASE_URL,OPENAI_EMBEDDING_MODEL, andOPENAI_EMBEDDING_DIMENSIONS, but doesn't clear the newOPENAI_EMBEDDING_BASE_URLorOPENAI_EMBEDDING_API_KEY. If a developer's environment has these set, tests like "respects OPENAI_BASE_URL env var" could unexpectedly fail since the embedding-specific URL takes precedence.♻️ Suggested addition to beforeEach
beforeEach(() => { process.env = { ...originalEnv }; delete process.env["OPENAI_BASE_URL"]; + delete process.env["OPENAI_EMBEDDING_BASE_URL"]; + delete process.env["OPENAI_EMBEDDING_API_KEY"]; delete process.env["OPENAI_EMBEDDING_MODEL"]; delete process.env["OPENAI_EMBEDDING_DIMENSIONS"]; });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/embedding-provider.test.ts` around lines 58 - 63, The beforeEach in embedding-provider.test.ts resets process.env but misses the new embedding-specific vars; update the beforeEach setup (the same block that currently deletes OPENAI_BASE_URL, OPENAI_EMBEDDING_MODEL, OPENAI_EMBEDDING_DIMENSIONS) to also delete OPENAI_EMBEDDING_BASE_URL and OPENAI_EMBEDDING_API_KEY so embedding tests run in isolation and the test "respects OPENAI_BASE_URL env var" cannot be influenced by developer environment variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/embedding-provider.test.ts`:
- Around line 58-63: The beforeEach in embedding-provider.test.ts resets
process.env but misses the new embedding-specific vars; update the beforeEach
setup (the same block that currently deletes OPENAI_BASE_URL,
OPENAI_EMBEDDING_MODEL, OPENAI_EMBEDDING_DIMENSIONS) to also delete
OPENAI_EMBEDDING_BASE_URL and OPENAI_EMBEDDING_API_KEY so embedding tests run in
isolation and the test "respects OPENAI_BASE_URL env var" cannot be influenced
by developer environment variables.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4acd71b7-0b49-4fe5-bc1e-31d357717342
📒 Files selected for processing (2)
src/providers/embedding/openai.tstest/embedding-provider.test.ts
|
@rohitg00 thanks for sitting on this — the gate question is fair. Concrete deployment that this unblocks: The exact split I'm running locally — agentmemory daemon driving a serverless OpenAI-compat chat completions endpoint (novita.ai's hosted Qwen3.6) while embeddings stay on a separate vLLM at It surfaced when I wrote Pattern is recurring upstream too:
If "downstream wrapper" is the policy, I'm fine closing: The whole thing can be achieved with a 5-line shell wrapper that twiddles env vars before spawning agentmemory, which is what I'd do otherwise. Just wanted to flag that this exact split has come up in two open PRs from two contributors in a few weeks — feels like the kind of operator knob that wants to live in-tree. Either way, the PR's diff is independent of that policy call. Tell me which way you'd like it and I'll move accordingly. |
CodeRabbit nitpick: the embedding-specific env vars take precedence over OPENAI_BASE_URL/OPENAI_API_KEY in provider config, so a developer with these set in their shell could see test/host config interfere. Match what the other ENV cleanup lines already do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/embedding-provider.test.ts (1)
55-158: ⚡ Quick winConsider adding explicit test coverage for the new embedding-specific environment variables.
While the existing tests pass and the PR author has performed live verification, the new
OPENAI_EMBEDDING_BASE_URLandOPENAI_EMBEDDING_API_KEYfunctionality lacks explicit test coverage. Adding tests would improve regression protection and documentation of the fallback behavior.Suggested test cases:
OPENAI_EMBEDDING_BASE_URL takes precedence - Similar to the existing test at lines 83-98, but setting
OPENAI_EMBEDDING_BASE_URLto verify it's used instead ofOPENAI_BASE_URL.OPENAI_EMBEDDING_API_KEY is used when set - Verify that setting only
OPENAI_EMBEDDING_API_KEY(withoutOPENAI_API_KEY) successfully constructs the provider.Fallback from embedding-specific to general vars - Verify that when
OPENAI_EMBEDDING_BASE_URLis not set, it falls back toOPENAI_BASE_URL, and whenOPENAI_EMBEDDING_API_KEYis not set, it falls back toOPENAI_API_KEY.Example test structure
it("respects OPENAI_EMBEDDING_BASE_URL with precedence over OPENAI_BASE_URL", async () => { process.env["OPENAI_BASE_URL"] = "https://generic.example.com"; process.env["OPENAI_EMBEDDING_BASE_URL"] = "https://embedding.example.com"; const provider = new OpenAIEmbeddingProvider("test-key"); const fetchSpy = vi.spyOn(globalThis, "fetch").mockResolvedValue( new Response(JSON.stringify({ data: [{ embedding: [0.1, 0.2, 0.3] }] }), { status: 200 }), ); await provider.embed("hello"); expect(fetchSpy).toHaveBeenCalledWith( "https://embedding.example.com/v1/embeddings", expect.any(Object), ); fetchSpy.mockRestore(); }); it("uses OPENAI_EMBEDDING_API_KEY when set", () => { process.env["OPENAI_EMBEDDING_API_KEY"] = "embedding-key"; // OPENAI_API_KEY not set const provider = new OpenAIEmbeddingProvider(); expect(provider.name).toBe("openai"); }); it("falls back from OPENAI_EMBEDDING_API_KEY to OPENAI_API_KEY", () => { process.env["OPENAI_API_KEY"] = "fallback-key"; // OPENAI_EMBEDDING_API_KEY not set const provider = new OpenAIEmbeddingProvider(); expect(provider.name).toBe("openai"); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/embedding-provider.test.ts` around lines 55 - 158, Add explicit tests for the embedding-specific env vars to cover precedence and fallback: create tests in embedding-provider.test.ts that instantiate OpenAIEmbeddingProvider and spy on global fetch to assert that OPENAI_EMBEDDING_BASE_URL takes precedence over OPENAI_BASE_URL (call to OpenAIEmbeddingProvider.embed should use "https://embedding.../v1/embeddings"), that OPENAI_EMBEDDING_API_KEY alone allows successful construction of OpenAIEmbeddingProvider (when OPENAI_API_KEY is absent), and that when the embedding-specific vars are absent the provider falls back to OPENAI_BASE_URL and OPENAI_API_KEY; reference the OpenAIEmbeddingProvider constructor and the embed method and reuse the existing fetch spy pattern from other tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/embedding-provider.test.ts`:
- Around line 55-158: Add explicit tests for the embedding-specific env vars to
cover precedence and fallback: create tests in embedding-provider.test.ts that
instantiate OpenAIEmbeddingProvider and spy on global fetch to assert that
OPENAI_EMBEDDING_BASE_URL takes precedence over OPENAI_BASE_URL (call to
OpenAIEmbeddingProvider.embed should use "https://embedding.../v1/embeddings"),
that OPENAI_EMBEDDING_API_KEY alone allows successful construction of
OpenAIEmbeddingProvider (when OPENAI_API_KEY is absent), and that when the
embedding-specific vars are absent the provider falls back to OPENAI_BASE_URL
and OPENAI_API_KEY; reference the OpenAIEmbeddingProvider constructor and the
embed method and reuse the existing fetch spy pattern from other tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a2b129e4-0767-43f6-a8d0-0edaad8cbb17
📒 Files selected for processing (1)
test/embedding-provider.test.ts
Summary
The OpenAI-compat embedding provider read
OPENAI_BASE_URLandOPENAI_API_KEYonly, which the chat-LLM path also reads — coupling both calls to the same endpoint. Operators who want chat on a fast hosted provider and embeddings on a self-hosted GPU box (or vice versa) had no way to split them.Add two embedding-scoped overrides with fallback to the existing vars:
Existing setups continue working without changes (both fall back to the old vars).
Why
Personal motivating example:
Previously these had to live on one endpoint. Now they don't:
Other common patterns this enables:
OPENAI_EMBEDDING_BASE_URL=http://localhost:11434 OPENAI_EMBEDDING_API_KEY=ollamaOPENAI_EMBEDDING_BASE_URL=https://api.openai.comand put the proxy URL inOPENAI_BASE_URLWhy the separate API key
Most local endpoints (Ollama / LM Studio / llama.cpp / vLLM) ignore Authorization entirely, but Node fetch still requires a non-empty Bearer token. Setting
OPENAI_EMBEDDING_API_KEY=anything-truthyunblocks that case without revealing the realOPENAI_API_KEYto whatever's on localhost.Test plan
Reviewed against the 17 existing test cases in
test/embedding-provider.test.ts; no regression. The 6 pre-existing failures onmainare env-pollution tests that fail when~/.agentmemory/.envhas API keys set — verified unrelated to this PR by re-running onmain.Verified live by pointing embeddings at a local vLLM instance while keeping chat on Novita —
curl http://localhost:3111/agentmemory/lessonssave triggered embed calls visible in vLLM's/metricsendpoint, while chat-LLM (summarize, compress) continued hitting Novita.Files
src/providers/embedding/openai.ts— 36 insertions, 7 deletions (mostly docstring expansion)Summary by CodeRabbit