feat: opt-in fail-loud when AGENTMEMORY_URL is unreachable#273
feat: opt-in fail-loud when AGENTMEMORY_URL is unreachable#273TotalZack wants to merge 2 commits into
Conversation
|
@TotalZack is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds two env vars and a new error to control remote /agentmemory/livez probe behavior: ChangesRemote Backend Resilience Configuration
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
7ff8b5e to
16ab04e
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
test/mcp-standalone-proxy.test.ts (1)
200-212: ⚡ Quick winStrengthen the warning-contract assertion.
This test currently checks only
url. Since the warning contract now includesprobeTimeoutMsand env-var guidance, asserting those fields would better lock behavior.Suggested patch
const [msg, fields] = warn.mock.calls[0]; expect(msg).toMatch(/agentmemory backend unreachable/i); expect(fields).toMatchObject({ url: BASE }); + expect(fields).toHaveProperty("probeTimeoutMs"); + expect((fields as { hint?: string }).hint).toContain("AGENTMEMORY_REMOTE_REQUIRED"); + expect((fields as { hint?: string }).hint).toContain("AGENTMEMORY_LIVEZ_TIMEOUT_MS");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mcp-standalone-proxy.test.ts` around lines 200 - 212, The test "emits a warn on silent local fallback when REMOTE_REQUIRED is unset" should assert the full warning contract: after spying on logger.warn and resolving handle (resolveHandle) verify warn was called once and expand the fields assertion (currently checking { url: BASE }) to also include probeTimeoutMs with the expected timeout value and the env-var guidance string (e.g. mention of REMOTE_REQUIRED or related env var); update the test's destructured fields assertion to expect these keys so the spec locks the warning payload shape returned by the code that emits the "agentmemory backend unreachable" message.src/mcp/rest-proxy.ts (1)
60-66: ⚡ Quick winSnapshot probe settings once per resolve attempt.
probeTimeoutMs()/remoteRequired()are read multiple times in one path. Capturing once keeps probe behavior, thrown error, and warning metadata consistent for that attempt.Suggested patch
-async function probe(url: string): Promise<boolean> { +async function probe(url: string, timeoutMs: number): Promise<boolean> { try { const res = await fetch(`${url}/agentmemory/livez`, { method: "GET", headers: authHeader(), - signal: AbortSignal.timeout(probeTimeoutMs()), + signal: AbortSignal.timeout(timeoutMs), }); return res.ok; } catch { return false; } } @@ if (probeInFlight) return probeInFlight; const url = baseUrl(); + const timeoutMs = probeTimeoutMs(); + const requireRemote = remoteRequired(); probeInFlight = (async () => { - const up = await probe(url); + const up = await probe(url, timeoutMs); @@ - if (remoteRequired()) { - throw new RemoteUnreachableError(url, probeTimeoutMs()); + if (requireRemote) { + throw new RemoteUnreachableError(url, timeoutMs); } logger.warn( "agentmemory backend unreachable; falling back to in-memory local mode", { url, - probeTimeoutMs: probeTimeoutMs(), + probeTimeoutMs: timeoutMs, hint: "set AGENTMEMORY_REMOTE_REQUIRED=1 to fail loud, or AGENTMEMORY_LIVEZ_TIMEOUT_MS=N to allow more probe time on high-latency networks", }, );Also applies to: 89-127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mcp/rest-proxy.ts` around lines 60 - 66, Probe reads probeTimeoutMs() and remoteRequired() multiple times causing inconsistent behavior; snapshot these values once at the start of the async function probe (e.g., const timeout = probeTimeoutMs(); const required = remoteRequired()) and use timeout and required everywhere in probe (for AbortSignal.timeout, error messages, and warning metadata) so the timeout and requirement decision are consistent for that resolve attempt; apply the same snapshotting pattern to the related logic in the other block referenced (lines 89-127) where probeTimeoutMs()/remoteRequired() are currently read multiple times.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Line 13: The changelog text uses the ambiguous env var name REMOTE_REQUIRED;
update that occurrence to the full, explicit name AGENTMEMORY_REMOTE_REQUIRED in
the sentence describing the stderr warn line so the entry reads "...when
`AGENTMEMORY_URL` points at a remote backend but the `/livez` probe fails and
`AGENTMEMORY_REMOTE_REQUIRED` is unset..." to avoid operator confusion (edit the
string in CHANGELOG.md where the env var is referenced).
In `@README.md`:
- Around line 908-911: Update the README wording to reflect that fallback now
emits a warning: in the paragraph containing "fallback can land memories in the
wrong store without warning" replace that phrase with wording such as "fallback
may land memories in the wrong store after emitting a warning" (or similar) so
the docs match runtime behavior; locate this text block in the README and change
only the phrase to indicate that a warning is emitted rather than "without
warning."
In `@src/mcp/rest-proxy.ts`:
- Around line 26-27: The RemoteUnreachableError message in rest-proxy.ts is
misleading because probe() treats both timeouts/network errors and non-2xx
responses as failures; update the message constructed for RemoteUnreachableError
to be failure-mode neutral (e.g., "did not return a successful response within
{probeTimeoutMs}ms" or "did not return a 2xx response") and include the url and
probeTimeoutMs so it covers both slow/non-responsive and immediate 4xx/5xx
cases; change the string used where RemoteUnreachableError is thrown/created
(refer to RemoteUnreachableError and probe() in this file) and keep the guidance
about AGENTMEMORY_LIVEZ_TIMEOUT_MS, backend connectivity, and
AGENTMEMORY_REMOTE_REQUIRED unchanged.
---
Nitpick comments:
In `@src/mcp/rest-proxy.ts`:
- Around line 60-66: Probe reads probeTimeoutMs() and remoteRequired() multiple
times causing inconsistent behavior; snapshot these values once at the start of
the async function probe (e.g., const timeout = probeTimeoutMs(); const required
= remoteRequired()) and use timeout and required everywhere in probe (for
AbortSignal.timeout, error messages, and warning metadata) so the timeout and
requirement decision are consistent for that resolve attempt; apply the same
snapshotting pattern to the related logic in the other block referenced (lines
89-127) where probeTimeoutMs()/remoteRequired() are currently read multiple
times.
In `@test/mcp-standalone-proxy.test.ts`:
- Around line 200-212: The test "emits a warn on silent local fallback when
REMOTE_REQUIRED is unset" should assert the full warning contract: after spying
on logger.warn and resolving handle (resolveHandle) verify warn was called once
and expand the fields assertion (currently checking { url: BASE }) to also
include probeTimeoutMs with the expected timeout value and the env-var guidance
string (e.g. mention of REMOTE_REQUIRED or related env var); update the test's
destructured fields assertion to expect these keys so the spec locks the warning
payload shape returned by the code that emits the "agentmemory backend
unreachable" message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bc091e64-c9fd-4f04-b0b7-8b2e6d060db2
📒 Files selected for processing (4)
CHANGELOG.mdREADME.mdsrc/mcp/rest-proxy.tstest/mcp-standalone-proxy.test.ts
Three small follow-ups from CodeRabbit's pre-merge review — none blocking,
all minor wording or accuracy fixes:
- CHANGELOG.md: use full env var name `AGENTMEMORY_REMOTE_REQUIRED` instead
of the ambiguous bare `REMOTE_REQUIRED` in the warn-line bullet.
- README.md: align the inline comment about silent fallback with the
warning-line behavior introduced in this PR ("if the warning is missed"
rather than "without warning", since the warn line now exists).
- src/mcp/rest-proxy.ts: make `RemoteUnreachableError`'s message
failure-mode neutral. The previous "did not respond within Xms" wording
was inaccurate for fast non-2xx responses; the new wording covers
timeout, network error, and non-2xx all together.
No semantic change. 12/12 vitest cases still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: TotalZack <199677237+TotalZack@users.noreply.github.com>
|
Just to clarify the Real-world validation section in the PR body — the Hetzner / Coolify / Tailscale specifics there describe one concrete dev environment that was used to reproduce both the silent-fallback bug and verify the fix end-to-end. The bug + fix apply to any self-hosted setup where the agentmemory backend sits behind a private overlay network (Tailscale, WireGuard, ZeroTier, in-cluster service mesh, etc.), where 500 ms is sometimes not enough probe headroom. The fix is platform-neutral. Folgecommit |
The standalone MCP shim probes /agentmemory/livez with a hardcoded 500 ms
timeout and silently falls back to local in-memory mode if the probe
fails. On high-latency networks (Tailscale across regions, slow Coolify
container boots, packet loss), users who set AGENTMEMORY_URL=
http://your-server:3111/agentmemory have no signal that their memories
are landing in a local store instead of the configured backend.
Two new env vars, both default off (no behavior change for existing
users), plus a visible warn line on the previously-silent fallback path:
AGENTMEMORY_REMOTE_REQUIRED=1 fail loud on probe failure (throws
RemoteUnreachableError) instead of
falling back. Recommended for prod
multi-machine deployments.
AGENTMEMORY_LIVEZ_TIMEOUT_MS=N raise the probe timeout (ms) when
the 500 ms default trips on a
healthy but slow backend.
(warn on fallback) emit "[agentmemory] warn agentmemory
backend unreachable; falling back to
in-memory local mode" on stderr
when the probe fails and
REMOTE_REQUIRED is unset, so the
failure mode stops being invisible.
Tests: three new vitest cases in test/mcp-standalone-proxy.test.ts
covering each path. Existing 7 cases untouched.
- respects AGENTMEMORY_LIVEZ_TIMEOUT_MS — slow probe aborts at the
configured timeout, falls back to local
- emits a warn on silent local fallback when REMOTE_REQUIRED is unset
- throws RemoteUnreachableError when REMOTE_REQUIRED=1 and probe fails
Docs: env-var section in root README updated with both new vars in the
existing comment-block style. CHANGELOG entry under [Unreleased] /
Added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: TotalZack <199677237+TotalZack@users.noreply.github.com>
Three small follow-ups from CodeRabbit's pre-merge review — none blocking,
all minor wording or accuracy fixes:
- CHANGELOG.md: use full env var name `AGENTMEMORY_REMOTE_REQUIRED` instead
of the ambiguous bare `REMOTE_REQUIRED` in the warn-line bullet.
- README.md: align the inline comment about silent fallback with the
warning-line behavior introduced in this PR ("if the warning is missed"
rather than "without warning", since the warn line now exists).
- src/mcp/rest-proxy.ts: make `RemoteUnreachableError`'s message
failure-mode neutral. The previous "did not respond within Xms" wording
was inaccurate for fast non-2xx responses; the new wording covers
timeout, network error, and non-2xx all together.
No semantic change. 12/12 vitest cases still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: TotalZack <199677237+TotalZack@users.noreply.github.com>
882037e to
5dd1cd4
Compare
Problem
@agentmemory/mcp's standalone shim probes/agentmemory/livezand silently falls back to local in-memory mode on probe failure. Users on multi-machine self-hosted setups (Tailscale, Coolify) have no signal that their memories are landing in a local store rather than the configured shared backend.Fix
Two new env vars, both default-off (no behavior change for existing users), plus a visible stderr line on the previously-silent fallback path:
AGENTMEMORY_REMOTE_REQUIRED1/true/yes/on: probe failure throws a structuredRemoteUnreachableErrorinstead of falling back to local. Recommended for production multi-machine deployments.AGENTMEMORY_LIVEZ_TIMEOUT_MSAGENTMEMORY_PROBE_TIMEOUT_MSsince v0.9.7)AGENTMEMORY_PROBE_TIMEOUT_MSso existing deployments that already set this env var continue to work unchanged.Plus: when
REMOTE_REQUIREDis unset and probe fails, the shim now writes a single line tostderr— withurl,probeTimeoutMs, and a hint at both env vars.Three operator-friendly modes
AGENTMEMORY_REMOTE_REQUIRED=1→ throwRemoteUnreachableErrorif probe failsAGENTMEMORY_URLunset or unreachable, noREMOTE_REQUIRED→ local-only (existing behavior)Changes vs v0.9.9
src/mcp/rest-proxy.ts:remoteRequired()helper +RemoteUnreachableErrorclass + mergedprobeTimeoutMs()that checksAGENTMEMORY_LIVEZ_TIMEOUT_MSfirst, thenAGENTMEMORY_PROBE_TIMEOUT_MS;resolveHandle()throws on probe failure whenREMOTE_REQUIREDis set; fallback path emitsprocess.stderr.write(matching upstream's v0.9.7 style) instead oflogger.warntest/mcp-standalone-proxy.test.ts: 3 vitest cases covering each operator modeCompatibility
AGENTMEMORY_LIVEZ_TIMEOUT_MSis a transparent alias — deployments using the old name keep workingSummary by CodeRabbit
Documentation
New Features
Tests