Skip to content

fix(health+cli): heap_tight goes to notes, demo probe checks agentmemory ready#176

Merged
rohitg00 merged 1 commit intomainfrom
fix/health-notes-demo-probe
Apr 20, 2026
Merged

fix(health+cli): heap_tight goes to notes, demo probe checks agentmemory ready#176
rohitg00 merged 1 commit intomainfrom
fix/health-notes-demo-probe

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented Apr 20, 2026

Summary

Two bugs hit in a single v0.9.1 testing session on a fresh install:

  1. Dashboard showed red Alerts (1): memory_heap_tight_90%_rss108mb on a just-started process. v0.9.0's fix(health): gate memory severity on RSS floor #160 intended the sub-floor heap-tight marker to be a quiet note, but the code still pushed it into the same alerts[] array as real critical conditions, so the viewer rendered it under the red Alerts card and the heap gauge was red at 23 MB total.

  2. npx @agentmemory/agentmemory demo failed with POST /agentmemory/session/start → 404 when iii-engine was up but the agentmemory worker hadn't registered. isEngineRunning probed / and treated any response as up, including iii's own 404s for unregistered paths.

Changes

Health: heap_tight is a note, not an alert

  • src/health/thresholds.tsevaluateHealth now returns { status, alerts, notes }. heap_tight goes to notes; alerts is reserved for critical/degraded conditions.
  • src/health/monitor.ts → persists notes on the snapshot.
  • src/types.tsHealthSnapshot.notes?: string[].
  • src/viewer/index.html → new "Notes (N)" card in neutral color below Alerts. Heap gauge only reddens when RSS is above the 512 MB floor — below the floor, heap ratio is informational.
  • test/health-thresholds.test.ts → updated to assert heap_tight lives in notes.

CLI: demo probe checks agentmemory, not just any listener

  • Added isAgentmemoryReady() that probes /agentmemory/livez with res.ok. Keeps the original isEngineRunning() (root probe) for the main start flow, which needs to detect iii-without-agentmemory to decide whether to spawn iii.
  • runDemo uses the stricter probe with a specific error: "agentmemory worker not reachable on port N (livez probe failed). Something may be on the port but it isn't serving /agentmemory/*."

Follow-up

Viewer WS "RECONNECTING..." badge — split into #177.

## Health: heap_tight note vs alert

CHANGELOG v0.9.0 (#160) said the sub-floor heap_tight marker is a
"non-alerting note attached to the snapshot — visibility without the
false positive", but the implementation still pushed it into the same
alerts[] array as real critical/degraded conditions. Viewer rendered
it under a red "Alerts (1)" card on fresh installs (~108 MB RSS is
below the 512 MB floor).

- thresholds.ts now returns { status, alerts, notes }, with heap_tight
  routed to notes[].
- HealthSnapshot gains optional notes: string[].
- Viewer: new "Notes (N)" card in neutral color next to Alerts. Heap
  gauge only reddens when RSS is above the 512 MB floor — below the
  floor, heap percent is informational.
- Test suite updated to assert heap_tight lives in notes, not alerts.

## Demo readiness probe

`npx @agentmemory/agentmemory demo` 404'd on /agentmemory/session/start
when iii-engine was up but the agentmemory worker hadn't registered
(e.g., right after a shutdown that left iii lingering). isEngineRunning
probed `/` and accepted any response, including iii's own 404.

Split the probe: isEngineRunning still probes root (used by main/start
flow to detect whether to spawn iii), and a new isAgentmemoryReady
probes /agentmemory/livez with res.ok. runDemo now uses the stricter
probe with a specific error explaining the distinction.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentmemory Ready Ready Preview, Comment Apr 20, 2026 11:40pm

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

The changes introduce a "notes" field throughout the health monitoring pipeline, classifying non-critical memory pressure conditions as informational notes rather than alerts, gated by RSS thresholds. Additionally, the CLI now probes /agentmemory/livez for agent readiness checks, and the viewer displays notes alongside alerts in the dashboard.

Changes

Cohort / File(s) Summary
Health Type Model
src/types.ts, src/health/thresholds.ts
Added notes?: string[] field to HealthSnapshot interface and updated evaluateHealth() return signature to include notes array. Memory heap pressure conditions reclassified from alerts to notes.
Health Monitoring & Evaluation
src/health/monitor.ts, test/health-thresholds.test.ts
Monitor now persists notes from evaluated health to snapshot. Test assertions updated to expect memory_heap_tight_* entries in notes instead of alerts.
CLI & Viewer
src/cli.ts, src/viewer/index.html
Added isAgentmemoryReady() function probing /agentmemory/livez endpoint; demo startup now uses this probe instead of isEngineRunning(). Viewer heap gauge color logic now requires both high heap usage and sufficient RSS (≥512 MB) to display warning/critical colors; added UI card to render notes when present.

Sequence Diagram

sequenceDiagram
    participant Evaluator as Health Evaluator
    participant Monitor as Health Monitor
    participant KV as KV Store
    participant Viewer as Dashboard Viewer
    
    Note over Evaluator: evaluateHealth()
    Evaluator->>Evaluator: Check memory conditions<br/>Classify heap pressure<br/>as notes (not alerts)
    Evaluator-->>Monitor: return { status, alerts, notes }
    
    Monitor->>Monitor: snapshot.notes =<br/>evaluated.notes
    Monitor->>KV: Write snapshot with notes<br/>to KV.health/latest
    KV-->>Monitor: Persisted
    
    Viewer->>KV: Read latest health snapshot
    KV-->>Viewer: Return snapshot with notes
    Viewer->>Viewer: Render alerts card<br/>Render notes card<br/>Apply RSS-gated<br/>heap color logic
    Viewer-->>Viewer: Display dashboard
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • #160: Introduces identical changes to evaluateHealth() return type, memory condition reclassification to notes, and notes field threading through monitor and types.
  • #165: Related health severity gating logic and /agentmemory/livez readiness probe changes for MCP/server initialization.
  • #102: Overlapping CLI startup health probing behavior and viewer UI modifications for readiness checks.

Poem

🐰 A note, a note, through health we pass,
Memory whispers soft and glass!
The RSS gates heap's amber glow,
While our readiness probe hops to and fro. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: rerouting heap_tight from alerts to notes, and updating the demo probe to check agentmemory readiness instead of just root endpoint.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/health-notes-demo-probe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/viewer/index.html (1)

1077-1082: ⚠️ Potential issue | 🟡 Minor

Compare the raw RSS bytes for the floor check.

Line 1081 uses the rounded display value, so an actual RSS like 511.6 MiB rounds to 512 and can still turn the heap gauge red/yellow despite being below the floor. Compare the raw byte value to match evaluateHealth.

🐛 Proposed fix
-          var rss = Math.round((snap.memory.rss || 0) / 1024 / 1024);
+          var rssBytes = snap.memory.rss || 0;
+          var rss = Math.round(rssBytes / 1024 / 1024);
           var heapPct = heapTotal > 0 ? Math.round((heapUsed / heapTotal) * 100) : 0;
-          var rssAboveFloor = rss >= 512;
+          var rssAboveFloor = rssBytes >= 512 * 1024 * 1024;
           var heapColor = (heapPct > 80 && rssAboveFloor) ? 'var(--red)' : (heapPct > 60 && rssAboveFloor) ? 'var(--yellow)' : 'var(--green)';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/viewer/index.html` around lines 1077 - 1082, The health-color logic is
using the rounded MiB value for rssAboveFloor which can misclassify values
(e.g., 511.6 MiB rounds to 512); update the check to compare the raw byte value
from snap.memory.rss (or a dedicated rssBytes variable) against 512 * 1024 *
1024 so rssAboveFloor is computed from raw bytes (keep heapPct calculation based
on rounded MiB as-is and ensure you still reference snap.memory.rss and heapPct
in the condition that computes heapColor to match evaluateHealth).
🧹 Nitpick comments (1)
src/cli.ts (1)

84-94: LGTM — readiness probe matches existing pattern.

isAgentmemoryReady() mirrors the probe in src/mcp/rest-proxy.ts and correctly gates the demo on the /agentmemory/livez endpoint registered by api::liveness.

One optional cleanup: runImportJsonl at Line 831-843 inlines essentially the same probe (plus a detail string for the error message). Consider having isAgentmemoryReady() optionally return a reason, or extract a small shared helper, so both callers stay in sync if the probe path/timeout ever changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli.ts` around lines 84 - 94, The readiness probe in isAgentmemoryReady()
is duplicated in runImportJsonl; refactor so both callers share the same logic
by either (A) changing isAgentmemoryReady() to return a tuple or object with
{ready: boolean, reason?: string} (so runImportJsonl can include the detail
string) or (B) extract a small helper like probeAgentMemoryLive(url, timeout)
that returns {ok, reason} and have both isAgentmemoryReady() and runImportJsonl
call that helper; update references to isAgentmemoryReady and runImportJsonl
accordingly so the probe path and timeout are defined in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/viewer/index.html`:
- Around line 1077-1082: The health-color logic is using the rounded MiB value
for rssAboveFloor which can misclassify values (e.g., 511.6 MiB rounds to 512);
update the check to compare the raw byte value from snap.memory.rss (or a
dedicated rssBytes variable) against 512 * 1024 * 1024 so rssAboveFloor is
computed from raw bytes (keep heapPct calculation based on rounded MiB as-is and
ensure you still reference snap.memory.rss and heapPct in the condition that
computes heapColor to match evaluateHealth).

---

Nitpick comments:
In `@src/cli.ts`:
- Around line 84-94: The readiness probe in isAgentmemoryReady() is duplicated
in runImportJsonl; refactor so both callers share the same logic by either (A)
changing isAgentmemoryReady() to return a tuple or object with {ready: boolean,
reason?: string} (so runImportJsonl can include the detail string) or (B)
extract a small helper like probeAgentMemoryLive(url, timeout) that returns {ok,
reason} and have both isAgentmemoryReady() and runImportJsonl call that helper;
update references to isAgentmemoryReady and runImportJsonl accordingly so the
probe path and timeout are defined in one place.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e51bc653-fce6-4115-a6b7-36b9cbc4281c

📥 Commits

Reviewing files that changed from the base of the PR and between 66b153a and 7013722.

📒 Files selected for processing (6)
  • src/cli.ts
  • src/health/monitor.ts
  • src/health/thresholds.ts
  • src/types.ts
  • src/viewer/index.html
  • test/health-thresholds.test.ts

@rohitg00 rohitg00 merged commit e57a108 into main Apr 20, 2026
5 checks passed
@rohitg00 rohitg00 deleted the fix/health-notes-demo-probe branch April 20, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant