Skip to content

fix: system audit -- 10 bugs fixed across hooks, triggers, and core#1

Merged
rohitg00 merged 2 commits into
mainfrom
audit/system-review
Feb 25, 2026
Merged

fix: system audit -- 10 bugs fixed across hooks, triggers, and core#1
rohitg00 merged 2 commits into
mainfrom
audit/system-review

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

Summary

Full system audit of agentmemory. Read every source file, identified 10 bugs across 17 files, fixed all of them.

Critical

  • Event triggers bypassed auth -- events.ts was calling api:: functions that require ApiRequest shape + auth headers. When AGENTMEMORY_SECRET is set, every event trigger got 401. Rewrote to call core KV/functions directly.
  • All 5 hooks missing auth -- None of the hook scripts sent Authorization: Bearer header. Every hook POST would fail silently when secret is configured.

High

  • observe.ts JSON.parse crash -- stripPrivateData() on a JSON string could break JSON structure (e.g., removing <private> inside a string value). Added try/catch with fallback.
  • post-tool-use.ts invalid truncation -- JSON.parse(str.slice(0, max-1) + '}') produces invalid JSON for nested objects. Returns truncated string instead.

Medium

  • compress.ts unbounded importance -- LLM could return importance outside 1-10 range or NaN. Added Math.max(1, Math.min(10, ...)) with NaN fallback.
  • compress.ts invalid observation types -- LLM could return types not in ObservationType. Added VALID_TYPES set validation with "other" fallback.

Low

  • context.ts token undercount -- Token estimate missed the header text of observation blocks, causing budget overshoot.
  • viewer/index.html hardcoded WS port -- WebSocket port now configurable via ?wsPort= query param.

Code simplification (second commit)

  • Removed redundant getContext()/logger from event triggers (targets already log)
  • Collapsed trivial delegate functions to one-liners
  • Removed unnecessary as const casts
  • Extracted shared viewer response headers

Test plan

  • npx tsc --noEmit -- zero type errors
  • npm test -- 45/45 tests pass (242ms)
  • Integration test with AGENTMEMORY_SECRET set to verify hooks authenticate
  • Verify viewer loads with custom ?wsPort= param

1. events.ts: Event triggers were calling api:: functions which
   require ApiRequest shape and auth headers. Rewrote to call
   core functions (kv.set, sdk.trigger) directly, bypassing auth.

2. All 5 hooks: Missing AGENTMEMORY_SECRET auth header. If secret
   was set, every hook would get 401 from the API. Now all hooks
   read AGENTMEMORY_SECRET and send Bearer token.

3. observe.ts: stripPrivateData on JSON string could break JSON
   structure when replacement text differs in length. Added
   try/catch fallback to string coercion.

4. post-tool-use.ts: truncate() for objects did
   JSON.parse(str.slice(0, max-1) + '}') which produces invalid
   JSON in nearly all cases. Changed to return truncated string.

5. compress.ts: LLM-returned importance was not clamped to 1-10
   range. Added Math.max(1, Math.min(10, ...)) with NaN fallback.

6. compress.ts: LLM-returned observation type was not validated
   against ObservationType union. Invalid types now fall back to
   "other".

7. context.ts: Token estimate for observation blocks only counted
   inner content, not the "## Session..." header. Fixed to estimate
   the full block text.

8. viewer: WebSocket port now configurable via ?wsPort= query param
   for non-default III_STREAMS_PORT configurations.

9. plugin/scripts: Rebuilt with auth header support matching the
   updated hook source files.
- Remove redundant getContext/logger calls from event triggers
  (target functions already log)
- Collapse event::observation and event::session::stopped to
  one-line delegates
- Remove unnecessary `as const` casts on string literals
- Nest hook-type checks under single object guard in observe.ts
- Extract shared viewer headers to avoid duplication in api.ts
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 25, 2026

Warning

Rate limit exceeded

@rohitg00 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 29 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b5ea36e and 77b161f.

📒 Files selected for processing (17)
  • plugin/scripts/post-tool-use.mjs
  • plugin/scripts/prompt-submit.mjs
  • plugin/scripts/session-end.mjs
  • plugin/scripts/session-start.mjs
  • plugin/scripts/stop.mjs
  • src/functions/compress.ts
  • src/functions/context.ts
  • src/functions/observe.ts
  • src/hooks/post-tool-use.ts
  • src/hooks/prompt-submit.ts
  • src/hooks/session-end.ts
  • src/hooks/session-start.ts
  • src/hooks/stop.ts
  • src/index.ts
  • src/triggers/api.ts
  • src/triggers/events.ts
  • src/viewer/index.html
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch audit/system-review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rohitg00 rohitg00 merged commit cde447b into main Feb 25, 2026
1 check passed
@rohitg00 rohitg00 deleted the audit/system-review branch February 25, 2026 08:22
rohitg00 added a commit that referenced this pull request May 13, 2026
…#295, supersedes #296) (#327)

* fix #2: BM25 tokenizer now preserves Cyrillic/Unicode via \p{L}\p{N} regex

Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

* fix #1: add getVectorIndex/setVectorIndex + vector embedding in rebuildIndex

Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

* fix #1+#3: vectorIndex.add in remember/observe/compress + setVectorIndex in index.ts

Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

* fix: add migration script, Cyrillic BM25 tests, vector-populate tests, rebuildIndex vector test

Adds migrateVectorIndex utility for re-embedding when embedding provider dimensions change. Adds comprehensive test coverage for Cyrillic tokenization, vector index population on remember/rebuildIndex, and migration edge cases.

Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

* chore: remove internal research spec from PR branch

Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

* refactor(search): centralize vector-index writes via vectorIndexAddGuarded

Pre-merge hardening on top of @nik1t7n's PR #296.

Original PR had vectorIndex.add() inlined at four sites (remember,
observe, compress, rebuildIndex) with the same try/catch shape repeated
each time. That open-codes three sharp edges per site:

1. Missing dimension guard at the write boundary. Per #248, persistence
   load already refuses to start when persisted vectors mismatch the
   active provider's dimensions — but the live-write path had no
   symmetric check. A mis-configured embedding provider returning a
   different-length Float32Array would silently corrupt the in-memory
   index until next restart cleared it.

2. No input clipping. Memory.content can be arbitrarily large (the
   MemorySlot path even allows 20KB content per slot). Most embedding
   providers cap input around 8k tokens and 400 on overflow. A single
   oversize memory could throw at the provider and lose the vector
   entry; worse, on local-Ollama-style providers it can cost real
   inference time.

3. Inconsistent error logging (rebuildIndex had `catch {}` swallowing
   the error entirely, the other three sites logged via logger.warn
   with slightly different shape).

Consolidates into a single exported `vectorIndexAddGuarded()` in
src/functions/search.ts:

  - Reads vectorIndex + currentEmbeddingProvider from module-level
    state (same singletons the PR added).
  - Clips embed input to 16k chars (EMBED_MAX_CHARS) — well under
    every provider's documented limit, leaves headroom for
    chars-per-token variance across languages.
  - Validates embedding.length === provider.dimensions before
    calling vi.add(); logs structured warning and skips on mismatch.
  - Single logger.warn on embed failure with provider name + error
    message + indexed-item kind ("memory" | "observation" |
    "synthetic") + id.
  - Returns boolean (true on success) so callers can branch if they
    want, but every existing call site fire-and-forgets — the contract
    is "soft-fail: index entry skipped, upstream save still succeeds".

Rewrites the four call sites (remember.ts, observe.ts synthetic
branch, compress.ts post-LLM, search.ts rebuildIndex memories + obs
loops) to call vectorIndexAddGuarded() instead of the inline try/catch.

Side effect: cleans up the lingering `catch {}` in rebuildIndex that
was swallowing rebuild-time embed failures silently — every failure
now surfaces in logs at a consistent shape.

Tests:
- test/multimodal.test.ts mocked `../src/functions/search.js` and
  exposed only getSearchIndex; needed vectorIndexAddGuarded added to
  the mock surface or test fails with "No vectorIndexAddGuarded export
  is defined on the search.js mock". Added the export to the mock.
- All other tests (search-index Cyrillic, search rebuildIndex vector,
  vector-index-populate, migrate-vector-index) green on the helper
  refactor — no behavioural change beyond the three guards above.

886 / 886 tests pass. Build clean.

Out-of-scope (deferred to follow-ups):
- Blocking await on every save. The embed call is on the request hot
  path. For local Ollama / on-device providers this is fine; for
  network providers it adds 200ms-2s latency per remember/observe.
  Should be async / fire-and-forget with a write-behind queue —
  separate perf PR after this lands.
- Periodic persistence flush of the vector index. Today the index is
  serialized at process exit (per IndexPersistence). Runtime adds
  between flushes survive in memory but die on crash. Acceptable for
  this PR; future PR can add a debounced flush.
- Batching in rebuildIndex. Today the rebuild path embeds one doc at
  a time. embedBatch() exists on every provider and would be ~5-10x
  faster for cold restarts on large corpora. Defer.

* fix(search): tighter migrate guard + rebuild vector clear + test hygiene

Five reviewer findings applied. Five skipped with reason:

APPLIED:

1. migrate-vector-index.ts filter now requires m.content (and rejects
   empty/whitespace). Without this, undefined m.content would concat
   to "title undefined" and poison the embed input. Memory.content is
   declared string but #277-era data shapes had cases where it could
   be missing — defensive filter.

2. migrate-vector-index.ts success expression now derives from the
   failed counter (success: failed === 0). Previously hardcoded true
   so callers couldn't rely on the boolean alone.

3. search.ts rebuildIndex now calls vectorIndex?.clear() before the
   repopulation loops. Symmetric to the BM25 idx.clear() that was
   already there. Without this, memories deleted between runs leave
   orphan embeddings in the vector store after every rebuild.

4. test/vector-index-dimensions.test.ts assertion tightened from
   toBeGreaterThan(0) to toBe(1) for both totalProcessed and
   vectorSize. The seeded fixture has exactly one observation; loose
   assertion would hide regressions where the migration produced
   extra/duplicate vectors.

5. test/vector-index-populate.test.ts switched to beforeEach/afterEach
   for the setVectorIndex / setEmbeddingProvider lifecycle. Manual
   cleanup in each test body was easy to drift on additions.

PLUS:

6. compress.ts BM25 add now wrapped in try/catch matching the
   remember.ts pattern for parity. Helps cross-file symmetry even
   though SearchIndex.add doesn't throw on valid input.

SKIPPED (with reason):

- Wrapping vectorIndexAddGuarded calls in additional try/catch at
  remember.ts / observe.ts / compress.ts: the helper has internal
  try/catch (src/functions/search.ts:67-89). It NEVER throws. Adding
  outer try/catch is dead code. The reviewer missed the helper's
  contract.
- vector-index-dimensions.test.ts mockKV.delete -> real deletion: the
  tests in that file don't exercise deletion semantics, and
  migrateVectorIndex itself never calls kv.delete. Touching it would
  add noise without effect on any assertion.
- recordAudit + Promise.allSettled in remember.ts: out of scope.
  recordAudit is already called earlier in the function. Parallelizing
  cascade-update with vector-add changes failure semantics (cascade-
  update is currently fire-and-forget) — separate perf PR if wanted.
- validateDimensions(N) assertion in vector-index-dimensions.test.ts:
  vectorSize is the COUNT of stored vectors, not their dimensions.
  Adding a validateDimensions check on the result would require
  exposing the new index. Out of scope for assertion tightening.

886 / 886 tests pass. Build clean.

* fix(migrate): per-session isolation + dim guard + drop unused oldIndex param

Four reviewer findings applied. One skipped with reason:

APPLIED:

1. Per-session isolation in migrateVectorIndex observations phase. The
   previous shape wrapped the entire sessions for-loop in one try/catch
   — a single bad session (kv.list throws, embedBatch rejects, etc.)
   aborted every subsequent session and silently truncated the
   migration. The loop now has per-session try/catch; failures
   increment `failed`, append the session id to a new `failedSessions`
   array on the result, and the loop continues. The kv.list<sessions>
   call itself is now guarded separately — if THAT fails the whole
   migration aborts (no sessions to iterate), but with `failed`
   recorded and `failedSessions` empty so the caller can distinguish.

2. Drop `oldIndex` parameter from migrateVectorIndex signature. It was
   dead — no read path inside the function. Three test call sites
   updated to match the new signature
   migrateVectorIndex(kv, newProvider). Confused the API contract; if
   we ever DO need a previous-index argument it can come back with a
   real use case.

3. Dim guard on every embedding result inside migrateVectorIndex, in
   both the memories phase and the per-session observations phase.
   Extracted into a local isValidEmbedding() helper that mirrors the
   shape of search.ts::vectorIndexAddGuarded's guard: log structured
   warn + skip on mismatch. Without this, a misconfigured provider
   returning the wrong-length Float32Array would corrupt the rebuilt
   index, defeating the #248 persistence-load guard.

4. Rename `vi` -> `vectorIndex` in test/vector-index-populate.test.ts
   so the let binding doesn't shadow the vitest `vi` import. Practical
   impact: future tests in that file can use `vi.fn()` / `vi.spyOn()`
   without renaming this binding first. Cosmetic but cheap.

SKIPPED (with reason):

- Use embedBatch in rebuildIndex (search.ts): perf optimization, not
  correctness. Already called out as deferred in the previous review's
  commit message. Cold-restart speed matters but the fix here is
  bigger refactor (collect-into-batches with metadata, validate each,
  add only successful) and worth its own PR. Will pick up when we
  also do the write-behind queue for live-write batching.

Result envelope:
  Added `failedSessions: string[]` to MigrateVectorIndexResult so the
  caller can decide whether to retry per-session or just abort.
  Existing fields (success, totalProcessed, failed, vectorSize)
  unchanged in shape.

886 / 886 tests pass. Build clean.

* fix(migrate): attribute batch-level failures to correct count + distinguish list-fail

Two new reviewer findings on migrate-vector-index. Six earlier findings
re-flagged in the same review block had already been addressed in prior
commits on this branch (verified inline):

  - compress.ts BM25 try/catch (commit d2098a5)
  - migrate filter requires m.content (commit d2098a5)
  - migrate success: failed === 0 (commit d2098a5)
  - rebuildIndex vector clear (commit d2098a5)
  - observe.ts / remember.ts outer try/catch — SKIPPED with reason
    (vectorIndexAddGuarded has internal try/catch, NEVER throws)
  - mockKV.delete no-op — SKIPPED with reason (no test exercises it)

NEW FINDINGS APPLIED:

1. Catch block in memories phase counted +1 for the entire batch
   regardless of batch size. If kv.list returned 250 memories and
   embedBatch threw, `failed` reflected 1 missed embedding when
   reality is 250.

   textMems is now declared outside the try so the catch can read its
   length. If kv.list itself threw before textMems was populated
   (length 0), we fall back to +1 (something failed but the batch
   size is unknown). If embedBatch threw, `failed += textMems.length`
   so the counter is honest.

2. kv.list<sessions> failure path returned `failed: 1,
   failedSessions: []`. Indistinguishable from "0 sessions, all OK"
   from the caller's perspective — they can't tell whether the
   enumeration itself blew up or simply found no sessions.

   The path now pushes a marker string "<sessions-list-failed>" into
   failedSessions before returning. Marker keeps the existing schema
   (failedSessions: string[]) — no boolean flag added — and gives
   callers a sentinel they can grep for. Per-session entries continue
   to use real session ids, so the marker is unambiguous.

886 / 886 tests pass. Build clean.

* test: swap Cyrillic non-ASCII tokenizer fixtures for Greek

The two regression tests added in this branch for the BM25 non-ASCII
tokenizer fix used Russian-language fixtures
("Проверка памяти", "Тестируем поиск по кириллице", etc). Switching
to Greek-language equivalents ("Προβολή μνήμης",
"Δοκιμάζουμε αναζήτηση σε ελληνικά", etc).

Reasoning: the test fixtures need to exercise the same code path —
non-ASCII Unicode letters that the old ASCII-only \w regex stripped.
Greek code points (U+0370..U+03FF) hit \p{L} identically to Cyrillic
(U+0400..U+04FF), so the regression coverage is preserved. Greek is
also less likely to be misread as politically loaded content in a
public repo.

Test ids renamed from obs_cyrillic to obs_greek; test names from
"indexes and finds Cyrillic text" to "indexes and finds non-ASCII
(Greek) text" (and the mixed-script variant likewise). Branch name
stays as-is to keep PR continuity but the shipped test fixtures are
language-neutral going forward.

11/11 search-index tests pass with the swap. Full suite 886/886.

---------

Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>
Co-authored-by: Nikita Nosov <20nik.nosov21@gmail.com>
rohitg00 pushed a commit that referenced this pull request May 15, 2026
…#391)

The README implied the `@agentmemory/mcp` shim exposes "all 51 tools when
AGENTMEMORY_TOOLS=all".  In reality:

- `AGENTMEMORY_TOOLS` is a server-side flag (read by iii-engine), not by
  the shim — setting it in the shim's MCP `env` block does nothing.
- The shim has two modes: proxy (when `AGENTMEMORY_URL` reaches a running
  agentmemory server, all 51 tools are forwarded) and local fallback
  (no server reachable → 7 hardcoded tools from
  `src/mcp/standalone.ts`'s `IMPLEMENTED_TOOLS`).

This caused users (Cursor / OpenCode / Gemini CLI) following the README
to assume they had misconfigured something when they only saw 7 tools.

Two changes:
- Codex plugin bullet: replace the misleading `AGENTMEMORY_TOOLS=all`
  qualifier with an accurate proxy/local mode description.
- MCP Server section header: add a callout that explicitly enumerates the
  7-tool local fallback set and tells users how to unlock the full 51.

Addresses gap #1 in #233.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant