Skip to content

agentmemory stop disconnects worker but leaves iii engine running with stale function registrations #474

@efenex

Description

@efenex

Symptom

When iterating on agentmemory locally (rebuild → swap dist → restart), the new code on disk is NOT picked up by the running engine even after agentmemory stop && agentmemory. The runtime continues to serve the OLD function definitions until the iii engine process is explicitly killed.

Repro

  1. Run agentmemory normally
  2. Modify any function source, e.g. src/functions/diagnostics.ts — add a new category to ALL_CATEGORIES
  3. npm run build
  4. Copy the new dist over the deployed location, e.g.:
    rm -rf /opt/homebrew/lib/node_modules/@agentmemory/agentmemory/dist
    cp -R dist /opt/homebrew/lib/node_modules/@agentmemory/agentmemory/dist
  5. agentmemory stop && nohup agentmemory > /tmp/am.log 2>&1 < /dev/null & disown
  6. Verify livez: curl -s http://localhost:3111/agentmemory/livez returns 200 OK
  7. Probe the changed function: returns OLD behavior (the new category is absent from the response, or the new ALL_CATEGORIES entries are silently filtered out by categories.filter((c) => ALL_CATEGORIES.includes(c)) rejecting them)

I confirmed in my own repro that the deployed dist/index.mjs SHA matches the locally-built one and the new code IS present in the bundle — but the running process behaves as if it's still on the old code.

Workaround that DOES work

pkill -9 -f "iii\|node dist/index"
nohup agentmemory > /tmp/am.log 2>&1 < /dev/null & disown

After the hard kill, the next start picks up the new bundle immediately.

Likely cause

agentmemory stop appears to disconnect the agentmemory node worker from the iii engine, but the iii engine process keeps running. When agentmemory starts again, it reconnects to the existing iii engine, which presumably retains the prior worker's registered function definitions (or otherwise serves them from a cache that the new worker can't override).

A pid check during my repro confirmed the iii Rust binary (~/.local/bin/iii) keeps running across agentmemory stop invocations.

Impact

  • Anyone doing local iteration on agentmemory has to know to pkill -9 for changes to take effect — the normal "stop && start" loop silently keeps old code live, which is extremely confusing
  • Related: the iii websocket reconnect chatter ([OTel] Disconnected from engine, will reconnect…) in the log makes it harder to tell whether a restart actually happened, because the same reconnect messages appear during regular long-running operations too

Suggested fix

agentmemory stop should signal the iii engine process to exit, not just disconnect the worker. Verify the iii pid is gone before returning. If iii has a graceful-shutdown API, prefer that; otherwise SIGTERM with a short timeout and SIGKILL fallback.

Context

Surfaced while iterating on #472 (chunking) and #473 (lesson visibility) — every rebuild+restart cycle was returning misleading "the new code isn't deployed" symptoms until I switched to the kill-9 workaround.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions