Skip to content

Add GitHub Copilot CLI support#1

Open
Random-Word wants to merge 39 commits into
mainfrom
feature/copilot-cli-support
Open

Add GitHub Copilot CLI support#1
Random-Word wants to merge 39 commits into
mainfrom
feature/copilot-cli-support

Conversation

@Random-Word
Copy link
Copy Markdown
Owner

@Random-Word Random-Word commented May 19, 2026

Summary

  • adds a GitHub Copilot CLI plugin manifest, MCP config, and hook config
  • adds agentmemory connect copilot-cli for MCP-only setup, including COPILOT_HOME and Windows-safe command handling
  • normalizes Copilot hook payload shapes and adds focused tests for plugin/install behavior

Validation

This PR is against the fork only for review before deciding whether to open the upstream PR.

Ross Story and others added 6 commits May 18, 2026 19:45
- plugin/.plugin/plugin.json: Copilot manifest with name/version/skills/mcpServers/hooks refs
- plugin/.mcp.copilot.json: MCP server config with type:local, npx, env passthrough, tools:[*]
- plugin/hooks/hooks.copilot.json: Copilot hooks (version:1) with 11 supported events and PreToolUse matcher
- test/copilot-plugin.test.ts: 11 tests covering manifest, MCP config, and hooks validation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds Copilot CLI support through a root plugin manifest, Copilot-specific MCP and hook configuration, and a connect adapter for MCP-only setup.

Includes Windows-safe Copilot MCP command generation, COPILOT_HOME handling, Copilot hook payload normalization, generated hook scripts, and targeted tests for plugin shape, hook execution, and connect behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Random-Word
Copy link
Copy Markdown
Owner Author

Reopening to trigger CI after workflows were registered on the fork.

@Random-Word
Copy link
Copy Markdown
Owner Author

CI probe complete; closing fork-local PR.

@Random-Word Random-Word reopened this May 19, 2026
@Random-Word Random-Word changed the title Test Copilot CLI support CI Add GitHub Copilot CLI support May 19, 2026
@Random-Word Random-Word requested a review from Copilot May 19, 2026 08:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GitHub Copilot CLI as a supported agent. Introduces a new plugin/plugin.json manifest plus Copilot-specific MCP and hooks config, a new copilot-cli connect adapter (with COPILOT_HOME support and Windows-safe command handling), and normalizes hook payload handling to also accept Copilot's camelCase field names (e.g. sessionId, toolName, toolArgs, userPrompt, errorMessage, notificationType, tool_result/toolResult). Adds focused tests for the plugin manifest, hooks config, hook scripts, and the new adapter.

Changes:

  • New Copilot plugin manifest + .mcp.copilot.json + hooks/hooks.copilot.json and a new copilot-cli connect adapter (allowed on Windows).
  • All Claude/Codex hook scripts and their TS sources now read both snake_case and camelCase payload fields; pre-tool-use switches to lowercase tool matching with create/view added; post-tool-use falls back to tool_result/toolResult.text_result_for_llm.
  • README/AGENTS/CLI help updated; new test/copilot-plugin.test.ts and Copilot-adapter tests in test/cli-connect.test.ts.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
plugin/plugin.json New top-level Copilot plugin manifest.
plugin/.mcp.copilot.json Copilot MCP server block (npx -y @agentmemory/mcp).
plugin/hooks/hooks.copilot.json Copilot hook event registrations.
plugin/scripts/*.mjs Built hook scripts updated to accept Copilot camelCase fields.
src/hooks/*.ts Source hooks updated in parallel for camelCase compatibility.
src/cli/connect/copilot-cli.ts New connect adapter for ~/.copilot/mcp-config.json.
src/cli/connect/util.ts Adds AGENTMEMORY_COPILOT_MCP_BLOCK with Windows-safe command.
src/cli/connect/index.ts Registers adapter; allows copilot-cli on Windows.
src/cli.ts Updates help text to list copilot-cli.
README.md / AGENTS.md Document new agent and update maintenance checklists.
test/cli-connect.test.ts Tests for new adapter and adapter count.
test/copilot-plugin.test.ts New tests for plugin manifest, MCP config, hooks, and hook scripts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugin/plugin.json
{
"name": "agentmemory",
"version": "0.9.20",
"description": "Persistent memory for AI coding agents -- captures tool usage, compresses via LLM, injects context into future sessions. 12 hooks, 53 MCP tools, 4 skills, real-time viewer.",
Ross Story and others added 13 commits May 19, 2026 02:04
Addresses upstream AI review suggestions by aligning the Copilot preToolUse matcher with the hook allowlist, narrowing hook payload fields at runtime, normalizing subagent fallbacks, and tightening hook config validation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Includes GitHub Copilot CLI in the first-run agent picker and adds a regression test so the Copilot setup path remains discoverable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Detect Copilot CLI environment markers during first-run setup so pressing Enter wires the current agent instead of the historical Claude Code default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Accept Content-Length framed JSON-RPC messages in addition to the existing newline-delimited transport so Copilot CLI can initialize the standalone MCP server.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ohitg00#517)

The viewer's five search inputs (graph, memories, lessons, actions,
crystals) destroy and recreate their input DOM via innerHTML on every
keystroke, which interrupts active IME composition sessions and makes
non-Latin input (Chinese, Japanese, Korean) unusable.

Additionally, the viewer's CSP includes script-src-attr 'none', which
silently blocks the inline oninput=/onchange= handlers on the lessons,
actions, and crystals panels. Those three search/filter controls have
been non-functional under the strict CSP.

This patch:

1. Adds a bindImeSafeSearch helper that guards on both an explicit
   compositionstart/compositionend flag and event.isComposing.
   compositionend triggers an immediate commit and sets a
   justCommitted one-shot flag to suppress the redundant trailing
   input event that browsers dispatch after compositionend.

2. Adds captureSearchFocus/restoreSearchFocus helpers to preserve
   focus and cursor position across innerHTML rebuilds, so multi-word
   IME input doesn't require clicking back into the search box after
   each commit.

3. Migrates all five search inputs to addEventListener via the new
   helpers, removing the CSP-blocked inline handlers on lessons,
   actions, and crystals. The actions panel's status filter <select>
   is also migrated for the same reason.

4. Unifies debounce delay to 200ms across all five panels.

Verified via:
- 8/8 jsdom + synthetic CompositionEvent regression cases (ASCII
  debounce, IME composing suppresses input, compositionend immediate
  commit + justCommitted suppression, post-IME ASCII resumes, fast
  typing coalesces, composition cancel returns to idle).
- 8/8 manual browser cases on Windows + Chrome (Chinese pinyin commit,
  multi-word IME without re-focusing, English regression, all five
  panels, zero CSP violations in DevTools, cursor position retained).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: 이민재 <19909783+honor2030@users.noreply.github.com>
Co-authored-by: honor2030 <19909783+honor2030@users.noreply.github.com>
…oken_budget (rohitg00#507) (rohitg00#516)

* fix(mcp): route memory_recall to /agentmemory/search and forward format/token_budget

memory_recall and memory_smart_search were sharing the smart-search
endpoint, which always returns compact mode and silently drops the
format and token_budget parameters that the tool schema advertises.
Split the cases so memory_recall hits /agentmemory/search (which
honors format) while memory_smart_search keeps its own endpoint.
Default format to "full" for memory_recall so the documented behavior
matches the wire call.

Signed-off-by: serhiizghama <zmrser@gmail.com>

* test(mcp): cover memory_recall endpoint, format forwarding, and defaults

Two new proxy tests for issue rohitg00#507: one asserts memory_recall calls
POST /agentmemory/search with the format and token_budget fields,
and never falls through to smart-search; the other pins the default
format to "full" when the caller omits it.

Signed-off-by: serhiizghama <zmrser@gmail.com>

---------

Signed-off-by: serhiizghama <zmrser@gmail.com>
…660 shipped) (rohitg00#546)

v0.9.19 (rohitg00#460 / commit bb259ac) routed the first-run iii-console
install through `bash -s -- --next` to dodge the upstream tag-prefix
bug at iii-hq/iii#1652. Upstream PR iii-hq/iii#1660 fixed the bug on
2026-05-19 — installer's jq filter now accepts both `iii/v...` and
bare `v...` tags, and `-v X.Y.Z` falls back gracefully.

`install.iii.dev/console/main/install.sh` is a thin proxy serving
`raw.githubusercontent.com/iii-hq/iii/main/console/install.sh` with
a 5-minute CDN cache — verified byte-for-byte that the live URL
already serves the post-#1660 fix. No iii release tag needed.

Switch agentmemory back to the canonical bare invocation:

  curl -fsSL https://install.iii.dev/console/main/install.sh | sh

Drops the workaround comment block (10 lines) explaining the prior
detour. v0.9.19/v0.9.20 users on the `--next` path will still
resolve a valid release (next-release lookup also handles
`iii/v...-next.*` correctly post-#1660), so this isn't a forced
upgrade.

1038/1038 tests pass.
…ce (rohitg00#545)

* feat(repo): add Sponsor button + GH Packages mirror for sidebar surface

Three additions that make the repo page surface clearer + give users
a single place to fund the project:

1. `.github/FUNDING.yml` — `github: [rohitg00]` renders the "Sponsor"
   button at the top of the repo + the Sponsor widget in the right
   sidebar. Requires GitHub Sponsors to be enabled at
   github.com/sponsors/accounts on the rohitg00 profile before the
   link resolves (currently 404s — enable before merging this PR).

2. `.github/workflows/publish.yml` — new `publish-github-packages`
   job runs after the existing public-npm publish completes.
   Republishes the main package as `@rohitg00/agentmemory` to
   `npm.pkg.github.com`. The repo's right-sidebar "Packages" widget
   only surfaces packages on GitHub Packages, not packages on the
   public npm registry, so this is what makes the sidebar widget
   non-empty. Public npm remains the canonical install source;
   GH Packages is purely a discovery surface.

   - Uses built-in GITHUB_TOKEN, no new secrets needed.
   - Rewrites package.json `name` + `publishConfig` in-runner via a
     small node one-liner, publishes, then restores the original so
     main isn't permanently scope-changed.
   - Skip-on-already-published guard mirrors the existing public
     publish steps.
   - Marked `|| echo "non-fatal"` so a GH Packages hiccup never blocks
     the canonical npm release.
   - `permissions: packages: write` added at workflow level.

3. README badge row — added `npm downloads`, `GitHub Packages mirror`,
   and `Sponsor rohitg00 on GitHub Sponsors` badges alongside the
   existing `npm version` / `CI` / `License` / `Stars` row. The
   sponsor badge is the same link the FUNDING.yml sidebar widget
   uses; surfacing it in-README means readers who don't notice the
   sidebar still see it.

Out of scope (asked, declined):
- Docker Hub / ghcr.io publish workflow. Not in this PR.

* ci(publish): scope write perms per-job + persist-credentials false

Inline review on rohitg00#545 flagged that the workflow-level permissions
block granted `id-token: write` + `packages: write` to every job,
including ones that don't need them. Tightened to least-privilege:

- Workflow-level: only `contents: read`.
- `publish` job: adds `id-token: write` (required for `npm publish
  --provenance` to mint a Sigstore OIDC token). The GH Packages
  job doesn't inherit this.
- `publish-github-packages` job: adds `packages: write` (required
  to push to npm.pkg.github.com). The public-npm publish job
  doesn't inherit this.

Both `actions/checkout@v6` calls also pick up `persist-credentials:
false`. The publish steps never push back to the repo, so the
GITHUB_TOKEN doesn't need to land in `.git/config` after checkout.
Same posture both jobs.

Skipped from the same review pass:

- **Pin actions to commit SHAs.** Industry rule but introduces real
  maintenance friction — Renovate/Dependabot don't auto-bump
  SHA-pinned actions to new minors, so SHA pinning trades easy
  semver tracking for stale-action drift. We stay on `@v6` major-tag
  pins (GitHub publishes those via verified moving refs).
- **Disable setup-node cache.** `actions/setup-node@v6` defaults to
  cache-off (the `cache:` input is opt-in). `package-manager-cache`
  only auto-enables when `package.json` has a `packageManager` field
  — agentmemory's doesn't (verified via `grep`). The fix is a no-op
  on this workflow.
…tg00#547)

Sponsor button still missing from the repo page despite rohitg00#545 merging.
The committed FUNDING.yml started with 4 lines of `#` comments before
the canonical `github: [rohitg00]` directive. GitHub's FUNDING parser
documents only the canonical key-value form; leading comments
shouldn't break it but some users have reported indexer lag when the
file starts with non-data lines. Strip to the bare single-line form
to match the documented schema and remove any ambiguity.

Sponsor profile is enabled (github.com/sponsors/rohitg00 returns 200
+ 'Sponsor @rohitg00' button), so the only remaining gap is GitHub's
side-bar indexing. Tightening the file forces a re-parse.
…ohitg00#548)

Reverting the GH Packages publish from rohitg00#545. GH Packages is a
separate registry from npmjs.com — anyone installing
`@rohitg00/agentmemory` from `npm.pkg.github.com` needs to point
their registry there and authenticate, which is friction users
don't have on the canonical `@agentmemory/agentmemory` install
from public npm.

The right-sidebar Packages widget on the repo page was the only
motivation for the mirror. Acceptable to leave it empty — the
single canonical install path is the better DX.

- Drop `publish-github-packages` job from `.github/workflows/publish.yml`
- Drop `packages: write` perm wording from the workflow comment block
- Remove "GitHub Packages mirror" badge from README

Manual follow-up (post-merge): delete the already-published
`@rohitg00/agentmemory@0.9.20` from GH Packages registry via
github.com/users/rohitg00/packages/npm/agentmemory/settings → Delete.
…#549)

GitHub auto-renders the "Sponsor this project" widget in the
right sidebar from .github/FUNDING.yml (Sponsor button + heart icon
+ "Learn more about GitHub Sponsors" link). The README badge was
redundant noise on the top badge row.

Sidebar widget is the canonical surface — one path, one click.
honor2030 and others added 20 commits May 19, 2026 19:18
Co-authored-by: honor2030 <19909783+honor2030@users.noreply.github.com>
* fix(hermes): declare all plugin hooks

* test(hermes): compare manifest hooks to provider

---------

Co-authored-by: honor2030 <19909783+honor2030@users.noreply.github.com>
…s run (rohitg00#500)

mem::observe's boot flow had this sequence in main():

  1. registerSearchFunction / registerContextFunction / ...
     (sync — completes immediately)
  2. restore persisted vector index from disk
  3. await rebuildIndex(kv)        ← blocks here
  4. bootLog "Ready" / "REST API" / "MCP surface"
  5. startViewerServer(...)
  6. setInterval auto-forget / lesson decay / consolidation

rebuildIndex iterates every observation across every session and AWAITS
an embedding-provider call per record. On a large corpus + a rate-limited
embedding endpoint (e.g. 100 RPM), step 3 takes hours to days.
Everything that runs AFTER it — including startViewerServer — is
silently delayed for the same duration.

Symptoms in the wild:
- http://localhost:3113/ unreachable (no listening socket on the viewer
  port) even on a freshly-started server
- `agentmemory doctor` reports "viewer-unreachable"
- log floods with `vector-index add: embed failed — skipping {429: ...}`
  from the still-running rebuild burning rate-limit budget
- no error message — the worker stays alive serving HTTP because
  sdk.registerFunction had already completed synchronously in step 1

Fix: detach rebuildIndex with `void` + .then/.catch instead of awaiting.
The index lazily fills in over time, search degrades gracefully (BM25
keeps working immediately, vector results fill in as the embed queue
drains), and the viewer comes up in seconds.

Repro on the operator side:
1. import a sizeable jsonl corpus (`mem::replay::import-jsonl`)
2. clear the persisted vector index so rebuildIndex runs on next boot
3. restart agentmemory with EMBEDDING_PROVIDER pointed at a rate-limited
   endpoint (any OpenAI-compat with low RPM)
4. observe: REST API responds on :3111, but :3113 is never bound, and
   the doctor's "viewer-unreachable" check fires until the rebuild
   finishes (hours-to-days for a 300+ session corpus)

The 5-second non-fix workaround was a hard kill + restart; that just
re-entered the same hang.

No tests added — main() isn't unit-tested today and wiring up a fake
slow rebuildIndex + asserting the post-rebuild boot lines run early
would need the full worker mock harness. The change is one line and
the failure mode is dramatic; visual review + integration smoke covers
the regression risk.
…rpora) (rohitg00#504)

* fix(rebuild): batch embed calls in rebuildIndex (25h → 3h on large corpora)

rebuildIndex called `await vectorIndexAddGuarded(...)` per memory and
per observation. Each call is one HTTP round-trip to the embedding
provider for a single input. On a 500k-observation imported corpus
against an embedding endpoint with even modest latency, that's
serial 100-200ms per call = 14-28 hours of wallclock. The new
non-blocking rebuild path (rohitg00#500) made this no longer block boot, but
the rebuild itself still takes the same wallclock.

Add `vectorIndexAddBatchGuarded()` next to the existing per-item
helper, accepting an array of items and calling `provider.embedBatch()`
once. For batchable endpoints (vLLM, Triton, OpenAI's `/v1/embeddings`
all accept an `input` array), latency for N items is roughly the
latency of a single embed because network + GPU setup amortize.

Refactor `rebuildIndex` to accumulate items into a buffer and flush
every REBUILD_EMBED_BATCH_SIZE (default 32). BM25 add stays
per-item-synchronous; only the vector path is batched.

Validated against a vLLM Qwen3-Embedding-8B endpoint:
  - single embed: 175ms
  - batch-of-32:  737ms (= 23ms/item amortized, ~7.6× speedup)
  - projected backfill time for 500k obs: 25h → 3h

Per-item failure shape is preserved:
  - whole-batch network/provider error → all skipped, single warn line
    (vs N warns previously when the same error hit every item)
  - per-item dimension mismatch → that item skipped, others continue
  - rebuildIndex return value unchanged (count of attempted items)

Override knob:
  - REBUILD_EMBED_BATCH_SIZE (default 32) — set lower for endpoints
    with small per-request input limits, higher for endpoints that
    prefer larger batches. Set to 1 to fall back to the per-item path.

39/39 existing tests in search-index/vector-index/remember-bm25-index
pass unchanged.

Related: rohitg00#500 (non-blocking rebuildIndex), rohitg00#503 (separate embedding
base URL).

* fix(rebuild): per-item vi.add try/catch to preserve soft-fail

Restores the pre-batch soft-fail behavior — a single failing
vi.add() no longer aborts the entire rebuild batch. Failures
are logged and counted toward fail, just like dimension
mismatches above.
…g00#472)

* fix(summarize): chunk large sessions to fit LLM context window

JSONL-imported sessions can have far more observations than the
500-cap MAX_OBS_PER_SESSION that constrains native sessions.
mem::summarize previously built one prompt containing every
observation and shipped it as a single LLM call, which exceeded the
provider's context window for sessions >~7,000 observations and
returned an unhelpful 400 from upstream — silently leaving large
bulk-imported sessions out of the semantic tier.

Approach: map-reduce inside mem::summarize.
- Sessions ≤ SUMMARIZE_CHUNK_SIZE (default 400) take the legacy
  single-call path with no overhead
- Larger sessions are split into chunks, each summarized with the
  existing per-session prompt in parallel batches of
  SUMMARIZE_CHUNK_CONCURRENCY (default 6), and partial summaries
  merged via a new REDUCE_SYSTEM prompt
- Per-chunk retry-once on transient parse / provider errors
- Persistently-failing chunks are skipped (not propagated) so a
  flaky chunk doesn't waste 30+ already-completed LLM calls on
  the same session
- Bail with too_many_chunks_skipped only if >50% of chunks fail

Companion operator tool: scripts/backfill-imported-sessions.sh
walks jsonl-imported sessions and POSTs mem::summarize per session,
with project / agent / obs-count filters, cost estimation, and
per-failure payload dumping for debugging provider rejections.

Validated locally against a real corpus:
- 5,392-obs session (14 chunks, c=6): 39s
- 10,704-obs session (27 chunks, c=6): 34s
- 105,966-obs session (265 chunks, c=50): handler completes
  server-side and persists
- 52-session bulk backfill → 25 new semantic facts + 6 new reflect
  insights produced by consolidate-pipeline

Known limit: iii-engine has a hardcoded 180s function-invocation
timeout. Sessions large enough that chunked summarize wallclock
exceeds that will return a timeout/500 to the HTTP client even
though the handler completes and persists server-side. High-RPM
providers (Novita / DeepInfra / DeepSeek typically allow 100+
concurrent) can raise SUMMARIZE_CHUNK_CONCURRENCY to push the
cliff well past any realistic session size. True fix is an
async-job pattern; left as follow-up.

- src/prompts/summary.ts: add REDUCE_SYSTEM + buildReducePrompt
- src/functions/summarize.ts: chunking, retry, skip, parallelism
- test/summarize.test.ts: 9 cases covering single-call path,
  chunking, env-override, retry-then-success, persistent skip,
  too-many-skipped bail, provider error after retry, concurrency
- .env.example: document SUMMARIZE_CHUNK_SIZE / _CONCURRENCY
- .gitignore: agentmemory-debug/, data-*/  (operator artefacts)
- scripts/backfill-imported-sessions.sh: bulk-import backfill tool

9/9 new tests pass; existing tests untouched.

* fix(summarize): address CodeRabbit review on rohitg00#472

Four nits flagged by the automated reviewer, all worth fixing:

- scripts/backfill: add curl --connect-timeout + --max-time profiles
  (META_CURL_OPTS vs WORK_CURL_OPTS). Metadata reads fail fast and
  retry on transient blips; LLM-backed work calls get a wide 30-min
  cap and no retry (retrying a half-finished LLM job double-spends).
- scripts/backfill: sanitize sessionId before joining with DEBUG_DIR
  in dump_failure() (otherwise a session id containing `/` or `..`
  could escape the debug dir). UUIDs in practice, but the server
  doesn't enforce that.
- scripts/backfill: switch the observations query to
  `--get --data-urlencode "sessionId=$id"` so special characters
  can't corrupt the query string.
- scripts/backfill: guard `jq` on summarize + consolidate responses
  with `jq -e . </dev/null 2>&1` first. iii's HTTP layer occasionally
  returns non-JSON (HTML 5xx, empty body on timeout). Without the
  guard, `set -e` aborts the whole backfill loop on a single bad
  response — now it logs `invalid_json_response` and moves on.
- test/summarize.test.ts: fix `vi.mock("./audit.js", ...)` path to
  `"../src/functions/audit.js"`. The old path resolved to
  `test/audit.js` (nonexistent), so the mock was a silent no-op.
  Tests passed anyway because `safeAudit` writes to a mocked KV.

9/9 tests still pass; backfill dry-run still resolves the corpus
cleanly.
… diagnose (rohitg00#473)

* fix(visibility): surface lessons in smart-search + tally per-store in diagnose

Two related UX gaps in the memory layer's reflection surfaces. A consumer
that calls `memory_lesson_save` and gets `success:true` reasonably expects
to find the lesson via `memory_smart_search` ("did my save land?") and
to see it counted in `memory_diagnose` ("what's in the store?"). Neither
was true: lessons live in their own KV store (`KV.lessons`), and both
diagnostic surfaces only looked at `KV.observations` / `KV.memories`.
A 4,350-lesson store could read as "memories: 0" on diagnose and return
zero hits on smart_search — the trust-shock that prompted this fix.

A) mem::smart-search: also return lessons in the compact response.
   - New optional `project` and `includeLessons` (default true) params.
   - Delegates lesson scoring to the existing mem::lesson-recall via
     sdk.trigger, so confidence + recency weighting stays consistent
     with mem::lesson-recall (no duplicate scoring logic).
   - Lessons come back in a separate `lessons` field on the response,
     not merged into `results`. Existing consumers reading `results`
     are unaffected; new consumers can read `result.lessons` too.
   - Content truncated to 240 chars in compact mode (full content
     remains available via mem::lesson-recall directly).
   - Lesson-recall failures are soft: log + return empty lessons,
     observation results still flow through.

B) mem::diagnose: add per-store tally categories for lessons,
   summaries, semantic, procedural, crystals, insights. Mirrors the
   existing `memories` pattern: count + light consistency check
   (confidence range for scored memories; non-empty title/narrative/
   steps for the rest). Each new category is in ALL_CATEGORIES so
   `--categories lessons` filtering works as expected.

The empty-system pass count goes from 8 to 14 (8 original + 6 new
stores). Test updated accordingly.

- src/types.ts: add CompactLessonResult
- src/functions/smart-search.ts: lesson recall + merge (single-call
  path unchanged, expand mode unchanged)
- src/functions/diagnostics.ts: six new category blocks before mesh
- test/smart-search.test.ts: 6 new cases (lesson inclusion, content
  preview truncation, includeLessons=false opt-out, project filter
  passthrough, soft-fail on recall error / non-success response)
- test/diagnostics.test.ts: 7 new pass/warn cases for each new
  category + filter check; empty-system pass count bumped 8→14

43/43 tests pass.

* fix(diagnostics): defensive guards on new validators (CodeRabbit rohitg00#473 review)

CodeRabbit flagged two patterns in the per-store validators added in
the parent commit:

1. .trim() on .title / .narrative was unconditional — a corrupted row
   with title=null or title=42 would throw, abort the whole diagnose
   run, and silently skip every later category. Add typeof guards.

2. confidence range checks were `< 0 || > 1` which silently passes
   NaN and Infinity (NaN < 0 is false, NaN > 1 is false → "healthy").
   Add Number.isFinite(...) prefix so corrupted scored rows surface
   as warnings instead.

Applied across all 6 new validators: lesson confidence, summary title,
semantic confidence, crystal narrative, insight confidence.

Tests added in test/diagnostics.test.ts under "defensive row-shape
handling": NaN confidence on a lesson, null summary title (verifies
diagnose still completes and later categories still execute),
undefined crystal narrative, Infinity / NaN on insight + semantic.

34/34 tests pass.
Quality + integration wave. Bundles 11 PRs since v0.9.20:

Contributor feature:
- rohitg00#237 OpenCode plugin with 22 auto-capture hooks (@cl0ckt0wer)

Bug fixes (9):
- rohitg00#516 memory_recall endpoint + format/token_budget (@serhiizghama, closes rohitg00#507/rohitg00#440)
- rohitg00#461 env-file AGENTMEMORY_DROP_STALE_INDEX flag honored (@honor2030, closes rohitg00#456)
- rohitg00#487 Windows hook path quoting (@honor2030, closes rohitg00#477)
- rohitg00#517 viewer IME composition guard (@jonathanzhan1975)
- rohitg00#472 chunk large sessions for LLM context window (@efenex)
- rohitg00#473 surface lessons in smart-search + diagnose tally (@efenex)
- rohitg00#486 declare all Hermes plugin hooks (@honor2030)
- rohitg00#500 rebuildIndex non-blocking on boot (@efenex)
- rohitg00#504 batched embed in rebuildIndex (25h -> 3h) (@efenex)
- rohitg00#491 cli skip onboarding without tty (@honor2030)

Upstream-installer revert:
- rohitg00#546 drop --next workaround now that iii-hq/iii#1660 shipped

1067/1067 tests pass across 95 files.
* ci: cross-platform matrix + paths-ignore + concurrency

1. **OS matrix** — Linux + Windows + macOS, both Node 20 + 22. 6 cells,
   ~3min each, ~18min wall time. Direct test against the class of
   bug rohitg00#487 caught: hooks crashing on Windows usernames with spaces.
   Pre-merge Linux-only CI meant that bug landed in main + a release.
   fail-fast: false so a flake on one cell doesn't mask whether the
   same failure reproduces elsewhere.

2. **paths-ignore** — skip CI runs on README / CHANGELOG / docs /
   website / assets / .md / .mdx pushes. ~half the runner minutes
   back on doc-only churn. Source / config / workflow changes
   always run.

3. **concurrency + cancel-in-progress** — PR force-pushes cancel
   in-flight runs instead of piling them up. Push to main protected
   (concurrency group still scoped to ref, no cancel for main pushes).

Plus minor hardening: persist-credentials: false on the checkout
step so the GITHUB_TOKEN doesn't land in .git/config.

What was NOT lifted (rationale per plan):
- Per-package reusable workflows (Rust/Python/Homebrew — non-TS).
- License-header check (no per-file Apache banners in agentmemory).
- CLA bot (defer until external PR volume justifies friction).
- tsc --noEmit lint job (codebase has ~10 pre-existing type errors
  tsdown skips; gating CI on those would block every PR until
  fixed; tracked as separate cleanup).
- Smoke test (`agentmemory demo + livez`) — defer to its own PR
  with its own validation cycle.
- Codecov badge — defer until baseline is set.

* ci(windows): force bash shell so build script's POSIX idioms work

Windows runners default to cmd.exe for npm run scripts; the build
script uses POSIX patterns the build script's exit codes
(`cp ... 2>/dev/null || true`, `mkdir -p`) that cmd doesn't
parse. ubuntu + macos already use bash by default so this is
Windows-only behaviour change.

Alternative: rewrite the build script in Node. Bigger lift, not
minimal.

* ci(windows): point npm script-shell at git-bash before build

`shell: bash` on the step only sets the shell for the step's own
runner; `npm run` still spawns its inner script via npm's
`script-shell` config, which defaults to cmd.exe on Windows.

Configure npm to use Git-Bash (preinstalled on GitHub-hosted
Windows runners) so `npm run build` and `npm run test` execute
the build script the same way ubuntu + macos do.

Step is gated on `runner.os == 'Windows'` so it's a no-op on the
other matrix cells.

* ci: drop windows-latest from matrix (obsidian-export hardcoded POSIX paths)

Windows runners fail on test/obsidian-export.test.ts because the
test + src hardcode `/tmp/...` POSIX paths that don't resolve on
the D:\ drive Windows uses. Fixing it cleanly requires reworking
src/functions/obsidian-export.ts to use os.tmpdir() + path.join,
which is a separate scope.

Drop windows from the matrix for now. Ship ubuntu + macos coverage
(real darwin/linux divergence catch) and file a follow-up to make
obsidian-export cross-platform so Windows can be added back.

* test(fs-watcher): bump waits to 1500ms + describe retry for macos fsevents flake
…rpus (rohitg00#562)

* feat(eval): pluggable benchmark harness with in-house coding-agent corpus

Adds eval/ tree (outside files field so npm tarball stays thin) with Adapter
interface, three reference adapters (grep / vector / agentmemory-hybrid),
two benchmarks (LongMemEval _s public, coding-agent-life-v1 in-house 15
sessions), scoring (P@K, R@K, hit, top-gold-rank), NDJSON output,
sandbox script.

coding-agent-life-v1 published scorecard at
docs/benchmarks/2026-05-20-coding-agent-life-v1.md:
agentmemory-hybrid R@5=0.967 P@5=0.578 (100% hit) vs grep R@5=0.967 P@5=0.267.
2.2x better precision on identical input, sandbox-reproducible.

Adapter contract: init(sessions, config) -> State; query(q, state, k) -> RankedDoc[]

npm scripts:
  npm run eval:coding-life   (no download, no API key for grep)
  npm run eval:longmemeval   (needs OPENAI key + 278MB download)

eval/scripts/sandbox.sh boots clean agentmemory + iii-engine on ports
3411/3412 with isolated data dir; tears down on exit.

README headline updated. 1072/1072 tests pass + 5 new eval tests.

* fix(eval): address review findings on benchmark harness

- agentmemory adapter: prefer row.sessionId before observationToSession lookup
- vector adapter: validate embedBatch response (length, indexes, non-empty rows)
- coding-life: positive-int guard on --k; wrap query loop in try/finally so teardown runs
- longmemeval: positive-int guards on --k/--limit/--stratify; per-question try/finally
- load: throw on haystack_session_ids vs haystack_sessions length mismatch
- score: P@K denominator is k (requested cutoff) not topK.length
- sandbox.sh: guard rm -rf with non-empty + /tmp/ prefix check
- README: drop unsafe rm "$(which iii)"; instruct ~/.local/bin + PATH instead; add language tag to repo-layout fenced block
- sessions.json: fix "two-phase" -> "three-phase" wording mismatch
…hitg00#509) (rohitg00#564)

Codex Desktop currently does not dispatch plugin-local hooks.json even
though both CodexHooks and PluginHooks feature flags are stable +
default-enabled in codex-rs/features/src/lib.rs (openai/codex#16430).
MCP tools still work; lifecycle observations are silently missing.

Adds `agentmemory connect codex --with-hooks` which mirrors the bundled
hooks.codex.json into the user-scope ~/.codex/hooks.json:

- Resolves ${CLAUDE_PLUGIN_ROOT} to the absolute bundled plugin/ path
  (user-scope hooks don't get plugin-root injection)
- Idempotent merge: previous agentmemory entries are stripped on
  reinstall via the resolved scripts/ path prefix; unrelated user
  hooks are preserved untouched
- Preserves matcher fields from the bundled manifest so PreToolUse
  routing still works
- findPluginRoot walks up from import.meta.url to locate the plugin/
  dir; works for both dist/cli.mjs (bundled) and src/ (dev) layouts
- Dry-run path previews both TOML and hooks.json changes

Closes rohitg00#509.
* fix(deps): pin iii-sdk to 0.11.2 to avoid routing regression in 0.11.6

iii-sdk@0.11.6 changes nested behavior so that all /agentmemory/* routes
return 404 against the iii-engine, even though both packages still
satisfy the previous "^0.11.2" semver range. npm picked up the new
version on `npm install -g @agentmemory/agentmemory` after 0.9.21
shipped, silently breaking installs.

Two pin sites:

1. package.json — caret -> exact "0.11.2" so npm cannot drift forward
   on minor releases until the upstream regression is sorted.

2. src/cli.ts — `agentmemory setup` previously ran
   `pnpm up iii-sdk@latest` / `npm install iii-sdk@latest`, which would
   re-pull 0.11.6+ even after a freshly-pinned install. Both call sites
   now pin to 0.11.2 with a label referencing this issue.

Tests (1081) + build pass against iii-sdk@0.11.2.

Closes rohitg00#555.

* fix(cli): drop issue ref from iii-sdk pin label

Source labels should describe what the code does, not point at issues
that rot as the codebase evolves. Issue context lives in the PR body.
…ohitg00#561)

* fix: read tool_response instead of tool_output in PostToolUse hook

Claude Code's PostToolUse payload sends the field as `tool_response`,
not `tool_output`. The hook was reading `data.tool_output` which is
always undefined, so `cleanOutput` was undefined, the observe request
contained no `tool_output` value, and mem::compress consistently failed
its XML schema validation (requires narrative >= 10 chars + facts >= 1).

Fix: read `data.tool_response` with `data.tool_output` as a fallback
so older integrations that emit the legacy field name keep working.

Fixes rohitg00#539

* style: remove explanatory comment per repo guidelines
…body (rohitg00#526)

OpenAI API spec defines `stream` as defaulting to false when absent, so
the current code (which omits it) should yield JSON. Some OpenAI-compatible
proxies disagree and default to text/event-stream, which crashes the
`response.json()` parser below with:

  Unexpected token 'd', "data: {"id"... is not valid JSON

After a few of these in a row, the resilient wrapper's circuit breaker
trips and all subsequent compression calls fail with `circuit_breaker_open`,
silently disabling LLM-backed compression / summarisation / reflection.

Reproduced upstream in decolua/9router#1260: 9Router's `handleChatCore`
returns SSE unless `stream: false` is explicit. PR
decolua/9router#1272 fixes the proxy side, but
sending the field explicitly here is defensive — other OpenAI-compatible
endpoints (older self-hosted proxies, vLLM compat shims, …) hit the same
spec gap.

No behavior change for spec-compliant endpoints (openai.com, Azure
OpenAI, well-behaved proxies): they already default to non-streaming
when `stream` is absent, so making it explicit is a no-op there.

Co-authored-by: Ptah-CT <221234802+Ptah-CT@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…tg00#560)

* fix(cli): accurately display bound viewer port on splash screen
- Expose viewerPort and viewerSkipped state in /agentmemory/livez endpoint.
- Update CLI readiness check to poll until the viewer port is bound or explicitly skipped.
- Prevents misleading default port (3113) display on splash screen when the viewer falls back to another port.

* fix(viewer): address CodeRabbitAI review
Signed-off-by: aqilaziz <gonzes7@gmail.com>
Ensures pre-tool-use only forwards string session IDs and falls back to unknown for invalid Copilot payload values, with regression coverage for the generated plugin script.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves conflicts with current main, keeps Copilot CLI support intact, and preserves the Codex hook idempotency fix from main.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add sudo for global installation command

* Update README.md

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update README with EACCES retry instructions

Added installation instructions for macOS/Linux users.

* docs: drop backticks around package name inside bash code fence

Backticks inside a ```bash fenced block are still copy-pasted literally
by users, and bash interprets them as command substitution. The package
name in the install line had decorative backticks that turn into a shell
syntax error when pasted as-is.

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Rohit Ghumare <ghumare64@gmail.com>
Keeps upstream install guidance and retains Copilot CLI in the supported agent list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.