feat: reasoning traces (System 2 Memory) — ADR-003#126
Conversation
Phase 1: Data Model - Migration 065: reasoning_traces table (steps JSONB, quality_score, task_context) - GORM model ReasoningTrace with BeforeCreate hook - ReasoningTraceStore with Create/GetBySession/SearchByProject Phase 2: Extraction - reasoning_detector.go: DetectReasoning() — 3+ pattern matches in 200+ char text - Extraction + quality evaluation LLM prompts - Async extraction in ProcessObservation (non-blocking goroutine) - Quality threshold ≥ 0.5 to store Phase 3: MCP Integration - recall(action="reasoning") — searches traces by project, formats with step types - "reasoning" added to recall tool action enum - Wired into worker service (processor + MCP server) ADR-003 implemented. Inspired by Cipher's System 2 dual memory architecture.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
WalkthroughДобавлена поддержка извлечения, оценки и хранения цепочек рассуждений: новая таблица БД и модель GORM, хранилище для reasoning traces, асинхронная логика извлечения и оценки в SDK-процессоре с вызовами LLM, и поисковый recall-экшен в MCP-сервере для отображения сохранённых цепочек. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Processor as SDK Processor
participant Detector as Reasoning Detector
participant LLM
participant DB as ReasoningTraceStore
Client->>Processor: ProcessObservation(response)
Processor->>Detector: DetectReasoning(response)
Detector-->>Processor: bool (detected?)
alt Detected && reasoningStore configured && storedCount>0
Processor->>Processor: spawn goroutine extractAndStoreReasoning
Processor->>LLM: Extract chain (reasoningExtractionPrompt + response)
LLM-->>Processor: JSON trace
Processor->>LLM: Score trace (reasoningQualityPrompt + trace)
LLM-->>Processor: quality (0..1)
alt quality >= 0.5
Processor->>DB: Create(ReasoningTrace)
DB-->>Processor: id / error
end
end
sequenceDiagram
participant User
participant MCP as MCP Server
participant Store as ReasoningTraceStore
participant DB
User->>MCP: recall(action="reasoning", project, limit)
MCP->>Store: SearchByProject(project, limit)
Store->>DB: SELECT .. WHERE project=? AND quality_score>=0.5 ORDER BY ...
DB-->>Store: []ReasoningTrace
Store-->>MCP: traces
MCP->>MCP: unmarshal steps JSON, format output
MCP-->>User: formatted reasoning chains
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 минут Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
@coderabbitai review |
|
@gemini-code-assist review |
✅ Actions performedReview triggered.
|
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@codex review |
* refactor: move max_tokens from hardcoded 4096 to ENGRAM_LLM_MAX_TOKENS (#49)
Configurable via env var ENGRAM_LLM_MAX_TOKENS (default: 4096).
Stored in config.Config.LLMMaxTokens and OpenAIConfig.MaxTokens.
Removes magic number from LLM client.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* docs: add pre-commit guardrails + re-benchmark tech debt items
* fix: scoring formulas — guidance type weight/decay + meaningful total_results (#50)
- Add type=guidance to typeWeights (1.8, highest) and typeHalfLife (365 days)
- Behavioral rules no longer decay in 7 days or get default weight 1.0
- sourceBoost 1.3 for LLM-extracted guidance (live user_behavior detection)
- total_results now counts observations with composite score > 0.05
(was raw DB count — in high-dim space all observations passed threshold,
showing "33 matches" for every query regardless of relevance)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: exclude behavioral rules from contradiction detection (#51)
Imported feedback rules (type=decision, concept=user-preference, title
starts with "Rule:") were all classified as contradicting each other
because classifyRelation marks any two decisions with different titles
and similarity > 0.7 as contradicts. 57 rules × 56 peers = 76 false
contradiction edges in the knowledge graph.
Added hasGuidanceConcept() check: skips contradiction detection for
observations that are behavioral rules (type=guidance, or concept
user-preference, or title prefix "Rule:").
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.6.4
* chore: update marketplace for v1.6.5
* fix: filter heartbeat and Telegram metadata from user prompts (#52)
Skip HEARTBEAT.md polling (openclaw every 30min) and Telegram
conversation/sender metadata from being stored as user prompts.
These are system-generated, not real user interactions.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: register PreCompact hook and add discovery logging
PreCompact hook was created but never registered in hooks.json.
Now registered with 10s timeout. Hook writes discovery data to
.agent/pre-compact-discovery.json for empirical testing of
available input fields (transcript_path verification for FR-2).
* feat: always-inject tier for behavioral rules (FR-1, FR-6)
Three-tier injection system: observations tagged with concept
"always-inject" are now fetched independently of similarity
matching and included in every session (session-start) and
every prompt (user-prompt) context.
Server changes:
- GetAlwaysInjectObservations query (concepts @> GIN index)
- GIN indexes on concepts, files_modified, files_read columns
- Migration 048 for all new indexes
- handleContextInject + handleSearchByPrompt return always_inject array
- AlwaysInjectLimit (default 20) and ProjectInjectLimit (default 15) config
Hook changes:
- session-start.js renders <user-behavior-rules> block before <engram-context>
- user-prompt.js merges always-inject + similarity-matched rules with dedup
- Plugin version bumped to 0.6.0
Also adds GetObservationsByFile and GetPreviousObservationInSession
queries for Phase 3 and Phase 4 (no callers yet).
* feat: PreCompact hook sends full transcript to backfill (FR-2)
Reads transcript JSONL at compaction time, parses all user/assistant
messages, and sends to /api/backfill/session in chunks of 50 messages.
Fire-and-forget with 5s timeout per chunk (Constitution Principle 3).
Fallback: if input.transcript_path is missing, derives path by
searching ~/.claude/projects/<hash>/<session>.jsonl.
Also writes discovery report to .agent/pre-compact-discovery.json
for empirical verification of available hook input fields.
* feat: PreToolUse file-context injection (FR-3)
New hook and endpoint for automatic file-specific knowledge
injection before Edit/Write operations.
Server:
- GET /api/context/by-file endpoint (handlers_context_file.go)
- Returns observations matching files_modified/files_read
- Graceful degradation: empty response on error (NFR-3)
Hook:
- pre-tool-use.js matches Edit/Write tools only
- Extracts file_path, queries /api/context/by-file
- Returns <file-context> XML block as systemMessage
- 200ms timeout with empty fallback
- Registered in hooks.json with "Edit|Write" matcher
* feat: causal chain linking — follows + prompted_by relations (FR-4, FR-5)
Observations within the same session are now automatically linked:
- "follows" relation: connects consecutive observations by prompt_number
- "prompted_by" relation: links observation to the user prompt that triggered it
Both relations are created via pure DB queries (< 10ms overhead per
observation, NFR-4) during the existing relation detection pipeline.
Changes:
- relation/detector.go: add temporal + prompt linking before similarity search
- prompt_store.go: add GetPromptForObservation query
- service.go: pass promptStore to NewDetector constructor
* refactor: extract shared normalizeEngramContent helper and normalize write-tool check
- Create plugin/openclaw-engram/src/hooks/content.ts with normalizeEngramContent()
centralizing stripEngramContext + CONTENT_MAX_CHARS truncation used by both
before-compaction and session-end hooks (eliminates duplicate implementations)
- Update before-compaction.ts and session-end.ts to import and use the shared helper
- Simplify WRITE_TOOLS Set to lowercase-only entries and normalize via
toolName.toLowerCase() in isWriteOrEdit() for reliable case-insensitive matching
* fix: convert text columns to jsonb before GIN index creation
Migration 048 failed because concepts, files_modified, files_read
were stored as text type. PostgreSQL GIN indexes require jsonb.
Fix: ALTER COLUMN TYPE jsonb USING COALESCE(col::jsonb, '[]'::jsonb)
before CREATE INDEX. Also update GORM model tags from type:text to
type:jsonb for consistency.
* fix: Phase 1 — Security & Reliability (P0) (#57)
* fix: security and reliability improvements (Phase 1 T001-T005)
- T001/T002: Apply privacy.RedactSecrets to LLM extraction output
before parsing observations (Constitution P9 fix). Both live
extraction (processor.go) and backfill (handlers_backfill.go).
- T003: Expand CSP headers from `default-src 'self'` to full
directive set with script/style/connect/img/font/frame rules.
- T004: Add truncated args (200 chars) to MCP tool call error log.
- T005: Add diagnostic state (llmClient configured status) to
callLLM error messages for debugging.
* feat: MCP health monitoring, bounded semaphore, fire-and-forget vault (T006-T010)
- T006: New internal/mcp/health.go — atomic request/error counters
with 5-minute sliding window for MCP endpoint monitoring
- T007: GET /api/mcp/health public endpoint registered
- T008: Streamable HTTP handler wired to health counters
- T009: Removed nil-semaphore unbounded goroutine fallback — always
use bounded semaphore, drop on overflow with warning
- T010: vaultStoreDetectedSecrets now fire-and-forget with 3s timeout
goroutine (Constitution P3 compliance)
* fix: address PR review findings — CSP hardening, args redaction, race fix
- CSP: add object-src 'none' + base-uri 'self' per Gemini review
- Redact args in error log before logging (prevent secret leakage)
- Fix TOCTOU race in MCPHealth.rotateWindowIfNeeded with CompareAndSwap
- TODO: migrate unsafe-inline to nonce/hash-based CSP
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: dashboard REST endpoints + MCP tool aliases (Phase 2+4) (#58)
Dashboard backend (Phase 2):
- POST /api/observations/batch-tag — bulk tag add/remove
- DELETE /api/observations/bulk — bulk delete observations
- PATCH /api/observations/bulk-scope — bulk scope change
- GET /api/observations/tag-cloud — top tags with counts
- GET /api/auth/tokens/:id/stats — per-token usage stats
- auth_disabled field in /api/auth/me response
MCP tools (Phase 4):
- find_by_file_context — wraps GetObservationsByFile
- include_all parameter for tools/list (+ cursor: "all" compat)
- Vault aliases: vault_store, vault_get, vault_list, vault_delete
- Document aliases: doc_list_collections, doc_get, doc_ingest, etc.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: Phases 2-5 + 8a/8b/8c/8d — dashboard, self-learning, consistency, documents (#59)
* feat: dedup threshold, manual search signal, install.sh client-only, docs fix
- T029: Raise DedupSimilarityThreshold from 0.55 to 0.7 (pre-test confirmed safe)
- T031: Add manual search feedback signal in stop.js — detects engram
tool usage during session, sends insufficient_injection signal
- T038: install.sh defaults to --client-only (skips engram-server binary)
- T039: Fix cmplus-server naming in DEPLOYMENT.md to engram-server
* feat: intentional links + file→observation graph edges (FR-36, FR-37)
- T055: Parse [[obs:1234]] syntax in narratives → create bidirectional
references/referenced_by graph edges
- T056: files_modified/files_read entries → modifies/reads graph edges
using FNV-1a hash of file path as stable node ID
- Both integrated into existing Detect() pipeline (event-driven async)
* feat: add GetCluster to GraphStore interface (FR-38)
- GraphStore interface: GetCluster(nodeID, maxNodes) returns cluster IDs
- FalkorDB: BFS traversal up to 3 hops with LIMIT
- NoopGraphStore: returns empty slice
* feat: LLM causal classifier for error→fix and correction linking (FR-44/45)
- New causal_classifier.go: LLM prompt classifies observation pairs as
fixed_by, corrects, or unrelated
- Wired into Detect() pipeline: triggers for bugfix/guidance types on
top-3 similarity candidates only (~1 LLM call per 5 observations)
- SetCausalClassifier() method on Detector (opt-in, nil = disabled)
- ShouldClassify() filter: only bugfix and guidance types
* feat: migration 051 — documents + document_comments tables (FR-46)
Foundation for AI agent collaboration platform:
- documents: versioned, typed (markdown/task/review/decision),
JSONB metadata (assignee/status/priority), author attribution
- document_comments: inline and general comments with line ranges
- Indexes: project+path+version, doc_type, document_id
* feat: Phase 2 frontend + Phase 3 self-learning + Phase 8a consistency + document store
Phase 2 Frontend (T017-T021):
- Bulk action dropdown (delete/scope/tag) in ObservationsView
- Tag cloud sidebar with clickable filters
- Per-token stats (request count, last used) in TokensView
- Auth-disabled warning badge in AppSidebar
- Vault encryption setup helper in SystemView
Phase 3 Self-Learning (T023-T028):
- Injection floor: always inject at least N observations (default 3)
- Cross-session priming: 1.3x boost for recent sessions
- Adaptive per-project relevance threshold (project_settings table)
- Feedback-driven threshold adjustment (used→lower, ignored→raise)
Phase 8a Consistency Engine (T050-T054):
- Orphan vector cleanup (vectors without observations)
- Missing vector detection (observations without embeddings)
- Stale relation cleanup (broken source/target references)
- FalkorDB↔PostgreSQL drift detection + auto re-sync
- Embedding model change detection via system_config table
Phase 8d Document Store (T061):
- VersionedDocumentStore with Create/ReadLatest/ReadVersion/List/
GetHistory/AddComment/GetComments GORM methods
- SHA-256 content hashing, version tracking, DISTINCT ON for latest
* fix: address 12 PR review findings on Phase 2-8 implementation
- project_settings: explicit error on DB failure (not silent 0.3)
- falkordb GetCluster: ParameterizedQuery instead of string interpolation
- migration 051: renamed to versioned_documents (avoid conflict with m017)
- versioned_document_store: time.Time fields, transaction on Create, table names
- detector: negative file node IDs, corrected causal classification pair order
- maintenance: fix SQL column names for stale relation cleanup
- install.sh: proper flag parsing (--full doesn't corrupt version arg)
- SystemView: vault copy error feedback
- AppSidebar: deduplicate auth/me fetch via useAuth composable
- ObservationsView: project-aware tag cloud + refresh after mutations
- observation_store: deduplicate GetTopImportanceObservations
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: document MCP tools + OpenClaw message classification (T062-T064, T047-T049) (#60)
Document MCP tools (T062):
- 6 new tools: doc_create, doc_read, doc_update, doc_list, doc_history, doc_comment
- VersionedDocumentStore wired into MCP server and service
- T063 skipped (memory_get not an MCP tool)
- T064: embedding integration point marked with TODO
OpenClaw message classification (T047-T049):
- New message-classifier.ts with allowlist approach for heartbeat/system detection
- before-prompt-build.ts + after-tool-call.ts updated to use classifier
- source: "openclaw" added to observation storage calls
- always_inject rendering verified in context injection
Bump openclaw-engram version 1.3.1 → 1.4.0
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.7.0
* fix: write pre-compact discovery JSON to project dir (#61)
* fix: write pre-compact discovery JSON to project dir, not plugin dir
The PreCompact hook used __dirname-relative path to write
pre-compact-discovery.json, which resolved to the plugin install
cache instead of the project .agent/ directory. Use ctx.CWD instead.
* fix: simplify projectDir fallback in pre-compact hook
ctx.CWD is already derived from input.cwd in lib.js with type safety,
making the intermediate input.cwd check redundant and potentially unsafe
(truthy non-string values would bypass ctx.CWD's type guarantee).
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: dashboard type filter, tag cloud, SSE auth (#62)
* fix: server-side type filter, add guidance type, fix tag cloud SQL
- Add `type` query param to GET /api/observations for server-side filtering
- Add obsType param to GetAllRecentObservationsPaginated and
GetObservationsByProjectStrictPaginated with optional WHERE clause
- Frontend: pass type to API, remove client-side filteredObservations filter
- Add `guidance` to ObservationType union, OBSERVATION_TYPES, TYPE_CONFIG
- Fix tag cloud SQL: COALESCE(is_superseded, 0) = 0 (bigint, not boolean)
* fix: support query param token auth for SSE EventSource
EventSource API cannot set custom headers. Add ?token= query param
fallback in auth middleware so SSE /api/events can authenticate.
* fix: address review findings — DRY query builder, restrict token query param
- Refactor observation store: extract buildBaseQuery helper to reduce
duplication between paginated functions
- Restrict query param token auth to SSE-only endpoints (/api/events,
/sse, /api/logs) instead of all routes
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: sidebar metrics use snake_case to match API response (#63)
RetrievalStats interface used PascalCase (TotalRequests) but API
returns snake_case (total_requests). Sidebar showed 0 for all
retrieval metrics. Fixed in api.ts, Sidebar.vue, AppSidebar.vue.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* docs: update TECHNICAL_DEBT.md with dashboard findings (#64)
Add entries for type filter (resolved), SDK extraction types,
and dashboard memories view feature request.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.7.1
* fix: reuse existing session in bulk import instead of creating phantom (#65)
- Add optional session_id to BulkImportRequest — if provided, uses
CreateSDKSession with that ID (idempotent: INSERT OR IGNORE + fetch)
- If not provided, falls back to bulk-import-{timestamp} (backward compat)
- OpenClaw engram-remember tool now passes ctx.sessionId to bulkImport
- Fixes 403+ phantom bulk-import-* sessions in openclaw project
- Bump openclaw-engram 1.4.0 → 1.4.1
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.7.2
* fix: migration 052 — delete phantom bulk-import sessions (#66)
Cleanup 403+ phantom sdk_sessions created by bulk-import before PR #65.
Deletes sessions matching 'bulk-import-%' with prompt_counter=0.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.7.3
* fix: LLM extraction now produces diverse observation types (#67)
Previously all extracted learnings were hardcoded to type=guidance.
Now:
- Prompt asks LLM to classify as guidance/decision/bugfix/discovery/
feature/refactor/change
- learningToObsType() maps LLM type to observation type
- Legacy signal field still supported as fallback
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: migration 053 — delete vault credentials with lost encryption key (#68)
All 15 credentials encrypted with auto-generated key that was lost
when Docker container was recreated. AES-256-GCM = unrecoverable.
Vault status confirmed mismatch_count=15 = credential_count=15.
Users will re-create credentials with current ENGRAM_VAULT_KEY.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.7.4
* feat: observation status lifecycle (v1.8 Phase 1) (#69)
* feat: observation status lifecycle — migration, model, API, MCP (Phase 1 backend)
- Migration 054: status TEXT DEFAULT 'active' + status_reason TEXT + index
- Observation model: Status + StatusReason fields in GORM and shared models
- ObservationUpdate: Status + StatusReason fields for edit_observation
- Paginated queries: status filter param (backward compat, "" = all)
- Context injection: COALESCE(status, 'active') = 'active' on all query paths
- handleGetObservations: ?status= query param
- edit_observation MCP tool: status (enum active/resolved) + status_reason
* feat: observation status lifecycle UI (Phase 1 frontend)
- TypeScript: status + status_reason fields on Observation interface
- API client: status param in fetchObservationsPaginated, updateObservationStatus()
- ObservationsView: status pill toggle (All/Active/Resolved), resolve button
with reason modal, resolved card styling (opacity-50 + line-through),
reopen button (green), bulk resolve, status_reason tooltip
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: dashboard Memories view — filter, card variant, inline edit, delete (Phase 2) (#70)
- Backend: memory_type filter in paginated queries ("any" = all memories, specific value = exact match)
- handleGetObservations: ?memory_type= query param
- Frontend: All/Memories toggle, memory cards with purple accent + brain icon,
scope badges (project/global), inline edit (title + narrative), delete with confirm
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: pattern insight — LLM summary + source observations (Phase 3) (#71)
- New: GET /api/patterns/{id}/observations — resolve observation_ids
- New: POST /api/patterns/{id}/insight — LLM summary with cache
- New: internal/learning/pattern_insight.go — GeneratePatternInsight
- Frontend: inline expand on pattern card (replaces useless modal),
skeleton loading, LLM summary + source observation list,
cache indicator, unavailable fallback with retry
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: pattern cleanup — orphan detection, confidence recalc, bulk archive (Phase 4) (#72)
- Orphan pattern detection: verify observation_ids against existing observations
- Batch confidence recalculation using existing formula
- POST /api/maintenance/patterns/cleanup with dry_run + threshold params
- Frontend: cleanup section with preview (dry_run) + confirm button
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark v1.8 items resolved in TECHNICAL_DEBT.md (#73)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: remove unused TypeScript imports in usePatterns (CI build fix) (#74)
fetchPatternInsight and legacyInsights were declared but never read.
vue-tsc strict mode treats these as errors.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.8.0
* fix: 3 post-v1.8.0 bugs — memories empty, insight timeout, filter mess (#75)
1. store_memory now sets memory_type via ClassifyMemoryType() — was never
populated, causing Memories tab to show "No observations found"
2. Pattern insight: 5s context timeout + nil LLM guard — was hanging
indefinitely when LLM model loading was slow
3. ObservationsView filter bar restructured: 2-row layout with project +
view mode on top, type filters + status pills below with divider.
Type filters hidden in Memories view.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.8.0
* fix: migration 055 — backfill memory_type for existing store_memory observations (#76)
Existing observations from store_memory (source_type='manual') had empty
memory_type. Classifies using same logic as ClassifyMemoryType():
type=guidance→guidance, concepts keywords→decision/pattern/preference/etc,
default→context. Enables Memories tab to show historical memories.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: increase pattern insight LLM timeout from 5s to 30s (#77)
Ollama cold start can take 10-30s for model loading. 5s was too
aggressive for interactive (non-hot-path) insight generation.
Extraction pipeline works because it has no timeout constraint.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: backfill NULL status to 'active', COALESCE in paginated queries (#78)
A1 anomaly: migration 054 ADD COLUMN DEFAULT only sets value for new rows.
708 existing observations had status=NULL. Dashboard "Active" filter
matched 0 because WHERE status='active' skips NULLs.
- Migration 055: UPDATE SET status='active' WHERE NULL
- Paginated queries: COALESCE(status, 'active') = ? for safety
- Renumbered memory_type backfill to migration 056
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: LLM API key falls back to embedding key when not set (#79)
ENGRAM_LLM_API_KEY was empty while ENGRAM_EMBEDDING_API_KEY was set.
Both point to same LiteLLM proxy but LLM chat completions sent
requests without auth → 401 → context deadline exceeded.
Now DefaultOpenAIConfig falls back to ENGRAM_EMBEDDING_API_KEY,
matching existing URL fallback pattern.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: pattern insight timeout 30s→120s for Ollama cold start (#80)
Ollama loads 9B model from disk in 30-60s on cold start.
qwen3-8b took 58s to respond with 20 tokens.
30s was not enough — increased to 120s for interactive insight.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.8.0
* fix: increase OpenClaw inject/search timeout 5s→15s to prevent cooldown (#81)
Root cause: OpenClaw client default timeout = 5s. Inject endpoint returns
80KB+ with vector search, taking 0.7-2s normally but longer under load.
3 consecutive AbortController timeouts → AvailabilityTracker cooldown 60s →
ALL engram tools disabled (search, decisions, store_memory, recall).
Fix: explicit 15s timeout for getContextInject + searchContext.
Other endpoints (health=3s, selfcheck=5s, mark-injected=3s) keep shorter timeouts.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: add elapsed time + abort reason to OpenClaw client error logs (#82)
Inject/search requests abort with "This operation was aborted" but no
timing data — impossible to tell if it's timeout (5s), connection
refused (immediate), or slow response (2-4s).
Now logs: "[engram] POST /api/context/inject failed after 5003ms (timeout=5000ms)"
Also includes elapsed time for HTTP errors and success path.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: closed-loop learning Phase 1 — outcome tracking + injection binding (#83)
* feat: closed-loop learning Phase 1 — outcome tracking + injection binding
Foundations for closed-loop self-learning (Agent Lightning integration):
Schema:
- Migration 057: sdk_sessions outcome/outcome_reason/outcome_recorded_at/injection_strategy
- Migration 058: observation_injections junction table with session + observation indexes
Backend:
- InjectionStore: batch record, query by session, TTL cleanup
- DetermineSessionOutcome heuristic: success (bugfix/feature obs), partial, abandoned
- POST /api/sessions/{id}/outcome endpoint
- set_session_outcome MCP tool
- handleContextInject records injections to junction table (fire-and-forget)
- handleSessionMarkInjected also writes to junction table
New files: injection_store.go, outcome.go, handlers_learning.go, tools_learning.go
* feat: stop hook records session outcome for closed-loop learning (T010)
Heuristic: bugfix/feature observations = success, any obs = partial,
none = abandoned. Calls POST /api/sessions/{id}/outcome.
No transcript content parsing (NFR-4 compliant).
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: closed-loop learning Phase 2 — score propagation + effectiveness (#84)
Closes the feedback loop: session outcomes flow back to observation scores.
Schema:
- Migration 059: effectiveness_score, effectiveness_injections, effectiveness_successes on observations
Backend:
- PropagateOutcome: position-weighted utility_score adjustment (always_inject=1.0x,
recent=0.8x, relevant=0.5x), ±0.05 per-session cap, [0,1] clamp
- ComputeEffectiveness: successes/injections with min_data threshold (10 sessions)
- GET /api/observations/{id}/effectiveness endpoint
- Scoring calculator: EffectivenessContrib blended into ImportanceScore (weight 0.3)
- Maintenance: periodic effectiveness recalc from junction table + 90-day TTL cleanup
New files: propagator.go, effectiveness.go
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v1.9.0
* feat: closed-loop learning Phase 3 — injection strategy A/B testing (#85)
4 strategies: baseline, effectiveness-weighted, recency-boosted, diverse.
Round-robin assignment per session (configurable: fixed mode available).
- Config: InjectionStrategies, InjectionStrategyMode, DefaultStrategy
- StrategySelector: atomic round-robin for thread safety
- applyStrategy(): re-sorts/filters observations per strategy
- Strategy recorded on sdk_sessions.injection_strategy
- GET /api/learning/strategies: per-strategy outcome rate comparison
- session-start.js: logs assigned strategy
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: closed-loop learning Phase 4 — agent-specific learning (#86)
Per-agent effectiveness tracking: each agent gets its own effectiveness
scores for observations, enabling personalized injection.
- Migration 060: agent_observation_stats table (agent_id, observation_id PK)
- AgentStatsStore: upsert (atomic ON CONFLICT), batch lookup, single lookup
- PropagateAgentStats: updates per-agent counters alongside global
- handleContextInject: uses agent-specific effectiveness when agent has 10+ injections
- Effectiveness API: ?agent_id=X returns agent-specific stats
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: closed-loop learning Phase 5 — APO-lite (automatic prompt optimization) (#87)
Low-effectiveness guidance auto-rewritten by LLM, A/B tested, condensed.
Schema:
- Migration 061: observation_versions table (versioned narratives)
Backend:
- VersionStore: create/get/set active version
- RewriteGuidance: LLM-based APO with effectiveness-aware prompt
- POST /api/maintenance/apo/rewrite endpoint (dry_run + apply modes)
- Maintenance: detect APO candidates (effectiveness < 0.4 after 15+ injections)
- applyActiveVersions: inject uses versioned narrative when available
- 3 format variants: bullet-only, concise, structured
- CondenseObservation: standalone utility for future auto-condensation
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: closed-loop learning Phase 6 — auto signals + learning dashboard (#88)
Final implementation phase: hook-based reward signals + frontend visualization.
Hooks:
- post-tool-use.js: detect git commits, PRs, error streaks from tool metadata (NFR-4)
- stop.js: enhanced outcome with signal counts, upgrade partial→success on commits
- lib.js: cross-process signal store via temp files
Backend:
- Signal weights config (git_commit=1.0, pr_created=2.0, etc.)
- GET /api/learning/curve: daily outcome rates for learning curve chart
Frontend:
- Effectiveness badge on observation cards (green/yellow/red/gray dot)
- LearningView.vue: effectiveness distribution, learning curve, strategy comparison
- API: fetchLearningCurve, fetchStrategies, fetchEffectiveness
- Sidebar: Learning nav item + router route
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.0
* fix: split multi-statement migrations 058, 061 for PostgreSQL (#89)
PostgreSQL prepared statements reject multiple commands in a single
Exec(). Migrations 058 (observation_injections) and 061
(observation_versions) had CREATE TABLE + CREATE INDEX in one call.
Split into separate Exec() calls per statement.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.0
* chore: update marketplace for v2.0.0 (#90)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: sync plugin versions to server v2.0.0 (#91)
All plugin versions now match server version:
- engram (Claude Code): 0.6.0 → 2.0.0
- openclaw-engram (npm): 1.4.3 → 2.0.0
- marketplace.json: already 2.0.0
Going forward: plugins bump to server version on every release (Constitution #15).
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: escape Windows backslashes in file-context JSONB query (#92)
GetObservationsByFile used fmt.Sprintf to build JSON array for @>
operator. Windows paths like D:\Dev\... contain backslashes which
are invalid JSON escape sequences → SQLSTATE 22P02.
Fix: use json.Marshal([]string{filePath}) for proper escaping.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: bump plugin versions to 2.0.1 (#93)
Sync with server v2.0.1 (Constitution #15).
- engram plugin: 2.0.0 → 2.0.1
- openclaw-engram: 2.0.0 → 2.0.1
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.1
* test: fix 6 test failures — USERPROFILE, CSP, trivial filter, obs types (#94)
- config: set USERPROFILE alongside HOME for Windows (os.UserHomeDir reads USERPROFILE)
- worker: update CSP assertion to match stricter security headers
- sdk: change test tool names from Bash to Edit to bypass trivial filter
- sdk: add "guidance" to valid observation types map
- sdk: update Read/Grep expected skip behavior (whitelist approach)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: periodic outcome recording — server-side closed-loop trigger (#95)
Users don't close sessions (always continue) and PreCompact is rare
with 1M context. Stop hook never fires → outcome never recorded →
closed loop never closes.
Fix: server-side periodic job (every 15 minutes, configurable) finds
sessions with injection records but no outcome, determines outcome
from observation types, records + propagates.
- GetSessionsWithPendingOutcome: sessions with injections >10min old, no outcome
- runOutcomeRecorder: separate goroutine from heavy maintenance
- Config: ENGRAM_OUTCOME_RECORDER_INTERVAL_MINUTES (default 15)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: bump plugins to v2.0.2 (periodic outcome recorder) (#96)
Constitution #15: plugin versions track server.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.2
* fix: add diagnostic logging to stop hook for investigation (#97)
Stop hook has zero traces in server logs — unclear if it's:
1. Not called by CC harness
2. Called but silently failing (catch returns '')
3. Called but session lookup fails
Added: health check marker (proves hook ran), session lookup error logging,
invalid session ID logging. Will reveal root cause on next session exit.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: store_memory accepts always_inject param for behavioral rules (#98)
When always_inject=true, adds "always-inject" concept to observation.
Observations with this concept are injected into every agent context
regardless of query relevance (user-prompt.js hook filters on it).
Closes gap: store_memory previously couldn't create behavioral rules
because it didn't set the always-inject concept marker.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* fix: migration 062 — cleanup remaining phantom bulk-import sessions (#99)
6 remaining bulk-import-* sessions from before PR #65 fix.
Migration 052 cleaned most; this catches the rest.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark 2 tech debt items resolved (#100)
- Phantom bulk-import sessions: cleaned by migration 062 (PR #99)
- T027 post-deploy verification: composite scoring active in v2.0.2
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: bump plugins to v2.0.3 + stop hook diagnostics (#101)
- engram plugin: 2.0.2 → 2.0.3
- openclaw-engram: 2.0.2 → 2.0.3
- stop.js: diagnostic file marker + error logging (PR #97 changes)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.3
* fix: dedup skips suppressed observations + edit_observation always_inject (#102)
Two fixes for behavioral rules workflow:
1. store_memory dedup: suppressed/archived observations no longer block
re-creation. Vector index doesn't exclude suppressed obs, so dedup
now checks DB for is_suppressed/is_archived before rejecting.
2. edit_observation: accepts always_inject boolean. When true, adds
"always-inject" concept to existing concepts. When false, removes it.
Enables converting existing observations to behavioral rules.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: bump plugins to v2.0.4 (dedup fix + always_inject edit) (#103)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.4
* fix: add effectiveness + status fields to ObservationJSON serialization (#104)
ObservationJSON struct was missing effectiveness_score,
effectiveness_injections, effectiveness_successes, status,
and status_reason fields. Observations list API returned these
as undefined → Learning Dashboard showed 100% "Insufficient data".
Fields existed on Observation struct but were never copied to
ObservationJSON in MarshalJSON.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: bump plugins to v2.0.5 (effectiveness JSON fix) (#105)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.5
* fix: server-side effectiveness distribution for Learning Dashboard (#106)
Replaced client-side counting (fetch 500 obs, count tiers) with
server-side SQL aggregation endpoint.
- GET /api/learning/effectiveness-distribution: COUNT FILTER by tier
- GetEffectivenessDistribution: single SQL query, excludes archived/suppressed
- LearningView: uses server endpoint, no more fetchObservationsPaginated
- Removes 500-observation limit and 80KB+ unnecessary payload
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: bump plugins to v2.0.6 (server-side effectiveness) (#107)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.6
* fix: session outcome/strategy fields in API + session_id in inject (#108)
Three root causes for Learning Dashboard empty data:
1. GORM SDKSession model missing outcome/strategy fields — DB has data
but GORM never reads it. Added 4 fields to both GORM and shared models.
2. session-start.js inject call missing session_id param — inject handler
fell back to empty string → UpdateInjectionStrategy matched 0 rows.
Now passes ctx.SessionID in inject URL.
3. toModelSDKSession mapping missing outcome/strategy fields.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: bump plugins to v2.0.7 (session fields + inject session_id) (#109)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.7
* fix: effectiveness distribution excludes never-injected observations (#110)
"Insufficient data" showed 797 observations including those never
injected. Now only counts observations with effectiveness_injections > 0:
- Participated but <10 sessions → "Insufficient data" (will resolve)
- Never injected → excluded (dead weight, not actionable)
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* feat: session injection retrospective API (#111)
GET /api/sessions/{id}/injections — returns all observations injected
into a session with effectiveness metrics and summary stats.
Enables retrospective analysis: what was injected, noise vs signal,
effectiveness breakdown per section (always_inject/recent/relevant).
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.0.8
* chore: plugin-tool-consolidation spec + gap audit
- Gap audit report: plugin vs API analysis (68 MCP tools, 130 REST endpoints)
- New spec: plugin-tool-consolidation (6 FR, 4 NFR, 6 US)
- Plan: 5 phases, 34 tasks, analyze remediation applied
- Closed old mcp-tools-refactoring spec (FR7/FR8 → TECHNICAL_DEBT)
* chore: update task progress (Phases 1-5 complete)
* feat: plugin tool consolidation — all phases (FR-1 through FR-6) (#112)
* chore: plugin-tool-consolidation spec + gap audit
- Gap audit report: plugin vs API analysis (68 MCP tools, 130 REST endpoints)
- New spec: plugin-tool-consolidation (6 FR, 4 NFR, 6 US)
- Plan: 5 phases, 34 tasks, analyze remediation applied
- Closed old mcp-tools-refactoring spec (FR7/FR8 → TECHNICAL_DEBT)
* chore: remove 7 redundant MCP tool registrations (FR-1)
Remove from tools/list: get_context_timeline, get_timeline_by_query,
get_recent_context, find_by_file_context, get_observation_relationships,
get_graph_neighbors, doc_update.
All 7 tools remain callable via dispatch aliases in handleCallTool
for backward compatibility. Reduces tool count from 68 to 61.
Updates tests to match new tool set.
* fix: openclaw decisions endpoint + memory_forget suppress default (FR-2)
- engram_decisions now uses /api/decisions/search instead of searchContext
+ client-side type filter (B1 from audit)
- memory_forget defaults to suppress (reversible) instead of archive.
Add permanent=true parameter for permanent archival (B2 from audit)
- Add suppressObservation() client method using bulk-status suppress action
- Add "suppress" action to server bulk-status handler
- Bump openclaw-engram to 2.0.9
* feat: expand openclaw tools — rate, suppress, outcome, file, timeline, vault (FR-3)
Add 9 new tools to openclaw-engram:
- engram_rate: rate observations as useful/not useful
- engram_suppress: reversible soft-hide from search
- engram_outcome: record session outcome for closed-loop learning
- engram_find_by_file: check what engram knows BEFORE modifying a file
- engram_timeline: fetch temporal observation context
- engram_changes: search preset for recent code changes
- engram_how_it_works: search preset for architecture/design
- engram_vault_store: securely store encrypted credentials
- engram_vault_get: retrieve and decrypt credentials
All tool descriptions include trigger conditions (NFR-3).
Add client methods: rateObservation, setSessionOutcome, getFileContext,
getTimeline, storeCredential, getCredential.
Add preset param to searchContext type.
Bump openclaw-engram to 2.0.10.
Total tools: 17 (was 8).
* feat: cc stop hook retrospective API + statusline learning metrics (FR-5, FR-6)
- stop.js: Replace /api/sessions/{id}/injected-observations + individual
utility calls with single /api/sessions/{id}/injections (retrospective API).
Fewer HTTP calls, enriched response with effectiveness data.
- statusline.js: Add learning effectiveness indicator with 60s client cache.
Shows "eff:72%" (high tier percentage) or "eff:--" when no data.
Fetches /api/learning/effectiveness-distribution in parallel with stats.
* feat: openclaw lifecycle hooks — outcome, utility, file context (FR-4)
- session_end: detect session outcome (success/partial/failure/abandoned)
from conversation signals, record via /api/sessions/{id}/outcome.
Handles gracefully when no DB session ID exists.
- before_tool_call: inject file-context observations before Write/Edit
tools using /api/context/by-file. 200ms timeout, non-blocking.
- Register before_tool_call hook in index.ts.
- Bump openclaw-engram to 2.0.11.
* fix: address CodeRabbit review findings in spec/plan docs
- Fix edge case: memory_forget has only permanent param, not suppress
- Fix edge case: before_tool_call not after_tool_call
- Fix plan: version tracking says 2.0.x not 2.1.0
* fix: address CodeRabbit review — suppress cache + ID validation
- Suppress action now checks RowsAffected (not found = failed)
- Cache invalidation extended to suppress action (was archive-only)
- Unified ID validation in memory_forget: validate before branching
* fix: address CodeRabbit re-review findings (round 2)
CRIT: engram_outcome uses sessionDbId (not .id) from initSession response
MAJOR:
- stop.js: read injectionsResp.injections (wrapped response, not root array)
- before-tool-call: 500ms timeout (was 3s — too slow for pre-tool hook)
- session-end: use sessionDbId, soften heuristic (multi-word patterns),
conservative default (partial, not abandoned)
- client.ts: timeline uses 15s timeout (matches searchContext),
getFileContext accepts configurable timeoutMs
* fix: session outcome uses claude session ID string, not numeric DB ID
Sonnet lite review found: server UpdateSessionOutcome takes
claude_session_id string, not numeric DB ID. All outcome calls
(engram_outcome tool + session_end hook) now pass ctx.sessionId
directly — no initSession lookup needed.
- client.ts: setSessionOutcome accepts string, URL-encodes it
- engram-outcome.ts: removed initSession, pass claudeSessionId directly
- session-end.ts: simplified — no DB ID resolution needed
* fix: address all remaining CodeRabbit findings (round 3)
MAJOR:
- session_store.go: UpdateSessionOutcome only sets if outcome IS NULL —
explicit engram_outcome tool takes priority over heuristic
- memory-forget.ts: strict integer regex + parseInt + isSafeInteger validation
MINOR:
- vault.ts: descriptive error messages for store/get failures
- vault.ts: comment about credential value in tool output
- before-tool-call.ts: doc says 500ms (matches code)
- TECHNICAL_DEBT.md: full spec path
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: resolve stash conflicts
* chore: update marketplace for v2.0.9
* chore: mcp-tool-api-consolidation spec (61→7 tools)
Full SpecKit pipeline: specify → clarify → plan → tasks → analyze.
Consolidates 61 MCP tools into 6 primary (recall/store/feedback/vault/docs/admin)
+ check_system_health. Backward-compatible dispatch aliases for all old names.
Target: >80% context window reduction (~6100 → ~900 tokens).
Also: 3 new dashboard bugs recorded in inbox.
* feat: MCP tool API consolidation — 61 tools → 7 primary (#113)
* chore: mcp-tool-api-consolidation spec (61→7 tools)
Full SpecKit pipeline: specify → clarify → plan → tasks → analyze.
Consolidates 61 MCP tools into 6 primary (recall/store/feedback/vault/docs/admin)
+ check_system_health. Backward-compatible dispatch aliases for all old names.
Target: >80% context window reduction (~6100 → ~900 tokens).
Also: 3 new dashboard bugs recorded in inbox.
* feat: create 6 primary tool routers (Phase 1 — FR-1 through FR-6)
New handler files that route consolidated tool actions to existing handlers:
- tools_recall.go: 12 actions (search, preset, by_file, by_concept, etc.)
- tools_store_consolidated.go: 4 actions (create, edit, merge, import)
- tools_feedback.go: 3 actions (rate, suppress, outcome)
- tools_vault_consolidated.go: 5 actions (store, get, list, delete, status)
- tools_docs_consolidated.go: 11 actions (create, read, list, history, etc.)
- tools_admin.go: 21 actions (bulk ops, tags, graph, maintenance, etc.)
Each is a thin routing layer — NO new business logic. All delegate
to existing handler functions via action parameter dispatch.
* feat: register 7 primary tools + alias dispatch (Phase 2 — FR-7, FR-8)
- Add primaryTools() returning 6 consolidated tools with flat schemas
- Default tools/list returns 7 tools (6 primary + check_system_health)
- cursor=all returns primary + 61 legacy alias tools
- callTool dispatch: primary names → consolidated handlers first,
then fallthrough to legacy alias handlers
- All 61 original tool names continue to work via alias dispatch
* test: update MCP tests for 7 primary tools (Phase 3)
- TestHandleToolsList: expect 7 primary tools by default, legacy in cursor=all
- TestCallTool_ToolNameRecognition: verify primary + alias names in cursor=all
- Account for conditional tools (store_memory etc.) not present with nil stores
* docs: update engramInstructions for 7 consolidated tools (T018)
Replace 61 legacy tool references with 7 primary tools in the MCP
server instructions string. Shows action-based API: recall(action=...),
store(action=...), feedback(action=...), vault(action=...), docs(action=...),
admin(action=...), check_system_health(). Includes backward compat note.
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark all mcp-tool-api-consolidation tasks complete
* chore: update marketplace for v2.1.0
* chore: dashboard-bugfixes-v2 spec + TD cleanup
- New spec: 4 dashboard bugs (concept filter, type filter, 50/50 counts, summaries)
- Marked 3 TD items resolved (phantom sessions, vault lost key, MCP stubs)
* fix: dashboard concept filter, type filter, count display (#114) (#114)
* fix: dashboard concept filter, type filter, and count display
FR-1: Concept filter — add server-side concept param to handleGetObservations
and both paginated store methods. LIKE query on concepts JSON column.
Frontend passes concept from FilterTabs to fetchObservations.
FR-2: Type filter on HomeView — fetchObservations now passes type param.
Server already supported type filtering (obsType), was just not wired on home.
FR-3: Real counts — fetchObservations returns { observations, total }.
useTimeline tracks observationTotal from API response instead of array length.
"50 obs / 50 prompts" replaced with real counts.
Backend: handlers_data.go, observation_store.go (concept WHERE clause)
Frontend: api.ts (fetchObservations params), useTimeline.ts (server filter + totals)
Callers updated: handlers_import_export.go, detector.go (pass "" for concept)
* fix: use JSONB @> operator for concept filter instead of LIKE
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.1.1
* fix: SDK extraction prompt bias + mark dashboard type filter resolved
- Reorder extraction prompt types: specific first (decision, feature, bugfix),
general last (guidance). Add explicit note: "prefer specific over general"
- Mark "Dashboard Type Filter" TD as resolved v2.1.1 (PR #114)
* chore: mark 3 more TD items resolved (extraction types, type filter, namespace prefixes)
* chore: behavioral rules created (3 always_inject observations), mark TD resolved
* chore: mark 2 inbox bugs fixed (concept filter, counts) from PR #114
* chore: triage TD + inbox — mark DEFERRED/IMPLEMENTED items
TD: GPU contention and re-benchmark marked DEFERRED (external/infra)
Inbox: 5 ideas marked DEFERRED (future FR), 1 bug DEFERRED (external),
user commands marked IMPLEMENTED (PR #115), 2 bugs marked FIXED (PR #114)
Spec: engram-user-commands pipeline artifacts
* feat: add 4 engram user commands (retro, stats, cleanup, export) (#115)
- /engram:retro — session retrospective (injection analysis, effectiveness, recommendations)
- /engram:stats — memory health dashboard (counts, types, effectiveness, learning curve, search analytics)
- /engram:cleanup — interactive observation curation (review, suppress, edit, merge low-quality items)
- /engram:export — export observations as markdown/JSON/JSONL with project/type filters
All commands use consolidated tool API (recall/store/feedback/admin).
Commands are markdown instruction files — no compilation needed.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.1.2
* feat: pre-edit guardrails + session summarization on start (#116)
Wave 2: Pre-edit guardrails — pre-tool-use.js now separates warnings
(bugfix, guidance, anti-pattern, gotcha, security) from general context.
Warnings appear first with clear header so agent reviews them before editing.
Wave 3: Session summarization — session-start.js triggers summarization
of the most recent unsummarized session (fire-and-forget, 1 per start).
Workaround for CC bug #19225 (stop hook doesn't fire) so summaries
accumulate and appear on the Dashboard Summaries tab.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark pre-commit guardrails TD resolved v2.1.3
* chore: update marketplace for v2.1.3
* feat: config hot-reload without process restart (#117)
* feat: config hot-reload without process restart
Replace os.Exit(0) in reloadConfig with atomic config swap via
config.Reload(). Services calling config.Get() per-request pick up
new values automatically.
- config.go: add Reload() function — re-reads from disk, swaps global,
returns list of changed fields
- service.go: reloadConfig() uses Reload() instead of os.Exit(0),
broadcasts changed fields to dashboard via SSE
Port/token changes log a warning (still need manual restart).
All other config changes (model, embedding, context limits, reranking,
HyDE, maintenance) take effect immediately.
* fix: detect WorkerToken changes in hot-reload (requires restart)
---------
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark config reload TD resolved v2.1.4
* chore: update marketplace for v2.1.4
* chore: mark GPU contention TD resolved (transient queue issue)
* feat: inbox features — session counter, consistency check, memory import (#118)
1. Dashboard: "Sessions Today" instead of "Active Sessions" (was always 0)
— uses sessionsToday from stats API, not in-memory count
2. Consistency check endpoint: GET /api/maintenance/consistency
— read-only orphan detection (vectors, relations, observations)
— returns { orphan_vectors, observations_without_vectors, stale_relations, healthy }
3. memory_get store flag: memory_get(path="file.md", store=true)
— reads .md file AND imports content into engram as observation
— bridges local markdown files → engram persistent memory
4. Version bump: openclaw-engram 2.1.5
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark 4 inbox items resolved/implemented (notes, consistency, indexes, bridge)
* chore: mark session tracking, CC bug, summaries as resolved/mitigated
* chore: audit specs (4 marked Implemented), close audit inbox item
* chore: mark OpenClaw architecture as external dependency, audit complete
* chore: update marketplace for v2.1.5
* feat: graph UX polish — local mode, search, visual styling (#119)
Phase 1: Local graph mode
- Route /graph/:observationId? with optional param
- Fetches /api/observations/{id}/graph?depth=N
- Anchor node: larger (25px), green border (#10b981)
- Depth selector (1/2/3) in toolbar
- "View in Graph" link on ObservationCard
Phase 2: Node search
- Search input in toolbar with match count
- Matching nodes: yellow border highlight
- Non-matching: dimmed (0.3 opacity)
- Enter key: focus camera on first match
Phase 3: Visual styling
- Node shadows, hover glow (white border)
- Edge colors mapped to relation types
- Curved edges (curvedCCW), dashed for low confidence
- Dot grid background
- Edge color legend sidebar
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark graph UX polish implemented, all inbox items complete
* chore: update marketplace for v2.1.6
* chore: update benchmark to max_tokens:4096, 13 current models
* chore: mark re-benchmark TD resolved (script updated, ready to run)
* fix: benchmark parallel=1 default (avoid multi-model GPU overload)
* chore: bump openclaw-engram to 2.1.6 (match server version per Constitution #15)
* chore: remove legacy alias tools from tools/list entirely
Legacy tool names (search, store_memory, find_by_file, etc.) no longer
appear in tools/list at all — not even with cursor=all. Only 7 primary
tools shown. Dispatch aliases still work in callTool for backward compat
(zero runtime cost, zero context cost).
* fix: summaries — build content from session observations when no transcript (#120)
ProcessSummary now fetches session observations from DB when called
without lastAssistantMsg (e.g., from session-start summarizer).
Previously: empty msg → hasMeaningfulContent=false → skip always.
Now: empty msg → query observations by sdk_session_id → build summary
input from observation titles+narratives → generate LLM summary.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: dashboard-quality-v3 spec + 3 inbox bugs
* feat: dashboard quality v3 — search misses, sessions, pattern insights (#121)
Phase 1: Fix search misses display — unwrap miss_stats envelope, map miss_count→frequency
Phase 2: Sessions backend — add min_prompts, from, to filters to ListSDKSessions
Phase 3: Sessions frontend — pass min_prompts=1 (hide empty), wire date filters,
clickable sessions with detail view (SessionDetailView.vue: metadata, injections, outcome)
Phase 4: Pattern insight background — maintenance Task 18 generates LLM insights
for patterns with generic descriptions (5 per cycle), caches in pattern.description
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.1.7
* chore: save session state (v2.1.7, 10 PRs, all TD resolved)
* feat: dashboard UX polish — tooltips, cursor-pointer, hover transitions, color coding (#122)
- Tooltips: all action buttons have descriptive title attributes explaining
what they do and whether reversible (Resolve, Suppress, Archive, Rate, Graph)
- Cursor-pointer: 32 additions across 3 files — all interactive elements
- Hover transitions: 27 duration-200 additions for consistent 200ms timing
- Color coding: destructive=red, resolve=green, reopen=blue, info=gray
- Existing ConfirmDialog already handles destructive action confirmation
3 files: ObservationsView.vue, ObservationCard.vue, ObservationDetailView.vue
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: update marketplace for v2.1.8
* docs: summaries + concepts pipeline audit report with root causes and fixes
* chore: summaries-concepts-fix spec + updated inbox
* fix: summaries + concepts pipeline — 3 root causes from audit (#123)
FR-2: Add valid concept list to extraction systemPrompt (processor.go).
LLM now knows which concepts to use instead of inventing random ones.
Fixed example: user-preference → workflow.
FR-1: Add userPrompt fallback in ProcessSummary (processor.go).
When both lastAssistantMsg and observations are empty, use the
session's initial user prompt as summary input.
FR-3: Migration 055 — keyword-based concept backfill for 1047 existing
observations. Assigns architecture, security, debugging, api, database,
etc. based on title/narrative keyword matching. No LLM needed.
Audit: .agent/reports/summaries-concepts-audit-2026-03-28.md
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark summaries-concepts tasks complete
* chore: mark dashboard-quality-v3 tasks complete (PR #121)
* chore: update marketplace for v2.1.9
* chore: save session state (v2.1.9, 11 PRs, session compacted)
* docs: investigate report — 13 findings (4 P1, 7 P2, 2 P3) across 12 areas
* docs: summaries pipeline investigation — root cause is trigger architecture, not code
* chore: server-summarizer-and-fixes spec + tasks
* fix: summaries + concepts pipeline — 3 root causes from audit (#123) (#124)
FR-1: Server-side periodic summarizer (maintenance Task 19)
Scans sessions with prompts > 0 and no summary, older than 30min.
Builds content from observations, calls LLM, stores in session_summaries.
Cap: 3 per cycle. No client hook dependency.
FR-2: Pre-edit guardrails — remove guidance from warnings
Only bugfix + concept-based (anti-pattern, gotcha, security) are warnings.
Global behavioral rules no longer show as "WARNINGS" before every file edit.
FR-3: Remove client-side summarizer from session-start.js
Replaced by server-side Task 19. Client workaround had bugs (sess.summary
field doesn't exist, would re-summarize repeatedly).
FR-4: Circuit breaker recovery logging
Logs "entering half-open state" and "recovered — LLM calls re-enabled"
for diagnosing LLM availability from server logs.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark server-summarizer tasks complete (PR #124)
* chore: update marketplace for v2.2.0
* chore: mark graph-ux-polish tasks complete (PR #119)
* chore: mark stale tasks complete (dashboard-bugfixes-v2 PR#114, user-commands PR#115)
* chore: transfer investigate P1/P2 findings to inbox as actionable tasks
* chore: audit-bugfixes spec + tasks (P1+P2 from investigate)
* fix: audit bugfixes — P1+P2 findings from investigate report (#125)
T001: Summary dedup verified — NOT EXISTS check already correct
T002: OpenClaw before_tool_call — added BeforeToolCallResult to HookResult type
T003: Store content validation — error message clarified
T004: Summary userPrompt threshold lowered (50→10 chars)
T005: Migration 064 — backfill 5 missing concepts (why-it-exists, what-changed,
anti-pattern, gotcha, trade-off) with keyword matching
T006: go build + tsc --noEmit verified clean
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark audit-bugfixes complete, update inbox (v2.2.1)
* chore: update marketplace for v2.2.1
* docs: 3 ADRs from Cipher competitive analysis + investigation report
ADR-003: Reasoning Traces (System 2 Memory) — store HOW agent reasons, not just WHAT it decided
ADR-004: Dedicated Embedding Resilience — separate CB, health check, 4 states, auto-recovery
ADR-005: LLM-Driven Memory Extraction — extract_and_operate for autonomous observation creation
Investigation: 10 findings across 10 areas comparing Cipher vs Engram architecture
* chore: reasoning-traces spec (System 2 Memory from ADR-003)
* chore: reasoning-traces full SpecKit pipeline (clarify+plan+tasks+analyze)
* feat: reasoning traces (System 2 Memory) — Phases 1-3 (#126)
Phase 1: Data Model
- Migration 065: reasoning_traces table (steps JSONB, quality_score, task_context)
- GORM model ReasoningTrace with BeforeCreate hook
- ReasoningTraceStore with Create/GetBySession/SearchByProject
Phase 2: Extraction
- reasoning_detector.go: DetectReasoning() — 3+ pattern matches in 200+ char text
- Extraction + quality evaluation LLM prompts
- Async extraction in ProcessObservation (non-blocking goroutine)
- Quality threshold ≥ 0.5 to store
Phase 3: MCP Integration
- recall(action="reasoning") — searches traces by project, formats with step types
- "reasoning" added to recall tool action enum
- Wired into worker service (processor + MCP server)
ADR-003 implemented. Inspired by Cipher's System 2 dual memory architecture.
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark reasoning-traces tasks complete (PR #126, v2.3.0)
* chore: embedding-resilience spec pipeline (ADR-004)
* chore: close 2 remaining P2 inbox items (metric documented, visual API-verified)
* chore: update marketplace for v2.3.0
* feat: dedicated embedding resilience layer (ADR-004) (#127)
- ResilientEmbedder wraps EmbeddingModel with 4-state circuit breaker:
HEALTHY → DEGRADED (1+ failures) → DISABLED (5+ failures) → RECOVERING
- Health check goroutine probes every 30s when not HEALTHY
- Automatic recovery: probe succeeds → RECOVERING → next real request → HEALTHY
- Independent from LLM circuit breaker (embedding failures ≠ LLM failures)
- Thread-safe via sync/atomic
- Wired into worker service (init + reinit + shutdown)
- selfcheck handler reports embedding status with failure counts
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark embedding-resilience tasks complete (PR #127)
* chore: extract-and-operate spec pipeline (ADR-005)
* chore: update marketplace for v2.3.1
* feat: store(action="extract") — LLM-driven memory extraction (ADR-005) (#128)
- New action on store tool: accepts raw content, uses LLM to extract observations
- Extraction prompt generates structured observations (type, title, narrative, concepts)
- Privacy: content redacted via RedactSecrets before LLM call
- Validation: min 50 chars, truncate at 32k, type validation with fallback
- Returns summary: {extracted, stored, duplicates, titles}
- Added "extract" to store tool action enum
Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
* chore: mark extract-and-operate tasks complete (PR #128, v2.4.0)
* chore: save session state (v2.4.0, 16 PRs, all complete)
* chore: update marketplace for v2.4.0
* docs: complete documentation rewrite for v2.4.0
- README: 48 legacy tools → 7 consolidated primary tools, marketing intro,
architecture diagram with dashboard/LLM/embedding, What's New, Use Cases,
Upgrading, Troubleshooting, MCP Tools reference with all actions
- CHANGELOG: 17 new entries (v2.0.7 through v2.4.0) with comparison links
- README.ru.md: full Russian translation synced to v2.4.0
- README.zh.md: full Chinese translation synced to v2.4.0
* chore: remove agent working state and test artifacts from tracking
- .agent/ was in .gitignore but files were committed before the rule
- .playwright-mcp/ screenshots are test artifacts, not source
- TECHNICAL_DEBT.md is agent-local state
- Updated .gitignore to cover all three
* perf: narrow PostToolUse hook matcher from * to Write|Edit|Bash|Agent|mcp__aimux
Eliminates ~50+ unnecessary node process spawns per research session.
Previously matcher * fired post-tool-use.js on every tool call including
Read, Grep, Glob, ToolSearch — which then hit skipTools early exit and
returned empty. Now Claude Code filters at matcher level, avoiding
process spawn entirely for read-only tools.
Removed skipTools map from post-tool-use.js (redundant with matcher).
Bumped plugin version 2.0.7 → 2.0.8.
* feat: strengthen MCP server instructions to assert memory exclusivity
- "Your ONLY Persistent Memory" — exclusivity claim over competing tools
- AFTER workflow mandatory: store decisions/discoveries after every task
- "Steps 4-6 are NOT optional" — directive-level store instruction
- "What to Store" section with concrete examples
- Workflow patterns now end with store, not just recall
- Counters Nia context manager competing for agent attention
* chore: update marketplace for v2.4.1
* chore: update marketplace for v2.4.1
* perf: stop re-injecting behavioral rules on every user prompt
Behavioral rules (user-preference concept + always-inject) are already
injected once by session-start.js via /api/context/inject. Re-injecting
them on every UserPromptSubmit via /api/context/search wasted ~4K tokens
per prompt (~17KB duplicated behavioral rules block).
Changes:
- Removed behavioral rules assembly from user-prompt.js
- Removed footer reminder (redundant with MCP server instructions)
- Only technical observations injected in <relevant-memory>
- Bumped plugin 2.4.1 → 2.4.2
* feat: minimum viable learning loop — close feedback loop + stop scope leak
Phase 1 (narrow scope):
- Remove includeGlobal=true from 3 vector search call sites in context
handlers (search, file-context, inject). Observations from other projects
no longer pollute context injection.
- Add project filter to GetAlwaysInjectObservations — only returns
observations from current project or global scope (was: all projects).
- Client-side min similarity filter (>0.10) in user-prompt.js — observations
with 0.00 relevance no longer injected.
Phase 2a (close the loop):
- Add Bayesian effectiveness multiplier to ApplyCompositeScoring.
Formula: (successes + 1) / (injections + 2) with neutra…
Summary
Implements Reasoning Traces — a second memory layer that captures HOW the agent reasons, not just WHAT it decided. Based on ADR-003 and Cipher competitive analysis.
Phase 1: Data Model
reasoning_tracestable with steps JSONB, quality_score, task_contextPhase 2: Reasoning Detection + Extraction
DetectReasoning()— detects multi-step reasoning patterns (3+ indicators in 200+ chars)Phase 3: MCP Tool Integration
recall(action="reasoning")— searches and formats reasoning tracesNot in this PR (Phase 4)
Test plan
go build ./...passesgo vet ./...passesrecall(action="reasoning", project="engram")returns empty (no traces yet)Summary by CodeRabbit
Примечания к выпуску