feat: unify and tune existing systems (roadmap Phase 2)#130
Conversation
Capitalize expandViaGraph → ExpandViaGraph so it can be called from handlers_context.go (context injection path). Part of unify-tune-phase2 T001.
ExpandViaGraph now called after clustering and before composite scoring in handleSearchByPrompt. Enriches context injection with FalkorDB neighbors (top-5, 2-hop, decay 0.7) — same parameters as MCP hybridSearch. New observations fetched via GetObservationsByIDs and merged into results with decayed similarity scores. Part of unify-tune-phase2 T002 (FR-1).
ThemeSimilarity: 0.7 → 0.55 (matches ClusteringThreshold) ParallelMaxDays: 7 → 14 (capture 2-week sprint window) ParallelMaxSim: 0.4 → 0.5 (widen temporal co-occurrence pairs) These types had 0 edges despite rules existing — thresholds were too restrictive for the observation pairs being sampled. Updated test: similarity 0.6 now correctly triggers shares_theme. Part of unify-tune-phase2 T003 (FR-2).
ForgetEnabled now defaults to true (was false). Protection list expanded: decision, discovery, guidance, entity, wiki are never archived by the forgetting cycle. Threshold remains conservative at 0.01 — only truly irrelevant observations with near-zero importance get archived. Part of unify-tune-phase2 T004 (FR-3).
Allows runtime control of forgetting cycle via environment variable. ENGRAM_FORGET_ENABLED=false disables, unset/true = enabled (default). Part of unify-tune-phase2 T005 (FR-3).
New internal/dedup/checker.go with CheckDuplicate function: - cosine >= 0.92: NOOP (near-duplicate, skip) - cosine 0.75-0.92: UPDATE (supersede existing) - cosine < 0.75: ADD (new observation) Includes suppression/archive check to allow re-creation. Shared between MCP store and SDK extraction paths. Part of unify-tune-phase2 T006 (FR-5).
Replace ~45 lines of inline dedup logic with call to shared dedup.CheckDuplicate. Same behavior preserved: NOOP/UPDATE/ADD responses identical, suppression check included. Part of unify-tune-phase2 T007 (FR-5).
SDK observation extraction now checks for near-duplicates before storing. NOOP (cosine >= 0.92) skips storage. UPDATE (0.75-0.92) supersedes existing with EVOLVES_FROM relation. Uses shared dedup.CheckDuplicate — same logic as MCP store path. duplicates counter now tracks actual skipped duplicates. Part of unify-tune-phase2 T008 (FR-5).
handleIngestEvent now checks vector similarity before storing. NOOP (cosine >= 0.92) returns early with semantic_duplicate status. Complements existing hash-based TTL dedup with semantic detection. Uses shared dedup.CheckDuplicate — same as MCP store and SDK extract. Part of unify-tune-phase2 T008b (FR-5).
Part of unify-tune-phase2 T010.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughВведена централизованная семантическая дедупликация ( Changes
Sequence Diagram(s)sequenceDiagram
participant Ingest as Ingest Handler
participant Dedup as dedup.CheckDuplicate
participant Vector as Vector Index
participant DB as Postgres DB
rect rgba(200,200,255,0.5)
Ingest->>Dedup: CheckDuplicate(ctx, vectorClient, db, project, content, noopThreshold)
Dedup->>Vector: Query top-K vectors scoped by doc type + project
alt Vector returns matches
Vector-->>Dedup: Matches with metadata (doc_id, sim)
Dedup->>DB: Optional lookup matched doc_id for suppressed/archived
alt matched is suppressed/archived
DB-->>Dedup: suppressed/archived -> treat as ADD
Dedup-->>Ingest: Result{Action: ADD}
else not suppressed/archived
DB-->>Dedup: OK
Dedup-->>Ingest: Result{Action: NOOP/UPDATE/ADD, ExistingID, Similarity}
end
else Vector error or no matches
Vector-->>Dedup: error/empty -> ADD
Dedup-->>Ingest: Result{Action: ADD}
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
Code Review
This pull request centralizes semantic deduplication logic into a new internal/dedup package, implementing the Mem0 Algorithm 1 to manage near-duplicates and contradictions within MCP tools and ingestion handlers. Key updates include enabling the forgetting mechanism by default with environment variable override support, expanding protected observation types, exporting graph expansion functionality for search handlers, and bumping the version to 3.1.1. Review feedback recommends using constants for new observation types to avoid hardcoded strings, logging errors when marking observations as superseded, and maintaining code cleanliness by moving inline threshold documentation to external descriptions.
| ThemeSimilarity: 0.55, // Lowered from 0.7 — matches ClusteringThreshold (if similar enough to cluster, similar enough to share theme) | ||
| ExplainSimilarity: 0.5, | ||
| ParallelMaxDays: 7, | ||
| ParallelMaxSim: 0.4, | ||
| ParallelMaxDays: 14, // Widened from 7 — capture observations from same 2-week sprint | ||
| ParallelMaxSim: 0.5, // Widened from 0.4 — more temporal co-occurrence pairs |
There was a problem hiding this comment.
| switch obs.Type { | ||
| case models.ObsTypeDecision, models.ObsTypeDiscovery, models.ObsTypeGuidance, | ||
| "entity", "wiki": |
|
|
||
| // If UPDATE: supersede existing observation. | ||
| if dedupResult != nil && dedupResult.Action == dedup.ActionUpdate && dedupResult.ExistingID > 0 { | ||
| _ = s.observationStore.MarkAsSuperseded(ctx, dedupResult.ExistingID) |
There was a problem hiding this comment.
The error from MarkAsSuperseded is ignored. Even if this is a non-critical operation, it should be logged to ensure visibility into potential database issues.
| _ = s.observationStore.MarkAsSuperseded(ctx, dedupResult.ExistingID) | |
| if err := s.observationStore.MarkAsSuperseded(ctx, dedupResult.ExistingID); err != nil { log.Warn().Err(err).Int64("id", dedupResult.ExistingID).Msg("failed to mark observation as superseded") } |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
- Add ObsTypeEntity and ObsTypeWiki constants to models package - Use constants instead of hardcoded strings in forgetting protection - Log MarkAsSuperseded error in SDK extract dedup path
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/consolidation/scheduler.go (1)
41-42:⚠️ Potential issue | 🟡 MinorКомментарий не соответствует новому значению по умолчанию.
Комментарий утверждает
(default false), но значение по умолчанию изменено наtrueв строке 55.📝 Предлагаемое исправление
- // ForgetEnabled controls whether the forgetting cycle runs (default false). + // ForgetEnabled controls whether the forgetting cycle runs (default true). ForgetEnabled bool `json:"forget_enabled"`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consolidation/scheduler.go` around lines 41 - 42, The struct field comment for ForgetEnabled is outdated (says "default false") but the default is now true; update the comment for the ForgetEnabled bool in the scheduler struct to reflect the new default (e.g., "(default true)") and ensure any nearby docstring or constructor/initialization for the scheduler that sets the default value is consistent with the ForgetEnabled field.internal/search/manager.go (1)
1486-1511:⚠️ Potential issue | 🟡 MinorДедупликация только по
IDможет отбрасывать валидных соседей.На Line 1486–1489 в
seenпопадают ID всех типов (observation/session/prompt). При совпадении числового ID сосед-observation на Line 1509 может быть ошибочно пропущен.💡 Предлагаемое исправление
- seen := make(map[int64]bool, len(fused)) + seen := make(map[int64]bool, len(fused)) for _, f := range fused { - seen[f.ID] = true + if f.DocType == "observation" { + seen[f.ID] = true + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/search/manager.go` around lines 1486 - 1511, The dedup map currently keys only on numeric ID (seen map[int64]bool) which causes neighbors of a different DocType (e.g., observation vs session vs prompt) to be incorrectly skipped; change the dedup key to include DocType plus ID (e.g., use map[string]bool or map[struct{typ string; id int64}]bool) and populate it using the fused item's DocType and ID (ScoredID.DocType and ScoredID.ID) and check neighbors using their DocType and ObsID (from m.graphStore.GetNeighbors results) so you only treat an item as seen when both type and id match. Ensure all references to seen are updated to use the new composite key when marking initial fused items and when skipping neighbors.
🧹 Nitpick comments (2)
internal/consolidation/associations_test.go (1)
56-57: Добавьте граничные кейсы для новых порогов, чтобы зафиксировать strict-сравнения.На Line 56-57 покрыты «внутренние» значения, но не сами границы. Рекомендую добавить кейсы ровно на
similarity=0.55,similarity=0.50иageDiffDays=14, чтобы явно закрепить поведение>/</<=.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consolidation/associations_test.go` around lines 56 - 57, Add three boundary test rows to the table in associations_test.go to lock down strict comparisons: (1) a case with similarity: 0.55 (same aType/bType pattern as existing SharesTheme case) and assert it yields models.RelationSharesTheme, (2) a case with similarity: 0.50 and assert it yields nil/expectNil true, and (3) a case with ageDiffDays: 14 and assert it is treated as within the age cutoff (i.e. not expectNil) so the same relationship as the corresponding non-boundary case; place these entries alongside the existing test cases using the same field names (name, aType, bType, ageDiffDays, similarity, expectedType, expectNil) so the table-driven tests cover the <= / < / > boundary behavior.internal/consolidation/scheduler.go (1)
380-385: Создайте константыObsTypeEntityиObsTypeWikiвpkg/models/observation.go.Строковые литералы
"entity"и"wiki"используются в нескольких местах кода (scheduler.go, search/manager.go, synthesis/entity_store.go, maintenance/wiki_generation.go), тогда как другие типы наблюдений определены как константы (ObsTypeDecision,ObsTypeDiscovery,ObsTypeGuidance). Это создаёт риск расхождения при рефакторинге и снижает типизацию кода.Добавьте в
pkg/models/observation.go(строки 16-25):ObsTypeEntity ObservationType = "entity" ObsTypeWiki ObservationType = "wiki"Затем замените строковые литералы на константы во всех местах использования.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consolidation/scheduler.go` around lines 380 - 385, Добавьте в pkg/models/observation.go две новые константы типа ObservationType: ObsTypeEntity и ObsTypeWiki (matching the pattern used for ObsTypeDecision/ObsTypeDiscovery/ObsTypeGuidance) and then replace all string literals "entity" and "wiki" with these constants in code locations that use observation types (e.g., internal/consolidation/scheduler.go, search/manager.go, synthesis/entity_store.go, maintenance/wiki_generation.go) so comparisons use ObsTypeEntity and ObsTypeWiki instead of raw strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/consolidation/associations.go`:
- Around line 52-55: Обновите комментарии для полей
AssociationConfig.ThemeSimilarity, AssociationConfig.ParallelMaxDays и
AssociationConfig.ParallelMaxSim так, чтобы они отражали новые значения:
ThemeSimilarity = 0.55 (ранее 0.7) — добавьте про соответствие
ClusteringThreshold, ParallelMaxDays = 14 (ранее 7) — укажите, что это покрывает
двухнедельный спринт, и ParallelMaxSim = 0.5 (ранее 0.4) — отметьте увеличение
для большей временной со-оккупации; внесите эти правки рядом с объявлениями
полей, чтобы комментарии не вводили в заблуждение.
In `@internal/dedup/checker.go`:
- Around line 79-106: The current logic uses similar[0] unconditionally which
can yield NOOP/UPDATE for hits with ExtractRowID(...) == 0 or
suppressed/archived; instead iterate over the similar slice, call
vector.ExtractRowID for each hit and skip entries with id == 0, and for each
non-zero id check the DB suppressed/archived condition (reuse the existing Raw
query) and skip suppressed/archived rows; take the first remaining valid hit as
the match (set topSim and existingID from that hit) and only then decide
ActionNoop/ActionUpdate; if no valid hits remain return ActionAdd; update
references to noopThreshold, DefaultUpdateThreshold, Result, and actions
accordingly.
In `@internal/mcp/tools_store_consolidated.go`:
- Around line 186-194: The update path currently drops the evolves_from relation
and ignores failures: when dedupResult indicates an update, populate the
ObservationRelation.DetectionSource (use the new observation's detection source
field or the appropriate source from the observation being stored) before
calling relationStore.StoreRelation, and stop ignoring errors from both
s.observationStore.MarkAsSuperseded and s.relationStore.StoreRelation — check
returned errors and return or handle them (log and abort the update) so you
never leave a superseded state or a missing relation due to a silently swallowed
error; relevant symbols: dedupResult, MarkAsSuperseded, StoreRelation,
ObservationRelation, DetectionSource, relationStore, observationStore.
In `@internal/worker/handlers_context.go`:
- Around line 430-447: After expanding IDs via s.searchMgr.ExpandViaGraph and
fetching graphObs with s.observationStore.GetObservationsByIDs, filter graphObs
by observation.Scope (allow only "project" and "global" or as appropriate) and
ensure observation.Project matches the current project context before appending
into clusteredObservations; update any places using newIDs/similarityScores
accordingly (e.g., in the block around ExpandViaGraph, clusteredObservations,
GetObservationsByIDs) so only observations with the correct Scope/Project are
merged to prevent cross-project injection.
In `@internal/worker/handlers_ingest.go`:
- Around line 187-201: The semantic dedup branch currently only handles
dedupPkg.ActionNoop and ignores dedupPkg.ActionUpdate, causing the handler to
fall through to StoreObservation and lose supersede/link behavior; update the
branch that checks dedupResult from dedupPkg.CheckDuplicate(r.Context(),
s.vectorClient, s.observationStore.GetDB(), req.Project, dedupContent, 0) to
also detect dedupPkg.ActionUpdate and perform the same supersede/link workflow
used elsewhere (use dedupResult.ExistingID and dedupResult.Similarity to create
a supersede or link entry via the same observation-store helper used by
tools_memory.go/tools_store_consolidated.go) instead of calling
StoreObservation, keeping ActionNoop handling as-is.
- Around line 192-199: The early return when dedupResult != nil &&
dedupResult.Action == dedupPkg.ActionNoop skips calling MarkProcessed (which is
only invoked after StoreObservation), leaving the created raw_event in pending;
before writing the 202 response for semantic duplicates in the handler, call the
same MarkProcessed code path used after StoreObservation (e.g., invoke
MarkProcessed(ctx, rawEventID) or the handler's mark-processed helper) to
transition the raw_event out of pending, then proceed to write the JSON
response—ensure you reference and use the existing MarkProcessed function and
the raw event ID produced earlier so you don't duplicate logic.
In `@internal/worker/service.go`:
- Around line 892-895: The environment check for ENGRAM_FORGET_ENABLED in the
scheduler setup is case-sensitive (v == "false" || v == "0"), so values like
"FALSE" or "False" are ignored; update the check in the service initialization
(around consolidation.DefaultSchedulerConfig() and schedCfg) to perform a
case-insensitive comparison (e.g., using strings.EqualFold or normalizing v with
strings.ToLower) and treat any "false" or "0" variant as disabling by setting
schedCfg.ForgetEnabled = false.
- Around line 892-901: Добавьте документирование переменной окружения
ENGRAM_FORGET_ENABLED: вставьте в .env.example новую строку с именем
ENGRAM_FORGET_ENABLED, кратким комментарием о назначении (включает/выключает
автоматическое архивирование наблюдений с низкой релевантностью) и значением по
умолчанию (пусто/true), и обновите docs/arch/CONFIGURATION.md, добавив запись о
том, что ENGRAM_FORGET_ENABLED — environment-only переменная (аналогично
ENGRAM_API_TOKEN и ENGRAM_WORKER_HOST) с описанием поведения (по умолчанию
включено, можно отключить установкой "false" или "0"); также обновите CHANGELOG,
указав, что поведение по умолчанию изменилось на opt-out (или, если это было
намеренно, задокументируйте это решение) и предложите пользователям проверить
конфигурации перед обновлением.
---
Outside diff comments:
In `@internal/consolidation/scheduler.go`:
- Around line 41-42: The struct field comment for ForgetEnabled is outdated
(says "default false") but the default is now true; update the comment for the
ForgetEnabled bool in the scheduler struct to reflect the new default (e.g.,
"(default true)") and ensure any nearby docstring or constructor/initialization
for the scheduler that sets the default value is consistent with the
ForgetEnabled field.
In `@internal/search/manager.go`:
- Around line 1486-1511: The dedup map currently keys only on numeric ID (seen
map[int64]bool) which causes neighbors of a different DocType (e.g., observation
vs session vs prompt) to be incorrectly skipped; change the dedup key to include
DocType plus ID (e.g., use map[string]bool or map[struct{typ string; id
int64}]bool) and populate it using the fused item's DocType and ID
(ScoredID.DocType and ScoredID.ID) and check neighbors using their DocType and
ObsID (from m.graphStore.GetNeighbors results) so you only treat an item as seen
when both type and id match. Ensure all references to seen are updated to use
the new composite key when marking initial fused items and when skipping
neighbors.
---
Nitpick comments:
In `@internal/consolidation/associations_test.go`:
- Around line 56-57: Add three boundary test rows to the table in
associations_test.go to lock down strict comparisons: (1) a case with
similarity: 0.55 (same aType/bType pattern as existing SharesTheme case) and
assert it yields models.RelationSharesTheme, (2) a case with similarity: 0.50
and assert it yields nil/expectNil true, and (3) a case with ageDiffDays: 14 and
assert it is treated as within the age cutoff (i.e. not expectNil) so the same
relationship as the corresponding non-boundary case; place these entries
alongside the existing test cases using the same field names (name, aType,
bType, ageDiffDays, similarity, expectedType, expectNil) so the table-driven
tests cover the <= / < / > boundary behavior.
In `@internal/consolidation/scheduler.go`:
- Around line 380-385: Добавьте в pkg/models/observation.go две новые константы
типа ObservationType: ObsTypeEntity и ObsTypeWiki (matching the pattern used for
ObsTypeDecision/ObsTypeDiscovery/ObsTypeGuidance) and then replace all string
literals "entity" and "wiki" with these constants in code locations that use
observation types (e.g., internal/consolidation/scheduler.go, search/manager.go,
synthesis/entity_store.go, maintenance/wiki_generation.go) so comparisons use
ObsTypeEntity and ObsTypeWiki instead of raw strings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c6633f9e-35f4-4606-a2ef-adcfa9b63c54
📒 Files selected for processing (13)
internal/consolidation/associations.gointernal/consolidation/associations_test.gointernal/consolidation/scheduler.gointernal/consolidation/scheduler_test.gointernal/dedup/checker.gointernal/mcp/tools_memory.gointernal/mcp/tools_store_consolidated.gointernal/search/manager.gointernal/worker/handlers_context.gointernal/worker/handlers_ingest.gointernal/worker/service.goplugin/engram/.claude-plugin/plugin.jsonplugin/openclaw-engram/package.json
| topSim := similar[0].Similarity | ||
| existingID := vector.ExtractRowID(similar[0].Metadata) | ||
|
|
||
| // Check if the match is suppressed/archived — allow re-creation if so. | ||
| if db != nil && existingID > 0 { | ||
| var checkResult struct{ Count int64 } | ||
| if checkErr := db.WithContext(ctx). | ||
| Raw("SELECT COUNT(*) as count FROM observations WHERE id = ? AND (is_suppressed = TRUE OR COALESCE(is_archived, 0) != 0)", existingID). | ||
| Scan(&checkResult).Error; checkErr == nil && checkResult.Count > 0 { | ||
| // Existing match is suppressed/archived — treat as ADD | ||
| return &Result{Action: ActionAdd}, nil | ||
| } | ||
| } | ||
|
|
||
| if topSim >= noopThreshold { | ||
| return &Result{ | ||
| Action: ActionNoop, | ||
| ExistingID: existingID, | ||
| Similarity: topSim, | ||
| }, nil | ||
| } | ||
|
|
||
| if topSim >= DefaultUpdateThreshold { | ||
| return &Result{ | ||
| Action: ActionUpdate, | ||
| ExistingID: existingID, | ||
| Similarity: topSim, | ||
| }, nil |
There was a problem hiding this comment.
Не принимайте similar[0] за валидный dedup-матч без дополнительной фильтрации.
Сейчас решение строится только по первому hit: если ExtractRowID(...) вернёт 0, функция всё равно может отдать NOOP/UPDATE; если верхний hit suppressed/archived, код сразу возвращает ADD и не смотрит на остальные top-k результаты. Это даёт ложные skip и пропускает реальные активные дубликаты. Лучше пройтись по similar, отбросить hits без row id и неактивные записи, а решение принимать по первому валидному совпадению.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/dedup/checker.go` around lines 79 - 106, The current logic uses
similar[0] unconditionally which can yield NOOP/UPDATE for hits with
ExtractRowID(...) == 0 or suppressed/archived; instead iterate over the similar
slice, call vector.ExtractRowID for each hit and skip entries with id == 0, and
for each non-zero id check the DB suppressed/archived condition (reuse the
existing Raw query) and skip suppressed/archived rows; take the first remaining
valid hit as the match (set topSim and existingID from that hit) and only then
decide ActionNoop/ActionUpdate; if no valid hits remain return ActionAdd; update
references to noopThreshold, DefaultUpdateThreshold, Result, and actions
accordingly.
| if dedupResult != nil && dedupResult.Action == dedup.ActionUpdate && dedupResult.ExistingID > 0 { | ||
| _ = s.observationStore.MarkAsSuperseded(ctx, dedupResult.ExistingID) | ||
| if s.relationStore != nil { | ||
| _, _ = s.relationStore.StoreRelation(ctx, &models.ObservationRelation{ | ||
| SourceID: obsID, | ||
| TargetID: dedupResult.ExistingID, | ||
| RelationType: models.RelationEvolvesFrom, | ||
| Confidence: dedupResult.Similarity, | ||
| }) |
There was a problem hiding this comment.
UPDATE-путь сейчас тихо теряет evolves_from relation.
В StoreRelation передаётся ObservationRelation без DetectionSource; по текущей модели relation этот insert не пройдёт CHECK и quietly провалится, потому что ошибка тоже глушится. Плюс MarkAsSuperseded здесь тоже вызывается через _ =, так что можно получить новый observation без relation, а при ошибке supersede — ещё и с прежним observation, оставшимся активным.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/mcp/tools_store_consolidated.go` around lines 186 - 194, The update
path currently drops the evolves_from relation and ignores failures: when
dedupResult indicates an update, populate the
ObservationRelation.DetectionSource (use the new observation's detection source
field or the appropriate source from the observation being stored) before
calling relationStore.StoreRelation, and stop ignoring errors from both
s.observationStore.MarkAsSuperseded and s.relationStore.StoreRelation — check
returned errors and return or handle them (log and abort the update) so you
never leave a superseded state or a missing relation due to a silently swallowed
error; relevant symbols: dedupResult, MarkAsSuperseded, StoreRelation,
ObservationRelation, DetectionSource, relationStore, observationStore.
| expanded := s.searchMgr.ExpandViaGraph(r.Context(), scoredIDs, limit) | ||
| // Merge new IDs back: fetch any expanded observations not already in clusteredObservations | ||
| existingIDs := make(map[int64]bool, len(clusteredObservations)) | ||
| for _, obs := range clusteredObservations { | ||
| existingIDs[obs.ID] = true | ||
| } | ||
| var newIDs []int64 | ||
| for _, sid := range expanded { | ||
| if !existingIDs[sid.ID] && sid.DocType == "observation" { | ||
| newIDs = append(newIDs, sid.ID) | ||
| similarityScores[sid.ID] = sid.Score // propagate decayed score | ||
| } | ||
| } | ||
| if len(newIDs) > 0 { | ||
| graphObs, err := s.observationStore.GetObservationsByIDs(r.Context(), newIDs, "", 0) | ||
| if err == nil && len(graphObs) > 0 { | ||
| clusteredObservations = append(clusteredObservations, graphObs...) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Проверка: есть ли project/scope-фильтрация в реализациях, вызываемых этим блоком.
# Ожидаемый результат: либо фильтрация уже есть в нижнем слое, либо её нужно оставить в handler как в патче выше.
rg -n -C3 '\btype\s+GraphStore\b|\bGetNeighbors\s*\(' --type=go
rg -n -C4 '\bfunc\s*\(.*\)\s*GetNeighbors\s*\(' --type=go
rg -n -C4 '\bfunc\s*\(.*\)\s*GetObservationsByIDs\s*\(' --type=goRepository: thebtf/engram
Length of output: 5820
🏁 Script executed:
# Read the full implementation of GetObservationsByIDs
sed -n '316,350p' internal/db/gorm/observation_store.goRepository: thebtf/engram
Length of output: 1017
🏁 Script executed:
# Check broader context around lines 430-447 in handlers_context.go
sed -n '400,460p' internal/worker/handlers_context.goRepository: thebtf/engram
Length of output: 2457
🏁 Script executed:
# Search for how GetObservationsByIDs is called elsewhere to understand if it filters by project
rg -n -B2 -A2 'GetObservationsByIDs' --type=go | head -60Repository: thebtf/engram
Length of output: 4322
🏁 Script executed:
# Check function signature and broader context to find where 'project' variable comes from
sed -n '200,450p' internal/worker/handlers_context.go | head -100Repository: thebtf/engram
Length of output: 4168
🏁 Script executed:
# Search for project variable declaration/usage in this handler function
rg -n 'project\s*:=' internal/worker/handlers_context.go | grep -A2 -B2 '40[0-9]'Repository: thebtf/engram
Length of output: 39
🏁 Script executed:
# Look at the beginning of the handler function containing line 444
sed -n '1,450p' internal/worker/handlers_context.go | tail -250 | head -100Repository: thebtf/engram
Length of output: 4195
🏁 Script executed:
# Find the function name and signature containing line 444
sed -n '100,450p' internal/worker/handlers_context.go | grep -n 'func\|^}'Repository: thebtf/engram
Length of output: 145
🏁 Script executed:
# Get the function signature for handleSearchByPrompt
sed -n '241,260p' internal/worker/handlers_context.goRepository: thebtf/engram
Length of output: 814
🏁 Script executed:
# Find the exact function signature and where project is defined
sed -n '180,250p' internal/worker/handlers_context.goRepository: thebtf/engram
Length of output: 2879
🏁 Script executed:
# Verify one more detail: check if graph expansion could return observations from any project
sed -n '425,460p' internal/worker/handlers_context.goRepository: thebtf/engram
Length of output: 1482
🏁 Script executed:
# Check the Observation model to see if it has Scope and Project fields
rg -n 'type Observation struct' --type=go -A 20 | head -40Repository: thebtf/engram
Length of output: 4192
Добавьте явную фильтрацию project/scope для результатов graph expansion на строке 444.
GetObservationsByIDs не выполняет фильтрацию по проекту—она только фильтрует по ID. После graph expansion (строка 444) результаты добавляются в clusteredObservations без проверки, что наблюдения принадлежат текущему проекту. Это открывает путь для межпроектной инъекции контекста. Проверьте поле Scope (доступные значения: 'project', 'global', 'agent') и Project для каждого наблюдения перед добавлением.
💡 Предлагаемое исправление
if len(newIDs) > 0 {
graphObs, err := s.observationStore.GetObservationsByIDs(r.Context(), newIDs, "", 0)
- if err == nil && len(graphObs) > 0 {
- clusteredObservations = append(clusteredObservations, graphObs...)
+ if err != nil {
+ log.Warn().Err(err).Msg("graph expansion: failed to fetch expanded observations")
+ } else if len(graphObs) > 0 {
+ for _, obs := range graphObs {
+ if obs.Scope == models.ScopeGlobal || (obs.Scope == models.ScopeProject && obs.Project == project) {
+ if !existingIDs[obs.ID] {
+ existingIDs[obs.ID] = true
+ clusteredObservations = append(clusteredObservations, obs)
+ }
+ }
+ }
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/worker/handlers_context.go` around lines 430 - 447, After expanding
IDs via s.searchMgr.ExpandViaGraph and fetching graphObs with
s.observationStore.GetObservationsByIDs, filter graphObs by observation.Scope
(allow only "project" and "global" or as appropriate) and ensure
observation.Project matches the current project context before appending into
clusteredObservations; update any places using newIDs/similarityScores
accordingly (e.g., in the block around ExpandViaGraph, clusteredObservations,
GetObservationsByIDs) so only observations with the correct Scope/Project are
merged to prevent cross-project injection.
| schedCfg := consolidation.DefaultSchedulerConfig() | ||
| if v := strings.TrimSpace(os.Getenv("ENGRAM_FORGET_ENABLED")); v == "false" || v == "0" { | ||
| schedCfg.ForgetEnabled = false | ||
| } | ||
| consolidationScheduler := consolidation.NewScheduler( | ||
| relevanceCalc, | ||
| assocEngine, | ||
| observationStore, | ||
| relationStore, | ||
| consolidation.DefaultSchedulerConfig(), | ||
| schedCfg, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Проверяем, документирована ли переменная ENGRAM_FORGET_ENABLED
rg -l "ENGRAM_FORGET_ENABLED" --type-not=go
# Ищем .env.example или аналогичные файлы конфигурации
fd -i ".env" -t fRepository: thebtf/engram
Length of output: 77
🏁 Script executed:
# Search for all mentions of ENGRAM_FORGET_ENABLED in the entire repository
rg "ENGRAM_FORGET_ENABLED" -i
# Check for configuration files with various patterns
fd -i "\.env" -type f
# List potential configuration/documentation files
fd -i "(README|config|docs|CONTRIBUTING)" -type f | head -20Repository: thebtf/engram
Length of output: 533
🏁 Script executed:
# Look specifically at the service.go file context around the variable
cat -n internal/worker/service.go | sed -n '885,910p'Repository: thebtf/engram
Length of output: 1018
🏁 Script executed:
# Search for DefaultSchedulerConfig definition
rg "DefaultSchedulerConfig" -A 10
# Search for documentation files
find . -maxdepth 2 -iname "README*" -o -iname "CHANGELOG*" -o -iname "*.md"
# Check for all .env* files anywhere in repository
find . -name ".env*" -o -name "*.env"Repository: thebtf/engram
Length of output: 8474
🏁 Script executed:
# Search for consolidation package and DefaultSchedulerConfig
rg "ForgetEnabled"
# Look for any documentation about environment variables
rg -i "environment|config" -l | grep -E "(README|docs|\.md|\.txt)"Repository: thebtf/engram
Length of output: 1598
🏁 Script executed:
# Check the actual content of .env.example
cat .env.example
# Check if ENGRAM_FORGET_ENABLED is mentioned in CHANGELOG
rg "ENGRAM_FORGET_ENABLED|ForgetEnabled" CHANGELOG.mdRepository: thebtf/engram
Length of output: 1097
🏁 Script executed:
# Check documentation files for environment variable documentation
rg "ENGRAM_FORGET_ENABLED" docs/ README* CONTRIBUTING.mdRepository: thebtf/engram
Length of output: 39
🏁 Script executed:
# Check CHANGELOG for any mentions related to forgetting/consolidation
rg -i "forget|consolidation" CHANGELOG.md | head -30
# Check if there's any documentation about environment variables
rg -i "environment.*variable|ENGRAM_" docs/arch/CONFIGURATION.mdRepository: thebtf/engram
Length of output: 4507
🏁 Script executed:
# Check when this code was added/changed - look at git history if available
git log --oneline -n 20 -- internal/worker/service.go internal/consolidation/scheduler.go 2>/dev/null | head -10Repository: thebtf/engram
Length of output: 124
Документируйте переменную окружения ENGRAM_FORGET_ENABLED.
Переменная ENGRAM_FORGET_ENABLED отсутствует в .env.example и не описана в docs/arch/CONFIGURATION.md. Код устанавливает ForgetEnabled: true по умолчанию, что означает автоматическое архивирование наблюдений с низкой релевантностью при запуске планировщика. Существующие развёртывания начнут выполнять эту операцию автоматически после обновления, если не установить ENGRAM_FORGET_ENABLED=false.
Рекомендуется:
- Добавить переменную в
.env.exampleс комментарием о назначении и поведении по умолчанию - Документировать в
CONFIGURATION.mdкак environment-only переменную (аналогичноENGRAM_API_TOKENиENGRAM_WORKER_HOST) - Уточнить CHANGELOG: определить, было ли это изменение умышленным (смена с opt-in на opt-out) или документация устарела
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/worker/service.go` around lines 892 - 901, Добавьте документирование
переменной окружения ENGRAM_FORGET_ENABLED: вставьте в .env.example новую строку
с именем ENGRAM_FORGET_ENABLED, кратким комментарием о назначении
(включает/выключает автоматическое архивирование наблюдений с низкой
релевантностью) и значением по умолчанию (пусто/true), и обновите
docs/arch/CONFIGURATION.md, добавив запись о том, что ENGRAM_FORGET_ENABLED —
environment-only переменная (аналогично ENGRAM_API_TOKEN и ENGRAM_WORKER_HOST) с
описанием поведения (по умолчанию включено, можно отключить установкой "false"
или "0"); также обновите CHANGELOG, указав, что поведение по умолчанию
изменилось на opt-out (или, если это было намеренно, задокументируйте это
решение) и предложите пользователям проверить конфигурации перед обновлением.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/mcp/tools_store_consolidated.go`:
- Around line 166-174: The deduplication threshold is hardcoded as 0 in the call
to dedup.CheckDuplicate inside handleExtractAndOperate; instead obtain the
configured threshold via cfg := config.Get() and pass
cfg.StoreMemoryDedupThreshold to dedup.CheckDuplicate so the behavior matches
handleStoreMemory and the configurable StoreMemoryDedupThreshold is respected.
Ensure you import/use the same config.Get() and the StoreMemoryDedupThreshold
symbol and replace the literal 0 with cfg.StoreMemoryDedupThreshold in the call
to dedup.CheckDuplicate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: de2fb203-002f-4c63-b1cc-c469427293a0
📒 Files selected for processing (3)
internal/consolidation/scheduler.gointernal/mcp/tools_store_consolidated.gopkg/models/observation.go
✅ Files skipped from review due to trivial changes (1)
- pkg/models/observation.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/consolidation/scheduler.go
| // Dedup check: skip near-duplicates, supersede contradictions (shared Mem0 Algorithm 1). | ||
| dedupResult, dedupErr := dedup.CheckDuplicate(ctx, s.vectorClient, s.observationStore.GetDB(), project, obs.Narrative, 0) | ||
| if dedupErr != nil { | ||
| log.Debug().Err(dedupErr).Str("title", obs.Title).Msg("extract: dedup check failed, proceeding with ADD") | ||
| } | ||
| if dedupResult != nil && dedupResult.Action == dedup.ActionNoop { | ||
| duplicates++ | ||
| continue | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Проверяем сигнатуру CheckDuplicate и значение dedupThreshold
ast-grep --pattern 'func CheckDuplicate($$$) ($$$)'
# Ищем использование dedupThreshold в tools_memory.go
rg -n 'dedupThreshold' --type=go -C3Repository: thebtf/engram
Length of output: 7691
🏁 Script executed:
# Ищем DefaultNoopThreshold
rg -n 'DefaultNoopThreshold' --type=go -C2
# Проверяем полный контекст tools_store_consolidated.go вокруг строк 166-174
rg -n 'CheckDuplicate.*0\)' --type=go -C5 internal/mcp/tools_store_consolidated.go
# Смотрим есть ли другие пороги в tools_store_consolidated.go
rg -n 'dedupThreshold\|noopThreshold' --type=go internal/mcp/tools_store_consolidated.goRepository: thebtf/engram
Length of output: 1657
🏁 Script executed:
# Проверяем доступ к конфигу в tools_store_consolidated.go
rg -n 'func.*store' --type=go -A5 internal/mcp/tools_store_consolidated.go | head -50
# Смотрим импорты и общую структуру файла
head -100 internal/mcp/tools_store_consolidated.go
# Проверяем есть ли cfg в методе где используется CheckDuplicate
rg -B20 'CheckDuplicate.*0\)' --type=go internal/mcp/tools_store_consolidated.go | head -50Repository: thebtf/engram
Length of output: 4108
🏁 Script executed:
# Проверяем структуру Server и доступ к cfg
rg -n 'type Server struct' --type=go -A20 internal/mcp/
# Смотрим handleStoreMemory и как он получает threshold
rg -n 'func.*handleStoreMemory' --type=go -A50 internal/mcp/tools_memory.go | head -80
# Ищем где используется StoreMemoryDedupThreshold в конфиге
rg -n 'StoreMemoryDedupThreshold' --type=go -C3Repository: thebtf/engram
Length of output: 5138
🏁 Script executed:
# Проверяем функцию handleExtractAndOperate полностью
sed -n '40,250p' internal/mcp/tools_store_consolidated.go
# Проверяем есть ли другие вызовы CheckDuplicate в файле
rg -n 'CheckDuplicate' --type=go internal/mcp/tools_store_consolidated.go
# Смотрим коммиты или комментарии около этой строки
rg -n 'extract' --type=go -i -C3 internal/mcp/tools_store_consolidated.go | grep -A5 -B5 'CheckDuplicate'Repository: thebtf/engram
Length of output: 6278
Использование порога дедупликации должно быть согласовано с tools_memory.go.
В handleExtractAndOperate передаётся литерал 0 к CheckDuplicate, что срабатывает fallback на DefaultNoopThreshold (0.92). Однако handleStoreMemory явно получает StoreMemoryDedupThreshold из конфигурации (lines 78-80 tools_memory.go), позволяя настраивать порог.
Необходимо применить тот же подход: добавить cfg := config.Get() и использовать cfg.StoreMemoryDedupThreshold вместо 0, чтобы дедупликация была согласована и конфигурируема.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/mcp/tools_store_consolidated.go` around lines 166 - 174, The
deduplication threshold is hardcoded as 0 in the call to dedup.CheckDuplicate
inside handleExtractAndOperate; instead obtain the configured threshold via cfg
:= config.Get() and pass cfg.StoreMemoryDedupThreshold to dedup.CheckDuplicate
so the behavior matches handleStoreMemory and the configurable
StoreMemoryDedupThreshold is respected. Ensure you import/use the same
config.Get() and the StoreMemoryDedupThreshold symbol and replace the literal 0
with cfg.StoreMemoryDedupThreshold in the call to dedup.CheckDuplicate.
- dedup/checker.go: return ADD when ExtractRowID returns 0 - tools_store_consolidated.go: add DetectionSource to StoreRelation, log MarkAsSuperseded errors - handlers_ingest.go: handle UPDATE branch (supersede), mark raw_event processed on NOOP (was left pending) - service.go: case-insensitive ENGRAM_FORGET_ENABLED check - associations.go: update default comments to match new values
Summary
Roadmap Phase 2: wire existing components into paths where they're missing, tune configuration, validate systems.
Graph Expansion in Context Injection (FR-1):
ExpandViaGraphfor cross-package usehandlers_context.goafter reranking, before composite scoringAssociation Thresholds (FR-2):
ThemeSimilarity: 0.7 → 0.55 (matches ClusteringThreshold)ParallelMaxDays: 7 → 14 (2-week sprint window)ParallelMaxSim: 0.4 → 0.5 (widen temporal pairs)shares_themeandparallel_contextedges (were 0 before)Forgetting Cycle (FR-3):
ForgetEnablednow defaults totrue(wasfalse)ENGRAM_FORGET_ENABLED=falseenv var for runtime overrideSDK Dedup (FR-5):
internal/dedup/checker.go(Mem0 Algorithm 1: NOOP/UPDATE/ADD)tools_store_consolidated.go) now deduplicateshandlers_ingest.go) now deduplicatesCitation Validation (FR-4):
Test plan
go build ./...passesgo test ./internal/consolidation/passes (updated test for new thresholds)shares_themeedge count > 0parallel_contextedge count > 0ENGRAM_FORGET_ENABLED=falsedisables forgettingSummary by CodeRabbit
Примечания к релизу
Новые возможности
Улучшения
Служебные операции