feat: entity extraction + wiki layer (synthesize-wiki-layer)#129
Conversation
Config fields: EntityExtractionEnabled (default true),
EntityExtractionLimit (20), WikiGenerationLimit (10),
WikiMinSources (5), WikiDataDir ({DataDir}/wiki).
All configurable via ENGRAM_* env vars.
Part of synthesize-wiki-layer Phase 1 (T001).
New package internal/synthesis/ with: - types.go: Entity, Relation, ExtractionResult, WikiResult, EntityMetadata, WikiMetadata structs - prompts.go: LLM prompt templates for entity extraction and wiki generation (JSON schema output) - json_parser.go: ParseEntityExtractionResponse with think-block stripping, markdown fence removal, regex JSON fallback, entity/relation validation and type normalization Part of synthesize-wiki-layer Phase 1 (T002-T004).
…ions Extracts structured entities (name, type, description) and relations from observation batches via LLM. Caps at 10 observations per call, truncates narrative to 200 chars. Uses existing learning.LLMClient. Part of synthesize-wiki-layer Phase 2 (T005).
StoreEntities deduplicates by (lower(title), project, entity_type), creates or updates entity observations with structured metadata in narrative field, and creates FalkorDB edges (entity→source, entity→entity). Uses direct SQL for dedup queries and GORM DB for insert/update. No new tables or migrations needed — entities stored as observations with type=entity. Part of synthesize-wiki-layer Phase 2 (T006).
New extractEntities method queries observations from last 24h, batches by project (5 per LLM call), extracts entities+relations, stores as type=entity observations with FalkorDB edges. Registered as Task 21 in maintenance cycle, gated on llmClient != nil && EntityExtractionEnabled. Uses subtitle field as entity_extracted flag (no schema change). Part of synthesize-wiki-layer Phase 2 (T007-T008).
wiki_generator.go: LLM-based wiki summary generation from entity's source observations. Caps at 10 sources, ~4000 token budget. wiki_disk.go: WriteWikiToDisk writes markdown with frontmatter (type, sources, date). UpdateWikiIndex generates index.md table. EntitySlug for filesystem-safe names. Part of synthesize-wiki-layer Phase 3 (T010-T011).
generateWikiPages finds entities with 5+ source observations, generates LLM wiki summaries, stores as type=wiki observations, writes markdown backup to disk, and maintains index.md. Wiki regenerated only when source count changes by 3+. Old wiki superseded with EVOLVES_FROM pattern. Registered as Task 22, runs after Task 21 (entity extraction). Part of synthesize-wiki-layer Phase 3 (T012-T014).
wiki type_weight=2.0 (highest), entity type_weight=1.0 (neutral). Wiki observations have recency=1.0 (no decay, like manual saves). Entity observations have 90-day half-life. Part of synthesize-wiki-layer Phase 4 (T015-T016).
Wiki observations rendered in <wiki-knowledge> section before
<relevant-memory>. Capped at 3 wiki results.
Entity observations as one-liner references:
[ENTITY] {name} — {description}
Wiki/entity separated from regular observations before type grouping.
IDs tracked for mark-injected.
Part of synthesize-wiki-layer Phase 4 (T017-T018).
Part of synthesize-wiki-layer Phase 5 (T020).
|
Caution Review failedPull request was closed or merged during review WalkthroughДобавлены механизмы извлечения сущностей и генерации вики-страниц: новые конфигурации, планировщик обслуживания вызывает задачи извлечения и генерации, реализованы LLM-интеграция, парсинг результатов, сохранение сущностей/вики в БД и на диск, обновлён скоринг поиска и плагин инъекции подсказок. Changes
Sequence Diagram(s)sequenceDiagram
participant Service as Service.runMaintenance()
participant Config as Config (Entity/Wiki)
participant DB as observations DB
participant Extractor as EntityExtractor
participant LLM as LLMClient
participant Parser as ParseEntityExtractionResponse()
participant Store as StoreEntities()
participant GraphDB as GraphStore
Service->>Config: Проверка EntityExtractionEnabled
alt enabled
Service->>DB: Query recent observations (24h, exclude types, not superseded)
DB-->>Service: observation batches (grouped by project)
loop per batch
Service->>Extractor: Extract(ctx, llm, observations)
Extractor->>LLM: Complete(system+user prompts)
LLM-->>Extractor: raw response
Extractor->>Parser: ParseEntityExtractionResponse(raw)
Parser-->>Extractor: ExtractionResult
Extractor-->>Service: result
alt result non-empty
Service->>Store: StoreEntities(db, graph, project, result, sourceIDs)
Store->>DB: insert/update observations
Store->>GraphDB: write edges (if available)
Store-->>Service: counts
else
Service->>DB: markEntityExtracted(ids)
end
end
end
sequenceDiagram
participant Service as Service.runMaintenance()
participant Config as Config (WikiGeneration)
participant DB as observations DB
participant Helper as getSourceObservations()
participant WikiGen as WikiGenerator
participant LLM as LLMClient
participant Disk as WriteWikiToDisk / UpdateWikiIndex
Service->>Config: Проверка WikiGenerationLimit, WikiMinSources
alt enabled
Service->>DB: Query recent entity observations (limit×2)
DB-->>Service: candidate entities
loop until generated >= limit
Service->>Service: parse entity narrative → EntityMetadata
Service->>DB: lookup existing wiki by title/project
alt existing wiki and not enough new sources
Service->>Service: append index entry, skip regeneration
else
Service->>Helper: getSourceObservations(sourceIDs)
Helper->>DB: GetObservationsByIDs(...)
DB-->>Helper: source observations
Helper-->>Service: sources
Service->>WikiGen: Generate(ctx, llm, entity, sources)
WikiGen->>LLM: Complete(system+user prompts)
LLM-->>WikiGen: content
WikiGen-->>Service: WikiResult
Service->>DB: supersede old wiki (if any)
Service->>DB: insert new wiki observation
alt WikiDataDir set
Service->>Disk: WriteWikiToDisk(dataDir, ...)
Disk-->>Service: success/fail
Service->>Service: append index entry
end
end
end
alt index entries exist and WikiDataDir set
Service->>Disk: UpdateWikiIndex(dataDir, entries)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new synthesis layer for entity extraction and wiki generation. It adds maintenance tasks that use an LLM to extract structured entities and relations from observations, store them in the database, and generate markdown wiki summaries for entities with sufficient source material. The search and injection logic has also been updated to prioritize these synthesized wiki entries. Feedback focuses on improving the robustness and idiomatic nature of the Go implementation, specifically regarding database operations and string manipulation.
| if err := db.WithContext(ctx). | ||
| Table("observations"). | ||
| Where("id IN ?", ids). | ||
| Update("subtitle", fmt.Sprintf("entity_extracted:%d", time.Now().Unix())).Error; err != nil { |
There was a problem hiding this comment.
The Update call replaces the entire subtitle column value. The comment on line 134 indicates an intent to append to the subtitle. If subtitle is used to store other metadata flags (e.g., from other maintenance tasks), those will be lost. Consider using a database-level concatenation (e.g., subtitle || ',' || ...) or fetching the existing value first if preservation is required.
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/search/manager.go (1)
130-141:⚠️ Potential issue | 🔴 CriticalКритическая ошибка: типы наблюдений "wiki" и "entity" не существуют в схеме БД.
Строки 131, 137, 147, 166 используют bare string literals
"wiki"и"entity", которые не определены в CHECK constraint таблицы observations. Согласноinternal/db/gorm/models.go:61, разрешены только типы:'decision', 'bugfix', 'feature', 'refactor', 'discovery', 'change', 'guidance'.Попытка сохранить наблюдение с типом
"wiki"или"entity"приведёт к нарушению constraint в PostgreSQL. Кроме того, вpkg/models/observation.goотсутствуют константыObsTypeWikiиObsTypeEntity.Необходимо либо:
- Добавить эти типы в CHECK constraint и определить константы, либо
- Удалить эти типы из weightsMap и убрать обработку для них.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/search/manager.go` around lines 130 - 141, The typeWeights map in internal/search/manager.go contains invalid observation types "wiki" and "entity" that are not allowed by the DB CHECK constraint; remove those keys from the map and any codepaths that reference ObsTypeWiki or ObsTypeEntity (and any handling at the locations around the current map usage and lines noted in the comment) so only the permitted ObservationType values (decision, bugfix, feature, refactor, discovery, change, guidance) are used; if there are any leftover constants or imports for ObsTypeWiki/ObsTypeEntity, delete them and update related tests or places that expect those types to use one of the allowed constants instead.
🧹 Nitpick comments (8)
internal/synthesis/types.go (2)
56-62: Отсутствует JSON-тег уSourceCountвWikiIndexEntry.Поле
SourceCountне имеет JSON-тега, в то время как другие подобные поля в файле используют snake_case теги (source_countвWikiMetadata). Это создаёт несогласованность.♻️ Предлагаемое исправление
// WikiIndexEntry represents one entry in the wiki index.md file. type WikiIndexEntry struct { - EntityName string - EntityType string - Slug string - SourceCount int + EntityName string `json:"entity_name"` + EntityType string `json:"entity_type"` + Slug string `json:"slug"` + SourceCount int `json:"source_count"` }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/synthesis/types.go` around lines 56 - 62, Поле SourceCount в структуре WikiIndexEntry не имеет JSON-тега, что нарушает согласованность с другими структурами (например, WikiMetadata использует `source_count`); добавьте JSON-тег `source_count` к полю SourceCount в типе WikiIndexEntry (структура WikiIndexEntry, поле SourceCount) чтобы сериализация использовала snake_case и соответствовала остальным полям.
26-30: Отсутствуют JSON-теги уWikiResult.В отличие от других структур в этом файле (
Entity,Relation,ExtractionResult,EntityMetadata,WikiMetadata), структураWikiResultне имеет JSON-тегов для полейEntityNameиContent. Если эта структура будет сериализована в JSON (например, для логирования или API), имена полей будут в PascalCase вместо snake_case.♻️ Предлагаемое исправление
// WikiResult holds the LLM wiki generation output. type WikiResult struct { - EntityName string - Content string + EntityName string `json:"entity_name"` + Content string `json:"content"` }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/synthesis/types.go` around lines 26 - 30, The WikiResult struct lacks JSON tags, causing PascalCase keys when serialized; update the WikiResult definition to add json tags for EntityName and Content (e.g., `entity_name` and `content`) so serialization matches other structs' snake_case convention and consistent API/log output, ensuring tags are placed on the EntityName and Content fields of the WikiResult type.internal/synthesis/wiki_disk.go (1)
31-50: Нет валидацииdataDirперед файловыми операциями.Согласно контексту из
internal/config/config.go:672-680,WikiDataDirвсегда получает значение по умолчанию. ОднакоWriteWikiToDiskиUpdateWikiIndexне проверяют, чтоdataDirне пустой. Если по какой-то причине будет передана пустая строка,filepath.Join("", "wiki")вернёт"wiki", и файлы будут записаны в относительный путь от текущей директории процесса.🛡️ Предлагаемое добавление валидации
func WriteWikiToDisk(dataDir, entityName, entityType, content string, sourceCount int) error { + if dataDir == "" { + return fmt.Errorf("dataDir cannot be empty") + } wikiDir := filepath.Join(dataDir, "wiki")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/synthesis/wiki_disk.go` around lines 31 - 50, The function WriteWikiToDisk (and similarly UpdateWikiIndex) lacks validation of the dataDir parameter and can write to a relative path when an empty string is passed; add a guard at the start of WriteWikiToDisk that checks dataDir is non-empty (and optionally check filepath.IsAbs or call filepath.Clean/Abs if you prefer enforcing an absolute path) and return a clear error (e.g., "invalid dataDir") before any filesystem operations (MkdirAll, filepath.Join, os.WriteFile); mirror the same validation logic in UpdateWikiIndex to prevent accidental writes to the process working directory.internal/maintenance/service.go (1)
474-495: Wiki generation привязана к флагуEntityExtractionEnabled.Обе задачи (Task 21 и Task 22) контролируются одним флагом
EntityExtractionEnabled. Это означает, что невозможно:
- Включить entity extraction без wiki generation
- Включить wiki generation без entity extraction
Если это намеренное поведение (wiki зависит от entities), рекомендуется добавить комментарий, объясняющий эту связь. Если независимое управление желательно, нужен отдельный флаг
WikiGenerationEnabled.📝 Вариант A: Добавить пояснительный комментарий
// Task 21: Entity extraction from recent observations (synthesize-wiki-layer FR-1) // LLM extracts structured entities + relations from observations, stores as type=entity. + // Note: Wiki generation (Task 22) is gated by the same flag since wikis depend on entities. if s.llmClient != nil && s.config.EntityExtractionEnabled {📝 Вариант B: Добавить отдельный флаг для wiki generation
- if s.llmClient != nil && s.config.EntityExtractionEnabled { + if s.llmClient != nil && s.config.WikiGenerationEnabled { wikiGenerated, err := s.generateWikiPages(ctx)С соответствующим добавлением
WikiGenerationEnabledвconfig.go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/maintenance/service.go` around lines 474 - 495, Both Task 21 (entity extraction) and Task 22 (wiki generation) are gated by the same flag s.config.EntityExtractionEnabled which prevents enabling wiki generation independently; either document the dependency or add a separate config flag. Update the maintenance flow to check a new s.config.WikiGenerationEnabled (or similar) for invoking s.generateWikiPages(ctx) while keeping entity extraction controlled by s.config.EntityExtractionEnabled, and add the new WikiGenerationEnabled field to the config struct (and its loading/validation) so wiki generation can be toggled independently; if the wiki truly depends on newly extracted entities instead, add a clear comment above the Task 22 block explaining that dependency instead of changing flags.internal/synthesis/entity_extractor.go (1)
26-29: Жёстко заданный лимит 10 наблюдений vs конфигурируемыйEntityExtractionLimit.Лимит в 10 наблюдений захардкожен, в то время как в конфигурации есть
EntityExtractionLimit(по умолчанию 20). Это может вызвать путаницу:
EntityExtractionLimit= лимит наблюдений на цикл maintenance- Здесь 10 = лимит на один LLM-вызов
Рекомендуется либо вынести это значение в константу с пояснением, либо принимать его как параметр.
♻️ Предлагаемое улучшение
+// maxObservationsPerBatch is the maximum observations processed per LLM call. +// This is separate from config.EntityExtractionLimit which controls observations per maintenance cycle. +const maxObservationsPerBatch = 10 + func (e *EntityExtractor) Extract(ctx context.Context, llm learning.LLMClient, observations []*models.Observation) (*ExtractionResult, error) { // ... limit := len(observations) - if limit > 10 { - limit = 10 + if limit > maxObservationsPerBatch { + limit = maxObservationsPerBatch }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/synthesis/entity_extractor.go` around lines 26 - 29, The code currently hardcodes a per-LLM-call cap by setting limit := len(observations); if limit > 10 { limit = 10 } — replace that magic number with the configured EntityExtractionLimit (or a clearly named constant) so the per-call cap is consistent with configuration; update the logic that computes limit (the variable handling the observations slice in entity_extractor.go) to read the cap from the config value EntityExtractionLimit (falling back to a defined default constant if nil/unset) and add a brief comment explaining this is the per-LLM-call limit rather than the maintenance-cycle limit.internal/maintenance/entity_extraction.go (2)
140-146: Потенциальная потеря данных в поле subtitle.Метод
Update("subtitle", ...)полностью перезаписывает существующее значениеsubtitle. Если у наблюдения было значимое содержимое в этом поле, оно будет утеряно.Рассмотрите использование конкатенации или отдельного поля для флагов обработки.
♻️ Альтернатива с сохранением существующих данных
- Update("subtitle", fmt.Sprintf("entity_extracted:%d", time.Now().Unix())).Error; err != nil { + Update("subtitle", gorm.Expr("CASE WHEN subtitle IS NULL OR subtitle = '' THEN ? ELSE subtitle || ' ' || ? END", + fmt.Sprintf("entity_extracted:%d", time.Now().Unix()), + fmt.Sprintf("entity_extracted:%d", time.Now().Unix()))).Error; err != nil {Или используйте отдельное поле
processed_flagsдля хранения состояния обработки.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/maintenance/entity_extraction.go` around lines 140 - 146, The current Update("subtitle", fmt.Sprintf("entity_extracted:%d", ...)) on the observations table overwrites any existing subtitle content; change the update to preserve existing subtitle by appending the flag instead of replacing it (e.g. use a SQL expression/DB function or GORM's Expr to set subtitle = CONCAT_WS(',', COALESCE(subtitle, ''), 'entity_extracted:<ts>')) or switch to writing the flag into a dedicated field like processed_flags; update the DB call that uses db.WithContext(ctx).Table("observations").Update(...) so it either uses a gorm.Expr concatenation or targets a new processed_flags column to avoid losing existing subtitle data.
86-98: ПолеConceptsзапрашивается, но не используется.В запросе на строке 44 выбирается поле
concepts, но при преобразовании вmodels.Observation(строки 90-96) это поле игнорируется. ЕслиEntityExtractorне использует концепты, запрос можно упростить.♻️ Предлагаемое упрощение
Если поле
conceptsне используется вEntityExtractor, удалите его из запроса и структуры:var observations []struct { ID int64 Project string Type string Title string Narrative string - Concepts string CreatedAtEpoch int64 } err := db.WithContext(ctx). Table("observations"). - Select("id, project, type, title, COALESCE(narrative, '') as narrative, COALESCE(concepts, '[]') as concepts, created_at_epoch"). + Select("id, project, type, title, COALESCE(narrative, '') as narrative, created_at_epoch").🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/maintenance/entity_extraction.go` around lines 86 - 98, The SQL query is selecting `concepts` but the loop that builds `modelObs` (used by EntityExtractor) ignores it; either stop selecting `concepts` or map it into models.Observation. Update the code around the conversion loop that builds modelObs (and the query that produces batch) to do one of: 1) remove `concepts` from the select/struct that fills `batch` if EntityExtractor doesn't need it; or 2) populate the Concepts field on models.Observation from the batch item (e.g., set Models.Observation.Concepts = sql.NullString or the appropriate type) so the extracted concepts are preserved for EntityExtractor. Ensure to update any source struct used to hold query results and the conversion in the loop where modelObs is built.plugin/engram/hooks/user-prompt.js (1)
292-298: Избыточная проверка условия.Проверка
if (entityObs.length > 0)на строке 297 избыточна, так как код уже находится внутри блока с таким же условием на строке 292.♻️ Предлагаемое исправление
if (entityObs.length > 0) { for (const e of entityObs.slice(0, 5)) { wikiBlock += `[ENTITY] ${escapeXmlTags(asString(e.title))} — ${escapeXmlTags(asString(e.subtitle || e.narrative))}\n`; if (e && typeof e.id === 'number' && e.id > 0) searchIds.push(e.id); } - if (entityObs.length > 0) wikiBlock += '\n'; + wikiBlock += '\n'; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/engram/hooks/user-prompt.js` around lines 292 - 298, The inner redundant check "if (entityObs.length > 0)" inside the outer block should be removed: inside the block that already guards on entityObs.length > 0 (where wikiBlock is built and searchIds appended), simply append the trailing newline via "wikiBlock += '\n';" without re-checking entityObs; locate the code by the symbols entityObs, wikiBlock, and searchIds and remove the inner conditional so the newline is added unconditionally within the outer guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/maintenance/wiki_generation.go`:
- Around line 191-202: The custom lower(s string) function only handles ASCII
and should be removed and replaced with the standard strings.ToLower; import
"strings" if missing and update call sites that reference lower(...) to
strings.ToLower(...), ensuring Unicode-aware lowercasing is used everywhere (or
add a small wrapper that calls strings.ToLower if you prefer a single symbol to
replace lower).
- Around line 75-80: The DB lookup uses "wiki: %s" but elsewhere the inserted
title uses "Wiki: %s", so existing wiki rows are never matched; update the WHERE
parameter generation in the db.WithContext(...).Table("observations").Where(...)
call to produce the same prefix and casing used on insert (e.g. use "Wiki: %s"
instead of "wiki: %s") and ensure you still pass a lowercased title for the
LOWER(title) comparison (e.g. replace lower(entity.Title) with
strings.ToLower(entity.Title) or the equivalent project helper) so the
LOWER(title) = ? check works correctly with the prefix.
In `@internal/synthesis/entity_store.go`:
- Around line 105-117: The current code retrieves the inserted observation ID by
re-querying via
p.DB.WithContext(...).Table("observations").Select("id").Where(...).Row().Scan(&newID),
which is racey and can leave newID as 0; change the insert so the DB returns the
new ID directly (e.g. perform the INSERT with RETURNING id and scan that result)
or switch to GORM's Create/Save on a model struct so the inserted record's ID is
auto-populated, then use that ID to populate entityIDsByName (and remove
reliance on re-querying by title/project/created_at_epoch); keep references to
the same variables/maps (entityIDsByName, newID, counts.Created,
p.DB.WithContext) when making the change.
- Around line 66-67: The json.Unmarshal call that decodes existingNarrative into
existingMeta currently ignores errors, which may allow corrupted metadata to be
treated as zero-values and cause data loss; change the code to capture the error
from json.Unmarshal (e.g., err := json.Unmarshal(...)) and handle it explicitly:
if err != nil, log or return the error with context including the entity ID and
existingNarrative, and skip the merge or preserve the original metadata instead
of proceeding with zero-values; update any callers (the merge/update routine
that references existingMeta and existingNarrative) to respect this error path
so corrupted metadata is not silently overwritten.
In `@internal/synthesis/json_parser.go`:
- Around line 84-100: The validation currently skips relations only when both
endpoints are unknown due to the condition using && on entityNames checks, which
contradicts the comment; update the logic in the loop over result.Relations
depending on desired behavior: if you must require both endpoints to be known,
change the condition to use || (i.e., if !entityNames[strings.ToLower(from)] ||
!entityNames[strings.ToLower(to)] then continue) so relations with any unknown
endpoint are dropped; if current behavior (allow one unknown endpoint) is
intended, update the comment that mentions "skip relations where neither
endpoint is a known entity" to clearly state that relations are kept when at
least one endpoint is a known entity; refer to entityNames, result.Relations,
Relation and normalizeRelationType to locate the code.
- Around line 13-14: Паттерн jsonObjectRE использует жадный квантификатор
`(?s)\{.*\}` и может захватить от первой `{` до последней `}`, возвращая
некорректный фрагмент при нескольких объектов или тексте со скобками; замените
его на ленивый квантификатор (например `(?s)\{.*?\}`) чтобы избежать чрезмерного
захвата в fallback-извлечении и, по возможности, добавьте заметку/рефакторинг в
код вокруг функции, которая пытается парсить напрямую и затем использует
jsonObjectRE, чтобы в будущем заменить regex на более надёжный алгоритм (счётчик
фигурных скобок или специализированная библиотека) для корректной обработки
вложенных объектов.
In `@internal/synthesis/wiki_disk.go`:
- Around line 16-27: The EntitySlug function currently truncates the slug with a
byte-slice (slug = slug[:60]) which can split multibyte UTF-8 characters; change
the truncation to operate on runes instead: convert slug to a []rune, if its
length > 60 take the first 60 runes and reassign slug from that rune slice, so
EntitySlug returns valid UTF-8 even when trimming. Ensure you update only the
truncation logic in EntitySlug and keep the rest (lowercasing, regex
replacement, trimming, empty fallback) unchanged.
- Around line 66-70: The table row builder in the loop over entries writes raw
EntityName and link text which can break Markdown if EntityName contains '|' or
'['/']'; add a small helper (e.g., escapeMarkdownCell or escapeMarkdownText) and
use it when composing the row so that '|' becomes '\|' and '[' / ']' become '\['
/ '\]'; apply the escape to e.EntityName (for the first cell) and to the link
text portion (the visible "[%s.md]" text) while still using the safe filename
produced by EntitySlug(e.EntityName) for the actual file reference, and update
the loop where sb.WriteString(fmt.Sprintf(...)) is called to use the escaped
values.
In `@internal/synthesis/wiki_generator.go`:
- Around line 30-34: The check uses entity.Narrative.Valid but then reads
entity.Subtitle.String, causing a mismatch; update the condition in the
wiki_generator logic so you validate the same field you read — either change the
if to check entity.Subtitle.Valid before assigning entity.Subtitle.String, or if
the intent was to use Narrative, assign entity.Narrative.String instead; ensure
the variable entityDesc is populated only after the matching Valid check
(references: entityDesc, entity.Narrative.Valid, entity.Subtitle.String).
---
Outside diff comments:
In `@internal/search/manager.go`:
- Around line 130-141: The typeWeights map in internal/search/manager.go
contains invalid observation types "wiki" and "entity" that are not allowed by
the DB CHECK constraint; remove those keys from the map and any codepaths that
reference ObsTypeWiki or ObsTypeEntity (and any handling at the locations around
the current map usage and lines noted in the comment) so only the permitted
ObservationType values (decision, bugfix, feature, refactor, discovery, change,
guidance) are used; if there are any leftover constants or imports for
ObsTypeWiki/ObsTypeEntity, delete them and update related tests or places that
expect those types to use one of the allowed constants instead.
---
Nitpick comments:
In `@internal/maintenance/entity_extraction.go`:
- Around line 140-146: The current Update("subtitle",
fmt.Sprintf("entity_extracted:%d", ...)) on the observations table overwrites
any existing subtitle content; change the update to preserve existing subtitle
by appending the flag instead of replacing it (e.g. use a SQL expression/DB
function or GORM's Expr to set subtitle = CONCAT_WS(',', COALESCE(subtitle, ''),
'entity_extracted:<ts>')) or switch to writing the flag into a dedicated field
like processed_flags; update the DB call that uses
db.WithContext(ctx).Table("observations").Update(...) so it either uses a
gorm.Expr concatenation or targets a new processed_flags column to avoid losing
existing subtitle data.
- Around line 86-98: The SQL query is selecting `concepts` but the loop that
builds `modelObs` (used by EntityExtractor) ignores it; either stop selecting
`concepts` or map it into models.Observation. Update the code around the
conversion loop that builds modelObs (and the query that produces batch) to do
one of: 1) remove `concepts` from the select/struct that fills `batch` if
EntityExtractor doesn't need it; or 2) populate the Concepts field on
models.Observation from the batch item (e.g., set Models.Observation.Concepts =
sql.NullString or the appropriate type) so the extracted concepts are preserved
for EntityExtractor. Ensure to update any source struct used to hold query
results and the conversion in the loop where modelObs is built.
In `@internal/maintenance/service.go`:
- Around line 474-495: Both Task 21 (entity extraction) and Task 22 (wiki
generation) are gated by the same flag s.config.EntityExtractionEnabled which
prevents enabling wiki generation independently; either document the dependency
or add a separate config flag. Update the maintenance flow to check a new
s.config.WikiGenerationEnabled (or similar) for invoking
s.generateWikiPages(ctx) while keeping entity extraction controlled by
s.config.EntityExtractionEnabled, and add the new WikiGenerationEnabled field to
the config struct (and its loading/validation) so wiki generation can be toggled
independently; if the wiki truly depends on newly extracted entities instead,
add a clear comment above the Task 22 block explaining that dependency instead
of changing flags.
In `@internal/synthesis/entity_extractor.go`:
- Around line 26-29: The code currently hardcodes a per-LLM-call cap by setting
limit := len(observations); if limit > 10 { limit = 10 } — replace that magic
number with the configured EntityExtractionLimit (or a clearly named constant)
so the per-call cap is consistent with configuration; update the logic that
computes limit (the variable handling the observations slice in
entity_extractor.go) to read the cap from the config value EntityExtractionLimit
(falling back to a defined default constant if nil/unset) and add a brief
comment explaining this is the per-LLM-call limit rather than the
maintenance-cycle limit.
In `@internal/synthesis/types.go`:
- Around line 56-62: Поле SourceCount в структуре WikiIndexEntry не имеет
JSON-тега, что нарушает согласованность с другими структурами (например,
WikiMetadata использует `source_count`); добавьте JSON-тег `source_count` к полю
SourceCount в типе WikiIndexEntry (структура WikiIndexEntry, поле SourceCount)
чтобы сериализация использовала snake_case и соответствовала остальным полям.
- Around line 26-30: The WikiResult struct lacks JSON tags, causing PascalCase
keys when serialized; update the WikiResult definition to add json tags for
EntityName and Content (e.g., `entity_name` and `content`) so serialization
matches other structs' snake_case convention and consistent API/log output,
ensuring tags are placed on the EntityName and Content fields of the WikiResult
type.
In `@internal/synthesis/wiki_disk.go`:
- Around line 31-50: The function WriteWikiToDisk (and similarly
UpdateWikiIndex) lacks validation of the dataDir parameter and can write to a
relative path when an empty string is passed; add a guard at the start of
WriteWikiToDisk that checks dataDir is non-empty (and optionally check
filepath.IsAbs or call filepath.Clean/Abs if you prefer enforcing an absolute
path) and return a clear error (e.g., "invalid dataDir") before any filesystem
operations (MkdirAll, filepath.Join, os.WriteFile); mirror the same validation
logic in UpdateWikiIndex to prevent accidental writes to the process working
directory.
In `@plugin/engram/hooks/user-prompt.js`:
- Around line 292-298: The inner redundant check "if (entityObs.length > 0)"
inside the outer block should be removed: inside the block that already guards
on entityObs.length > 0 (where wikiBlock is built and searchIds appended),
simply append the trailing newline via "wikiBlock += '\n';" without re-checking
entityObs; locate the code by the symbols entityObs, wikiBlock, and searchIds
and remove the inner conditional so the newline is added unconditionally within
the outer guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2d53516f-f216-4ab1-a2d6-5f6a6d48f57e
📒 Files selected for processing (16)
.socraticodeignoreinternal/config/config.gointernal/maintenance/entity_extraction.gointernal/maintenance/service.gointernal/maintenance/wiki_generation.gointernal/search/manager.gointernal/synthesis/entity_extractor.gointernal/synthesis/entity_store.gointernal/synthesis/json_parser.gointernal/synthesis/prompts.gointernal/synthesis/types.gointernal/synthesis/wiki_disk.gointernal/synthesis/wiki_generator.goplugin/engram/.claude-plugin/plugin.jsonplugin/engram/hooks/user-prompt.jsplugin/openclaw-engram/package.json
| // jsonObjectRE extracts the first JSON object from a string. | ||
| var jsonObjectRE = regexp.MustCompile(`(?s)\{.*\}`) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Проверяем, есть ли тесты для json_parser.go с краевыми случаями
rg -l "jsonObjectRE|ParseEntityExtractionResponse" --glob '*_test.go'Repository: thebtf/engram
Length of output: 39
🏁 Script executed:
cat -n internal/synthesis/json_parser.go | head -50Repository: thebtf/engram
Length of output: 1967
🏁 Script executed:
# Проверим, есть ли файлы, связанные с json_parser
fd "json_parser"Repository: thebtf/engram
Length of output: 90
🏁 Script executed:
# Поищем использование jsonObjectRE в коде
rg "jsonObjectRE" --context 5Repository: thebtf/engram
Length of output: 1767
🏁 Script executed:
wc -l internal/synthesis/json_parser.goRepository: thebtf/engram
Length of output: 94
🏁 Script executed:
tail -100 internal/synthesis/json_parser.goRepository: thebtf/engram
Length of output: 2896
🏁 Script executed:
# Ищем файлы тестов, которые могут содержать примеры использования
fd ".*_test\.go$" | xargs grep -l "ParseEntityExtractionResponse" 2>/dev/null || echo "No tests found"Repository: thebtf/engram
Length of output: 71
Жадный regex может захватить лишний контент при использовании в качестве fallback.
Паттерн (?s)\{.*\} является жадным и захватит от первой { до последней } в строке. Если очищенный LLM-ответ содержит несколько объектов или текст с фигурными скобками, regex вернёт некорректный результат.
Однако практический риск снижен, так как функция сначала пытается распарсить вывод напрямую как JSON (строка 23), и только при неудаче обращается к regex-извлечению (строка 28). Рекомендуется:
- Заменить на ленивый квантификатор:
(?s)\{.*?\}— это уменьшит захват в типичных случаях. - Рассмотреть более надёжный подход для nested объектов (счётчик скобок или специализированная библиотека), если это потребуется в будущем.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/synthesis/json_parser.go` around lines 13 - 14, Паттерн jsonObjectRE
использует жадный квантификатор `(?s)\{.*\}` и может захватить от первой `{` до
последней `}`, возвращая некорректный фрагмент при нескольких объектов или
тексте со скобками; замените его на ленивый квантификатор (например
`(?s)\{.*?\}`) чтобы избежать чрезмерного захвата в fallback-извлечении и, по
возможности, добавьте заметку/рефакторинг в код вокруг функции, которая пытается
парсить напрямую и затем использует jsonObjectRE, чтобы в будущем заменить regex
на более надёжный алгоритм (счётчик фигурных скобок или специализированная
библиотека) для корректной обработки вложенных объектов.
| validRels := make([]Relation, 0, len(result.Relations)) | ||
| for _, r := range result.Relations { | ||
| from := strings.TrimSpace(r.From) | ||
| to := strings.TrimSpace(r.To) | ||
| if from == "" || to == "" { | ||
| continue | ||
| } | ||
| if !entityNames[strings.ToLower(from)] && !entityNames[strings.ToLower(to)] { | ||
| continue // skip relations where neither endpoint is a known entity | ||
| } | ||
| validRels = append(validRels, Relation{ | ||
| From: from, | ||
| To: to, | ||
| RelType: normalizeRelationType(strings.TrimSpace(r.RelType)), | ||
| }) | ||
| } | ||
| result.Relations = validRels |
There was a problem hiding this comment.
Логика валидации связей пропускает связи с одним неизвестным endpoint.
Комментарий на строке 92 говорит "skip relations where neither endpoint is a known entity", но условие !entityNames[from] && !entityNames[to] пропускает только когда оба endpoint неизвестны. Связи, где только один endpoint известен, будут сохранены.
Если это намеренное поведение (для связей с внешними сущностями), комментарий следует исправить. Если нет — нужно изменить условие.
🔧 Если нужно требовать оба известных endpoint
- if !entityNames[strings.ToLower(from)] && !entityNames[strings.ToLower(to)] {
- continue // skip relations where neither endpoint is a known entity
+ if !entityNames[strings.ToLower(from)] || !entityNames[strings.ToLower(to)] {
+ continue // skip relations where either endpoint is not a known entity
}🔧 Если текущее поведение корректно — исправить комментарий
if !entityNames[strings.ToLower(from)] && !entityNames[strings.ToLower(to)] {
- continue // skip relations where neither endpoint is a known entity
+ continue // skip relations where both endpoints are unknown entities
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| validRels := make([]Relation, 0, len(result.Relations)) | |
| for _, r := range result.Relations { | |
| from := strings.TrimSpace(r.From) | |
| to := strings.TrimSpace(r.To) | |
| if from == "" || to == "" { | |
| continue | |
| } | |
| if !entityNames[strings.ToLower(from)] && !entityNames[strings.ToLower(to)] { | |
| continue // skip relations where neither endpoint is a known entity | |
| } | |
| validRels = append(validRels, Relation{ | |
| From: from, | |
| To: to, | |
| RelType: normalizeRelationType(strings.TrimSpace(r.RelType)), | |
| }) | |
| } | |
| result.Relations = validRels | |
| validRels := make([]Relation, 0, len(result.Relations)) | |
| for _, r := range result.Relations { | |
| from := strings.TrimSpace(r.From) | |
| to := strings.TrimSpace(r.To) | |
| if from == "" || to == "" { | |
| continue | |
| } | |
| if !entityNames[strings.ToLower(from)] || !entityNames[strings.ToLower(to)] { | |
| continue // skip relations where either endpoint is not a known entity | |
| } | |
| validRels = append(validRels, Relation{ | |
| From: from, | |
| To: to, | |
| RelType: normalizeRelationType(strings.TrimSpace(r.RelType)), | |
| }) | |
| } | |
| result.Relations = validRels |
| validRels := make([]Relation, 0, len(result.Relations)) | |
| for _, r := range result.Relations { | |
| from := strings.TrimSpace(r.From) | |
| to := strings.TrimSpace(r.To) | |
| if from == "" || to == "" { | |
| continue | |
| } | |
| if !entityNames[strings.ToLower(from)] && !entityNames[strings.ToLower(to)] { | |
| continue // skip relations where neither endpoint is a known entity | |
| } | |
| validRels = append(validRels, Relation{ | |
| From: from, | |
| To: to, | |
| RelType: normalizeRelationType(strings.TrimSpace(r.RelType)), | |
| }) | |
| } | |
| result.Relations = validRels | |
| validRels := make([]Relation, 0, len(result.Relations)) | |
| for _, r := range result.Relations { | |
| from := strings.TrimSpace(r.From) | |
| to := strings.TrimSpace(r.To) | |
| if from == "" || to == "" { | |
| continue | |
| } | |
| if !entityNames[strings.ToLower(from)] && !entityNames[strings.ToLower(to)] { | |
| continue // skip relations where both endpoints are unknown entities | |
| } | |
| validRels = append(validRels, Relation{ | |
| From: from, | |
| To: to, | |
| RelType: normalizeRelationType(strings.TrimSpace(r.RelType)), | |
| }) | |
| } | |
| result.Relations = validRels |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/synthesis/json_parser.go` around lines 84 - 100, The validation
currently skips relations only when both endpoints are unknown due to the
condition using && on entityNames checks, which contradicts the comment; update
the logic in the loop over result.Relations depending on desired behavior: if
you must require both endpoints to be known, change the condition to use ||
(i.e., if !entityNames[strings.ToLower(from)] ||
!entityNames[strings.ToLower(to)] then continue) so relations with any unknown
endpoint are dropped; if current behavior (allow one unknown endpoint) is
intended, update the comment that mentions "skip relations where neither
endpoint is a known entity" to clearly state that relations are kept when at
least one endpoint is a known entity; refer to entityNames, result.Relations,
Relation and normalizeRelationType to locate the code.
|
@codex review |
Review fixes for PR #129: - Use JSONB concepts append instead of subtitle overwrite for entity_extracted flag (entity_extraction.go) - Fix wiki_generator.go: check Subtitle.Valid not Narrative.Valid - Remove custom lower() function, use strings.ToLower (wiki_generation.go) - Use RETURNING id instead of separate SELECT (entity_store.go) - Fix slug truncation to use rune-safe slicing (wiki_disk.go) - Add markdown table cell escaping (wiki_disk.go) - Fix relation validation comment accuracy (json_parser.go) - Handle corrupted entity metadata gracefully (entity_store.go)
Summary
Adds a synthesis layer to engram: LLM-based entity extraction from observations during maintenance, wiki summary generation for mature entities, and wiki scoring boost in retrieval.
New package:
internal/synthesis/— entity extractor, entity store, wiki generator, wiki disk writer, JSON parser, LLM prompts.Maintenance Tasks:
Retrieval changes:
wikitype_weight = 2.0 (highest, above guidance=1.8)<wiki-knowledge>section in context injection before<relevant-memory>[ENTITY] {name} — {description}Config: 5 new env vars (ENGRAM_ENTITY_EXTRACTION_ENABLED/LIMIT, ENGRAM_WIKI_GENERATION_LIMIT, ENGRAM_WIKI_MIN_SOURCES, ENGRAM_WIKI_DATA_DIR)
Architecture: No new tables, no new MCP tools, no new migrations. Entities and wiki stored as observations with type=entity/wiki. Metadata in narrative field as JSON. FalkorDB edges for entity→source and entity→entity relations.
Inspired by: qwe-qwe knowledge-graph-design (three-layer: raw→entity→wiki in single collection).
Test plan
go build ./...passes (verified){DataDir}/wiki/with index.mdrecall(type="wiki")returns synthesized summaries<wiki-knowledge>section appears in UserPromptSubmit hook outputSummary by CodeRabbit
Новые функции
Chores