feat: extraction quality v2 — category prompts + behavioral rules#43
Conversation
- ChunkExtractionSystemPrompt: 6 categories (DECISION, CORRECTION, DEBUGGING, GOTCHA, PATTERN, USER_BEHAVIOR) with detection patterns and DO NOT EXTRACT list - RetrospectiveSystemPrompt: session-level holistic summary - FeedbackImportSystemPrompt: converts feedback_*.md to TRIGGER/RULE/REASON - XMLObservation: added Category field - ParseRetrospective: robust XML parser for session retrospective
- Replace vague "be generous" extraction prompt with 6-category taxonomy (DECISION, CORRECTION, DEBUGGING, GOTCHA, PATTERN, USER_BEHAVIOR) adapted for single tool-output analysis - Map categories to observation types: user_behavior → guidance, debugging → bugfix, correction/gotcha/pattern → discovery - Category field parsed from XML, takes precedence over <type>
Query engram for type=guidance observations (user behavior rules) and inject as <user-behavior-rules> block BEFORE <relevant-memory>. Rules are not subject to diversity penalty or LLM filter — always injected if matched. Max 10 rules, global scope prioritized.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughДобавлены категория в XML-наблюдения, новые подсказки и сборочные функции для чанков, парсинг одиночных наблюдений и ретроспективы, интеграция ретроспективных наблюдений в backfill, импорт правил обратной связи из файлов, приоритетная инъекция правил поведения в клиентский prompt и новый endpoint для purge-rebuild. Changes
Sequence Diagram(s)sequenceDiagram
participant Plugin as Plugin (user-prompt.js)
participant API as Context API
participant Backfill as Backfill Runner
participant LLM as LLM
participant Parser as Parser
participant Store as Storage
Plugin->>API: GET /api/context/search (obs_type: guidance)
API-->>Plugin: guidance observations
Plugin->>Plugin: фильтрация + форматирование <user-behavior-rules>
Plugin-->>Backfill: prompt (behaviorRulesBlock + context)
Backfill->>LLM: ChunkExtractionSystemPrompt + chunk user prompt
LLM-->>Backfill: XML observations (включая <category>)
Backfill->>Parser: unmarshal XML
Parser->>Parser: categoryTypeMap -> determine Type/SourceType
Parser-->>Store: StoreObservation(..., project, ...)
Backfill->>LLM: RetrospectiveSystemPrompt (session snippets)
LLM-->>Backfill: <session_retrospective> XML
Backfill->>Parser: ParseRetrospective -> SessionObservations
Backfill-->>Store: StoreSummary + deduped session observations
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the quality and structure of extracted observations by moving from a generic extraction method to a precise, category-based system. It introduces new prompts and parsing logic to categorize observations into specific types, such as decisions, corrections, and user behaviors. A key outcome is the ability to inject user-defined behavioral rules directly into the AI's context, ensuring that the system consistently adheres to user preferences and past corrections, thereby improving the AI's adaptability and responsiveness. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant improvement to observation extraction by moving from a vague system to a structured 6-category model. This includes new, more precise prompts, updated data structures, and refined parsing logic. A key addition is the ability to inject high-priority user behavior rules into the session context. My review focused on the implementation of these new features. I've identified a couple of areas for improvement: one regarding performance and maintainability in retrospective.go by defining a map at the package level, and another concerning data consistency in parser.go to ensure all LLM-derived observations are correctly marked with the SourceLLMDerived source type. Overall, this is a well-executed feature that promises to enhance the quality and relevance of extracted knowledge.
| validOutcomes := map[string]bool{ | ||
| "shipped": true, "partial": true, "blocked": true, "investigation_only": true, | ||
| } | ||
| if !validOutcomes[retro.Summary.Outcome] { | ||
| retro.Summary.Outcome = "partial" // default | ||
| } |
There was a problem hiding this comment.
For better performance and consistency with other validation maps in the project (like ValidTypes), the validOutcomes map should be defined as a package-level variable instead of being re-initialized on every function call.
I'd recommend moving the map definition to the package level, for example:
var validRetrospectiveOutcomes = map[string]bool{
"shipped": true, "partial": true, "blocked": true, "investigation_only": true,
}Then, you can simplify this part of the function.
| validOutcomes := map[string]bool{ | |
| "shipped": true, "partial": true, "blocked": true, "investigation_only": true, | |
| } | |
| if !validOutcomes[retro.Summary.Outcome] { | |
| retro.Summary.Outcome = "partial" // default | |
| } | |
| if !validRetrospectiveOutcomes[retro.Summary.Outcome] { | |
| retro.Summary.Outcome = "partial" // default | |
| } |
| if category == "user_behavior" { | ||
| finalSourceType = models.SourceLLMDerived | ||
| } |
There was a problem hiding this comment.
All observations extracted via this LLM-based parsing logic should likely have their SourceType set to models.SourceLLMDerived, not just those with the user_behavior category. Since the observations for all categories are derived from an LLM analyzing tool output, it would be more consistent to apply this source type to any observation that has a valid category.
finalSourceType = models.SourceLLMDerivedThere was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
plugin/engram/hooks/user-prompt.js (2)
302-306: Логирование можно улучшить: добавить количество инжектируемых правил.Текущее логирование указывает, что правила поведения инжектируются, но не показывает их количество, что затрудняет диагностику.
✏️ Предлагаемое улучшение логирования
Для реализации потребуется сохранить количество правил. Добавьте переменную при построении блока:
+ let behaviorRulesCount = 0; // ... if (rules.length > 0) { + behaviorRulesCount = Math.min(rules.length, 10); // ... }Затем обновите логирование:
if (output) { - console.error(`[engram] Injecting: ${behaviorRulesBlock ? 'behavior rules + ' : ''}${observationCount} observations`); + console.error(`[engram] Injecting: ${behaviorRulesCount > 0 ? `${behaviorRulesCount} behavior rules + ` : ''}${observationCount} observations`); return output; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/engram/hooks/user-prompt.js` around lines 302 - 306, The console log currently shows only that behavior rules are injected and the observationCount; capture and log the number of behavior rules as well by computing a rules count when assembling the behaviorRulesBlock (e.g., derive behaviorRulesCount while building behaviorRulesBlock) and include that value in the console.error message that references behaviorRulesBlock, observationCount and output; update the injection assembly logic around behaviorRulesBlock + contextToInject so the new behaviorRulesCount variable is maintained and logged together with the existing message emitted via console.error.
46-54: Рассмотрите сортировку правил по релевантности перед ограничением до 10.В отличие от технических наблюдений (строка 90), правила поведения не сортируются по
similarityперед выборкой первых 10. Если API возвращает результаты не по убыванию релевантности, более важные правила могут быть исключены.♻️ Предлагаемое улучшение
if (rules.length > 0) { + // Sort by similarity to prioritize most relevant rules + rules.sort((a, b) => (b.similarity || 0) - (a.similarity || 0)); behaviorRulesBlock = '<user-behavior-rules>\n';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/engram/hooks/user-prompt.js` around lines 46 - 54, The current generation of behaviorRulesBlock iterates rules and takes rules.slice(0, 10) without sorting, which can drop more relevant user rules; before slicing, sort the rules array by their relevance score (e.g., rule.similarity or rule.score) in descending order so the top 10 most relevant are kept. Update the logic around the symbols rules, rules.slice(0, 10), and the block that builds behaviorRulesBlock (while still using escapeXmlTags for title and narrative) to sort by the appropriate similarity field before truncation.internal/backfill/extract/extract.go (1)
84-97: ПолеCategoryдобавлено, но не валидируется вValidateXML.Добавлено поле
Categoryи картаValidCategories, однако функцияValidateXMLне проверяет значение категории против допустимого списка. Если это преднамеренно (например, для совместимости с устаревшими данными или потому что валидация происходит вparser.go), рекомендую добавить комментарий, объясняющий это решение.Если валидация необходима, можно добавить проверку:
♻️ Опциональная валидация категории
if len(o.Concepts.Concepts) == 0 { result.Errors = append(result.Errors, fmt.Sprintf("%s: no concepts", prefix)) valid = false } + // Optional: validate category if present + if o.Category != "" && !ValidCategories[o.Category] { + result.Errors = append(result.Errors, fmt.Sprintf("%s: invalid category %q", prefix, o.Category)) + // Note: not marking invalid=false to keep backwards compatibility + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backfill/extract/extract.go` around lines 84 - 97, ValidateXML currently omits checking the new Category field against ValidCategories; either add a validation step in ValidateXML to verify that the Observation.Category (or struct field named Category) exists in ValidCategories and return a descriptive error if not, or if omission is intentional, add a clarifying comment in the same file explaining why category is not validated here (e.g., legacy compatibility or validation in parser.go). Reference the Category struct field and the ValidCategories map and update the ValidateXML function accordingly.internal/worker/sdk/processor.go (1)
1175-1178: Рекомендуется рассмотреть унификацию промптов.Системный промпт здесь (
processor.go) почти идентиченChunkExtractionSystemPromptвprompts.go. Основное отличие — "single tool execution" vs "multi-exchange chunks". Рассмотрите возможность вынесения общей базы в константу с параметризацией различий, чтобы избежать расхождения при будущих изменениях.♻️ Пример унификации
// В prompts.go: const categoryDefinitions = `CATEGORY 1 — DECISION: ... ... DO NOT EXTRACT: ...` // В processor.go: const systemPrompt = `You are a coding session analyst. Analyze this single tool execution... ` + categoryDefinitions + ` RULES: - Maximum 1 observation per tool execution ...` // В prompts.go: const ChunkExtractionSystemPrompt = `You are a coding session analyst. Read the transcript segment... ` + categoryDefinitions + ` RULES: - Maximum 2 observations per chunk ...`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/worker/sdk/processor.go` around lines 1175 - 1178, Extract the shared taxonomy text into a single constant (e.g., categoryDefinitions) in prompts.go and replace the duplicated body in both processor.go (systemPrompt) and prompts.go (ChunkExtractionSystemPrompt) by concatenating or formatting that shared constant with the file-specific prefixes/suffixes (e.g., the single-tool vs multi-exchange intro and their RULES blocks); update the symbols systemPrompt and ChunkExtractionSystemPrompt to reference the shared categoryDefinitions and keep only the differing sentences (like "single tool execution" vs "multi-exchange chunks") and rule lines in each location so future changes to the taxonomy are made in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/backfill/extract/retrospective.go`:
- Around line 31-42: The current extraction assumes start < end; update the
validation around start and end (used to build xmlStr from raw) to also check
ordering and avoid invalid slices by returning nil when start == -1 || end == -1
|| start >= end; ensure this change is applied where start, end, raw, and xmlStr
are used (the XML extraction block that feeds xml.Unmarshal into
SessionRetrospective).
In `@internal/worker/sdk/parser.go`:
- Around line 147-149: The ParsedObservation being appended uses finalSourceType
which can be empty for non-user_behavior categories; change the logic around the
construction of models.ParsedObservation (where observations =
append(observations, &models.ParsedObservation{...}) uses finalType and
finalSourceType) to ensure SourceType is not an empty string: if the
category/type is not user_behavior, set SourceType to models.SourceUnknown (or
derive it deterministically from the category to match the behavior in
handlers_ingest.go:183, handlers_backfill.go:434 and handlers_vault.go:205) so
downstream code (smart_gc.go protections, search/manager.go boosts, and
scoring/calculator.go penalties) receives a valid SourceType instead of "".
In `@plugin/engram/hooks/user-prompt.js`:
- Around line 43-45: The filter on rulesResult.observations can match
observations with empty/undefined project when project is set to '' (from
ctx.Project not a string); update the filter in the rules assignment to only
compare observation project to project when project is a non-empty string —
i.e., keep the global scope check as-is but change the second branch to require
project truthiness before comparing via asString(obs.project) === project so
observations with missing/empty project are not accidentally treated as
matching; reference symbols: rulesResult, rulesResult.observations, asString,
project, obs.
---
Nitpick comments:
In `@internal/backfill/extract/extract.go`:
- Around line 84-97: ValidateXML currently omits checking the new Category field
against ValidCategories; either add a validation step in ValidateXML to verify
that the Observation.Category (or struct field named Category) exists in
ValidCategories and return a descriptive error if not, or if omission is
intentional, add a clarifying comment in the same file explaining why category
is not validated here (e.g., legacy compatibility or validation in parser.go).
Reference the Category struct field and the ValidCategories map and update the
ValidateXML function accordingly.
In `@internal/worker/sdk/processor.go`:
- Around line 1175-1178: Extract the shared taxonomy text into a single constant
(e.g., categoryDefinitions) in prompts.go and replace the duplicated body in
both processor.go (systemPrompt) and prompts.go (ChunkExtractionSystemPrompt) by
concatenating or formatting that shared constant with the file-specific
prefixes/suffixes (e.g., the single-tool vs multi-exchange intro and their RULES
blocks); update the symbols systemPrompt and ChunkExtractionSystemPrompt to
reference the shared categoryDefinitions and keep only the differing sentences
(like "single tool execution" vs "multi-exchange chunks") and rule lines in each
location so future changes to the taxonomy are made in one place.
In `@plugin/engram/hooks/user-prompt.js`:
- Around line 302-306: The console log currently shows only that behavior rules
are injected and the observationCount; capture and log the number of behavior
rules as well by computing a rules count when assembling the behaviorRulesBlock
(e.g., derive behaviorRulesCount while building behaviorRulesBlock) and include
that value in the console.error message that references behaviorRulesBlock,
observationCount and output; update the injection assembly logic around
behaviorRulesBlock + contextToInject so the new behaviorRulesCount variable is
maintained and logged together with the existing message emitted via
console.error.
- Around line 46-54: The current generation of behaviorRulesBlock iterates rules
and takes rules.slice(0, 10) without sorting, which can drop more relevant user
rules; before slicing, sort the rules array by their relevance score (e.g.,
rule.similarity or rule.score) in descending order so the top 10 most relevant
are kept. Update the logic around the symbols rules, rules.slice(0, 10), and the
block that builds behaviorRulesBlock (while still using escapeXmlTags for title
and narrative) to sort by the appropriate similarity field before truncation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 826d3aad-f1db-452b-89e7-3d528cb94d64
📒 Files selected for processing (7)
internal/backfill/extract/extract.gointernal/backfill/extract/prompts.gointernal/backfill/extract/retrospective.gointernal/worker/sdk/parser.gointernal/worker/sdk/processor.gointernal/worker/sdk/processor_test.goplugin/engram/hooks/user-prompt.js
| // Find the XML in the response (LLM may include preamble) | ||
| start := strings.Index(raw, "<session_retrospective>") | ||
| end := strings.Index(raw, "</session_retrospective>") | ||
| if start == -1 || end == -1 { | ||
| return nil | ||
| } | ||
| xmlStr := raw[start : end+len("</session_retrospective>")] | ||
|
|
||
| var retro SessionRetrospective | ||
| if err := xml.Unmarshal([]byte(xmlStr), &retro); err != nil { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Добавьте проверку порядка start и end для корректной обработки.
Текущая логика предполагает, что start < end, но в случае искажённого XML (например, теги в неправильном порядке) это может привести к некорректному срезу. Рекомендуется добавить проверку:
🛡️ Предлагаемое исправление
start := strings.Index(raw, "<session_retrospective>")
end := strings.Index(raw, "</session_retrospective>")
- if start == -1 || end == -1 {
+ if start == -1 || end == -1 || end <= start {
return nil
}
xmlStr := raw[start : end+len("</session_retrospective>")]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Find the XML in the response (LLM may include preamble) | |
| start := strings.Index(raw, "<session_retrospective>") | |
| end := strings.Index(raw, "</session_retrospective>") | |
| if start == -1 || end == -1 { | |
| return nil | |
| } | |
| xmlStr := raw[start : end+len("</session_retrospective>")] | |
| var retro SessionRetrospective | |
| if err := xml.Unmarshal([]byte(xmlStr), &retro); err != nil { | |
| return nil | |
| } | |
| // Find the XML in the response (LLM may include preamble) | |
| start := strings.Index(raw, "<session_retrospective>") | |
| end := strings.Index(raw, "</session_retrospective>") | |
| if start == -1 || end == -1 || end <= start { | |
| return nil | |
| } | |
| xmlStr := raw[start : end+len("</session_retrospective>")] | |
| var retro SessionRetrospective | |
| if err := xml.Unmarshal([]byte(xmlStr), &retro); err != nil { | |
| return nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/backfill/extract/retrospective.go` around lines 31 - 42, The current
extraction assumes start < end; update the validation around start and end (used
to build xmlStr from raw) to also check ordering and avoid invalid slices by
returning nil when start == -1 || end == -1 || start >= end; ensure this change
is applied where start, end, raw, and xmlStr are used (the XML extraction block
that feeds xml.Unmarshal into SessionRetrospective).
| observations = append(observations, &models.ParsedObservation{ | ||
| Type: finalType, | ||
| SourceType: finalSourceType, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Поиск использования SourceType в моделях и хранилищах
rg -n -C3 'SourceType' --type=go -g '!*_test.go' | head -80Repository: thebtf/engram
Length of output: 5402
🏁 Script executed:
# Find SourceType definition in models
fd . -type f -name "*.go" -path "*/models/*" | xargs grep -l "SourceType" | head -5Repository: thebtf/engram
Length of output: 227
🏁 Script executed:
# Check the SourceType type definition and its values
rg -n "type SourceType" --type=go -A 15Repository: thebtf/engram
Length of output: 1112
🏁 Script executed:
# Check how StoreObservation handles SourceType
rg -n "StoreObservation" --type=go -B 2 -A 20 | grep -A 20 "func.*StoreObservation"Repository: thebtf/engram
Length of output: 9332
🏁 Script executed:
# Look for validations or default assignments related to SourceType
rg -n "SourceType.*=|if.*SourceType|switch.*SourceType" --type=go -C 2 | head -100Repository: thebtf/engram
Length of output: 5052
🏁 Script executed:
# Find database migrations and schema files
fd . -type f \( -name "*migration*" -o -name "*schema*" \) | grep -E "\.(sql|go)$"Repository: thebtf/engram
Length of output: 227
🏁 Script executed:
# Search for database schema or table definitions related to observations
rg -n "CREATE TABLE|observations|source_type" -i --type=sqlRepository: thebtf/engram
Length of output: 205
🏁 Script executed:
# Check if there's a database model for Observation that defines column constraints
fd . -type f -name "*.go" -path "*db*" | xargs grep -l "type Observation struct" | head -3Repository: thebtf/engram
Length of output: 227
🏁 Script executed:
# Look at the Observation DB model structure
rg -n "type Observation struct" -A 30 --type=go | head -80Repository: thebtf/engram
Length of output: 6694
🏁 Script executed:
# Search for parser tests to see if empty SourceType is tested
fd parser_test.go -type fRepository: thebtf/engram
Length of output: 227
🏁 Script executed:
# Check if there's any middleware or post-processing that fills empty SourceType
rg -n "SourceType\s*=" . --type=go -B 3 -A 3 | grep -v "test.go" | head -120Repository: thebtf/engram
Length of output: 4074
🏁 Script executed:
# Look for any validation or default assignment for SourceType
rg -n "if.*SourceType.*==.*\"\"" . --type=goRepository: thebtf/engram
Length of output: 39
Установите SourceType для категорий, отличных от user_behavior, во избежание нарушения логики обработки.
Пустой SourceType (empty string) для не-user_behavior категорий обходит критическую логику в нескольких подсистемах:
smart_gc.go:97— защитаtool_verifiedисточников (2x порог) не применяетсяsearch/manager.go:159, :189— предотвращение распада дляmanualсохранений и 1.5x усиление не применяютсяscoring/calculator.go:62— штраф дляbackfillисточников не применяется
Вместо пустой строки используйте models.SourceUnknown или классифицируйте источник на основе категории, чтобы соответствовать поведению handlers_ingest.go:183, handlers_backfill.go:434 и handlers_vault.go:205.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/worker/sdk/parser.go` around lines 147 - 149, The ParsedObservation
being appended uses finalSourceType which can be empty for non-user_behavior
categories; change the logic around the construction of models.ParsedObservation
(where observations = append(observations, &models.ParsedObservation{...}) uses
finalType and finalSourceType) to ensure SourceType is not an empty string: if
the category/type is not user_behavior, set SourceType to models.SourceUnknown
(or derive it deterministically from the category to match the behavior in
handlers_ingest.go:183, handlers_backfill.go:434 and handlers_vault.go:205) so
downstream code (smart_gc.go protections, search/manager.go boosts, and
scoring/calculator.go penalties) receives a valid SourceType instead of "".
| const rules = Array.isArray(rulesResult.observations) | ||
| ? rulesResult.observations.filter(obs => asString(obs.scope) === 'global' || asString(obs.project) === project) | ||
| : []; |
There was a problem hiding this comment.
Потенциальная проблема: пустой project может неожиданно сопоставляться с наблюдениями без проекта.
Когда ctx.Project не является строкой, project устанавливается в '' (строка 19). В этом случае условие asString(obs.project) === project будет истинным для любых наблюдений с пустым или undefined значением project, что может привести к инъекции правил, не предназначенных для данного контекста.
🛡️ Предлагаемое исправление
const rules = Array.isArray(rulesResult.observations)
- ? rulesResult.observations.filter(obs => asString(obs.scope) === 'global' || asString(obs.project) === project)
+ ? rulesResult.observations.filter(obs => asString(obs.scope) === 'global' || (project && asString(obs.project) === project))
: [];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/engram/hooks/user-prompt.js` around lines 43 - 45, The filter on
rulesResult.observations can match observations with empty/undefined project
when project is set to '' (from ctx.Project not a string); update the filter in
the rules assignment to only compare observation project to project when project
is a non-empty string — i.e., keep the global scope check as-is but change the
second branch to require project truthiness before comparing via
asString(obs.project) === project so observations with missing/empty project are
not accidentally treated as matching; reference symbols: rulesResult,
rulesResult.observations, asString, project, obs.
- Backfill uses ChunkExtractionSystemPrompt (6 categories) - Session retrospective: 2nd LLM pass after chunk extraction produces summary (request/completed/learned/next) + 0-2 session-level observations - SDK session records created for backfilled sessions (visible on Sessions dashboard page) - Summaries stored via SummaryStore (visible in timeline) - 30s timeout for retrospective LLM call
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/backfill/extract/extract.go (1)
83-97:⚠️ Potential issue | 🟡 MinorДобавлено поле Category, но ValidateXML его не проверяет.
Поле
Categoryдобавлено вXMLObservationи создана картаValidCategories, однако функцияValidateXML(строки 154-177) проверяет толькоTypeиOutcome, но неCategory. Это может привести к сохранению наблюдений с невалидными категориями.Согласно AI-summary, валидация и маппинг категорий происходит в
internal/worker/sdk/parser.go, но для backfill-пайплайна это не применяется.🛡️ Предлагаемое исправление для ValidateXML
if !ValidTypes[o.Type] { result.Errors = append(result.Errors, fmt.Sprintf("%s: invalid type %q", prefix, o.Type)) valid = false } + if o.Category != "" && !ValidCategories[o.Category] { + result.Errors = append(result.Errors, fmt.Sprintf("%s: invalid category %q", prefix, o.Category)) + valid = false + } if !ValidOutcomes[o.Outcome] {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backfill/extract/extract.go` around lines 83 - 97, The ValidateXML function currently checks XMLObservation.Type and .Outcome but not .Category; update ValidateXML to verify that obs.Category is non-empty and exists in the ValidCategories map (using the XMLObservation and ValidCategories symbols) and return a clear validation error when the category is missing or not allowed (mirroring how Type/Outcome errors are handled), so backfill rejects observations with invalid categories before persisting.internal/backfill/backfill.go (1)
112-128:⚠️ Potential issue | 🟠 MajorДвойной подсчёт
total_sessionsв метриках.Метрика
total_sessionsувеличивается дважды: сначала вprocessFile(строка 114), затем снова вprocessSession(строка 128). Это приводит к удвоению счётчика при обработке файлов.🐛 Предлагаемое исправление
// processFile processes a single session file and returns extracted observations. func (r *Runner) processFile(ctx context.Context, file string, m *metrics.Metrics) []ExtractedObservation { - m.Add("total_sessions", 1) sess, err := sessions.ParseSession(file) if err != nil { + m.Add("parse_errors", 1) return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backfill/backfill.go` around lines 112 - 128, The metric "total_sessions" is being incremented twice: in processFile and again in processSession via m.Add("total_sessions", 1), causing double counting; remove the redundant increment in processFile and let processSession own the session counter (i.e., delete the m.Add("total_sessions", 1) call from processFile so only processSession calls m.Add("total_sessions", 1) when processing a session).
🧹 Nitpick comments (3)
internal/backfill/extract/extract.go (1)
197-205: Дублирование логики санитизации транскрипта.Функции
BuildUserPromptиBuildChunkUserPromptсодержат идентичную логику санитизации XML-тегов. Рекомендуется извлечь её в отдельную вспомогательную функцию.♻️ Предлагаемый рефакторинг
+// sanitizeTranscript replaces XML tags that could break prompt boundaries. +func sanitizeTranscript(transcript string) string { + return xmlTagRe.ReplaceAllStringFunc(transcript, func(tag string) string { + return strings.ReplaceAll(strings.ReplaceAll(tag, "<", "["), ">", "]") + }) +} + // BuildUserPrompt constructs the user prompt for LLM extraction. // Transcript content is sanitized to prevent XML tag injection (AG06-005). func BuildUserPrompt(projectPath, gitBranch string, durationMin, exchangeCount int, chunkInfo, alreadyExtracted, transcript string) string { - // Replace XML tags that could break prompt boundaries with bracketed equivalents - sanitized := xmlTagRe.ReplaceAllStringFunc(transcript, func(tag string) string { - return strings.ReplaceAll(strings.ReplaceAll(tag, "<", "["), ">", "]") - }) + sanitized := sanitizeTranscript(transcript) return fmt.Sprintf(UserPromptTemplate, projectPath, gitBranch, durationMin, exchangeCount, chunkInfo, alreadyExtracted, sanitized) } // BuildChunkUserPrompt constructs the user prompt using ChunkExtractionUserTemplate. // Transcript content is sanitized to prevent XML tag injection (AG06-005). func BuildChunkUserPrompt(projectPath string, exchangeCount int, chunkInfo, transcript string) string { - // Replace XML tags that could break prompt boundaries with bracketed equivalents - sanitized := xmlTagRe.ReplaceAllStringFunc(transcript, func(tag string) string { - return strings.ReplaceAll(strings.ReplaceAll(tag, "<", "["), ">", "]") - }) + sanitized := sanitizeTranscript(transcript) return fmt.Sprintf(ChunkExtractionUserTemplate, projectPath, exchangeCount, chunkInfo, sanitized) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backfill/extract/extract.go` around lines 197 - 205, Функции BuildUserPrompt и BuildChunkUserPrompt дублируют логику санитизации XML-тегов через xmlTagRe.ReplaceAllStringFunc; вынеси эту логику в одну вспомогательную функцию (например SanitizeTranscriptTags or sanitizeTranscript) и замени внутренние реализации BuildUserPrompt и BuildChunkUserPrompt на вызов этой новой функции с теми же входными данными, сохранив текущее поведение замены "<" на "[" и ">" на "]" и форматирование через ChunkExtractionUserTemplate / соответствующий шаблон.internal/worker/handlers_backfill.go (1)
337-346: НеиспользуемыйsessionDBID.Переменная
sessionDBIDприсваивается, но сразу отбрасывается (_ = sessionDBID). Если значение не нужно, лучше использовать_напрямую в присваивании. Если нужно для отладки — можно залогировать.♻️ Предлагаемое исправление
// T017: Create SDK session record so backfilled sessions appear in Sessions page. if s.sessionStore != nil { - sessionDBID, err := s.sessionStore.CreateSDKSession(r.Context(), sdkSessionID, project, "backfill") + _, err := s.sessionStore.CreateSDKSession(r.Context(), sdkSessionID, project, "backfill") if err != nil { log.Warn().Err(err).Str("session_id", req.SessionID).Msg("backfill: failed to create SDK session") } - _ = sessionDBID }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/worker/handlers_backfill.go` around lines 337 - 346, The local variable sessionDBID returned from s.sessionStore.CreateSDKSession is assigned then immediately discarded (_ = sessionDBID); remove the unused variable by assigning the call to the blank identifier (e.g., _, err := s.sessionStore.CreateSDKSession(...)) or, if you need the ID for debugging/observability, keep sessionDBID and add a log line (e.g., log.Debug().Str("session_db_id", sessionDBID)... ) instead of the discard; update the call site that currently uses sessionDBID and s.sessionStore.CreateSDKSession(r.Context(), sdkSessionID, project, "backfill") accordingly.internal/backfill/backfill.go (1)
308-311: Ошибки LLM-вызова в ретроспективе игнорируются молча.При ошибке LLM-вызова функция возвращает
nilбез логирования. Это затрудняет диагностику проблем в production.📝 Предлагаемое улучшение
xmlOutput, err := r.llm.Complete(retroCtx, extract.RetrospectiveSystemPrompt, retroPrompt) if err != nil { + // Log but don't fail — retrospective is supplementary. + log.Debug().Err(err).Str("project", sess.ProjectPath).Msg("retrospective LLM call failed") return nil }Потребуется добавить импорт
"github.com/rs/zerolog/log".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backfill/backfill.go` around lines 308 - 311, The LLM error returned by r.llm.Complete (called with extract.RetrospectiveSystemPrompt and retroPrompt) is currently ignored; change the error handling to log the error before returning nil by adding an import for "github.com/rs/zerolog/log" and calling log.Error().Err(err).Msg("LLM Complete failed for retrospective prompt") (include any relevant identifiers like retrospective ID or context if available) in the if err != nil block so failures are recorded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/backfill/backfill.go`:
- Around line 112-128: The metric "total_sessions" is being incremented twice:
in processFile and again in processSession via m.Add("total_sessions", 1),
causing double counting; remove the redundant increment in processFile and let
processSession own the session counter (i.e., delete the m.Add("total_sessions",
1) call from processFile so only processSession calls m.Add("total_sessions", 1)
when processing a session).
In `@internal/backfill/extract/extract.go`:
- Around line 83-97: The ValidateXML function currently checks
XMLObservation.Type and .Outcome but not .Category; update ValidateXML to verify
that obs.Category is non-empty and exists in the ValidCategories map (using the
XMLObservation and ValidCategories symbols) and return a clear validation error
when the category is missing or not allowed (mirroring how Type/Outcome errors
are handled), so backfill rejects observations with invalid categories before
persisting.
---
Nitpick comments:
In `@internal/backfill/backfill.go`:
- Around line 308-311: The LLM error returned by r.llm.Complete (called with
extract.RetrospectiveSystemPrompt and retroPrompt) is currently ignored; change
the error handling to log the error before returning nil by adding an import for
"github.com/rs/zerolog/log" and calling log.Error().Err(err).Msg("LLM Complete
failed for retrospective prompt") (include any relevant identifiers like
retrospective ID or context if available) in the if err != nil block so failures
are recorded.
In `@internal/backfill/extract/extract.go`:
- Around line 197-205: Функции BuildUserPrompt и BuildChunkUserPrompt дублируют
логику санитизации XML-тегов через xmlTagRe.ReplaceAllStringFunc; вынеси эту
логику в одну вспомогательную функцию (например SanitizeTranscriptTags or
sanitizeTranscript) и замени внутренние реализации BuildUserPrompt и
BuildChunkUserPrompt на вызов этой новой функции с теми же входными данными,
сохранив текущее поведение замены "<" на "[" и ">" на "]" и форматирование через
ChunkExtractionUserTemplate / соответствующий шаблон.
In `@internal/worker/handlers_backfill.go`:
- Around line 337-346: The local variable sessionDBID returned from
s.sessionStore.CreateSDKSession is assigned then immediately discarded (_ =
sessionDBID); remove the unused variable by assigning the call to the blank
identifier (e.g., _, err := s.sessionStore.CreateSDKSession(...)) or, if you
need the ID for debugging/observability, keep sessionDBID and add a log line
(e.g., log.Debug().Str("session_db_id", sessionDBID)... ) instead of the
discard; update the call site that currently uses sessionDBID and
s.sessionStore.CreateSDKSession(r.Context(), sdkSessionID, project, "backfill")
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 40af082a-ea3e-465d-b759-211542b49007
📒 Files selected for processing (3)
internal/backfill/backfill.gointernal/backfill/extract/extract.gointernal/worker/handlers_backfill.go
- ImportFeedbackFiles: scans feedback_*.md, LLM re-processes each into TRIGGER→RULE→REASON structured observations - ParseSingleObservation: parses bare <observation> XML from LLM - cmd/engram-import: CLI with import-feedback (--dry-run) and purge-rebuild subcommands - POST /api/maintenance/purge-rebuild: selective purge (truncate derived tables, delete non-manual/non-credential observations) with dry_run=true preview mode
Binary naming consistency: engram (local client) + engram-server (remote server). Updated: AGENTS.md, .goreleaser.yaml, Dockerfile, docs/arch/*, scripts/uninstall.sh. internal/worker/ package NOT renamed (Go embed path, separate concern). Closes #43.
Binary naming consistency: engram (local client) + engram-server (remote server). Updated: AGENTS.md, .goreleaser.yaml, Dockerfile, docs/arch/*, scripts/uninstall.sh. internal/worker/ package NOT renamed (Go embed path, separate concern). Closes #43.
Binary naming consistency: engram (local client) + engram-server (remote server). Updated: AGENTS.md, .goreleaser.yaml, Dockerfile, docs/arch/*, scripts/uninstall.sh. internal/worker/ package NOT renamed (Go embed path, separate concern). Closes #43. Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
Summary
Redesigns observation extraction from vague "be generous" to structured 6-category system with user behavior detection.
Phase 1: Prompts + Parsers
XMLObservation.Categoryfield +ValidCategoriesmapParseRetrospective()— robust XML parser for session retrospectivePhase 2: Live Extraction Upgrade
<type>when both presentPhase 3: Behavioral Rule Injection
user-prompt.jsqueries engram for type=guidance observations<user-behavior-rules>block BEFORE<relevant-memory>Test plan
go build ./...passes<user-behavior-rules>block appears in session context when guidance observations existSummary by CodeRabbit
New Features
Improvements