Skip to content

feat: extraction quality v2 — category prompts + behavioral rules#43

Merged
thebtf merged 5 commits into
mainfrom
worktree-extraction-quality-v2
Mar 23, 2026
Merged

feat: extraction quality v2 — category prompts + behavioral rules#43
thebtf merged 5 commits into
mainfrom
worktree-extraction-quality-v2

Conversation

@thebtf
Copy link
Copy Markdown
Owner

@thebtf thebtf commented Mar 23, 2026

Summary

Redesigns observation extraction from vague "be generous" to structured 6-category system with user behavior detection.

Phase 1: Prompts + Parsers

  • ChunkExtractionSystemPrompt: 6 categories (DECISION, CORRECTION, DEBUGGING, GOTCHA, PATTERN, USER_BEHAVIOR) with "look for" detection patterns + DO NOT EXTRACT list
  • RetrospectiveSystemPrompt: session-level holistic summary (for backfill Phase 4)
  • FeedbackImportSystemPrompt: converts feedback_*.md to TRIGGER→RULE→REASON
  • XMLObservation.Category field + ValidCategories map
  • ParseRetrospective() — robust XML parser for session retrospective

Phase 2: Live Extraction Upgrade

  • SDK processor uses category-based prompt (adapted for single-exchange)
  • Category→type mapping: user_behavior→guidance, debugging→bugfix, correction/gotcha/pattern→discovery
  • Category takes precedence over <type> when both present

Phase 3: Behavioral Rule Injection

  • user-prompt.js queries engram for type=guidance observations
  • Injects <user-behavior-rules> block BEFORE <relevant-memory>
  • Rules not subject to diversity penalty or LLM filter — always injected

Test plan

  • go build ./... passes
  • Deploy, run live session — verify category-tagged observations appear
  • Verify <user-behavior-rules> block appears in session context when guidance observations exist
  • Run extraction on 3 sessions with 8B model — user reviews quality

Summary by CodeRabbit

  • New Features

    • Категоризация наблюдений и приоритетное включение правил поведения в пользовательский контекст
    • Сессионный ретроспективный анализ с итоговой сводкой и дополнительными наблюдениями
    • Инструмент импорта правил/отзывов из файлов и команда CLI для импорта/очистки
  • Improvements

    • Ужесточен формат и правила извлечения наблюдений (строгий XML-вывод, новые поля)
    • Улучшена классификация и парсинг наблюдений с учётом категории
    • Добавен HTTP-эндпоинт для очистки/пересборки данных (dry-run поддержан)

thebtf added 3 commits March 23, 2026 13:55
- ChunkExtractionSystemPrompt: 6 categories (DECISION, CORRECTION,
  DEBUGGING, GOTCHA, PATTERN, USER_BEHAVIOR) with detection patterns
  and DO NOT EXTRACT list
- RetrospectiveSystemPrompt: session-level holistic summary
- FeedbackImportSystemPrompt: converts feedback_*.md to TRIGGER/RULE/REASON
- XMLObservation: added Category field
- ParseRetrospective: robust XML parser for session retrospective
- Replace vague "be generous" extraction prompt with 6-category
  taxonomy (DECISION, CORRECTION, DEBUGGING, GOTCHA, PATTERN,
  USER_BEHAVIOR) adapted for single tool-output analysis
- Map categories to observation types: user_behavior → guidance,
  debugging → bugfix, correction/gotcha/pattern → discovery
- Category field parsed from XML, takes precedence over <type>
Query engram for type=guidance observations (user behavior rules)
and inject as <user-behavior-rules> block BEFORE <relevant-memory>.
Rules are not subject to diversity penalty or LLM filter — always
injected if matched. Max 10 rules, global scope prioritized.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78b06331-edb2-47b9-9d22-80323526ba38

📥 Commits

Reviewing files that changed from the base of the PR and between 03c85cc and 3cee4ed.

📒 Files selected for processing (5)
  • cmd/engram-import/main.go
  • internal/backfill/extract/extract.go
  • internal/backfill/import_feedback.go
  • internal/worker/handlers_maintenance.go
  • internal/worker/service.go

Walkthrough

Добавлены категория в XML-наблюдения, новые подсказки и сборочные функции для чанков, парсинг одиночных наблюдений и ретроспективы, интеграция ретроспективных наблюдений в backfill, импорт правил обратной связи из файлов, приоритетная инъекция правил поведения в клиентский prompt и новый endpoint для purge-rebuild.

Changes

Cohort / File(s) Summary
Извлечение: структуры, парсинг, утилиты
internal/backfill/extract/extract.go, internal/backfill/extract/retrospective.go
Добавлено поле Category в XMLObservation, ValidCategories, BuildChunkUserPrompt, ParseSingleObservation; добавлены SessionRetrospective/SessionSummary и ParseRetrospective с валидацией Outcome.
LLM-подсказки
internal/backfill/extract/prompts.go
Новая файл с константами подсказок: ChunkExtractionSystemPrompt, ChunkExtractionUserTemplate, RetrospectiveSystemPrompt, RetrospectiveUserTemplate, FeedbackImportSystemPrompt (новая XML-схема и правила извлечения).
Backfill и импорт обратной связи
internal/backfill/backfill.go, internal/backfill/import_feedback.go, cmd/engram-import/main.go
Добавлено поле Result.Summary; processSession возвращает ретроспективу и вызывает ретроспективный проход LLM; новый импорт feedback_*.md файлов (ImportFeedbackFiles) и CLI engram-import с командой import-feedback.
Парсер/процессор SDK и тесты
internal/worker/sdk/parser.go, internal/worker/sdk/processor.go, internal/worker/sdk/processor_test.go
Добавлен categoryTypeMap с приоритетом category над <type> при разборе; обновлён systemPrompt и тесты под новую схему.
Хендлеры и хранение
internal/worker/handlers_backfill.go, internal/worker/handlers_maintenance.go, internal/worker/service.go
Всегда создаётся SDK session запись; добавлено условное сохранение summary; переработан выбор проекта при сохранении наблюдений; новый HTTP-эндпоинт POST /api/maintenance/purge-rebuild реализован с dry-run и truncate/delete шагами; маршрут зарегистрирован.
Клиентский плагин: инъекция правил поведения
plugin/engram/hooks/user-prompt.js
Prefetch /api/context/search для obs_type: 'guidance', фильтрация scope/project, форматирование до 10 правил в <user-behavior-rules>, поведение блок вставляется перед техническим контекстом; ошибки логируются.
Мелкие: тесты/прочее
internal/worker/sdk/processor_test.go
Адаптация тестов к обновлённому systemPrompt (ожидает CATEGORY и user_behavior).

Sequence Diagram(s)

sequenceDiagram
    participant Plugin as Plugin (user-prompt.js)
    participant API as Context API
    participant Backfill as Backfill Runner
    participant LLM as LLM
    participant Parser as Parser
    participant Store as Storage

    Plugin->>API: GET /api/context/search (obs_type: guidance)
    API-->>Plugin: guidance observations
    Plugin->>Plugin: фильтрация + форматирование <user-behavior-rules>
    Plugin-->>Backfill: prompt (behaviorRulesBlock + context)

    Backfill->>LLM: ChunkExtractionSystemPrompt + chunk user prompt
    LLM-->>Backfill: XML observations (включая <category>)
    Backfill->>Parser: unmarshal XML
    Parser->>Parser: categoryTypeMap -> determine Type/SourceType
    Parser-->>Store: StoreObservation(..., project, ...)
    Backfill->>LLM: RetrospectiveSystemPrompt (session snippets)
    LLM-->>Backfill: <session_retrospective> XML
    Backfill->>Parser: ParseRetrospective -> SessionObservations
    Backfill-->>Store: StoreSummary + deduped session observations
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 Я — кролик, шёрстка в коде миг,
Категорий прибавил я вид.
Правила вперёд — в prompt их несу,
Ретро подытожит суть и путь.
Хвостиком помахаю — и всё в лад!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: introducing a six-category extraction system and behavioral rule injection for improved extraction quality.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-extraction-quality-v2

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the quality and structure of extracted observations by moving from a generic extraction method to a precise, category-based system. It introduces new prompts and parsing logic to categorize observations into specific types, such as decisions, corrections, and user behaviors. A key outcome is the ability to inject user-defined behavioral rules directly into the AI's context, ensuring that the system consistently adheres to user preferences and past corrections, thereby improving the AI's adaptability and responsiveness.

Highlights

  • Structured Observation Extraction: The observation extraction process has been redesigned from a vague 'be generous' approach to a structured 6-category system, including DECISION, CORRECTION, DEBUGGING, GOTCHA, PATTERN, and USER_BEHAVIOR.
  • New Prompt Definitions: New system prompts (ChunkExtractionSystemPrompt, RetrospectiveSystemPrompt, FeedbackImportSystemPrompt) and user templates (ChunkExtractionUserTemplate, RetrospectiveUserTemplate) have been introduced to guide the LLM in structured extraction and session summarization.
  • XML Observation Enhancements: The XMLObservation struct now includes a Category field, and a ValidCategories map has been added to enforce allowed observation categories.
  • Retrospective Parsing: A new ParseRetrospective function was added to robustly parse XML output for session-level retrospectives, including summary and session-level observations.
  • Live Extraction Upgrade: The SDK processor now utilizes the new category-based prompt for single-exchange analysis, with category-to-type mapping (e.g., user_behavior to guidance, debugging to bugfix). The category field takes precedence over the type field when both are present.
  • Behavioral Rule Injection: User behavior rules (observations of type guidance) are now fetched from engram and injected into the user-prompt.js context as a <user-behavior-rules> block, ensuring they are always considered by the LLM and are not subject to diversity penalties or LLM filters.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement to observation extraction by moving from a vague system to a structured 6-category model. This includes new, more precise prompts, updated data structures, and refined parsing logic. A key addition is the ability to inject high-priority user behavior rules into the session context. My review focused on the implementation of these new features. I've identified a couple of areas for improvement: one regarding performance and maintainability in retrospective.go by defining a map at the package level, and another concerning data consistency in parser.go to ensure all LLM-derived observations are correctly marked with the SourceLLMDerived source type. Overall, this is a well-executed feature that promises to enhance the quality and relevance of extracted knowledge.

Comment on lines +45 to +50
validOutcomes := map[string]bool{
"shipped": true, "partial": true, "blocked": true, "investigation_only": true,
}
if !validOutcomes[retro.Summary.Outcome] {
retro.Summary.Outcome = "partial" // default
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better performance and consistency with other validation maps in the project (like ValidTypes), the validOutcomes map should be defined as a package-level variable instead of being re-initialized on every function call.

I'd recommend moving the map definition to the package level, for example:

var validRetrospectiveOutcomes = map[string]bool{
	"shipped": true, "partial": true, "blocked": true, "investigation_only": true,
}

Then, you can simplify this part of the function.

Suggested change
validOutcomes := map[string]bool{
"shipped": true, "partial": true, "blocked": true, "investigation_only": true,
}
if !validOutcomes[retro.Summary.Outcome] {
retro.Summary.Outcome = "partial" // default
}
if !validRetrospectiveOutcomes[retro.Summary.Outcome] {
retro.Summary.Outcome = "partial" // default
}

Comment on lines +104 to +106
if category == "user_behavior" {
finalSourceType = models.SourceLLMDerived
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

All observations extracted via this LLM-based parsing logic should likely have their SourceType set to models.SourceLLMDerived, not just those with the user_behavior category. Since the observations for all categories are derived from an LLM analyzing tool output, it would be more consistent to apply this source type to any observation that has a valid category.

			finalSourceType = models.SourceLLMDerived

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
plugin/engram/hooks/user-prompt.js (2)

302-306: Логирование можно улучшить: добавить количество инжектируемых правил.

Текущее логирование указывает, что правила поведения инжектируются, но не показывает их количество, что затрудняет диагностику.

✏️ Предлагаемое улучшение логирования

Для реализации потребуется сохранить количество правил. Добавьте переменную при построении блока:

+  let behaviorRulesCount = 0;
   // ...
   if (rules.length > 0) {
+    behaviorRulesCount = Math.min(rules.length, 10);
     // ...
   }

Затем обновите логирование:

   if (output) {
-    console.error(`[engram] Injecting: ${behaviorRulesBlock ? 'behavior rules + ' : ''}${observationCount} observations`);
+    console.error(`[engram] Injecting: ${behaviorRulesCount > 0 ? `${behaviorRulesCount} behavior rules + ` : ''}${observationCount} observations`);
     return output;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/engram/hooks/user-prompt.js` around lines 302 - 306, The console log
currently shows only that behavior rules are injected and the observationCount;
capture and log the number of behavior rules as well by computing a rules count
when assembling the behaviorRulesBlock (e.g., derive behaviorRulesCount while
building behaviorRulesBlock) and include that value in the console.error message
that references behaviorRulesBlock, observationCount and output; update the
injection assembly logic around behaviorRulesBlock + contextToInject so the new
behaviorRulesCount variable is maintained and logged together with the existing
message emitted via console.error.

46-54: Рассмотрите сортировку правил по релевантности перед ограничением до 10.

В отличие от технических наблюдений (строка 90), правила поведения не сортируются по similarity перед выборкой первых 10. Если API возвращает результаты не по убыванию релевантности, более важные правила могут быть исключены.

♻️ Предлагаемое улучшение
     if (rules.length > 0) {
+      // Sort by similarity to prioritize most relevant rules
+      rules.sort((a, b) => (b.similarity || 0) - (a.similarity || 0));
       behaviorRulesBlock = '<user-behavior-rules>\n';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/engram/hooks/user-prompt.js` around lines 46 - 54, The current
generation of behaviorRulesBlock iterates rules and takes rules.slice(0, 10)
without sorting, which can drop more relevant user rules; before slicing, sort
the rules array by their relevance score (e.g., rule.similarity or rule.score)
in descending order so the top 10 most relevant are kept. Update the logic
around the symbols rules, rules.slice(0, 10), and the block that builds
behaviorRulesBlock (while still using escapeXmlTags for title and narrative) to
sort by the appropriate similarity field before truncation.
internal/backfill/extract/extract.go (1)

84-97: Поле Category добавлено, но не валидируется в ValidateXML.

Добавлено поле Category и карта ValidCategories, однако функция ValidateXML не проверяет значение категории против допустимого списка. Если это преднамеренно (например, для совместимости с устаревшими данными или потому что валидация происходит в parser.go), рекомендую добавить комментарий, объясняющий это решение.

Если валидация необходима, можно добавить проверку:

♻️ Опциональная валидация категории
 		if len(o.Concepts.Concepts) == 0 {
 			result.Errors = append(result.Errors, fmt.Sprintf("%s: no concepts", prefix))
 			valid = false
 		}
+		// Optional: validate category if present
+		if o.Category != "" && !ValidCategories[o.Category] {
+			result.Errors = append(result.Errors, fmt.Sprintf("%s: invalid category %q", prefix, o.Category))
+			// Note: not marking invalid=false to keep backwards compatibility
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backfill/extract/extract.go` around lines 84 - 97, ValidateXML
currently omits checking the new Category field against ValidCategories; either
add a validation step in ValidateXML to verify that the Observation.Category (or
struct field named Category) exists in ValidCategories and return a descriptive
error if not, or if omission is intentional, add a clarifying comment in the
same file explaining why category is not validated here (e.g., legacy
compatibility or validation in parser.go). Reference the Category struct field
and the ValidCategories map and update the ValidateXML function accordingly.
internal/worker/sdk/processor.go (1)

1175-1178: Рекомендуется рассмотреть унификацию промптов.

Системный промпт здесь (processor.go) почти идентичен ChunkExtractionSystemPrompt в prompts.go. Основное отличие — "single tool execution" vs "multi-exchange chunks". Рассмотрите возможность вынесения общей базы в константу с параметризацией различий, чтобы избежать расхождения при будущих изменениях.

♻️ Пример унификации
// В prompts.go:
const categoryDefinitions = `CATEGORY 1 — DECISION: ...
...
DO NOT EXTRACT:
...`

// В processor.go:
const systemPrompt = `You are a coding session analyst. Analyze this single tool execution...
` + categoryDefinitions + `
RULES:
- Maximum 1 observation per tool execution
...`

// В prompts.go:
const ChunkExtractionSystemPrompt = `You are a coding session analyst. Read the transcript segment...
` + categoryDefinitions + `
RULES:
- Maximum 2 observations per chunk
...`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/sdk/processor.go` around lines 1175 - 1178, Extract the
shared taxonomy text into a single constant (e.g., categoryDefinitions) in
prompts.go and replace the duplicated body in both processor.go (systemPrompt)
and prompts.go (ChunkExtractionSystemPrompt) by concatenating or formatting that
shared constant with the file-specific prefixes/suffixes (e.g., the single-tool
vs multi-exchange intro and their RULES blocks); update the symbols systemPrompt
and ChunkExtractionSystemPrompt to reference the shared categoryDefinitions and
keep only the differing sentences (like "single tool execution" vs
"multi-exchange chunks") and rule lines in each location so future changes to
the taxonomy are made in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/backfill/extract/retrospective.go`:
- Around line 31-42: The current extraction assumes start < end; update the
validation around start and end (used to build xmlStr from raw) to also check
ordering and avoid invalid slices by returning nil when start == -1 || end == -1
|| start >= end; ensure this change is applied where start, end, raw, and xmlStr
are used (the XML extraction block that feeds xml.Unmarshal into
SessionRetrospective).

In `@internal/worker/sdk/parser.go`:
- Around line 147-149: The ParsedObservation being appended uses finalSourceType
which can be empty for non-user_behavior categories; change the logic around the
construction of models.ParsedObservation (where observations =
append(observations, &models.ParsedObservation{...}) uses finalType and
finalSourceType) to ensure SourceType is not an empty string: if the
category/type is not user_behavior, set SourceType to models.SourceUnknown (or
derive it deterministically from the category to match the behavior in
handlers_ingest.go:183, handlers_backfill.go:434 and handlers_vault.go:205) so
downstream code (smart_gc.go protections, search/manager.go boosts, and
scoring/calculator.go penalties) receives a valid SourceType instead of "".

In `@plugin/engram/hooks/user-prompt.js`:
- Around line 43-45: The filter on rulesResult.observations can match
observations with empty/undefined project when project is set to '' (from
ctx.Project not a string); update the filter in the rules assignment to only
compare observation project to project when project is a non-empty string —
i.e., keep the global scope check as-is but change the second branch to require
project truthiness before comparing via asString(obs.project) === project so
observations with missing/empty project are not accidentally treated as
matching; reference symbols: rulesResult, rulesResult.observations, asString,
project, obs.

---

Nitpick comments:
In `@internal/backfill/extract/extract.go`:
- Around line 84-97: ValidateXML currently omits checking the new Category field
against ValidCategories; either add a validation step in ValidateXML to verify
that the Observation.Category (or struct field named Category) exists in
ValidCategories and return a descriptive error if not, or if omission is
intentional, add a clarifying comment in the same file explaining why category
is not validated here (e.g., legacy compatibility or validation in parser.go).
Reference the Category struct field and the ValidCategories map and update the
ValidateXML function accordingly.

In `@internal/worker/sdk/processor.go`:
- Around line 1175-1178: Extract the shared taxonomy text into a single constant
(e.g., categoryDefinitions) in prompts.go and replace the duplicated body in
both processor.go (systemPrompt) and prompts.go (ChunkExtractionSystemPrompt) by
concatenating or formatting that shared constant with the file-specific
prefixes/suffixes (e.g., the single-tool vs multi-exchange intro and their RULES
blocks); update the symbols systemPrompt and ChunkExtractionSystemPrompt to
reference the shared categoryDefinitions and keep only the differing sentences
(like "single tool execution" vs "multi-exchange chunks") and rule lines in each
location so future changes to the taxonomy are made in one place.

In `@plugin/engram/hooks/user-prompt.js`:
- Around line 302-306: The console log currently shows only that behavior rules
are injected and the observationCount; capture and log the number of behavior
rules as well by computing a rules count when assembling the behaviorRulesBlock
(e.g., derive behaviorRulesCount while building behaviorRulesBlock) and include
that value in the console.error message that references behaviorRulesBlock,
observationCount and output; update the injection assembly logic around
behaviorRulesBlock + contextToInject so the new behaviorRulesCount variable is
maintained and logged together with the existing message emitted via
console.error.
- Around line 46-54: The current generation of behaviorRulesBlock iterates rules
and takes rules.slice(0, 10) without sorting, which can drop more relevant user
rules; before slicing, sort the rules array by their relevance score (e.g.,
rule.similarity or rule.score) in descending order so the top 10 most relevant
are kept. Update the logic around the symbols rules, rules.slice(0, 10), and the
block that builds behaviorRulesBlock (while still using escapeXmlTags for title
and narrative) to sort by the appropriate similarity field before truncation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 826d3aad-f1db-452b-89e7-3d528cb94d64

📥 Commits

Reviewing files that changed from the base of the PR and between fa99e46 and 87dbbf8.

📒 Files selected for processing (7)
  • internal/backfill/extract/extract.go
  • internal/backfill/extract/prompts.go
  • internal/backfill/extract/retrospective.go
  • internal/worker/sdk/parser.go
  • internal/worker/sdk/processor.go
  • internal/worker/sdk/processor_test.go
  • plugin/engram/hooks/user-prompt.js

Comment on lines +31 to +42
// Find the XML in the response (LLM may include preamble)
start := strings.Index(raw, "<session_retrospective>")
end := strings.Index(raw, "</session_retrospective>")
if start == -1 || end == -1 {
return nil
}
xmlStr := raw[start : end+len("</session_retrospective>")]

var retro SessionRetrospective
if err := xml.Unmarshal([]byte(xmlStr), &retro); err != nil {
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Добавьте проверку порядка start и end для корректной обработки.

Текущая логика предполагает, что start < end, но в случае искажённого XML (например, теги в неправильном порядке) это может привести к некорректному срезу. Рекомендуется добавить проверку:

🛡️ Предлагаемое исправление
 	start := strings.Index(raw, "<session_retrospective>")
 	end := strings.Index(raw, "</session_retrospective>")
-	if start == -1 || end == -1 {
+	if start == -1 || end == -1 || end <= start {
 		return nil
 	}
 	xmlStr := raw[start : end+len("</session_retrospective>")]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Find the XML in the response (LLM may include preamble)
start := strings.Index(raw, "<session_retrospective>")
end := strings.Index(raw, "</session_retrospective>")
if start == -1 || end == -1 {
return nil
}
xmlStr := raw[start : end+len("</session_retrospective>")]
var retro SessionRetrospective
if err := xml.Unmarshal([]byte(xmlStr), &retro); err != nil {
return nil
}
// Find the XML in the response (LLM may include preamble)
start := strings.Index(raw, "<session_retrospective>")
end := strings.Index(raw, "</session_retrospective>")
if start == -1 || end == -1 || end <= start {
return nil
}
xmlStr := raw[start : end+len("</session_retrospective>")]
var retro SessionRetrospective
if err := xml.Unmarshal([]byte(xmlStr), &retro); err != nil {
return nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backfill/extract/retrospective.go` around lines 31 - 42, The current
extraction assumes start < end; update the validation around start and end (used
to build xmlStr from raw) to also check ordering and avoid invalid slices by
returning nil when start == -1 || end == -1 || start >= end; ensure this change
is applied where start, end, raw, and xmlStr are used (the XML extraction block
that feeds xml.Unmarshal into SessionRetrospective).

Comment on lines 147 to +149
observations = append(observations, &models.ParsedObservation{
Type: finalType,
SourceType: finalSourceType,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Поиск использования SourceType в моделях и хранилищах
rg -n -C3 'SourceType' --type=go -g '!*_test.go' | head -80

Repository: thebtf/engram

Length of output: 5402


🏁 Script executed:

# Find SourceType definition in models
fd . -type f -name "*.go" -path "*/models/*" | xargs grep -l "SourceType" | head -5

Repository: thebtf/engram

Length of output: 227


🏁 Script executed:

# Check the SourceType type definition and its values
rg -n "type SourceType" --type=go -A 15

Repository: thebtf/engram

Length of output: 1112


🏁 Script executed:

# Check how StoreObservation handles SourceType
rg -n "StoreObservation" --type=go -B 2 -A 20 | grep -A 20 "func.*StoreObservation"

Repository: thebtf/engram

Length of output: 9332


🏁 Script executed:

# Look for validations or default assignments related to SourceType
rg -n "SourceType.*=|if.*SourceType|switch.*SourceType" --type=go -C 2 | head -100

Repository: thebtf/engram

Length of output: 5052


🏁 Script executed:

# Find database migrations and schema files
fd . -type f \( -name "*migration*" -o -name "*schema*" \) | grep -E "\.(sql|go)$"

Repository: thebtf/engram

Length of output: 227


🏁 Script executed:

# Search for database schema or table definitions related to observations
rg -n "CREATE TABLE|observations|source_type" -i --type=sql

Repository: thebtf/engram

Length of output: 205


🏁 Script executed:

# Check if there's a database model for Observation that defines column constraints
fd . -type f -name "*.go" -path "*db*" | xargs grep -l "type Observation struct" | head -3

Repository: thebtf/engram

Length of output: 227


🏁 Script executed:

# Look at the Observation DB model structure
rg -n "type Observation struct" -A 30 --type=go | head -80

Repository: thebtf/engram

Length of output: 6694


🏁 Script executed:

# Search for parser tests to see if empty SourceType is tested
fd parser_test.go -type f

Repository: thebtf/engram

Length of output: 227


🏁 Script executed:

# Check if there's any middleware or post-processing that fills empty SourceType
rg -n "SourceType\s*=" . --type=go -B 3 -A 3 | grep -v "test.go" | head -120

Repository: thebtf/engram

Length of output: 4074


🏁 Script executed:

# Look for any validation or default assignment for SourceType
rg -n "if.*SourceType.*==.*\"\"" . --type=go

Repository: thebtf/engram

Length of output: 39


Установите SourceType для категорий, отличных от user_behavior, во избежание нарушения логики обработки.

Пустой SourceType (empty string) для не-user_behavior категорий обходит критическую логику в нескольких подсистемах:

  • smart_gc.go:97 — защита tool_verified источников (2x порог) не применяется
  • search/manager.go:159, :189 — предотвращение распада для manual сохранений и 1.5x усиление не применяются
  • scoring/calculator.go:62 — штраф для backfill источников не применяется

Вместо пустой строки используйте models.SourceUnknown или классифицируйте источник на основе категории, чтобы соответствовать поведению handlers_ingest.go:183, handlers_backfill.go:434 и handlers_vault.go:205.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/sdk/parser.go` around lines 147 - 149, The ParsedObservation
being appended uses finalSourceType which can be empty for non-user_behavior
categories; change the logic around the construction of models.ParsedObservation
(where observations = append(observations, &models.ParsedObservation{...}) uses
finalType and finalSourceType) to ensure SourceType is not an empty string: if
the category/type is not user_behavior, set SourceType to models.SourceUnknown
(or derive it deterministically from the category to match the behavior in
handlers_ingest.go:183, handlers_backfill.go:434 and handlers_vault.go:205) so
downstream code (smart_gc.go protections, search/manager.go boosts, and
scoring/calculator.go penalties) receives a valid SourceType instead of "".

Comment on lines +43 to +45
const rules = Array.isArray(rulesResult.observations)
? rulesResult.observations.filter(obs => asString(obs.scope) === 'global' || asString(obs.project) === project)
: [];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Потенциальная проблема: пустой project может неожиданно сопоставляться с наблюдениями без проекта.

Когда ctx.Project не является строкой, project устанавливается в '' (строка 19). В этом случае условие asString(obs.project) === project будет истинным для любых наблюдений с пустым или undefined значением project, что может привести к инъекции правил, не предназначенных для данного контекста.

🛡️ Предлагаемое исправление
     const rules = Array.isArray(rulesResult.observations)
-      ? rulesResult.observations.filter(obs => asString(obs.scope) === 'global' || asString(obs.project) === project)
+      ? rulesResult.observations.filter(obs => asString(obs.scope) === 'global' || (project && asString(obs.project) === project))
       : [];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/engram/hooks/user-prompt.js` around lines 43 - 45, The filter on
rulesResult.observations can match observations with empty/undefined project
when project is set to '' (from ctx.Project not a string); update the filter in
the rules assignment to only compare observation project to project when project
is a non-empty string — i.e., keep the global scope check as-is but change the
second branch to require project truthiness before comparing via
asString(obs.project) === project so observations with missing/empty project are
not accidentally treated as matching; reference symbols: rulesResult,
rulesResult.observations, asString, project, obs.

- Backfill uses ChunkExtractionSystemPrompt (6 categories)
- Session retrospective: 2nd LLM pass after chunk extraction
  produces summary (request/completed/learned/next) + 0-2
  session-level observations
- SDK session records created for backfilled sessions
  (visible on Sessions dashboard page)
- Summaries stored via SummaryStore (visible in timeline)
- 30s timeout for retrospective LLM call
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/backfill/extract/extract.go (1)

83-97: ⚠️ Potential issue | 🟡 Minor

Добавлено поле Category, но ValidateXML его не проверяет.

Поле Category добавлено в XMLObservation и создана карта ValidCategories, однако функция ValidateXML (строки 154-177) проверяет только Type и Outcome, но не Category. Это может привести к сохранению наблюдений с невалидными категориями.

Согласно AI-summary, валидация и маппинг категорий происходит в internal/worker/sdk/parser.go, но для backfill-пайплайна это не применяется.

🛡️ Предлагаемое исправление для ValidateXML
 		if !ValidTypes[o.Type] {
 			result.Errors = append(result.Errors, fmt.Sprintf("%s: invalid type %q", prefix, o.Type))
 			valid = false
 		}
+		if o.Category != "" && !ValidCategories[o.Category] {
+			result.Errors = append(result.Errors, fmt.Sprintf("%s: invalid category %q", prefix, o.Category))
+			valid = false
+		}
 		if !ValidOutcomes[o.Outcome] {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backfill/extract/extract.go` around lines 83 - 97, The ValidateXML
function currently checks XMLObservation.Type and .Outcome but not .Category;
update ValidateXML to verify that obs.Category is non-empty and exists in the
ValidCategories map (using the XMLObservation and ValidCategories symbols) and
return a clear validation error when the category is missing or not allowed
(mirroring how Type/Outcome errors are handled), so backfill rejects
observations with invalid categories before persisting.
internal/backfill/backfill.go (1)

112-128: ⚠️ Potential issue | 🟠 Major

Двойной подсчёт total_sessions в метриках.

Метрика total_sessions увеличивается дважды: сначала в processFile (строка 114), затем снова в processSession (строка 128). Это приводит к удвоению счётчика при обработке файлов.

🐛 Предлагаемое исправление
 // processFile processes a single session file and returns extracted observations.
 func (r *Runner) processFile(ctx context.Context, file string, m *metrics.Metrics) []ExtractedObservation {
-	m.Add("total_sessions", 1)
 
 	sess, err := sessions.ParseSession(file)
 	if err != nil {
+		m.Add("parse_errors", 1)
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backfill/backfill.go` around lines 112 - 128, The metric
"total_sessions" is being incremented twice: in processFile and again in
processSession via m.Add("total_sessions", 1), causing double counting; remove
the redundant increment in processFile and let processSession own the session
counter (i.e., delete the m.Add("total_sessions", 1) call from processFile so
only processSession calls m.Add("total_sessions", 1) when processing a session).
🧹 Nitpick comments (3)
internal/backfill/extract/extract.go (1)

197-205: Дублирование логики санитизации транскрипта.

Функции BuildUserPrompt и BuildChunkUserPrompt содержат идентичную логику санитизации XML-тегов. Рекомендуется извлечь её в отдельную вспомогательную функцию.

♻️ Предлагаемый рефакторинг
+// sanitizeTranscript replaces XML tags that could break prompt boundaries.
+func sanitizeTranscript(transcript string) string {
+	return xmlTagRe.ReplaceAllStringFunc(transcript, func(tag string) string {
+		return strings.ReplaceAll(strings.ReplaceAll(tag, "<", "["), ">", "]")
+	})
+}
+
 // BuildUserPrompt constructs the user prompt for LLM extraction.
 // Transcript content is sanitized to prevent XML tag injection (AG06-005).
 func BuildUserPrompt(projectPath, gitBranch string, durationMin, exchangeCount int, chunkInfo, alreadyExtracted, transcript string) string {
-	// Replace XML tags that could break prompt boundaries with bracketed equivalents
-	sanitized := xmlTagRe.ReplaceAllStringFunc(transcript, func(tag string) string {
-		return strings.ReplaceAll(strings.ReplaceAll(tag, "<", "["), ">", "]")
-	})
+	sanitized := sanitizeTranscript(transcript)
 	return fmt.Sprintf(UserPromptTemplate, projectPath, gitBranch, durationMin, exchangeCount, chunkInfo, alreadyExtracted, sanitized)
 }

 // BuildChunkUserPrompt constructs the user prompt using ChunkExtractionUserTemplate.
 // Transcript content is sanitized to prevent XML tag injection (AG06-005).
 func BuildChunkUserPrompt(projectPath string, exchangeCount int, chunkInfo, transcript string) string {
-	// Replace XML tags that could break prompt boundaries with bracketed equivalents
-	sanitized := xmlTagRe.ReplaceAllStringFunc(transcript, func(tag string) string {
-		return strings.ReplaceAll(strings.ReplaceAll(tag, "<", "["), ">", "]")
-	})
+	sanitized := sanitizeTranscript(transcript)
 	return fmt.Sprintf(ChunkExtractionUserTemplate, projectPath, exchangeCount, chunkInfo, sanitized)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backfill/extract/extract.go` around lines 197 - 205, Функции
BuildUserPrompt и BuildChunkUserPrompt дублируют логику санитизации XML-тегов
через xmlTagRe.ReplaceAllStringFunc; вынеси эту логику в одну вспомогательную
функцию (например SanitizeTranscriptTags or sanitizeTranscript) и замени
внутренние реализации BuildUserPrompt и BuildChunkUserPrompt на вызов этой новой
функции с теми же входными данными, сохранив текущее поведение замены "<" на "["
и ">" на "]" и форматирование через ChunkExtractionUserTemplate /
соответствующий шаблон.
internal/worker/handlers_backfill.go (1)

337-346: Неиспользуемый sessionDBID.

Переменная sessionDBID присваивается, но сразу отбрасывается (_ = sessionDBID). Если значение не нужно, лучше использовать _ напрямую в присваивании. Если нужно для отладки — можно залогировать.

♻️ Предлагаемое исправление
 	// T017: Create SDK session record so backfilled sessions appear in Sessions page.
 	if s.sessionStore != nil {
-		sessionDBID, err := s.sessionStore.CreateSDKSession(r.Context(), sdkSessionID, project, "backfill")
+		_, err := s.sessionStore.CreateSDKSession(r.Context(), sdkSessionID, project, "backfill")
 		if err != nil {
 			log.Warn().Err(err).Str("session_id", req.SessionID).Msg("backfill: failed to create SDK session")
 		}
-		_ = sessionDBID
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/handlers_backfill.go` around lines 337 - 346, The local
variable sessionDBID returned from s.sessionStore.CreateSDKSession is assigned
then immediately discarded (_ = sessionDBID); remove the unused variable by
assigning the call to the blank identifier (e.g., _, err :=
s.sessionStore.CreateSDKSession(...)) or, if you need the ID for
debugging/observability, keep sessionDBID and add a log line (e.g.,
log.Debug().Str("session_db_id", sessionDBID)... ) instead of the discard;
update the call site that currently uses sessionDBID and
s.sessionStore.CreateSDKSession(r.Context(), sdkSessionID, project, "backfill")
accordingly.
internal/backfill/backfill.go (1)

308-311: Ошибки LLM-вызова в ретроспективе игнорируются молча.

При ошибке LLM-вызова функция возвращает nil без логирования. Это затрудняет диагностику проблем в production.

📝 Предлагаемое улучшение
 	xmlOutput, err := r.llm.Complete(retroCtx, extract.RetrospectiveSystemPrompt, retroPrompt)
 	if err != nil {
+		// Log but don't fail — retrospective is supplementary.
+		log.Debug().Err(err).Str("project", sess.ProjectPath).Msg("retrospective LLM call failed")
 		return nil
 	}

Потребуется добавить импорт "github.com/rs/zerolog/log".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backfill/backfill.go` around lines 308 - 311, The LLM error returned
by r.llm.Complete (called with extract.RetrospectiveSystemPrompt and
retroPrompt) is currently ignored; change the error handling to log the error
before returning nil by adding an import for "github.com/rs/zerolog/log" and
calling log.Error().Err(err).Msg("LLM Complete failed for retrospective prompt")
(include any relevant identifiers like retrospective ID or context if available)
in the if err != nil block so failures are recorded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/backfill/backfill.go`:
- Around line 112-128: The metric "total_sessions" is being incremented twice:
in processFile and again in processSession via m.Add("total_sessions", 1),
causing double counting; remove the redundant increment in processFile and let
processSession own the session counter (i.e., delete the m.Add("total_sessions",
1) call from processFile so only processSession calls m.Add("total_sessions", 1)
when processing a session).

In `@internal/backfill/extract/extract.go`:
- Around line 83-97: The ValidateXML function currently checks
XMLObservation.Type and .Outcome but not .Category; update ValidateXML to verify
that obs.Category is non-empty and exists in the ValidCategories map (using the
XMLObservation and ValidCategories symbols) and return a clear validation error
when the category is missing or not allowed (mirroring how Type/Outcome errors
are handled), so backfill rejects observations with invalid categories before
persisting.

---

Nitpick comments:
In `@internal/backfill/backfill.go`:
- Around line 308-311: The LLM error returned by r.llm.Complete (called with
extract.RetrospectiveSystemPrompt and retroPrompt) is currently ignored; change
the error handling to log the error before returning nil by adding an import for
"github.com/rs/zerolog/log" and calling log.Error().Err(err).Msg("LLM Complete
failed for retrospective prompt") (include any relevant identifiers like
retrospective ID or context if available) in the if err != nil block so failures
are recorded.

In `@internal/backfill/extract/extract.go`:
- Around line 197-205: Функции BuildUserPrompt и BuildChunkUserPrompt дублируют
логику санитизации XML-тегов через xmlTagRe.ReplaceAllStringFunc; вынеси эту
логику в одну вспомогательную функцию (например SanitizeTranscriptTags or
sanitizeTranscript) и замени внутренние реализации BuildUserPrompt и
BuildChunkUserPrompt на вызов этой новой функции с теми же входными данными,
сохранив текущее поведение замены "<" на "[" и ">" на "]" и форматирование через
ChunkExtractionUserTemplate / соответствующий шаблон.

In `@internal/worker/handlers_backfill.go`:
- Around line 337-346: The local variable sessionDBID returned from
s.sessionStore.CreateSDKSession is assigned then immediately discarded (_ =
sessionDBID); remove the unused variable by assigning the call to the blank
identifier (e.g., _, err := s.sessionStore.CreateSDKSession(...)) or, if you
need the ID for debugging/observability, keep sessionDBID and add a log line
(e.g., log.Debug().Str("session_db_id", sessionDBID)... ) instead of the
discard; update the call site that currently uses sessionDBID and
s.sessionStore.CreateSDKSession(r.Context(), sdkSessionID, project, "backfill")
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 40af082a-ea3e-465d-b759-211542b49007

📥 Commits

Reviewing files that changed from the base of the PR and between 87dbbf8 and 03c85cc.

📒 Files selected for processing (3)
  • internal/backfill/backfill.go
  • internal/backfill/extract/extract.go
  • internal/worker/handlers_backfill.go

- ImportFeedbackFiles: scans feedback_*.md, LLM re-processes each
  into TRIGGER→RULE→REASON structured observations
- ParseSingleObservation: parses bare <observation> XML from LLM
- cmd/engram-import: CLI with import-feedback (--dry-run) and
  purge-rebuild subcommands
- POST /api/maintenance/purge-rebuild: selective purge (truncate
  derived tables, delete non-manual/non-credential observations)
  with dry_run=true preview mode
@thebtf thebtf merged commit 156081b into main Mar 23, 2026
1 of 2 checks passed
thebtf added a commit that referenced this pull request Apr 12, 2026
Binary naming consistency: engram (local client) + engram-server (remote server).

Updated: AGENTS.md, .goreleaser.yaml, Dockerfile, docs/arch/*, scripts/uninstall.sh.
internal/worker/ package NOT renamed (Go embed path, separate concern).

Closes #43.
thebtf added a commit that referenced this pull request Apr 12, 2026
Binary naming consistency: engram (local client) + engram-server (remote server).

Updated: AGENTS.md, .goreleaser.yaml, Dockerfile, docs/arch/*, scripts/uninstall.sh.
internal/worker/ package NOT renamed (Go embed path, separate concern).

Closes #43.
thebtf added a commit that referenced this pull request Apr 12, 2026
Binary naming consistency: engram (local client) + engram-server (remote server).

Updated: AGENTS.md, .goreleaser.yaml, Dockerfile, docs/arch/*, scripts/uninstall.sh.
internal/worker/ package NOT renamed (Go embed path, separate concern).

Closes #43.

Co-authored-by: Kirill Turanskiy <thebtf@users.noreply.github.com>
@thebtf thebtf deleted the worktree-extraction-quality-v2 branch May 7, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant