feat: causal chain linking — follows + prompted_by (FR-4, FR-5)#56
Conversation
PreCompact hook was created but never registered in hooks.json. Now registered with 10s timeout. Hook writes discovery data to .agent/pre-compact-discovery.json for empirical testing of available input fields (transcript_path verification for FR-2).
Three-tier injection system: observations tagged with concept "always-inject" are now fetched independently of similarity matching and included in every session (session-start) and every prompt (user-prompt) context. Server changes: - GetAlwaysInjectObservations query (concepts @> GIN index) - GIN indexes on concepts, files_modified, files_read columns - Migration 048 for all new indexes - handleContextInject + handleSearchByPrompt return always_inject array - AlwaysInjectLimit (default 20) and ProjectInjectLimit (default 15) config Hook changes: - session-start.js renders <user-behavior-rules> block before <engram-context> - user-prompt.js merges always-inject + similarity-matched rules with dedup - Plugin version bumped to 0.6.0 Also adds GetObservationsByFile and GetPreviousObservationInSession queries for Phase 3 and Phase 4 (no callers yet).
Reads transcript JSONL at compaction time, parses all user/assistant messages, and sends to /api/backfill/session in chunks of 50 messages. Fire-and-forget with 5s timeout per chunk (Constitution Principle 3). Fallback: if input.transcript_path is missing, derives path by searching ~/.claude/projects/<hash>/<session>.jsonl. Also writes discovery report to .agent/pre-compact-discovery.json for empirical verification of available hook input fields.
New hook and endpoint for automatic file-specific knowledge injection before Edit/Write operations. Server: - GET /api/context/by-file endpoint (handlers_context_file.go) - Returns observations matching files_modified/files_read - Graceful degradation: empty response on error (NFR-3) Hook: - pre-tool-use.js matches Edit/Write tools only - Extracts file_path, queries /api/context/by-file - Returns <file-context> XML block as systemMessage - 200ms timeout with empty fallback - Registered in hooks.json with "Edit|Write" matcher
…R-5) Observations within the same session are now automatically linked: - "follows" relation: connects consecutive observations by prompt_number - "prompted_by" relation: links observation to the user prompt that triggered it Both relations are created via pure DB queries (< 10ms overhead per observation, NFR-4) during the existing relation detection pipeline. Changes: - relation/detector.go: add temporal + prompt linking before similarity search - prompt_store.go: add GetPromptForObservation query - service.go: pass promptStore to NewDetector constructor
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (14)
WalkthroughДобавлены конфигурируемые пределы инъекций наблюдений, включая новые методы хранилища для выборки наблюдений по тегам и файлам, интеграция с отслеживанием сессий через отношения, миграция базы данных с GIN-индексами и обновление обработчиков контекста с новыми конечными точками для фильтрации наблюдений. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler as Handler:<br/>handleContextInject
participant ObsStore as ObservationStore
participant DB as PostgreSQL
participant Response
Client->>Handler: GET /api/context/inject?session=X
Handler->>ObsStore: GetRecentObservations(limit)
ObsStore->>DB: Query observations
DB-->>ObsStore: Recent observations
ObsStore-->>Handler: Recent obs results
Handler->>ObsStore: GetAlwaysInjectObservations(limit)
ObsStore->>DB: Query WHERE concepts @> ["always-inject"]
DB-->>ObsStore: Always-inject observations
ObsStore-->>Handler: Always-inject obs results
Handler->>Handler: Deduplicate against recent
Handler->>Response: JSON with recent + always_inject
Response-->>Client: observations, always_inject array
sequenceDiagram
participant Client
participant Handler as Handler:<br/>handleContextByFile
participant ObsStore as ObservationStore
participant DB as PostgreSQL
participant Response
Client->>Handler: GET /api/context/by-file?path=/src/main.js&limit=10
Handler->>Handler: Validate path parameter
Handler->>ObsStore: GetObservationsByFile(path, limit)
ObsStore->>DB: Query files_modified @> OR files_read @>
DB-->>ObsStore: Filtered observations
ObsStore-->>Handler: Observations list
Handler->>Response: JSON observations + total count
Response-->>Client: observations, total
sequenceDiagram
participant Detector as Detector.Detect()
participant ObsStore as ObservationStore
participant PromptStore as PromptStore
participant RelStore as RelationStore
participant DB as PostgreSQL
Detector->>ObsStore: Load observation
ObsStore->>DB: Query observation by ID
DB-->>ObsStore: Observation + sdk_session_id
ObsStore-->>Detector: Observation object
Detector->>Detector: Check if in SDK session
Detector->>ObsStore: GetPreviousObservationInSession()
ObsStore->>DB: Query prev obs in session
DB-->>ObsStore: Previous observation
ObsStore-->>Detector: Previous obs result
Detector->>RelStore: Store "follows" relation
RelStore->>DB: INSERT relation
Detector->>PromptStore: GetPromptForObservation()
PromptStore->>DB: Query prompt by session+number
DB-->>PromptStore: Prompt ID
PromptStore-->>Detector: Prompt ID result
Detector->>RelStore: Store "prompted_by" relation
RelStore->>DB: INSERT relation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's ability to understand and utilize contextual information by introducing causal chain linking and more dynamic context injection mechanisms. It establishes temporal relationships between observations within a session and links observations directly to their originating user prompts. Furthermore, it improves context relevance by allowing 'always-inject' observations and providing file-specific context during tool usage, all while maintaining performance through optimized database indexing and transcript backfill. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces new configuration options for observation injection limits, database migrations for improved indexing, and new data store methods to retrieve "always-inject" and file-specific observations, as well as temporal session data. These features are integrated into the relation detector for linking observations and prompts, and into the worker service to expose a new /api/context/by-file endpoint and enhance context injection. New plugin hooks are added to leverage these capabilities for tool-specific context and transcript backfilling. The review identified several areas for improvement, including addressing an unsafe JSON string construction for database queries, refactoring duplicated logic for environment variable parsing and conditional checks, ensuring consistent error handling in database migrations and record lookups, replacing hardcoded "magic numbers" with constants for limits and timeouts, completing an XML escaping utility, making ignored file paths configurable, and improving type safety for empty slice returns.
| fileJSON := fmt.Sprintf(`["%s"]`, filePath) | ||
| err := s.db.WithContext(ctx). | ||
| Scopes(activeObservationFilter(), importanceOrdering()). | ||
| Where("files_modified @> ? OR files_read @> ?", fileJSON, fileJSON). |
There was a problem hiding this comment.
Manually constructing a JSON string using fmt.Sprintf is unsafe. If filePath contains characters like a double quote ("), it will result in invalid JSON, causing a database error. This could be a vector for injection attacks, although the risk is lower since it's a parameter. To ensure safety and correctness, you should use json.Marshal to create the JSON string.
import "encoding/json"
// ... inside GetObservationsByFile
fileList := []string{filePath}
fileJSON, err := json.Marshal(fileList)
if err != nil {
return nil, err
}
err = s.db.WithContext(ctx).
Scopes(activeObservationFilter(), importanceOrdering()).
Where("files_modified @> ? OR files_read @> ?", string(fileJSON), string(fileJSON)).| if v := strings.TrimSpace(os.Getenv("ENGRAM_ALWAYS_INJECT_LIMIT")); v != "" { | ||
| if n, err := strconv.Atoi(v); err == nil && n > 0 { | ||
| cfg.AlwaysInjectLimit = n | ||
| } | ||
| } | ||
| if v := strings.TrimSpace(os.Getenv("ENGRAM_PROJECT_INJECT_LIMIT")); v != "" { | ||
| if n, err := strconv.Atoi(v); err == nil && n > 0 { | ||
| cfg.ProjectInjectLimit = n | ||
| } | ||
| } |
There was a problem hiding this comment.
There's duplicated logic for parsing integer values from environment variables. This pattern is repeated for ENGRAM_ALWAYS_INJECT_LIMIT and ENGRAM_PROJECT_INJECT_LIMIT. To improve maintainability and reduce code duplication, consider extracting this logic into a helper function.
func loadIntFromEnv(key string, dest *int) {
if v := strings.TrimSpace(os.Getenv(key)); v != "" {
if n, err := strconv.Atoi(v); err == nil && n > 0 {
*dest = n
}
}
}
// In Load():
// ...
loadIntFromEnv("ENGRAM_ALWAYS_INJECT_LIMIT", &cfg.AlwaysInjectLimit)
loadIntFromEnv("ENGRAM_PROJECT_INJECT_LIMIT", &cfg.ProjectInjectLimit)| Rollback: func(tx *gorm.DB) error { | ||
| tx.Exec(`DROP INDEX IF EXISTS idx_observations_concepts_gin`) | ||
| tx.Exec(`DROP INDEX IF EXISTS idx_observations_files_modified_gin`) | ||
| tx.Exec(`DROP INDEX IF EXISTS idx_observations_files_read_gin`) | ||
| tx.Exec(`DROP INDEX IF EXISTS idx_observations_session_prompt`) | ||
| return nil |
There was a problem hiding this comment.
The Rollback function executes DROP INDEX commands but doesn't check for errors. While DROP INDEX IF EXISTS is generally safe, it's best practice to handle errors consistently. If an error occurs (e.g., due to permissions), it will be silently ignored. This is inconsistent with the Migrate function, which does check for errors.
Rollback: func(tx *gorm.DB) error {
if err := tx.Exec(`DROP INDEX IF EXISTS idx_observations_concepts_gin`).Error; err != nil {
return err
}
if err := tx.Exec(`DROP INDEX IF EXISTS idx_observations_files_modified_gin`).Error; err != nil {
return err
}
if err := tx.Exec(`DROP INDEX IF EXISTS idx_observations_files_read_gin`).Error; err != nil {
return err
}
return tx.Exec(`DROP INDEX IF EXISTS idx_observations_session_prompt`).Error
},| if err != nil { | ||
| return 0, err | ||
| } |
There was a problem hiding this comment.
The function returns gorm.ErrRecordNotFound when no prompt is found. The caller then logs this as an error, which could create unnecessary noise if it's a common scenario. It's better to handle this specific error within the function and return 0, nil to indicate that no record was found, which is not an exceptional state.
if err != nil {
if err == gorm.ErrRecordNotFound {
return 0, nil
}
return 0, err
}| if obs.SDKSessionID != "" && obs.PromptNumber.Valid && obs.PromptNumber.Int64 > 0 { | ||
| promptNum := int(obs.PromptNumber.Int64) | ||
| prevObs, prevErr := d.observationStore.GetPreviousObservationInSession(ctx, obs.SDKSessionID, promptNum) | ||
| if prevErr != nil { | ||
| log.Debug().Err(prevErr).Int64("obs_id", obsID).Msg("Temporal chain lookup failed") | ||
| } else if prevObs != nil { | ||
| rel := &models.ObservationRelation{ | ||
| SourceID: prevObs.ID, | ||
| TargetID: obsID, | ||
| RelationType: "follows", | ||
| Confidence: 1.0, | ||
| } | ||
| if _, storeErr := d.relationStore.StoreRelation(ctx, rel); storeErr != nil { | ||
| log.Debug().Err(storeErr).Int64("from", prevObs.ID).Int64("to", obsID).Msg("Failed to store follows relation") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // 1b. Prompt-observation linking: create "prompted_by" relation to triggering user prompt (FR-5) | ||
| if obs.SDKSessionID != "" && obs.PromptNumber.Valid && obs.PromptNumber.Int64 > 0 && d.promptStore != nil { |
There was a problem hiding this comment.
The condition obs.SDKSessionID != "" && obs.PromptNumber.Valid && obs.PromptNumber.Int64 > 0 and the variable promptNum are duplicated in the blocks for temporal chain linking and prompt-observation linking. This can be refactored to reduce duplication and improve readability.
if obs.SDKSessionID != "" && obs.PromptNumber.Valid && obs.PromptNumber.Int64 > 0 {
promptNum := int(obs.PromptNumber.Int64)
// 1a. Temporal chain linking: create "follows" relation to previous observation in same session (FR-4)
prevObs, prevErr := d.observationStore.GetPreviousObservationInSession(ctx, obs.SDKSessionID, promptNum)
if prevErr != nil {
log.Debug().Err(prevErr).Int64("obs_id", obsID).Msg("Temporal chain lookup failed")
} else if prevObs != nil {
rel := &models.ObservationRelation{
SourceID: prevObs.ID,
TargetID: obsID,
RelationType: "follows",
Confidence: 1.0,
}
if _, storeErr := d.relationStore.StoreRelation(ctx, rel); storeErr != nil {
log.Debug().Err(storeErr).Int64("from", prevObs.ID).Int64("to", obsID).Msg("Failed to store follows relation")
}
}
// 1b. Prompt-observation linking: create "prompted_by" relation to triggering user prompt (FR-5)
if d.promptStore != nil {| writeJSON(w, map[string]any{ | ||
| "observations": []*struct{}{}, | ||
| "total": 0, | ||
| }) |
There was a problem hiding this comment.
When an error occurs, you return []*struct{}{}. While this marshals to an empty JSON array, it's better for type safety and code clarity to use the correct slice type, which is []*models.Observation{}.
| writeJSON(w, map[string]any{ | |
| "observations": []*struct{}{}, | |
| "total": 0, | |
| }) | |
| writeJSON(w, map[string]any{ | |
| "observations": []*models.Observation{}, | |
| "total": 0, | |
| }) |
| function escapeXmlTags(text) { | ||
| if (typeof text !== 'string') return ''; | ||
| return text.replace(/</g, '<').replace(/>/g, '>'); | ||
| } |
There was a problem hiding this comment.
The escapeXmlTags function is incomplete as it doesn't handle ampersands (&). Text containing & will be passed through as-is, which can cause issues if the consumer expects fully escaped XML/HTML entities. You should also replace & with &.
Additionally, this function seems like a utility that could be used by other hooks. Consider moving it to lib.js to promote reuse and avoid duplication.
| function escapeXmlTags(text) { | |
| if (typeof text !== 'string') return ''; | |
| return text.replace(/</g, '<').replace(/>/g, '>'); | |
| } | |
| function escapeXmlTags(text) { | |
| if (typeof text !== 'string') return ''; | |
| return text.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>'); | |
| } |
| if (filePath.includes('/tmp/') || filePath.includes('\\Temp\\') || filePath.includes('node_modules')) { | ||
| return ''; | ||
| } |
There was a problem hiding this comment.
The list of ignored paths (/tmp/, \Temp\, node_modules) is hardcoded and may be incomplete. This approach is brittle and might require frequent updates as new patterns for temporary or dependency folders emerge. Consider making this list configurable or using a more robust method to identify non-project files.
| const params = new URLSearchParams({ path: filePath, limit: '10' }); | ||
| if (project) params.set('project', project); | ||
| const result = await lib.requestGet(`/api/context/by-file?${params.toString()}`, 200); |
There was a problem hiding this comment.
The limit (10) and request timeout (200) are hardcoded. These magic numbers should be defined as named constants at the top of the file to improve readability and maintainability.
const FILE_CONTEXT_LIMIT = 10;
const FILE_CONTEXT_TIMEOUT_MS = 200;
// ...
async function handlePreToolUse(ctx, input) {
// ...
const params = new URLSearchParams({ path: filePath, limit: FILE_CONTEXT_LIMIT.toString() });
if (project) params.set('project', project);
const result = await lib.requestGet(`/api/context/by-file?${params.toString()}`, FILE_CONTEXT_TIMEOUT_MS);| const narrative = escapeXmlTags(rule.narrative); | ||
| behaviorRulesBlock += '# Behavioral Rules (Always Active)\n'; | ||
| behaviorRulesBlock += 'These rules are injected unconditionally. Follow them.\n\n'; | ||
| for (const rule of uniqueRules.slice(0, 20)) { |
There was a problem hiding this comment.
The slice limit 20 is a magic number. It should be defined as a constant at the top of the file to improve readability and make it easier to modify in the future.
| for (const rule of uniqueRules.slice(0, 20)) { | |
| const MAX_BEHAVIOR_RULES = 20; | |
| // ... | |
| for (const rule of uniqueRules.slice(0, MAX_BEHAVIOR_RULES)) { |
Summary
followsrelationprompted_byTest plan
followsrelationprompted_bylinks to correct user promptSummary by CodeRabbit
Новые возможности
Служебные