security: auth consolidation, vault singleton, cwd hardening, LLM prompt sanitization#10
Conversation
…9-003) S5-004: Add vault singleton to Service struct with sync.Once initialization. Replace per-call crypto.NewVault(config.Get()) in vaultStoreDetectedSecrets with s.getVault(), eliminating vault fragmentation and redundant key reads. S9-003: Override client-provided cwd to empty string in both HTTP handlers (handleSearchByPrompt, handleContextInject) before any filesystem operations. The server has no access to client filesystems; cwd-based mtime checks are client-side concerns only. Prevents filesystem probing via cwd parameter.
Strip XML boundary tags from transcript content before embedding in LLM prompts to prevent prompt injection via crafted session content.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughДобавлены санация XML-подобных тегов в сборке промпта для backfill, ленивый singleton-доступ к Vault через Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Middleware
participant Service
participant Handlers
participant Vault
Client->>Middleware: HTTP запрос (с токеном/без)
Middleware->>Service: проверка токена
alt токен отсутствует/невалиден
Middleware-->>Client: 401 / warning лог
else токен валиден или аутентификация отключена
Middleware->>Service: передать запрос
Service->>Handlers: маршрутизация (admin/общий)
Handlers->>Service: запрошен доступ к Vault
Service->>Vault: s.getVault() (ленивая инициализация)
alt Vault инициализирован
Vault-->>Service: возвращает доступ
Service-->>Handlers: отдаёт Vault/секреты
Handlers-->>Client: ответ (с учётом секретов)
else Vault недоступен
Service-->>Handlers: возвращает ошибку/nil
Handlers-->>Client: ответ без Vault (лог/предупреждение)
end
end
Estimated code review effort🎯 3 (Умеренная) | ⏱️ ~25 минут Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the application's security posture by addressing multiple findings from a recent security audit. The changes focus on centralizing authentication for sensitive API endpoints, improving the secure handling of cryptographic operations through a vault singleton, mitigating filesystem probing risks, and preventing prompt injection vulnerabilities in LLM interactions. Additionally, it introduces clearer warnings and logging for security-related events. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
@coderabbitai review focus:security |
|
@gemini-code-assist review focus:security |
|
@codex review |
|
Conduct a thorough security-focused review. Pay special attention to: (1) whether all admin/sensitive endpoints are properly protected behind authentication middleware, (2) correctness of the vault singleton implementation using ✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Code Review
This pull request introduces several important security enhancements and refactorings. The consolidation of admin endpoints under a single authentication middleware is a significant improvement, as is the introduction of a singleton pattern for the vault service, which improves both performance and resource management. The hardening of HTTP handlers by ignoring client-provided cwd and the sanitization of LLM prompts are also valuable security fixes. My review includes one suggestion to make the prompt sanitization logic more robust using a regular expression.
| // Strip XML-like tags that could break the prompt boundary | ||
| sanitized := strings.ReplaceAll(transcript, "</session_transcript>", "[/session_transcript]") | ||
| sanitized = strings.ReplaceAll(sanitized, "</observations>", "[/observations]") | ||
| sanitized = strings.ReplaceAll(sanitized, "<observation>", "[observation]") | ||
| return fmt.Sprintf(UserPromptTemplate, projectPath, gitBranch, durationMin, exchangeCount, chunkInfo, alreadyExtracted, sanitized) |
There was a problem hiding this comment.
The current sanitization logic is a good step towards preventing prompt injection, but it's a bit brittle as it only handles three specific hardcoded tags. A more robust approach would be to use a regular expression to replace any XML-like tags. This would handle variations in spacing, attributes, and other tags that might be present in the transcript, providing stronger protection against prompt boundary confusion.
For example, you could use a single regex to handle all cases. Note that you will need to add regexp to your imports.
// It's good practice to compile regexes once at the package level.
var xmlTagRegex = regexp.MustCompile(`<([^>]+)>`)
// ... in BuildUserPrompt ...
sanitized := xmlTagRegex.ReplaceAllString(transcript, "[$1]")This would replace <tag> with [tag] and </tag> with [/tag], which is more comprehensive and maintainable.
| // Strip XML-like tags that could break the prompt boundary | |
| sanitized := strings.ReplaceAll(transcript, "</session_transcript>", "[/session_transcript]") | |
| sanitized = strings.ReplaceAll(sanitized, "</observations>", "[/observations]") | |
| sanitized = strings.ReplaceAll(sanitized, "<observation>", "[observation]") | |
| return fmt.Sprintf(UserPromptTemplate, projectPath, gitBranch, durationMin, exchangeCount, chunkInfo, alreadyExtracted, sanitized) | |
| // Strip XML-like tags that could break the prompt boundary. | |
| // A regex is more robust than replacing specific hardcoded tags. | |
| re := regexp.MustCompile(`<([^>]+)>`) | |
| sanitized := re.ReplaceAllString(transcript, "[$1]") | |
| return fmt.Sprintf(UserPromptTemplate, projectPath, gitBranch, durationMin, exchangeCount, chunkInfo, alreadyExtracted, sanitized) |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/backfill/extract/extract.go`:
- Around line 179-183: Sanitization currently only replaces two closing tags and
misses critical tags and case variations; update the sanitization that assigns
sanitized from transcript (the sanitized variable used when formatting
UserPromptTemplate) to replace all of these tags: <session_transcript>,
</session_transcript>, <observations>, </observations>, <observation>, and
</observation>, and perform the replacements in a case-insensitive manner (e.g.,
use regexp with the (?i) flag or equivalent) so tags like </SESSION_TRANSCRIPT>
are handled consistently; ensure the code replaces the original tags with safe
bracketed forms before calling fmt.Sprintf(UserPromptTemplate, ...) and remove
the old incomplete ReplaceAll calls.
In `@internal/worker/service.go`:
- Around line 1568-1606: Remove the redundant middleware application inside the
admin routes group: delete the r.Use(s.tokenAuth.Middleware) call in the
s.router.Group block so the group no longer re-applies authentication already
set up globally in setupMiddleware(); leave the route registrations (e.g.,
s.handleRebuildStatus, s.handleTriggerVectorRebuild, s.handleVectorHealth,
s.handleVectorMetrics, s.handleGraphStats, s.handleUpdateCheck,
s.handleUpdateApply, s.handleUpdateStatus, s.handleUpdateRestart,
s.handleRestart, s.handleSelfCheck, s.sseBroadcaster.HandleSSE, s.handleGetLogs,
s.handleInstinctsImport, s.handleBackfillIngest, s.handleBackfillSession,
s.handleBackfillStatus) intact and rely on the global tokenAuth.Middleware and
its ExemptPaths mechanism (defined in middleware.go) for correct authentication
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1ded6ade-db32-4c14-bb35-3ea7df2ed639
📒 Files selected for processing (5)
internal/backfill/extract/extract.gointernal/worker/handlers_backfill.gointernal/worker/handlers_context.gointernal/worker/middleware.gointernal/worker/service.go
| // Strip XML-like tags that could break the prompt boundary | ||
| sanitized := strings.ReplaceAll(transcript, "</session_transcript>", "[/session_transcript]") | ||
| sanitized = strings.ReplaceAll(sanitized, "</observations>", "[/observations]") | ||
| sanitized = strings.ReplaceAll(sanitized, "<observation>", "[observation]") | ||
| return fmt.Sprintf(UserPromptTemplate, projectPath, gitBranch, durationMin, exchangeCount, chunkInfo, alreadyExtracted, sanitized) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What XML tag injection techniques exist for LLM prompt injection attacks?
💡 Result:
Common XML tag–based prompt-injection techniques (seen when apps “structure” prompts with XML-like delimiters and then concatenate untrusted text into those sections) include:
- Tag boundary break-out (early close + new role/tag): attacker input includes a closing tag for the “user/content” region and then opens an “assistant/system/instructions” region so the model treats attacker text as higher-priority instructions (the classic XML-tag boundary exploitation pattern). [1]
- Tag spoofing / role impersonation: attacker wraps their content in tags that look like the application’s control tags (e.g., “instructions”, “system”, “assistant”), betting the model will follow them even if the app didn’t intend to grant that authority. This risk exists specifically because XML tags are recommended/used as separators in some LLM prompt formats. [2][3]
- Parser/model mismatch: the application may parse
<message>(or similar) tags into chat messages; if the app’s parsing/validation is lax, attacker-controlled tags can be interpreted as additional messages/roles (message-tag injection). [4] - Encoding & normalization bypasses around
</>: use HTML entities, Unicode lookalikes, or zero-width characters so input evades naïve “block angle brackets” filters, but is later decoded/normalized (or still “understood” by the model as tags). [1] - “Hidden instruction” zones inside XML constructs: put instructions in places humans/tools may overlook but models may still heed, such as comments (
<!-- ... -->), CDATA sections, long attribute values, or deeply nested structures (often combined with “ignore previous instructions”–style directives). (General prompt-injection pattern; XML is just the carrier.) [5] - Tag collision with orchestration templates: when platforms/frameworks use fixed XML sections like
<examples>,<instructions>, etc., attacker input that collides with those boundaries can cause the model to re-interpret where examples/instructions begin/end. [2][3]
A common mitigation specifically discussed for XML-structured prompts is “salted tags” (session-specific tag names) to reduce tag spoofing/collision. [6]
Sources: [1] dataset write-up of “Anthropic XML Tag Boundary Exploitation” pattern; [2] Anthropic XML-tag prompting guidance; [3] AWS Bedrock advanced prompt templates mentioning XML tags; [4] Microsoft Semantic Kernel note on parsing <message> tags and controlling injection; [5] OWASP Prompt Injection overview; [6] AWS prescriptive guidance PDF describing salted tags. [1][2][3][4][5][6]
🏁 Script executed:
# Find and examine the extract.go file
find . -name "extract.go" -path "*/internal/backfill/extract/*" -type fRepository: thebtf/engram
Length of output: 95
🏁 Script executed:
# Once located, examine the full BuildUserPrompt function and UserPromptTemplate
cat -n internal/backfill/extract/extract.go | head -250 | tail -100Repository: thebtf/engram
Length of output: 4261
🏁 Script executed:
# Search for UserPromptTemplate definition
rg "UserPromptTemplate\s*=" internal/backfill/extract/Repository: thebtf/engram
Length of output: 203
🏁 Script executed:
# Check if there are opening tags that need sanitization
rg "<session_transcript>|<observations>|<observation>" internal/backfill/extract/extract.goRepository: thebtf/engram
Length of output: 185
🏁 Script executed:
# Get the full UserPromptTemplate definition
rg "const UserPromptTemplate" -A 100 internal/backfill/extract/extract.goRepository: thebtf/engram
Length of output: 2962
🏁 Script executed:
# Check where and how the tags are used in the template
rg "<session_transcript>|</session_transcript>|<observations>|</observations>|<observation>|</observation>" -n internal/backfill/extract/extract.goRepository: thebtf/engram
Length of output: 447
🏁 Script executed:
# Search for tests related to BuildUserPrompt or sanitization
fd "test" internal/backfill/extract/ --type fRepository: thebtf/engram
Length of output: 39
🏁 Script executed:
# Search for AG06-005 reference in code or docs
rg "AG06-005" -B 3 -A 3Repository: thebtf/engram
Length of output: 757
🏁 Script executed:
# Check if there are any examples or comments about the threat model
rg "prompt injection|sanitiz|transcript" internal/backfill/extract/extract.go -B 2 -A 2Repository: thebtf/engram
Length of output: 1595
Неполная санитизация XML-тегов: отсутствуют критические открывающие и закрывающие теги.
Текущая реализация защищает от разрыва границ </session_transcript> и </observations>, но пропускает важные векторы атаки:
- Отсутствует
<session_transcript>— инъекция могла бы создать вложенную структуру и запутать парсер LLM - Отсутствует
</observation>— инъекция могла бы преждевременно закрыть наблюдение - Регистронезависимость не реализована —
</SESSION_TRANSCRIPT>или</Session_Transcript>не будут заменены
Кроме того, предложенный в коде подход с циклом неэффективен и может привести к проблемам с Юникодом.
Рекомендуется использовать простой и надёжный подход, например регулярные выражения с флагом регистронезависимости или тщательную обработку без циклов:
// Более надёжный вариант с использованием простых замен
var tagsToReplace = [][2]string{
{"<session_transcript>", "[session_transcript]"},
{"</session_transcript>", "[/session_transcript]"},
{"<observations>", "[observations]"},
{"</observations>", "[/observations]"},
{"<observation>", "[observation]"},
{"</observation>", "[/observation]"},
}
sanitized := transcript
for _, pair := range tagsToReplace {
sanitized = strings.ReplaceAll(sanitized, pair[0], pair[1])
}Для обработки разных регистров требуется более тщательное решение (например, через регулярные выражения с флагом (?i)).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/backfill/extract/extract.go` around lines 179 - 183, Sanitization
currently only replaces two closing tags and misses critical tags and case
variations; update the sanitization that assigns sanitized from transcript (the
sanitized variable used when formatting UserPromptTemplate) to replace all of
these tags: <session_transcript>, </session_transcript>, <observations>,
</observations>, <observation>, and </observation>, and perform the replacements
in a case-insensitive manner (e.g., use regexp with the (?i) flag or equivalent)
so tags like </SESSION_TRANSCRIPT> are handled consistently; ensure the code
replaces the original tags with safe bracketed forms before calling
fmt.Sprintf(UserPromptTemplate, ...) and remove the old incomplete ReplaceAll
calls.
There was a problem hiding this comment.
Code Review
This pull request introduces several significant security enhancements. The consolidation of administrative endpoints under a new authenticated route group is a great step forward. The introduction of a vault singleton improves performance and resource management. Additionally, the hardening of CWD handling and LLM prompt sanitization effectively address potential vulnerabilities. My review includes a suggestion to improve the robustness of the prompt sanitization and points out a potential redundancy in the application of authentication middleware.
| sanitized := strings.ReplaceAll(transcript, "</session_transcript>", "[/session_transcript]") | ||
| sanitized = strings.ReplaceAll(sanitized, "</observations>", "[/observations]") | ||
| sanitized = strings.ReplaceAll(sanitized, "<observation>", "[observation]") |
There was a problem hiding this comment.
The current sanitization logic uses a denylist of specific tags, which is brittle and incomplete. For example, it misses the opening <observations> tag and the closing </observation> tag, which could still allow for prompt injection.
Using strings.Replacer is also more efficient and maintainable than chaining strings.ReplaceAll calls.
sanitized := strings.NewReplacer(
"</session_transcript>", "[/session_transcript]",
"<observations>", "[observations]",
"</observations>", "[/observations]",
"<observation>", "[observation]",
"</observation>", "[/observation]",
).Replace(transcript)| s.router.Post("/api/update/restart", s.handleUpdateRestart) | ||
| // Admin/management routes — require authentication, work before DB is ready | ||
| s.router.Group(func(r chi.Router) { | ||
| r.Use(s.tokenAuth.Middleware) |
There was a problem hiding this comment.
The s.tokenAuth.Middleware appears to be already applied globally to the router in the setupMiddleware function. Applying it again within this group is redundant. If the intention is to only authenticate these admin routes, the global middleware should be removed. If the global middleware is intended, then this line is unnecessary.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/backfill/extract/extract.go (1)
177-183:⚠️ Potential issue | 🟠 MajorНеполная санация boundary-тегов оставляет обход prompt-инъекции.
Сейчас фильтруются только
</session_transcript>,</observations>и<observation>. Остаются<session_transcript>,<observations>,</observation>и варианты регистра (например,</SESSION_TRANSCRIPT>), поэтому разрыв XML-границ всё ещё возможен.💡 Предложение правки
import ( "encoding/xml" "fmt" + "regexp" "strings" @@ ) + +var promptBoundaryTagRE = regexp.MustCompile(`(?i)</?(session_transcript|observations|observation)>`) @@ func BuildUserPrompt(projectPath, gitBranch string, durationMin, exchangeCount int, chunkInfo, alreadyExtracted, transcript string) string { - // Strip XML-like tags that could break the prompt boundary - sanitized := strings.ReplaceAll(transcript, "</session_transcript>", "[/session_transcript]") - sanitized = strings.ReplaceAll(sanitized, "</observations>", "[/observations]") - sanitized = strings.ReplaceAll(sanitized, "<observation>", "[observation]") + // Strip XML-like tags that could break the prompt boundary + sanitized := promptBoundaryTagRE.ReplaceAllStringFunc(transcript, func(tag string) string { + switch strings.ToLower(tag) { + case "<session_transcript>": + return "[session_transcript]" + case "</session_transcript>": + return "[/session_transcript]" + case "<observations>": + return "[observations]" + case "</observations>": + return "[/observations]" + case "<observation>": + return "[observation]" + case "</observation>": + return "[/observation]" + default: + return tag + } + }) return fmt.Sprintf(UserPromptTemplate, projectPath, gitBranch, durationMin, exchangeCount, chunkInfo, alreadyExtracted, sanitized) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backfill/extract/extract.go` around lines 177 - 183, BuildUserPrompt's current sanitization only replaces a few exact-tag strings and misses opening tags, closing </observation>, and different casing, leaving a prompt-injection vector; update the sanitization of the transcript variable in BuildUserPrompt to normalize/escape all relevant XML-like tags case-insensitively (at minimum both opening and closing variants of session_transcript, observations, and observation) — e.g., perform case-insensitive replacements or use a regex that matches (?i)</?session_transcript>, (?i)</?observations>, (?i)</?observation> and replace them with safe bracketed or escaped equivalents so no angle-bracket boundaries remain in the final formatted prompt.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/worker/handlers_context.go`:
- Around line 780-784: После присвоения cwd = "" функция handleContextInject всё
ещё вызывает sdk.GetFileMtimes(...) и выполняет проверки mtime на сервере;
оградите все вызовы sdk.GetFileMtimes в handleContextInject условием, например
if cwd != "" { ... }, чтобы эти блоки выполнялись только на клиенте; обновите
связанные ветви (вызовы sdk.GetFileMtimes и последующую логику обработки
результатов) чтобы полностью пропускать mtime-проверки и обработку
ошибок/логирования на сервере; проверьте все вхождения sdk.GetFileMtimes в
handleContextInject и внесите одинаковую защиту.
---
Duplicate comments:
In `@internal/backfill/extract/extract.go`:
- Around line 177-183: BuildUserPrompt's current sanitization only replaces a
few exact-tag strings and misses opening tags, closing </observation>, and
different casing, leaving a prompt-injection vector; update the sanitization of
the transcript variable in BuildUserPrompt to normalize/escape all relevant
XML-like tags case-insensitively (at minimum both opening and closing variants
of session_transcript, observations, and observation) — e.g., perform
case-insensitive replacements or use a regex that matches
(?i)</?session_transcript>, (?i)</?observations>, (?i)</?observation> and
replace them with safe bracketed or escaped equivalents so no angle-bracket
boundaries remain in the final formatted prompt.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7426233b-bbab-4d63-bcfb-07275411e746
📒 Files selected for processing (5)
internal/backfill/extract/extract.gointernal/worker/handlers_backfill.gointernal/worker/handlers_context.gointernal/worker/middleware.gointernal/worker/service.go
| // Server-side: ignore client-provided cwd to prevent filesystem probing (S9-003). | ||
| // File mtime staleness checks are only meaningful on the client; the server has no | ||
| // access to client filesystems. | ||
| cwd = "" | ||
|
|
There was a problem hiding this comment.
После cwd = "" сервер всё ещё выполняет mtime-проверки файлов.
Несмотря на новый комментарий, в handleContextInject вызовы sdk.GetFileMtimes(...) остаются активными (Line 828, Line 916, Line 947) без условия cwd != "". Это противоречит заявленной модели и оставляет нежелательный server-side probing путь.
💡 Предложение правки
- if len(obs.FileMtimes) > 0 {
+ if len(obs.FileMtimes) > 0 && cwd != "" {
@@
- if len(obs.FileMtimes) > 0 {
+ if len(obs.FileMtimes) > 0 && cwd != "" {
@@
- if len(obs.FileMtimes) > 0 {
+ if len(obs.FileMtimes) > 0 && cwd != "" {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/worker/handlers_context.go` around lines 780 - 784, После присвоения
cwd = "" функция handleContextInject всё ещё вызывает sdk.GetFileMtimes(...) и
выполняет проверки mtime на сервере; оградите все вызовы sdk.GetFileMtimes в
handleContextInject условием, например if cwd != "" { ... }, чтобы эти блоки
выполнялись только на клиенте; обновите связанные ветви (вызовы
sdk.GetFileMtimes и последующую логику обработки результатов) чтобы полностью
пропускать mtime-проверки и обработку ошибок/логирования на сервере; проверьте
все вхождения sdk.GetFileMtimes в handleContextInject и внесите одинаковую
защиту.
…ndant auth middleware - AG06-005: replace brittle denylist with regex matching all prompt-boundary XML tags - Remove duplicate tokenAuth.Middleware in admin route group (already applied globally)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/backfill/extract/extract.go (1)
15-15:⚠️ Potential issue | 🟠 MajorСанитизация остаётся обходной из-за чувствительности regex к регистру.
На Line 15 паттерн не включает
(?i), поэтому</SESSION_TRANSCRIPT>/<Observations>не будут заменены и обойдут защиту при сборке prompt (Line 184-187).Предлагаемое исправление
-var xmlTagRe = regexp.MustCompile(`</?(?:session_transcript|observations?|metadata|no_observations_found)[^>]*>`) +var xmlTagRe = regexp.MustCompile(`(?i)</?(?:session_transcript|observations?|metadata|no_observations_found)\b[^>]*>`)#!/bin/bash set -euo pipefail echo "1) Текущий regex:" rg -n 'xmlTagRe\s*=\s*regexp\.MustCompile' internal/backfill/extract/extract.go -A1 echo echo "2) Проверка наличия case-insensitive флага (?i):" rg -n '\(\?i\)' internal/backfill/extract/extract.go || true echo echo "3) Проверка тестов на mixed/upper case теги в backfill/extract:" rg -n 'SESSION_TRANSCRIPT|Observations|BuildUserPrompt|xmlTagRe' internal/backfill/extract -g '*_test.go' || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/backfill/extract/extract.go` at line 15, The regex assigned to xmlTagRe (regexp.MustCompile) is case-sensitive and misses mixed/uppercase XML tags; update the pattern to be case-insensitive by adding the (?i) inline flag (e.g., regexp.MustCompile(`(?i)</?(?:session_transcript|observations?|metadata|no_observations_found)[^>]*>`)) so tags like </SESSION_TRANSCRIPT> or <Observations> are matched and sanitized before prompt assembly (see BuildUserPrompt usage around prompt construction). Ensure xmlTagRe is the only change and run/adjust existing tests that assert tag stripping to include mixed/upper-case examples.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/worker/service.go`:
- Around line 2007-2008: The warning condition in Start() can panic if
s.tokenAuth is nil and misclassifies IPv6 loopback (::1) as non-loopback; update
the check around s.tokenAuth.IsEnabled() to first guard against nil (e.g., if
s.tokenAuth == nil || !s.tokenAuth.IsEnabled()) and replace the simple host
string checks with proper IP loopback detection using net.ParseIP(host) and
ip.IsLoopback() so that log.Warn().Str(...).Msg(...) only runs when
authentication is disabled (or tokenAuth nil) and the parsed host is not a
loopback address.
---
Duplicate comments:
In `@internal/backfill/extract/extract.go`:
- Line 15: The regex assigned to xmlTagRe (regexp.MustCompile) is case-sensitive
and misses mixed/uppercase XML tags; update the pattern to be case-insensitive
by adding the (?i) inline flag (e.g.,
regexp.MustCompile(`(?i)</?(?:session_transcript|observations?|metadata|no_observations_found)[^>]*>`))
so tags like </SESSION_TRANSCRIPT> or <Observations> are matched and sanitized
before prompt assembly (see BuildUserPrompt usage around prompt construction).
Ensure xmlTagRe is the only change and run/adjust existing tests that assert tag
stripping to include mixed/upper-case examples.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cd4c8573-95bd-4076-99a1-9988fb77cca6
📒 Files selected for processing (2)
internal/backfill/extract/extract.gointernal/worker/service.go
| if !s.tokenAuth.IsEnabled() && host != "127.0.0.1" && host != "localhost" { | ||
| log.Warn().Str("bind", fmt.Sprintf("%s:%d", host, port)).Msg("auth: server listening without authentication on non-loopback address — set ENGRAM_API_TOKEN to secure") |
There was a problem hiding this comment.
Условие предупреждения в Start() стоит усилить (nil + IPv6 loopback).
На Line 2007 возможен panic при s.tokenAuth == nil, а ::1 сейчас ошибочно считается non-loopback. Лучше добавить nil-guard и нормальную проверку loopback-адреса.
💡 Предлагаемое исправление
import (
"context"
"fmt"
+ "net"
"net/http"
"os"
"sync"
@@
- // Warn if auth is disabled and server is reachable beyond loopback
- if !s.tokenAuth.IsEnabled() && host != "127.0.0.1" && host != "localhost" {
+ // Warn if auth is disabled and server is reachable beyond loopback
+ isLoopback := host == "localhost"
+ if ip := net.ParseIP(host); ip != nil && ip.IsLoopback() {
+ isLoopback = true
+ }
+ if s.tokenAuth != nil && !s.tokenAuth.IsEnabled() && !isLoopback {
log.Warn().Str("bind", fmt.Sprintf("%s:%d", host, port)).Msg("auth: server listening without authentication on non-loopback address — set ENGRAM_API_TOKEN to secure")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if !s.tokenAuth.IsEnabled() && host != "127.0.0.1" && host != "localhost" { | |
| log.Warn().Str("bind", fmt.Sprintf("%s:%d", host, port)).Msg("auth: server listening without authentication on non-loopback address — set ENGRAM_API_TOKEN to secure") | |
| import ( | |
| "context" | |
| "fmt" | |
| "net" | |
| "net/http" | |
| "os" | |
| "sync" | |
| // ... rest of imports | |
| ) | |
| // ... other code ... | |
| // Warn if auth is disabled and server is reachable beyond loopback | |
| isLoopback := host == "localhost" | |
| if ip := net.ParseIP(host); ip != nil && ip.IsLoopback() { | |
| isLoopback = true | |
| } | |
| if s.tokenAuth != nil && !s.tokenAuth.IsEnabled() && !isLoopback { | |
| log.Warn().Str("bind", fmt.Sprintf("%s:%d", host, port)).Msg("auth: server listening without authentication on non-loopback address — set ENGRAM_API_TOKEN to secure") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/worker/service.go` around lines 2007 - 2008, The warning condition
in Start() can panic if s.tokenAuth is nil and misclassifies IPv6 loopback (::1)
as non-loopback; update the check around s.tokenAuth.IsEnabled() to first guard
against nil (e.g., if s.tokenAuth == nil || !s.tokenAuth.IsEnabled()) and
replace the simple host string checks with proper IP loopback detection using
net.ParseIP(host) and ip.IsLoopback() so that log.Warn().Str(...).Msg(...) only
runs when authentication is disabled (or tokenAuth nil) and the parsed host is
not a loopback address.
Chokidar (native file watcher) was imported at module top level, causing jiti to resolve the entire native dependency tree during plugin discovery. Dynamic import defers this to service.start(), after gateway port is already listening.
Summary
Addresses all findings from security audit
.agent/reports/security-audit-report.md.HIGH fixes
/api/update/apply,/api/restart,/api/logs,/api/backfill/session, etc.)ENGRAM_API_TOKENunset on non-loopback bindMEDIUM fixes
sync.Onceon Service struct (no more per-callcrypto.NewVault)cwd=""in HTTP handlers to prevent filesystem probingKept outside auth (public, read-only)
/health,/api/ready,/api/version, MCP endpoints, static filesTest plan
go build ./cmd/worker/go test ./internal/worker/... ./internal/privacy/... ./internal/crypto/...Summary by CodeRabbit
Примечания к выпуску
Bug Fixes
Security Improvements
Chores