feat(loom): Phase 3 — 4 MCP tools + CLI worker with allowlist#169
Conversation
WalkthroughДобавлен полноценный loom tenant: документация, четыре MCP-инструмента (loom_submit, loom_get, loom_list, loom_cancel), обработчики инструментов, CLI-worker с allowlist/валидацией и наборы тестов; Module инициализирует workers после восстановления упавших задач и реализует ToolProvider. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as MCP Client
participant Tools as Module\n(ToolProvider)
participant Engine as loomEngine
participant Worker as CLI Worker
participant Process as Subprocess
Client->>Tools: HandleTool("loom_submit", {worker_type, prompt})
Tools->>Tools: Validate input & project (p.ID)
Tools->>Engine: Submit(TaskRequest{ProjectID, CLI, Prompt, Env, CWD, Timeout})
Engine->>Engine: Persist task (SQLite WAL) → status: dispatched
Engine-->>Tools: {task_id}
Tools-->>Client: {task_id, status:"dispatched"}
Worker->>Engine: Poll / fetch next task
Engine-->>Worker: Task(CLI, Prompt, Env, CWD, Timeout)
Worker->>Worker: Validate allowlist & env keys
Worker->>Process: exec.CommandContext(CLI, args), stdin=Prompt
Process-->>Worker: stdout, stderr, exit_code
Worker->>Engine: Update task (completed/failed/retry)
Engine->>EventBus: Emit task.* events
EventBus->>Notifier: Forward → JSON-RPC pushes to MCP sessions
sequenceDiagram
participant Client as MCP Client
participant Tools as Module\n(ToolProvider)
participant Engine as loomEngine
Client->>Tools: HandleTool("loom_get"/"loom_list"/"loom_cancel", ...)
Tools->>Tools: Unmarshal + validate + enforce ProjectID
Tools->>Engine: Get/List/Cancel(...)
Engine-->>Tools: Task(s) / success / error
Tools-->>Client: Normalized response or mapped error (tool_input_invalid, not_found, internal_error)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Поэма
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the functional core of the loom module, exposing four MCP tools (loom_submit, loom_get, loom_list, loom_cancel) for background task management. It introduces a cliWorker that executes allowlisted binaries by piping prompts to stdin. Feedback highlights a hardcoded local path in the documentation and suggests making the cli field required in the loom_submit schema. Additionally, there are concerns regarding error handling in loom_cancel potentially masking internal errors and the lack of memory limits when reading subprocess output in the cliWorker.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
internal/handlers/loom/tools_test.go (2)
394-396: Проверка на пустой массив может быть хрупкой.Проверка
strings.Contains(string(raw),"tasks":[])чувствительна к форматированию JSON. Если маршалинг добавит пробелы ("tasks": []), тест провалится.♻️ Более надёжная проверка
- // Must be {"tasks": []} not {"tasks": null} - if !strings.Contains(string(raw), `"tasks":[]`) { - t.Errorf("expected empty array in output, got: %s", raw) - } + // Must be {"tasks": []} not {"tasks": null} + var out struct { + Tasks []*loomlib.Task `json:"tasks"` + } + if err := json.Unmarshal(raw, &out); err != nil { + t.Fatalf("unmarshal: %v", err) + } + if out.Tasks == nil { + t.Error("expected non-nil tasks slice, got nil (JSON null)") + } + if len(out.Tasks) != 0 { + t.Errorf("expected 0 tasks, got %d", len(out.Tasks)) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handlers/loom/tools_test.go` around lines 394 - 396, The test currently checks formatting-sensitive output using strings.Contains(string(raw), `"tasks":[]`); instead, parse raw as JSON and assert that the "tasks" field is present and has length 0 to avoid whitespace/formatting brittleness—unmarshal raw into a map or struct and check map["tasks"] (or the struct field) is an empty slice/has zero length (reference the raw variable and the existing assertion that calls t.Errorf).
579-587: Тест compile-time assertion избыточен.Compile-time assertion уже существует в
module.go:46. Дублирование в тесте не добавляет ценности — если assertion в module.go провалится, код не скомпилируется, и тесты не запустятся.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handlers/loom/tools_test.go` around lines 579 - 587, Remove the redundant compile-time assertion test function TestToolProvider_CompileTimeAssertion from internal/handlers/loom/tools_test.go; the same assertion is already enforced in module.go via the package-level line var _ module.ToolProvider = (*Module)(nil), so delete the TestToolProvider_CompileTimeAssertion function (or replace it with a brief comment referencing the module.go assertion) to avoid duplication.internal/handlers/loom/module.go (1)
159-164: Комментарий частично неточен относительно порядка операций.Комментарий на строке 161 утверждает "Order: NewEngine → RecoverCrashed → registerWorkers → Events().Subscribe", но фактический порядок в коде:
- NewEngine (строки 132-138)
- Subscribe (строка 145)
- RecoverCrashed (строка 149)
- registerWorkers (строка 164)
Строка 162-163 корректно уточняет это, но строка 161 вводит в заблуждение.
♻️ Предлагаемое исправление комментария
// Register built-in workers after crash recovery so any worker registered // here cannot accidentally dispatch into a half-recovered state. - // Order: NewEngine → RecoverCrashed → registerWorkers → Events().Subscribe - // Note: Subscribe is called above (before RecoverCrashed) to forward - // crash-recovery events. Worker registration is intentionally AFTER. + // Order: NewEngine → Subscribe → RecoverCrashed → registerWorkers + // Subscribe is called before RecoverCrashed to forward crash-recovery + // events. Worker registration is intentionally AFTER recovery completes. registerWorkers(eng, deps)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handlers/loom/module.go` around lines 159 - 164, Update the misleading comment block so the operation order matches the actual code: NewEngine → Events().Subscribe → RecoverCrashed → registerWorkers; specifically edit the line that currently reads "Order: NewEngine → RecoverCrashed → registerWorkers → Events().Subscribe" to the correct sequence and keep the existing note that Subscribe is called before RecoverCrashed to forward crash-recovery events, while clarifying that worker registration (registerWorkers) happens after recovery; reference NewEngine, Events().Subscribe, RecoverCrashed, and registerWorkers when making the change.internal/handlers/loom/tools.go (2)
373-381: Любая ошибка Cancel интерпретируется как "task already finished".Ошибка engine.Cancel может означать не только "task already finished", но и другие проблемы (ошибка БД, task not found). Текущая логика всегда возвращает
cancelled: false, reason: "task already finished", что может быть некорректно.♻️ Более точная обработка ошибок
cancelErr := m.engine.Cancel(a.TaskID) if cancelErr != nil { - // Loom's Cancel returns an error when the task is not running (already - // terminal or pending). Translate to a soft success response. - out := map[string]any{ - "cancelled": false, - "reason": "task already finished", + // Check if it's a "not cancellable" error vs other errors + if strings.Contains(cancelErr.Error(), "not cancellable") || + strings.Contains(cancelErr.Error(), "not running") { + out := map[string]any{ + "cancelled": false, + "reason": "task not cancellable (already terminal or pending)", + } + return json.Marshal(out) } - return json.Marshal(out) + // Unexpected error - return as internal_error + return nil, &module.ModuleError{ + Code: "internal_error", + Message: "loom_cancel: engine.Cancel failed: " + cancelErr.Error(), + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handlers/loom/tools.go` around lines 373 - 381, The current handling in the Cancel path treats any error from m.engine.Cancel (cancelErr) as "task already finished"; update the logic in the cancel handler (where m.engine.Cancel is called and cancelErr is checked) to distinguish expected "task already finished"/not-running errors from real failures: inspect cancelErr (type/assert or error message/ sentinel like ErrTaskNotRunning/ErrNotFound if available from the engine) and only return {"cancelled":false,"reason":"task already finished"} for that specific case, otherwise return or wrap the actual error (or a JSON indicating failure with the error message) so DB/other errors are not masked; ensure you reference m.engine.Cancel and the cancelErr variable when making the change.
266-273: Все ошибки engine.Get преобразуются в "not_found".Если
m.engine.Getвозвращает ошибку (например, ошибку БД или сети), она будет замаскирована какnot_found. Это может усложнить диагностику проблем.♻️ Предлагаемое разделение ошибок
task, err := m.engine.Get(a.TaskID) if err != nil { + // Log the actual error for debugging + m.deps.Logger.WarnContext(ctx, "loom_get: engine.Get failed", + "task_id", a.TaskID, + "error", err, + ) return nil, &module.ModuleError{ - Code: "not_found", - Message: fmt.Sprintf("loom_get: task %q not found", a.TaskID), - Details: map[string]any{"task_id": a.TaskID}, + Code: "internal_error", + Message: fmt.Sprintf("loom_get: failed to retrieve task %q", a.TaskID), + Details: map[string]any{"task_id": a.TaskID, "error": err.Error()}, } }Примечание:
handleLoomGetне принимаетctx, поэтому для логирования потребуется добавить его в сигнатуру.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handlers/loom/tools.go` around lines 266 - 273, The current code masks every error from m.engine.Get as a "not_found" ModuleError; update handleLoomGet to accept a context.Context (add ctx to its signature) and change the error handling around m.engine.Get to distinguish a genuine "not found" from other failures: detect the engine's not-found sentinel (e.g., errors.Is(err, engine.ErrNotFound) or a typed NotFound error) and return the existing ModuleError with Code "not_found" only in that case; for any other error, log the full error using the provided ctx (or a logger obtained from ctx) and return a ModuleError with a more appropriate Code like "internal_error" or "engine_error" and include the original error details in Details to aid debugging (do not swallow the original error).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/modules/loom.md`:
- Around line 16-17: В файле docs/modules/loom.md замените абсолютный
Windows-путь "D:\Dev\aimux\loom\CONTRACT.md" на относительную ссылку или
репозиторный URL (например "loom/CONTRACT.md" или полную ссылку на GitHub) —
отредактируйте текст строки после "Loom library:
`github.com/thebtf/aimux/loom@v0.1.0`" так, чтобы вместо локального пути
использовался относительный путь или HTTP(S) URL репозитория.
In `@internal/handlers/loom/workers_test.go`:
- Around line 283-322: TestCliWorker_ContextCancellation fails because the test
currently relies on feeding "sleep 30" via stdin which some shells buffer;
update the test to invoke the shell with "-c" so the command is executed as an
argument rather than from stdin: modify the Task passed to
loomhandler.CLIWorkerWithAllowlist/Execute in TestCliWorker_ContextCancellation
so it runs "sh -c 'sleep 30'" (i.e., set the task to use CLI "sh" and supply
"-c" and "sleep 30" as the command arguments or equivalent fields on Task) so
cancellation triggers reliably.
- Around line 162-218: The test TestCliWorker_Timeout fails because spawning
"sh" without the -c flag leaves the shell waiting on stdin; update the Unix
branch to invoke the shell with -c so the command runs immediately: when
creating the worker via loomhandler.NewCLIWorkerWithAllowlist and the task
(task.CLI and task.Prompt used by Execute), set task.CLI = "sh" and set
task.Prompt to "-c sleep 10" (or otherwise ensure the shell is invoked as "sh -c
'sleep 10'") so the shell executes the sleep command instead of waiting on
stdin.
- Around line 43-71: Test uses "echo" which ignores stdin so the CLI worker
receives no content; update TestCliWorker_HappyPath to use a CLI that reads
stdin (e.g., change NewCLIWorkerWithAllowlist([]string{"echo"}) to
NewCLIWorkerWithAllowlist([]string{"cat"}) and set task.CLI = "cat") so the
Prompt passed via stdin is echoed back by CLIWorker.Execute; ensure the
allowlist and the Task.CLI value are both updated and the exec.LookPath check
matches the chosen binary (cat).
---
Nitpick comments:
In `@internal/handlers/loom/module.go`:
- Around line 159-164: Update the misleading comment block so the operation
order matches the actual code: NewEngine → Events().Subscribe → RecoverCrashed →
registerWorkers; specifically edit the line that currently reads "Order:
NewEngine → RecoverCrashed → registerWorkers → Events().Subscribe" to the
correct sequence and keep the existing note that Subscribe is called before
RecoverCrashed to forward crash-recovery events, while clarifying that worker
registration (registerWorkers) happens after recovery; reference NewEngine,
Events().Subscribe, RecoverCrashed, and registerWorkers when making the change.
In `@internal/handlers/loom/tools_test.go`:
- Around line 394-396: The test currently checks formatting-sensitive output
using strings.Contains(string(raw), `"tasks":[]`); instead, parse raw as JSON
and assert that the "tasks" field is present and has length 0 to avoid
whitespace/formatting brittleness—unmarshal raw into a map or struct and check
map["tasks"] (or the struct field) is an empty slice/has zero length (reference
the raw variable and the existing assertion that calls t.Errorf).
- Around line 579-587: Remove the redundant compile-time assertion test function
TestToolProvider_CompileTimeAssertion from internal/handlers/loom/tools_test.go;
the same assertion is already enforced in module.go via the package-level line
var _ module.ToolProvider = (*Module)(nil), so delete the
TestToolProvider_CompileTimeAssertion function (or replace it with a brief
comment referencing the module.go assertion) to avoid duplication.
In `@internal/handlers/loom/tools.go`:
- Around line 373-381: The current handling in the Cancel path treats any error
from m.engine.Cancel (cancelErr) as "task already finished"; update the logic in
the cancel handler (where m.engine.Cancel is called and cancelErr is checked) to
distinguish expected "task already finished"/not-running errors from real
failures: inspect cancelErr (type/assert or error message/ sentinel like
ErrTaskNotRunning/ErrNotFound if available from the engine) and only return
{"cancelled":false,"reason":"task already finished"} for that specific case,
otherwise return or wrap the actual error (or a JSON indicating failure with the
error message) so DB/other errors are not masked; ensure you reference
m.engine.Cancel and the cancelErr variable when making the change.
- Around line 266-273: The current code masks every error from m.engine.Get as a
"not_found" ModuleError; update handleLoomGet to accept a context.Context (add
ctx to its signature) and change the error handling around m.engine.Get to
distinguish a genuine "not found" from other failures: detect the engine's
not-found sentinel (e.g., errors.Is(err, engine.ErrNotFound) or a typed NotFound
error) and return the existing ModuleError with Code "not_found" only in that
case; for any other error, log the full error using the provided ctx (or a
logger obtained from ctx) and return a ModuleError with a more appropriate Code
like "internal_error" or "engine_error" and include the original error details
in Details to aid debugging (do not swallow the original error).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bfa71194-47a1-4af9-a2d1-421b598ead46
📒 Files selected for processing (6)
docs/modules/loom.mdinternal/handlers/loom/module.gointernal/handlers/loom/tools.gointernal/handlers/loom/tools_test.gointernal/handlers/loom/workers.gointernal/handlers/loom/workers_test.go
Review of PR #169 (feat/loom-tools) surfaced 13 findings across CodeRabbit and Gemini. 8 resolved via pr_resolve API, 5 are CodeRabbit nitpick threads with canResolve:false that will auto-close on the next CodeRabbit pass after this push. Changes applied in this commit: MAJOR — workers_test.go test correctness (CodeRabbit): - HappyPath: switched `echo` → `cat`. `echo` does NOT read stdin on any POSIX shell — it prints its command-line args. The test was passing task.Prompt via stdin and expecting it back in stdout, which would have failed on Linux/macOS CI. `cat` is the correct primitive: reads stdin, writes to stdout. - Timeout + ContextCancellation: switched `sh -c "sleep 10"` → `sh -c "exec sleep 10"`. Without `exec`, the shell process wraps sleep and exec.CommandContext's ctx cancellation kills only the shell, leaving sleep running as a zombie child. `exec` replaces the shell with sleep so the ctx-cancel signal reaches the real process. MAJOR — workers.go DoS protection (Gemini): - Replaced unbounded `cmd.Output()` with a `limitedWriter` stdout capture (10 MiB hard cap) + separate stderr buffer. A rogue subprocess producing gigabytes of output would have OOM'd the daemon. The `maxOutputBytes` constant (10 * 1024 * 1024) is inlined with a clear limit. Bytes beyond the cap are silently discarded — a follow-up PR can add a log line noting truncation if operators want visibility. - Added `bytes` and `io` imports to support the new capture path. MINOR — tools.go JSON schema (Gemini): - Added `"cli"` to the `required[]` array in the `loom_submit` input schema. Previously only `worker_type` and `prompt` were required; the handler validated `cli` at runtime but the schema didn't advertise the requirement to clients. MINOR — docs/modules/loom.md URL (both reviewers): - Replaced local Windows path `D:\Dev\aimux\loom\CONTRACT.md` with the canonical GitHub URL. The doc is a public module author guide; absolute local paths are unusable by readers outside this checkout. MINOR — tools_test.go robustness (CodeRabbit): - `TestLoomList_EmptyProjectReturnsEmptyArray`: replaced brittle `strings.Contains(raw, "tasks":[]")` check with a proper JSON unmarshal that validates the shape. - Removed redundant `TestToolProvider_CompileTimeAssertion` function — the compile-time assertion in module.go already catches the same regression at build time, the test function was a no-op decoration. - Removed now-unused `strings` import. MINOR — module.go comment (CodeRabbit): - Corrected operation order in package comment. The code does `NewEngine → Subscribe → RecoverCrashed → registerWorkers` (Subscribe before RecoverCrashed so crash-recovery events reach connected sessions), but the comment said the reverse. Comment now matches code. DEFERRED — `loom_get` error masking (CodeRabbit nitpick): The `loom_get` handler returns `not_found` for every error from `engine.Get(taskID)`, masking potential DB failures. Splitting the error codes for real DB errors vs missing tasks would improve observability but requires loom to export a sentinel like `ErrNotFound`. Checked `aimux/loom@v0.1.0` — no such sentinel is exported. A follow-up will add proper error classification once loom ships sentinel errors. DEFERRED — `loom_cancel` error masking (intentional per spec): Cancel error is mapped to soft success `{cancelled: false}` by design (spec FR-4 + C3 explicit): agents routinely try to cancel completed tasks and a hard error would be noisy. Not a bug. No change.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/handlers/loom/workers.go`:
- Around line 118-122: stderr is not limited while stdout uses limitedWriter, so
a noisy subprocess can exhaust memory; change the stderr assignment to use the
same limiting mechanism by wrapping stderrBuf with limitedWriter (same
maxOutputBytes) and set cmd.Stderr = &limitedWriter{w: &stderrBuf, n:
maxOutputBytes} (mirror how stdout is set). Ensure you reuse the existing
limitedWriter type and the maxOutputBytes constant so both stdoutBuf and
stderrBuf are capped.
- Around line 181-190: The limitedWriter.Write implementation violates io.Writer
by returning the truncated slice length instead of the original buffer length;
in limitedWriter.Write, capture the original length (origLen := len(p)), then if
you must truncate p to l.n call n, err := l.w.Write(p), decrement l.n by
int64(n), and finally: if err == nil return origLen, nil else return n, err.
Ensure the early-return path when l.n <= 0 also returns the original length and
nil error, and reference the limitedWriter.Write method and l.n and l.w fields
when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f824cf84-e032-40d4-958c-0627a3ae316c
📒 Files selected for processing (6)
docs/modules/loom.mdinternal/handlers/loom/module.gointernal/handlers/loom/tools.gointernal/handlers/loom/tools_test.gointernal/handlers/loom/workers.gointernal/handlers/loom/workers_test.go
✅ Files skipped from review due to trivial changes (1)
- docs/modules/loom.md
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/handlers/loom/tools_test.go
- internal/handlers/loom/module.go
| if strings.TrimSpace(a.Prompt) == "" { | ||
| return nil, &module.ModuleError{ | ||
| Code: "tool_input_invalid", | ||
| Message: "loom_submit: prompt must not be empty", | ||
| } | ||
| } | ||
|
|
||
| // Validate worker_type against v4.4.0 supported set. | ||
| wt := loom.WorkerType(a.WorkerType) | ||
| if wt != loom.WorkerTypeCLI { | ||
| return nil, &module.ModuleError{ | ||
| Code: "tool_input_invalid", | ||
| Message: fmt.Sprintf("loom_submit: unsupported worker_type %q; v4.4.0 supports: cli", a.WorkerType), | ||
| Details: map[string]any{"worker_type": a.WorkerType}, | ||
| } | ||
| } | ||
|
|
||
| req := loom.TaskRequest{ | ||
| WorkerType: wt, | ||
| ProjectID: p.ID, // ALWAYS scoped to the authenticated project | ||
| Prompt: a.Prompt, | ||
| CLI: a.CLI, | ||
| CWD: a.CWD, | ||
| Env: a.Env, | ||
| Model: a.Model, | ||
| Role: a.Role, | ||
| Effort: a.Effort, | ||
| Timeout: a.TimeoutSec, | ||
| Metadata: a.Metadata, | ||
| } |
There was a problem hiding this comment.
Не отправляйте заведомо неисполняемые задачи.
loom_submit сейчас не валидирует cli и ключи env, хотя worker потом это жёстко отбрасывает. В итоге инструмент отвечает "dispatched" для задачи, которая гарантированно упадёт уже на исполнении, вместо синхронного tool_input_invalid.
🔧 Вариант исправления
if strings.TrimSpace(a.Prompt) == "" {
return nil, &module.ModuleError{
Code: "tool_input_invalid",
Message: "loom_submit: prompt must not be empty",
}
}
+ if strings.TrimSpace(a.CLI) == "" {
+ return nil, &module.ModuleError{
+ Code: "tool_input_invalid",
+ Message: "loom_submit: cli must not be empty",
+ }
+ }
+ if strings.ContainsAny(a.CLI, "/\\:") {
+ return nil, &module.ModuleError{
+ Code: "tool_input_invalid",
+ Message: fmt.Sprintf("loom_submit: invalid cli %q", a.CLI),
+ Details: map[string]any{"cli": a.CLI},
+ }
+ }
+ allowed := false
+ for _, name := range defaultAllowlist {
+ if a.CLI == name {
+ allowed = true
+ break
+ }
+ }
+ if !allowed {
+ return nil, &module.ModuleError{
+ Code: "tool_input_invalid",
+ Message: fmt.Sprintf("loom_submit: cli %q is not allowlisted", a.CLI),
+ Details: map[string]any{"cli": a.CLI},
+ }
+ }
+ for k := range a.Env {
+ if !envKeyRe.MatchString(k) {
+ return nil, &module.ModuleError{
+ Code: "tool_input_invalid",
+ Message: fmt.Sprintf("loom_submit: invalid env key %q", k),
+ Details: map[string]any{"env_key": k},
+ }
+ }
+ }
// Validate worker_type against v4.4.0 supported set.
wt := loom.WorkerType(a.WorkerType)| // Capture stdout up to maxOutputBytes to prevent memory exhaustion from | ||
| // a runaway subprocess. Stderr is captured separately for error reporting. | ||
| var stdoutBuf, stderrBuf bytes.Buffer | ||
| cmd.Stdout = &limitedWriter{w: &stdoutBuf, n: maxOutputBytes} | ||
| cmd.Stderr = &stderrBuf |
There was a problem hiding this comment.
Ограничьте stderr так же, как stdout.
stderrBuf здесь без лимита, поэтому “болтливый” subprocess всё ещё может выбить демон по памяти, хотя stdout уже capped. Для shell-out worker это оставляет DoS по объёму вывода.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/handlers/loom/workers.go` around lines 118 - 122, stderr is not
limited while stdout uses limitedWriter, so a noisy subprocess can exhaust
memory; change the stderr assignment to use the same limiting mechanism by
wrapping stderrBuf with limitedWriter (same maxOutputBytes) and set cmd.Stderr =
&limitedWriter{w: &stderrBuf, n: maxOutputBytes} (mirror how stdout is set).
Ensure you reuse the existing limitedWriter type and the maxOutputBytes constant
so both stdoutBuf and stderrBuf are capped.
| func (l *limitedWriter) Write(p []byte) (int, error) { | ||
| if l.n <= 0 { | ||
| return len(p), nil // discard | ||
| } | ||
| if int64(len(p)) > l.n { | ||
| p = p[:l.n] | ||
| } | ||
| n, err := l.w.Write(p) | ||
| l.n -= int64(n) | ||
| return len(p), err // report full len to avoid short-write errors |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Go, if an io.Writer.Write method returns n < len(p) with a nil error, does io.Copy / os/exec treat that as ErrShortWrite? How should a truncating writer report n?
💡 Result:
No, a conforming io.Writer.Write implementation must not return n < len(p) with a nil error. The official Go documentation states: "Write must return a non-nil error if it returns n < len(p)." Returning partial n with nil error violates the io.Writer contract. io.Copy explicitly checks for this case in its implementation and returns io.ErrShortWrite if the writer returns nw != nr (where nr is bytes read to write) after a successful write (ew == nil). From io.CopyBuffer source: if ew != nil { err = ew break } if nr != nw { err = ErrShortWrite break } Thus, io.Copy treats a partial write with nil error as io.ErrShortWrite. os/exec uses io.Copy in its internal goroutines (via writerDescriptor) to copy from child process pipes to the user-provided io.Writer (cmd.Stdout/Stderr). Therefore, os/exec will also propagate io.ErrShortWrite if the writer violates the contract. For a truncating writer that intentionally writes fewer bytes, return the actual number written (n < len(p)) paired with a non-nil error, such as fmt.Errorf("truncated: wrote %d of %d bytes", n, len(p)) or io.ErrShortWrite itself. This signals to callers like io.Copy that the write stopped early intentionally. Example of a correct truncating writer (similar to testing/iotest.TruncateWriter): type truncateWriter struct { w io.Writer n int64 } func (t *truncateWriter) Write(p []byte) (int, error) { if t.n <= 0 { return len(p), nil // or return 0, io.EOF if preferred } nn := len(p) if int64(nn) > t.n { nn = int(t.n) } n, err := t.w.Write(p[:nn]) t.n -= int64(n) if err == nil && n < len(p) { err = io.ErrShortWrite // Key: return error for partial write } return n, err } Note: Some non-conforming writers (like certain serial libraries) do return partial n with nil err, causing io.Copy to return ErrShortWrite, which is the correct enforcement of the contract.
Citations:
- 1: https://pkg.go.dev/io@go1.25.4
- 2: https://pkg.go.dev/io@go1.25.5
- 3: https://pkg.go.dev/io
- 4: https://golang.org/src/io/io.go
- 5: https://go.dev/src/io/io.go
- 6: https://go.dev/src/os/exec/exec.go
- 7: https://golang.org/src/os/exec/exec.go
- 8: https://github.com/golang/go/blob/master/src/os/exec/exec.go
- 9: https://github.com/golang/go/blob/master/src/testing/iotest/writer.go
- 10: https://go.dev/src/testing/iotest/writer.go
limitedWriter нарушает контракт io.Writer и вызывает ошибку на больших объёмах данных.
На строке 190 возвращается длина усечённого слайса вместо исходной длины. Когда очередной write больше остатка лимита, функция возвращает len(p) (усечённое значение) вместо оригинальной длины буфера. Это нарушает контракт io.Writer: когда Write возвращает n < len(p) с nil ошибкой, io.Copy (который используется в os/exec для чтения stdout/stderr подпроцесса) трактует это как ErrShortWrite и команда падает.
Исправление
func (l *limitedWriter) Write(p []byte) (int, error) {
+ origLen := len(p)
if l.n <= 0 {
- return len(p), nil // discard
+ return origLen, nil // discard
}
- if int64(len(p)) > l.n {
+ if int64(origLen) > l.n {
p = p[:l.n]
}
n, err := l.w.Write(p)
l.n -= int64(n)
- return len(p), err // report full len to avoid short-write errors
+ return origLen, err // report full len to avoid short-write errors
}Для усекающего writer необходимо либо возвращать исходную длину с nil ошибкой (сигнализируя что все данные "потреблены"), либо возвращать фактическое количество написанных байт с ненулевой ошибкой.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (l *limitedWriter) Write(p []byte) (int, error) { | |
| if l.n <= 0 { | |
| return len(p), nil // discard | |
| } | |
| if int64(len(p)) > l.n { | |
| p = p[:l.n] | |
| } | |
| n, err := l.w.Write(p) | |
| l.n -= int64(n) | |
| return len(p), err // report full len to avoid short-write errors | |
| func (l *limitedWriter) Write(p []byte) (int, error) { | |
| origLen := len(p) | |
| if l.n <= 0 { | |
| return origLen, nil // discard | |
| } | |
| if int64(origLen) > l.n { | |
| p = p[:l.n] | |
| } | |
| n, err := l.w.Write(p) | |
| l.n -= int64(n) | |
| return origLen, err // report full len to avoid short-write errors | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/handlers/loom/workers.go` around lines 181 - 190, The
limitedWriter.Write implementation violates io.Writer by returning the truncated
slice length instead of the original buffer length; in limitedWriter.Write,
capture the original length (origLen := len(p)), then if you must truncate p to
l.n call n, err := l.w.Write(p), decrement l.n by int64(n), and finally: if err
== nil return origLen, nil else return n, err. Ensure the early-return path when
l.n <= 0 also returns the original length and nil error, and reference the
limitedWriter.Write method and l.n and l.w fields when making the change.
Phase 6 release ceremony for the v4.4.0 train. Bumps the unified engram-server + plugin version after the 5-PR train completed: #167 — Phase B-1 plumbing tenant (loom integration) #168 — Phase 2 obs.MeterFor helper + loom OTel wiring #169 — Phase 3 4 loom_* tools + CLI worker with allowlist #170 — Phase 4 server-side gRPC proto extensions + soft-delete reaper #171 — Phase 5 daemon serverevents bridge for real OnProjectRemoved e2e Per Constitution §15 the daemon version and plugin version must move together, so this commit bumps: - cmd/engram/main.go daemonVersion "v4.3.0" → "v4.4.0" Reported to gRPC Initialize + logged in structured output. - plugin/engram/.claude-plugin/plugin.json version "4.3.0" → "4.4.0" Drives Claude Code plugin cache invalidation on /reload-plugins. Without this bump the marketplace update would not be detected. No behaviour change, no test change. Release notes + git tag + gh release + marketplace sync land in subsequent steps.
Summary
Phase 3 of the engram v4.4.0 loom integration train. Turns the plumbing tenant from PR #167 into a useful tenant by exposing 4 MCP tools backed by an inline CLI shell-out worker.
Implements spec FR-1 through FR-5 (4 tools + CLI worker) + FR-12 (wiring.go package doc update) + US1 (Agent submits background task) + US6 (Future workers pluggable).
What ships
internal/handlers/loom/workers.go(160 LOC) —cliWorkerstruct implementingloom.Worker:codex,claude,aimux(deny-by-default)/,\,:)exec.CommandContextwith task-scoped ctx for cancellation + timeouttask.Promptdelivered via stdin (NOT command-line args — NFR-10)task.Envmerged overos.Environ()withKEY=VALUEshape validationtask.CWDapplied to subprocessWorkerResult{Content: ""}so loom's quality gate retriesinternal/handlers/loom/workers_test.go(355 LOC) — 8 table-driven tests covering happy path, allowlist deny, path-separator reject, env merge, timeout, stderr capture, empty stdout retry, context cancellation. 6 POSIX-shell-dependent cases skip on Windows (sh/echo/sleep not in PATH) — covered on Linux/macOS CI.internal/handlers/loom/tools.go(386 LOC) —ToolProviderimplementation with 4 MCP tools:loom_submit— validatesworker_typeagainst registered set +cliagainst allowlist + env key shape; returns{task_id, status: "dispatched"}loom_get— cross-project safety: iftask.ProjectID != p.IDreturnsnot_found-class errorloom_list— scope-implicit (never accepts client-suppliedproject_id); optionalstatuses[]filterloom_cancel— translates loom's "task not running" error into soft success{cancelled: false, reason: "task already finished"}internal/handlers/loom/tools_test.go(588 LOC) — 12+ tests covering schema validation, project scoping, cross-project isolation, soft success on Cancel-after-finished, Submit→Get e2e with fake worker, ToolProvider compile-time assertion.internal/handlers/loom/module.go(+31/-10) — addsregisterWorkers(engine, deps)call inInitafterRecoverCrashed,var _ module.ToolProvider = (*Module)(nil)compile-time assertion, package doc comment rewrite removing the "plumbing-only" wording.docs/modules/loom.md(105 LOC) — module author guide with Overview, Architecture, RegisterWorker extension example (10-line Go snippet showing how to add a futureWorkerTypeThinker), Tool reference, Operator notes.6 atomic commits
311b991feat(loom): add cliWorker with allowlist and env merge (T013)a5453bctest(loom): add 8 cliWorker tests covering happy path + edge cases (T014)5b9e33cfeat(loom): register workers during Init after RecoverCrashed (T015 + T018 + T019 — all touch module.go)e056bdafeat(loom): add ToolProvider with 4 loom_* tools and JSON schemas (T016)9333da0test(loom): add 12+ tool handler tests covering schemas and scoping (T017)6d1ff6edocs(modules): add loom module author guide with extension example (T020)Note: T015 + T018 + T019 are combined in one commit because all three edit
module.goexclusively. Splitting them would have required history rewrites for no meaningful review benefit — the three concerns (Init registerWorkers call, ToolProvider compile-time assertion, package doc update) are visible in the single diff.Verification
go build ./...— cleango vet ./...— cleango test ./internal/handlers/loom/... -count=1(×2 runs to detect flakes) — 30+ tests PASScliWorker.Executebody with a trivial return would failTestCliWorker_TimeoutandTestCliWorker_StderrCapture.Security (NFR-10)
CLI worker hardening implemented per spec:
exec.LookPath../etc/passwd-style attacks[A-Za-z_][A-Za-z0-9_]*) prevents injection via malformed keysCompatibility
tools/listper the existing dispatcher contract)Follow-up
After merge, a tiny chore PR will raise the
internal/handlers/loom/coverage threshold in.github/workflows/test.ymlfrom 70% → 80% (the new test surface lifts the average above 80%, noted in spec G003 RESOLVE step).Related
.agent/specs/loom-integration/spec.mdFR-1..5, FR-12, US1, US6 (gitignored).agent/specs/loom-integration/plan.md§Phase 3 (gitignored).agent/specs/loom-integration/tasks.mdT013..T020 (gitignored)Summary by CodeRabbit
Новые функции
Тесты
Документация