feat(cascade-tools): spec 014 — agent ergonomics (truthful prompts, structured envelope, --comment alias)#1190
Merged
zbigniewsobiecki merged 5 commits intodevfrom Apr 25, 2026
Conversation
Adds docs/specs/014-cascade-tools-agent-ergonomics.md plus two plans covering shared-infra and create-pr-review adoption. Prompted by prod run 5d993b04-6e05-4ae1-b7de-8c274cf3496b. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elope
Ships the root-cause fix for prod run 5d993b04-6e05-4ae1-b7de-8c274cf3496b
plus the shared infrastructure every future gadget inherits:
- System-prompt renderer (src/backends/shared/nativeToolPrompts.ts) stops
stripping trailing 's' from array param names and claiming '<string>
(repeatable)' for every array. Array-of-object params now render as
`--<flag> '<json>'` with aliases appended via `|` and a one-line runnable
example from the tool definition.
- Factory (src/gadgets/shared/cliCommandFactory.ts) gains oclif flag aliases,
JSON parsing for array-of-object flags, file-input JSON parsing, `examples`
wired into oclif `--help`, and Levenshtein-based 'did you mean' suggestions
for mistyped flags (via fastest-levenshtein).
- New shared error envelope (src/gadgets/shared/errorEnvelope.ts) — every
CLI failure emits `{"success":false,"error":{type,flag?,message,got?,
expected?,hint?,example?}}` on stdout plus a one-line prose summary on
stderr. All prior `this.error()` / flat `{success:false,error:"<string>"}`
call sites migrated.
- Contracts widened: ParameterDefinition gains `cliAliases`, FileInput-
Alternative gains `parseAs`, ToolManifest parameters carry `items`,
`aliases`, `example`.
- Manifest generator threads the new fields through.
- bin/cascade-tools.js wraps `run()` to swallow oclif ExitError cleanly so
the envelope isn't obscured by Node's default stack dump.
Plan-1 ACs #1–#17 all delivered. 8438/8438 unit tests passing.
Test surface delta: 57 new unit tests across errorEnvelope.test.ts,
shared-nativeToolPrompts.test.ts, and factories.test.ts. Seven legacy
assertions encoding the pre-014 error surface updated in cli/cli-command-
factory, cli/file-input-flags, cli/scm/create-pr-sidecar, cli/scm/create-
pr-review-sidecar, backends/claude-code.
Plan 2 adopts the pattern on createPRReviewDef — zero shared-file edits —
proving the declarative-metadata invariant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Applies the spec-014 declarative-metadata pattern to createPRReviewDef: - --comment alias for --comments (the exact muscle-memory mistake from prod run 5d993b04-6e05-4ae1-b7de-8c274cf3496b). - --comments-file <path> (and - for stdin) JSON-parsed escape hatch for long payloads that don't survive shell quoting. - Two declarative fields on createPRReviewDef.parameters.comments.cliAliases + createPRReviewDef.cli.fileInputAlternatives. Zero edits to shared infrastructure (cliCommandFactory, manifestGenerator, nativeToolPrompts, errorEnvelope) — proves spec 014's single-entrypoint invariant. Per-plan ACs #1, #2, #3, #5, #6, #7, #8, #9, #11, #12 auto-verified (unit tests + build + lint + typecheck). AC #4 (binary-level smoke) tagged [manual] because vitest fork-pool workers fail to capture stdout/stderr from spawned binaries that do top-level await import(); the six scenarios were verified manually against the built binary and the trace is recorded in the plan. AC #10 n/a — integration test path abandoned for the same reason. All plans done. Spec 014 marked .done (docs/specs/014-*.md → .done). CHANGELOG Unreleased updated with a per-plan entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
5d993b04-6e05-4ae1-b7de-8c274cf3496bwhere a review agent wasted ~2½ min of a 7m42s run fightingcascade-tools scm create-pr-reviewand ultimately dropped an inline PR comment. The agent-facing system prompt literally told it to use--comment <string> (repeatable)when reality is--comments '<json array>'; error messages and--helpgave nothing to self-correct from.s-stripping array names and stops claiming "<string> (repeatable)" for every array; array-of-object params now render--<flag> '<json>'with aliases via|and a one-line runnable example inlined from the tool definition. Every CLI failure — flag parse, JSON parse, missing-required, enum-mismatch, unknown-flag, auth, runtime — emits a single structured envelope on stdout ({success:false, error:{type, flag?, message, got?, expected?, hint?, example?}}) plus a readable prose summary on stderr. Mistyped flags get a "did you mean" suggestion viafastest-levenshtein.--helprendersdef.examplesas copy-pasteable invocations under anEXAMPLESsection.createPRReviewDefgainscliAliases: ['comment']+fileInputAlternativesfor--comments-file <path>(and-for stdin). Two declarative fields. Zero edits to shared infrastructure — proves the single-entrypoint invariant that a new gadget should never need to touch shared machinery.src/gadgets/README.md(new). Full spec atdocs/specs/014-cascade-tools-agent-ergonomics.md.done.Highlights
Before: agent system prompt declared
[--comment <string> (repeatable)]. Agent sent--comment(singular). GotNonexistent flag: comment. Tried--comments+ single-quoted-keys JSON. Got--comments must be valid JSON. Reverse-engineered source. Gave up. Posted body-only review. 2½ min lost.After: agent prompt declares
[--comments|--comment '<json>']with an example line showing--comments '[{"path":"src/x.ts","line":1,"body":"nit"}]'. If the agent still mistypes, the envelope hint saysdid you mean --comments?. If JSON is malformed, the envelope surfacesgot: "[{'path'...",expected: "[{\"path\":...}]",hint: "Use double-quoted JSON keys... or pass --comments-file <path>". Self-correction is one retry away.Test plan
npm run build— cleannpm run lint— clean (biome)npm run typecheck— cleannpm test— 8440 passing, 0 failing, 23 skipped (458 files)errorEnvelope.test.ts,shared-nativeToolPrompts.test.ts,factories.test.ts,definitions.test.tsthis.error()/ flat envelope / s-stripped array names)[manual]6 binary-level scenarios run against./bin/cascade-tools.jsafternpm run build:--comments '[...]'→ exit 1,type:"runtime"(parse succeeded)--comment '[...]'alias → exit 1,type:"runtime"(alias resolved)--comment "[{'path':..}]"(single-quoted JSON) → exit 1,type:"json-parse",flag:"comments", hint mentions--comments-file--comments-file /tmp/x.json→ exit 1,type:"runtime"(file parsed)--comments-file -(stdin) → exit 1,type:"runtime"(stdin parsed)--help→ exit 0, containsEXAMPLES+"path":"src/utils.ts"[manual]stderr prose readable:runtime — No GitHub client in scope...(123 chars, single line, no ANSI, no Node stack dump)Notes
vitestfork-pool workers deterministically fail to capture stdout/stderr from spawned binaries that do top-levelawait import()— reproduced withspawn/execa/sh -c/execSync, exit 2 with zero output. Unit tests cover every code path at the Command class level; the binary-level scenarios are documented as a manual-verification protocol in plan 2. Chasing the vitest/tinypool root cause is a separate concern and out of scope here..done; spec.done.CLAUDE.mduntouched (audit clean).🤖 Generated with Claude Code