Skip to content

feat(cascade-tools): spec 014 — agent ergonomics (truthful prompts, structured envelope, --comment alias)#1190

Merged
zbigniewsobiecki merged 5 commits intodevfrom
feature/014-cascade-tools-agent-ergonomics
Apr 25, 2026
Merged

feat(cascade-tools): spec 014 — agent ergonomics (truthful prompts, structured envelope, --comment alias)#1190
zbigniewsobiecki merged 5 commits intodevfrom
feature/014-cascade-tools-agent-ergonomics

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

  • Root cause fix for prod run 5d993b04-6e05-4ae1-b7de-8c274cf3496b where a review agent wasted ~2½ min of a 7m42s run fighting cascade-tools scm create-pr-review and ultimately dropped an inline PR comment. The agent-facing system prompt literally told it to use --comment <string> (repeatable) when reality is --comments '<json array>'; error messages and --help gave nothing to self-correct from.
  • Shared-infra fix (plan 1): prompt renderer stops s-stripping array names and stops claiming "<string> (repeatable)" for every array; array-of-object params now render --<flag> '<json>' with aliases via | and a one-line runnable example inlined from the tool definition. Every CLI failure — flag parse, JSON parse, missing-required, enum-mismatch, unknown-flag, auth, runtime — emits a single structured envelope on stdout ({success:false, error:{type, flag?, message, got?, expected?, hint?, example?}}) plus a readable prose summary on stderr. Mistyped flags get a "did you mean" suggestion via fastest-levenshtein. --help renders def.examples as copy-pasteable invocations under an EXAMPLES section.
  • Per-gadget opt-in (plan 2): createPRReviewDef gains cliAliases: ['comment'] + fileInputAlternatives for --comments-file <path> (and - for stdin). Two declarative fields. Zero edits to shared infrastructure — proves the single-entrypoint invariant that a new gadget should never need to touch shared machinery.
  • Authoring guide lives at src/gadgets/README.md (new). Full spec at docs/specs/014-cascade-tools-agent-ergonomics.md.done.

Highlights

Before: agent system prompt declared [--comment <string> (repeatable)]. Agent sent --comment (singular). Got Nonexistent flag: comment. Tried --comments + single-quoted-keys JSON. Got --comments must be valid JSON. Reverse-engineered source. Gave up. Posted body-only review. 2½ min lost.

After: agent prompt declares [--comments|--comment '<json>'] with an example line showing --comments '[{"path":"src/x.ts","line":1,"body":"nit"}]'. If the agent still mistypes, the envelope hint says did you mean --comments?. If JSON is malformed, the envelope surfaces got: "[{'path'...", expected: "[{\"path\":...}]", hint: "Use double-quoted JSON keys... or pass --comments-file <path>". Self-correction is one retry away.

Test plan

  • npm run build — clean
  • npm run lint — clean (biome)
  • npm run typecheck — clean
  • npm test — 8440 passing, 0 failing, 23 skipped (458 files)
  • 57 new unit tests across errorEnvelope.test.ts, shared-nativeToolPrompts.test.ts, factories.test.ts, definitions.test.ts
  • 7 legacy pre-014 test assertions updated in-place (prose this.error() / flat envelope / s-stripped array names)
  • [manual] 6 binary-level scenarios run against ./bin/cascade-tools.js after npm run build:
    1. canonical --comments '[...]' → exit 1, type:"runtime" (parse succeeded)
    2. --comment '[...]' alias → exit 1, type:"runtime" (alias resolved)
    3. --comment "[{'path':..}]" (single-quoted JSON) → exit 1, type:"json-parse", flag:"comments", hint mentions --comments-file
    4. --comments-file /tmp/x.json → exit 1, type:"runtime" (file parsed)
    5. --comments-file - (stdin) → exit 1, type:"runtime" (stdin parsed)
    6. --help → exit 0, contains EXAMPLES + "path":"src/utils.ts"
  • [manual] stderr prose readable: runtime — No GitHub client in scope... (123 chars, single line, no ANSI, no Node stack dump)

Notes

  • vitest fork-pool workers deterministically fail to capture stdout/stderr from spawned binaries that do top-level await import() — reproduced with spawn/execa/sh -c/execSync, exit 2 with zero output. Unit tests cover every code path at the Command class level; the binary-level scenarios are documented as a manual-verification protocol in plan 2. Chasing the vitest/tinypool root cause is a separate concern and out of scope here.
  • This PR closes out spec 014. Both plans .done; spec .done. CLAUDE.md untouched (audit clean).

🤖 Generated with Claude Code

zbigniewsobiecki and others added 5 commits April 24, 2026 21:26
Adds docs/specs/014-cascade-tools-agent-ergonomics.md plus two plans
covering shared-infra and create-pr-review adoption. Prompted by prod
run 5d993b04-6e05-4ae1-b7de-8c274cf3496b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elope

Ships the root-cause fix for prod run 5d993b04-6e05-4ae1-b7de-8c274cf3496b
plus the shared infrastructure every future gadget inherits:

- System-prompt renderer (src/backends/shared/nativeToolPrompts.ts) stops
  stripping trailing 's' from array param names and claiming '<string>
  (repeatable)' for every array. Array-of-object params now render as
  `--<flag> '<json>'` with aliases appended via `|` and a one-line runnable
  example from the tool definition.
- Factory (src/gadgets/shared/cliCommandFactory.ts) gains oclif flag aliases,
  JSON parsing for array-of-object flags, file-input JSON parsing, `examples`
  wired into oclif `--help`, and Levenshtein-based 'did you mean' suggestions
  for mistyped flags (via fastest-levenshtein).
- New shared error envelope (src/gadgets/shared/errorEnvelope.ts) — every
  CLI failure emits `{"success":false,"error":{type,flag?,message,got?,
  expected?,hint?,example?}}` on stdout plus a one-line prose summary on
  stderr. All prior `this.error()` / flat `{success:false,error:"<string>"}`
  call sites migrated.
- Contracts widened: ParameterDefinition gains `cliAliases`, FileInput-
  Alternative gains `parseAs`, ToolManifest parameters carry `items`,
  `aliases`, `example`.
- Manifest generator threads the new fields through.
- bin/cascade-tools.js wraps `run()` to swallow oclif ExitError cleanly so
  the envelope isn't obscured by Node's default stack dump.

Plan-1 ACs #1#17 all delivered. 8438/8438 unit tests passing.

Test surface delta: 57 new unit tests across errorEnvelope.test.ts,
shared-nativeToolPrompts.test.ts, and factories.test.ts. Seven legacy
assertions encoding the pre-014 error surface updated in cli/cli-command-
factory, cli/file-input-flags, cli/scm/create-pr-sidecar, cli/scm/create-
pr-review-sidecar, backends/claude-code.

Plan 2 adopts the pattern on createPRReviewDef — zero shared-file edits —
proving the declarative-metadata invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Applies the spec-014 declarative-metadata pattern to createPRReviewDef:

- --comment alias for --comments (the exact muscle-memory mistake from
  prod run 5d993b04-6e05-4ae1-b7de-8c274cf3496b).
- --comments-file <path> (and - for stdin) JSON-parsed escape hatch for
  long payloads that don't survive shell quoting.
- Two declarative fields on createPRReviewDef.parameters.comments.cliAliases
  + createPRReviewDef.cli.fileInputAlternatives. Zero edits to shared
  infrastructure (cliCommandFactory, manifestGenerator, nativeToolPrompts,
  errorEnvelope) — proves spec 014's single-entrypoint invariant.

Per-plan ACs #1, #2, #3, #5, #6, #7, #8, #9, #11, #12 auto-verified
(unit tests + build + lint + typecheck). AC #4 (binary-level smoke)
tagged [manual] because vitest fork-pool workers fail to capture
stdout/stderr from spawned binaries that do top-level await import();
the six scenarios were verified manually against the built binary and
the trace is recorded in the plan. AC #10 n/a — integration test path
abandoned for the same reason.

All plans done. Spec 014 marked .done (docs/specs/014-*.md → .done).
CHANGELOG Unreleased updated with a per-plan entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 87.46269% with 42 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/gadgets/shared/cliCommandFactory.ts 87.05% 29 Missing and 4 partials ⚠️
src/gadgets/shared/errorEnvelope.ts 81.48% 5 Missing ⚠️
src/backends/shared/nativeToolPrompts.ts 77.77% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@zbigniewsobiecki zbigniewsobiecki merged commit 3fe7398 into dev Apr 25, 2026
9 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the feature/014-cascade-tools-agent-ergonomics branch April 25, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant