feat(cli): normalized usage + error_kind + --json-schema-version (issue #62 items 3, 4, 9)#66
Conversation
e0dfafe to
a0b43e1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0b43e1c42
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "cache_write_tokens": ("cache_creation_input_tokens", "cache_write_tokens"), | ||
| "total_cost_usd": ("estimated_cost_usd", "total_cost_usd", "total_cost"), |
There was a problem hiding this comment.
Include OpenCode usage keys in normalization
When the configured SDK is OpenCode, OpenCodeEngineer records write-cache tokens and cost as cache_creation_tokens and cost (src/reverse_api/opencode_engineer.py lines 633-634), but this new candidate list only recognizes cache_creation_input_tokens/cache_write_tokens and estimated_cost_usd/total_cost_usd/total_cost. In that environment, agent --json / engineer --json will omit the promised stable cache_write_tokens and total_cost_usd top-level fields even though the values are present under raw, so wrappers using the stable subset lose cost/cache data after switching to OpenCode.
Useful? React with 👍 / 👎.
| "network", # DNS / TCP / TLS / timeout | ||
| "engine_failure", # SDK or capture engine crashed mid-run | ||
| "interrupted", # KeyboardInterrupt / SIGINT | ||
| "unknown", # default fallback | ||
| ) | ||
|
|
||
|
|
||
| def _format_error_message(error: str | BaseException | None) -> str | None: | ||
| """Render an exception or string into a human-readable error message. |
There was a problem hiding this comment.
config_invalid is documented but unreachable
ERROR_KINDS declares "config_invalid" and the PR description promises it for "config file or env var malformed" situations, but _classify_error has no isinstance branch for config-related exceptions (e.g., ValueError, yaml.YAMLError, custom config exceptions) and no substring patterns for config-related messages. Any config error will fall through to "unknown" unless a caller explicitly passes error_kind_hint="config_invalid" — which no current call-site does. A wrapper that branches on error_kind == "config_invalid" will silently never fire.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/reverse_api/cli.py
Line: 96-104
Comment:
**`config_invalid` is documented but unreachable**
`ERROR_KINDS` declares `"config_invalid"` and the PR description promises it for "config file or env var malformed" situations, but `_classify_error` has no isinstance branch for config-related exceptions (e.g., `ValueError`, `yaml.YAMLError`, custom config exceptions) and no substring patterns for config-related messages. Any config error will fall through to `"unknown"` unless a caller explicitly passes `error_kind_hint="config_invalid"` — which no current call-site does. A wrapper that branches on `error_kind == "config_invalid"` will silently never fire.
How can I resolve this? If you propose a fix, please make it concise.| """ | ||
| if not raw or not isinstance(raw, dict): | ||
| return {} | ||
| out: dict = {} | ||
| for stable_key, candidates in _STABLE_USAGE_KEYS.items(): | ||
| for c in candidates: | ||
| if c in raw: | ||
| out[stable_key] = raw[c] | ||
| break | ||
| out["raw"] = raw | ||
| return out | ||
|
|
||
|
|
||
| # Machine-readable error categories. Wrappers can react differently to each | ||
| # without pattern-matching on the human-readable `error` string. |
There was a problem hiding this comment.
raw key absent vs. present depending on input shape
_normalize_usage returns {} (no raw key) when the input is None or {}, but returns {"input_tokens": …, "raw": {…}} for any non-empty dict. Consumers that always destructure payload["usage"]["raw"] (e.g., to log the full SDK dict) will get a KeyError on empty usage. A consistent shape — returning {"raw": {}} even for empty input — would let callers safely do payload["usage"].get("raw", {}) without special-casing the empty path.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/reverse_api/cli.py
Line: 77-91
Comment:
**`raw` key absent vs. present depending on input shape**
`_normalize_usage` returns `{}` (no `raw` key) when the input is `None` or `{}`, but returns `{"input_tokens": …, "raw": {…}}` for any non-empty dict. Consumers that always destructure `payload["usage"]["raw"]` (e.g., to log the full SDK dict) will get a `KeyError` on empty usage. A consistent shape — returning `{"raw": {}}` even for empty input — would let callers safely do `payload["usage"].get("raw", {})` without special-casing the empty path.
How can I resolve this? If you propose a fix, please make it concise.
Test Results🛡️ 5/6 Results
Issues FoundFull regression suite: the overall suite still has 5 failing tests unrelated to the touched JSON-contract behavior, including at least Engineer parse-boundary JSON caveat: SummaryThe PR’s targeted behavior looks good: the affected tests all passed, the new View full run details · Tested by Kind I tested |
…-version Knocks out items #3, #4, #9 of the agent-friendliness backlog (#62). Schema version stays at 1 since the contract has not shipped to prod — fields are added in place rather than bumping versions. Item #3 — Stable `usage` subset Different SDKs (Claude / OpenCode / Copilot) emit different keys for the same concepts (cache_creation_input_tokens vs cache_write_tokens, estimated_cost_usd vs total_cost vs total_cost_usd, etc). New helper `_normalize_usage()` maps SDK-native keys into a stable subset {input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, total_cost_usd} and parks the SDK-native dict under `usage.raw` for power users. Wrappers can rely on the top-level keys without breaking when the user switches SDK. Item #4 — Machine-readable `error_kind` Previously agents had to pattern-match on prose like "[Errno 13] Permission denied: '/x'" to decide whether to retry / abort / surface to the user. New `error_kind` field on every `agent --json` and `engineer --json` payload, with a fixed enum: misuse | config_invalid | permission_denied | network | engine_failure | interrupted | unknown Inferred via `_classify_error()` which uses isinstance checks on exceptions (KeyboardInterrupt, PermissionError, ConnectionError, TimeoutError) and substring fallback on plain messages. Misuse paths now pass `error_kind_hint="misuse"` explicitly. Item #9 — `--json-schema-version` Top-level `reverse-api-engineer --json-schema-version` prints the schema version (currently `1`) and exits 0. Lets a wrapper query the contract version without having to invoke a real run. Bonus: KeyboardInterrupt now formats as the literal "interrupted" in the `error` field (str(KeyboardInterrupt()) is empty); empty exception messages fall back to the class name. Tests: 9 new tests in test_cli_followups.py (TestSchemaV2Normalization, TestJsonSchemaVersionFlag) + updated existing tests to assert on the normalized usage shape and the new error_kind field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…item 5)
Lets agents and CI wrappers sanity-check inputs (prompt, url, config,
env vars, deps, output dir writability) without launching a browser
or burning LLM tokens. Implies --json since --dry-run is fundamentally
about machine-parseable validation.
Output shape: same top-level fields as `agent --json` (schema_version,
status, run_id=null, prompt, url, mode="dry-run", error, error_kind)
plus:
- `would_run` sub-object: agent_provider, sdk, model, output_dir,
headless — what an actual `agent` invocation would do
- `checks` array: one entry per validation step with name, status
(ok | warn | error), and a human message
Validations:
1. prompt non-empty
2. url is http(s):// if provided
3. agent_provider in {auto, chrome-mcp}
4. SDK env var present (warn — SDK may resolve auth elsewhere)
5. node binary in PATH (required by both MCP servers) + version
6. chrome-mcp without --headless: warn that auto-connect needs Chrome
146+ with remote-debugging enabled (not auto-checkable)
7. output_dir writable (probe-write-and-delete)
Aggregate status is `error` if any check is `error`, otherwise `ok`.
Error kinds are `misuse` for prompt/url issues, `config_invalid` for
env/deps/output_dir issues — matching the schema-v1 error_kind enum.
Tests: 6 new in TestAgentDryRun covering: ok path, missing prompt
(misuse), bad url (misuse), unwritable output_dir (config_invalid),
that run_agent_capture is NOT called under --dry-run, and --help
mentions the flag with "Implies --json".
Closes the last medium-priority blocker on issue #62. Item #8
(`run --json` wrapped) intentionally deferred — orthogonal surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three P2 issues flagged by cubic-dev-ai:
1. Check npx availability separately from node
MCP servers shell out to `npx <package>` (not just node), so a
minimal Docker image with node-but-no-npx would pass dry-run and
then fail the real run. Added a dedicated `npx` check.
2. Use a unique probe filename for output_dir writability
The fixed `.dry_run_write_probe` could legitimately exist in a
user's output dir and would be deleted by the probe. Now uses
`.rae_dry_run_probe_{pid}_{8hex}` and refuses to touch any path
that already exists, guaranteeing we never clobber user data.
3. Resolve `would_run.model` per-SDK
Previously always read `claude_code_model` regardless of which
SDK was configured, so an opencode/copilot session would have a
misleading manifest. Now branches on `sdk` to pick
`opencode_model` / `copilot_model` / `claude_code_model` with
the matching default — mirroring the live capture path.
Adds 4 regression tests in TestAgentDryRun:
- test_dry_run_checks_npx_separately_from_node (mocks shutil.which)
- test_dry_run_probe_does_not_clobber_existing_files (creates a
canary file at the OLD probe path, asserts it survives)
- test_dry_run_resolves_correct_model_per_sdk (opencode case)
- test_dry_run_copilot_model_resolution (copilot case)
10/10 in TestAgentDryRun, 706 passing on full suite (5 pre-existing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kind-agent on PR #66 noted that `engineer --json` without RUN_ID fell through to Click's plain-text "Missing argument" error on stderr — wrappers expecting JSON for every malformed invocation would choke. Now `run_id` is declared optional at the click level. The function re-validates inline: - --json + missing RUN_ID → JSON misuse payload on stdout, exit 2 - plain + missing RUN_ID → standard Click "Usage / Error" on stderr, exit 2 (preserves the familiar UX) Adds 2 regression tests in TestEngineerJsonMissingRunId covering both paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9cc9bf6 to
70d5b57
Compare
Sync + comments addressedSync: rebased onto current @kind-agent — partial Ack
@cubic-dev-aiNo issues found — noted, thanks. @chatgpt-codex-connector / @greptile-appsEmpty marker reviews — no actionable suggestions. |
|
Re-running tests because the PR changed after the previous run. Triggered by Kind |
|
To use Codex here, create an environment for this repo. |
@kalil0321 Thanks for the thorough update, Kalil. Everything checks out:
Neither is a blocker, but if you want to knock them out while you're in this stack, happy to help with the patches. Otherwise the PR looks ready to merge from my side. |
|
Thanks for the detailed sync and the engineer boundary fix — the Status of the two open inline comments: P1 — P2 — This keeps the shape consistent so callers can always do Third-party P2 — OpenCode candidate keys: There's also a reviewer comment pointing out that The engineer boundary fix and the dry-run work look good to merge as-is; the two remaining items are self-contained and low-risk to address in a follow-up if you'd rather keep the stack moving. Tip: You can customize Greptile's behavior for this repo with |
This is likely a transient issue. You can re-trigger a run from the dashboard. |
Stacked on #65 (
feat/agent-friendly-followups).Knocks out 3 medium-priority items of the agent-friendliness backlog (#62) that all touch the JSON contract. Schema version stays 1 since the contract hasn't shipped to prod yet — fields are added in place.
Item #3 — Stable `usage` subset
Different SDKs emit different keys for the same things. Now
_normalize_usage()maps them into a stable subset and parks the SDK-native dict under.raw:```json
"usage": {
"input_tokens": 43,
"output_tokens": 13407,
"cache_read_tokens": 1038915,
"cache_write_tokens": 52825,
"total_cost_usd": 0.71100225,
"raw": { /* the full SDK-emitted dict, preserved as-is */ }
}
```
A wrapper can rely on the 5 top-level keys without breaking when the user switches SDK; power users still have full SDK info under
raw.Item #4 — Machine-readable `error_kind`
Agents no longer need to grep for
"[Errno 13]"inerror:error_kindmisuseconfig_invalidpermission_deniednetworkengine_failureinterruptedunknownClassification via isinstance checks on exceptions + substring fallback on plain messages. Misuse paths pass
error_kind_hint="misuse"explicitly.Item #9 — `--json-schema-version`
```bash
$ reverse-api-engineer --json-schema-version
1
```
Wrapper-friendly: query the contract version without invoking a real run.
Test plan
What's next on #62
Stack 2 will add `agent --dry-run` (item #5). Item #8 (`run --json`) deferred — orthogonal.
🤖 Generated with Claude Code
Summary by cubic
Adds stable usage normalization, machine-readable error_kind, a
--json-schema-versionflag, and a safeagent --dry-runmode with preflight checks; also fixesengineer --jsonto always emit JSON on missing RUN_ID. Addresses #62 items 3, 4, 5, and 9; schema_version stays 1 (additive).New Features
usage.raw.error_kindto all agent/engineer JSON payloads: misuse | config_invalid | permission_denied | network | engine_failure | interrupted | unknown. Correctly classifiesKeyboardInterruptand no-run cases.--json-schema-versionto print the current schema version and exit.agent --dry-runfor preflight validation. Implies--json. Emitsmode: "dry-run", awould_runmanifest (agent_provider, sdk, model, output_dir, headless), and achecks[]list. Verifies prompt/url, SDK env var presence,nodeandnpx, provider, and output-dir writability with a unique probe. Classifies prompt/url asmisuseand env/deps/output_dir asconfig_invalid. Exits 0 on ok, 1 on error.Bug Fixes
engineer --jsonwithout RUN_ID now returns a JSON misuse payload (not a plain Click error), keeping wrappers script-friendly.Written for commit 70d5b57. Summary will update on new commits.
Greptile Summary
This PR adds three agent-friendliness improvements to the JSON output contract: a
_normalize_usage()helper that maps SDK-specific token/cost keys to a stable subset (with the original dict preserved under.raw), a_classify_error()helper that maps exceptions and error strings to a machine-readableerror_kindfield, and a--json-schema-versionflag that lets wrappers query the contract version without running a capture._normalize_usage): handles key aliases across Claude/OpenCode/Copilot SDKs via a candidate-list approach;rawis always appended for power users._classify_error,_format_error_message): coversPermissionError,ConnectionError,TimeoutError,KeyboardInterrupt, and a set of substring patterns;error_kind_hintlets call-sites override the fallback.--json-schema-versionflag: added to themaingroup, echoesAGENT_JSON_SCHEMA_VERSIONand exits cleanly without invoking a subcommand.Confidence Score: 3/5
Safe to merge with the caveat that config_invalid will never be emitted despite being advertised in ERROR_KINDS and the PR description.
The config_invalid kind is documented in ERROR_KINDS and the PR description table, promising wrappers a dedicated signal for malformed config files or env vars, but _classify_error has no isinstance branch for any config-related exception and no call-site passes error_kind_hint='config_invalid'. Any config error silently falls through to 'unknown', and wrappers that branch on 'config_invalid' will never see it fire.
src/reverse_api/cli.py — specifically the _classify_error function and ERROR_KINDS declaration.
Important Files Changed
Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "feat(cli): stable usage normalization, e..." | Re-trigger Greptile