Skip to content

feat(cli): normalized usage + error_kind + --json-schema-version (issue #62 items 3, 4, 9)#66

Merged
kalil0321 merged 4 commits into
mainfrom
feat/agent-friendly-schema-v2
May 6, 2026
Merged

feat(cli): normalized usage + error_kind + --json-schema-version (issue #62 items 3, 4, 9)#66
kalil0321 merged 4 commits into
mainfrom
feat/agent-friendly-schema-v2

Conversation

@kalil0321
Copy link
Copy Markdown
Owner

@kalil0321 kalil0321 commented May 5, 2026

Stacked on #65 (feat/agent-friendly-followups).

Knocks out 3 medium-priority items of the agent-friendliness backlog (#62) that all touch the JSON contract. Schema version stays 1 since the contract hasn't shipped to prod yet — fields are added in place.

Item #3 — Stable `usage` subset

Different SDKs emit different keys for the same things. Now _normalize_usage() maps them into a stable subset and parks the SDK-native dict under .raw:

```json
"usage": {
"input_tokens": 43,
"output_tokens": 13407,
"cache_read_tokens": 1038915,
"cache_write_tokens": 52825,
"total_cost_usd": 0.71100225,
"raw": { /* the full SDK-emitted dict, preserved as-is */ }
}
```

A wrapper can rely on the 5 top-level keys without breaking when the user switches SDK; power users still have full SDK info under raw.

Item #4 — Machine-readable `error_kind`

Agents no longer need to grep for "[Errno 13]" in error:

error_kind When
misuse required arg missing, invalid combination
config_invalid config file or env var malformed
permission_denied filesystem / API permission denied
network DNS / connection / timeout / SSL
engine_failure SDK or capture engine produced no result
interrupted KeyboardInterrupt / SIGINT
unknown default fallback

Classification via isinstance checks on exceptions + substring fallback on plain messages. Misuse paths pass error_kind_hint="misuse" explicitly.

Item #9 — `--json-schema-version`

```bash
$ reverse-api-engineer --json-schema-version
1
```

Wrapper-friendly: query the contract version without invoking a real run.

Test plan

  • `uv run pytest tests/test_cli_followups.py tests/test_cli_agent_json.py tests/test_cli_engineer_command.py` — 61/61 pass
  • `uv run pytest` full suite — 696 pass; same 5 pre-existing failures
  • Manual smoke: `agent --json` misuse → `error_kind: "misuse"`; `--json-schema-version` → `1`; `--help` mentions the flag
  • End-to-end live with a real capture (the previous live test already validated the JSON contract; this is just additive)

What's next on #62

Stack 2 will add `agent --dry-run` (item #5). Item #8 (`run --json`) deferred — orthogonal.

🤖 Generated with Claude Code


Summary by cubic

Adds stable usage normalization, machine-readable error_kind, a --json-schema-version flag, and a safe agent --dry-run mode with preflight checks; also fixes engineer --json to always emit JSON on missing RUN_ID. Addresses #62 items 3, 4, 5, and 9; schema_version stays 1 (additive).

  • New Features

    • Normalize usage across SDKs into a stable subset for agent/engineer outputs: input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, total_cost_usd. Preserve full SDK data under usage.raw.
    • Add error_kind to all agent/engineer JSON payloads: misuse | config_invalid | permission_denied | network | engine_failure | interrupted | unknown. Correctly classifies KeyboardInterrupt and no-run cases.
    • Add root flag --json-schema-version to print the current schema version and exit.
    • Add agent --dry-run for preflight validation. Implies --json. Emits mode: "dry-run", a would_run manifest (agent_provider, sdk, model, output_dir, headless), and a checks[] list. Verifies prompt/url, SDK env var presence, node and npx, provider, and output-dir writability with a unique probe. Classifies prompt/url as misuse and env/deps/output_dir as config_invalid. Exits 0 on ok, 1 on error.
  • Bug Fixes

    • engineer --json without RUN_ID now returns a JSON misuse payload (not a plain Click error), keeping wrappers script-friendly.

Written for commit 70d5b57. Summary will update on new commits.

Greptile Summary

This PR adds three agent-friendliness improvements to the JSON output contract: a _normalize_usage() helper that maps SDK-specific token/cost keys to a stable subset (with the original dict preserved under .raw), a _classify_error() helper that maps exceptions and error strings to a machine-readable error_kind field, and a --json-schema-version flag that lets wrappers query the contract version without running a capture.

  • Usage normalization (_normalize_usage): handles key aliases across Claude/OpenCode/Copilot SDKs via a candidate-list approach; raw is always appended for power users.
  • Error classification (_classify_error, _format_error_message): covers PermissionError, ConnectionError, TimeoutError, KeyboardInterrupt, and a set of substring patterns; error_kind_hint lets call-sites override the fallback.
  • --json-schema-version flag: added to the main group, echoes AGENT_JSON_SCHEMA_VERSION and exits cleanly without invoking a subcommand.

Confidence Score: 3/5

Safe to merge with the caveat that config_invalid will never be emitted despite being advertised in ERROR_KINDS and the PR description.

The config_invalid kind is documented in ERROR_KINDS and the PR description table, promising wrappers a dedicated signal for malformed config files or env vars, but _classify_error has no isinstance branch for any config-related exception and no call-site passes error_kind_hint='config_invalid'. Any config error silently falls through to 'unknown', and wrappers that branch on 'config_invalid' will never see it fire.

src/reverse_api/cli.py — specifically the _classify_error function and ERROR_KINDS declaration.

Important Files Changed

Filename Overview
src/reverse_api/cli.py Adds _normalize_usage, _classify_error, _format_error_message helpers and wires them into both payload builders; adds --json-schema-version flag. config_invalid is documented in ERROR_KINDS but has no code path to produce it.
tests/test_cli_agent_json.py Extends payload key assertions to include error_kind, adds SDK key normalization inputs, and verifies engine_failure classification for the no-run case. Coverage looks solid for the changed paths.
tests/test_cli_followups.py Adds TestSchemaV2Normalization and TestJsonSchemaVersionFlag test classes covering _normalize_usage edge cases, all error-kind classification branches, and the --json-schema-version CLI flag. Well-structured and comprehensive.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
src/reverse_api/cli.py:96-104
**`config_invalid` is documented but unreachable**

`ERROR_KINDS` declares `"config_invalid"` and the PR description promises it for "config file or env var malformed" situations, but `_classify_error` has no isinstance branch for config-related exceptions (e.g., `ValueError`, `yaml.YAMLError`, custom config exceptions) and no substring patterns for config-related messages. Any config error will fall through to `"unknown"` unless a caller explicitly passes `error_kind_hint="config_invalid"` — which no current call-site does. A wrapper that branches on `error_kind == "config_invalid"` will silently never fire.

### Issue 2 of 2
src/reverse_api/cli.py:77-91
**`raw` key absent vs. present depending on input shape**

`_normalize_usage` returns `{}` (no `raw` key) when the input is `None` or `{}`, but returns `{"input_tokens": …, "raw": {…}}` for any non-empty dict. Consumers that always destructure `payload["usage"]["raw"]` (e.g., to log the full SDK dict) will get a `KeyError` on empty usage. A consistent shape — returning `{"raw": {}}` even for empty input — would let callers safely do `payload["usage"].get("raw", {})` without special-casing the empty path.

Reviews (1): Last reviewed commit: "feat(cli): stable usage normalization, e..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

@kalil0321 kalil0321 force-pushed the feat/agent-friendly-schema-v2 branch from e0dfafe to a0b43e1 Compare May 5, 2026 23:58
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0b43e1c42

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/reverse_api/cli.py
Comment on lines +68 to +69
"cache_write_tokens": ("cache_creation_input_tokens", "cache_write_tokens"),
"total_cost_usd": ("estimated_cost_usd", "total_cost_usd", "total_cost"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include OpenCode usage keys in normalization

When the configured SDK is OpenCode, OpenCodeEngineer records write-cache tokens and cost as cache_creation_tokens and cost (src/reverse_api/opencode_engineer.py lines 633-634), but this new candidate list only recognizes cache_creation_input_tokens/cache_write_tokens and estimated_cost_usd/total_cost_usd/total_cost. In that environment, agent --json / engineer --json will omit the promised stable cache_write_tokens and total_cost_usd top-level fields even though the values are present under raw, so wrappers using the stable subset lose cost/cache data after switching to OpenCode.

Useful? React with 👍 / 👎.

Comment thread src/reverse_api/cli.py
Comment on lines +96 to +104
"network", # DNS / TCP / TLS / timeout
"engine_failure", # SDK or capture engine crashed mid-run
"interrupted", # KeyboardInterrupt / SIGINT
"unknown", # default fallback
)


def _format_error_message(error: str | BaseException | None) -> str | None:
"""Render an exception or string into a human-readable error message.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 config_invalid is documented but unreachable

ERROR_KINDS declares "config_invalid" and the PR description promises it for "config file or env var malformed" situations, but _classify_error has no isinstance branch for config-related exceptions (e.g., ValueError, yaml.YAMLError, custom config exceptions) and no substring patterns for config-related messages. Any config error will fall through to "unknown" unless a caller explicitly passes error_kind_hint="config_invalid" — which no current call-site does. A wrapper that branches on error_kind == "config_invalid" will silently never fire.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/reverse_api/cli.py
Line: 96-104

Comment:
**`config_invalid` is documented but unreachable**

`ERROR_KINDS` declares `"config_invalid"` and the PR description promises it for "config file or env var malformed" situations, but `_classify_error` has no isinstance branch for config-related exceptions (e.g., `ValueError`, `yaml.YAMLError`, custom config exceptions) and no substring patterns for config-related messages. Any config error will fall through to `"unknown"` unless a caller explicitly passes `error_kind_hint="config_invalid"` — which no current call-site does. A wrapper that branches on `error_kind == "config_invalid"` will silently never fire.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread src/reverse_api/cli.py
Comment on lines +77 to +91
"""
if not raw or not isinstance(raw, dict):
return {}
out: dict = {}
for stable_key, candidates in _STABLE_USAGE_KEYS.items():
for c in candidates:
if c in raw:
out[stable_key] = raw[c]
break
out["raw"] = raw
return out


# Machine-readable error categories. Wrappers can react differently to each
# without pattern-matching on the human-readable `error` string.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 raw key absent vs. present depending on input shape

_normalize_usage returns {} (no raw key) when the input is None or {}, but returns {"input_tokens": …, "raw": {…}} for any non-empty dict. Consumers that always destructure payload["usage"]["raw"] (e.g., to log the full SDK dict) will get a KeyError on empty usage. A consistent shape — returning {"raw": {}} even for empty input — would let callers safely do payload["usage"].get("raw", {}) without special-casing the empty path.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/reverse_api/cli.py
Line: 77-91

Comment:
**`raw` key absent vs. present depending on input shape**

`_normalize_usage` returns `{}` (no `raw` key) when the input is `None` or `{}`, but returns `{"input_tokens": …, "raw": {…}}` for any non-empty dict. Consumers that always destructure `payload["usage"]["raw"]` (e.g., to log the full SDK dict) will get a `KeyError` on empty usage. A consistent shape — returning `{"raw": {}}` even for empty input — would let callers safely do `payload["usage"].get("raw", {})` without special-casing the empty path.

How can I resolve this? If you propose a fix, please make it concise.

@kind-agent
Copy link
Copy Markdown

kind-agent Bot commented May 6, 2026

Test Results

🛡️ 5/6

Results

# Test Status Details
1 Targeted pytest for changed CLI JSON surfaces ✅ passed Ran uv run pytest tests/test_cli_followups.py tests/test_cli_agent_json.py tests/test_cli_engineer_command.py -q; all 61 tests passed, covering usage normalization, error_kind, engineer JSON parity, and the schema-version work.
2 --json-schema-version manual CLI verification ✅ passed uv run reverse-api-engineer --json-schema-version printed exactly 1 with exit code 0 and no stderr. uv run reverse-api-engineer --help also documents the new flag.
3 Manual JSON misuse/error-kind smoke ✅ passed uv run reverse-api-engineer agent --json without required prompt returned JSON with status: "error", error_kind: "misuse", schema version 1, and a sensible human-readable error string.
4 Non-JSON CLI smoke ✅ passed Non-JSON behavior remained sensible: top-level invocation without a TTY exited 2 with help plus the stdin-is-not-a-TTY message, and engineer without args exited 2 with standard Click usage output.
5 Full regression suite ⚠️ inconclusive uv run pytest -q reported 5 failing tests outside the changed CLI JSON area, so the branch is not fully green even though the PR-targeted coverage passed.

Issues Found

Full regression suite: the overall suite still has 5 failing tests unrelated to the touched JSON-contract behavior, including at least tests.test_base_engineer.TestBaseEngineerInit.test_existing_client_language_falls_back_to_newest_file. I did not see evidence that the new CLI JSON changes caused these failures, but the branch is not fully clean end-to-end.

Engineer parse-boundary JSON caveat: uv run reverse-api-engineer engineer --json without RUN_ID still fails at Click argument parsing and emits plain stderr/help rather than a JSON payload. This appears to be outside the new runtime error-classification path, but it is worth keeping in mind for wrappers expecting JSON from every malformed invocation.

Summary

The PR’s targeted behavior looks good: the affected tests all passed, the new --json-schema-version flag works as intended, and manual misuse probing showed the new machine-readable error_kind behavior for JSON mode. Confidence is good on the changed CLI contract, though the branch still has unrelated full-suite failures and I only manually exercised a subset of runtime error-classification paths beyond the dedicated tests.


View full run details · Tested by Kind

I tested e0dfafe (feat(cli): stable usage normalization, error_kind enum, --json-schema-version)

📹 View browser recording

kalil0321 and others added 4 commits May 6, 2026 00:37
…-version

Knocks out items #3, #4, #9 of the agent-friendliness backlog (#62).
Schema version stays at 1 since the contract has not shipped to prod —
fields are added in place rather than bumping versions.

Item #3 — Stable `usage` subset
  Different SDKs (Claude / OpenCode / Copilot) emit different keys for
  the same concepts (cache_creation_input_tokens vs cache_write_tokens,
  estimated_cost_usd vs total_cost vs total_cost_usd, etc). New helper
  `_normalize_usage()` maps SDK-native keys into a stable subset
  {input_tokens, output_tokens, cache_read_tokens, cache_write_tokens,
  total_cost_usd} and parks the SDK-native dict under `usage.raw` for
  power users. Wrappers can rely on the top-level keys without breaking
  when the user switches SDK.

Item #4 — Machine-readable `error_kind`
  Previously agents had to pattern-match on prose like "[Errno 13]
  Permission denied: '/x'" to decide whether to retry / abort / surface
  to the user. New `error_kind` field on every `agent --json` and
  `engineer --json` payload, with a fixed enum:
    misuse | config_invalid | permission_denied | network |
    engine_failure | interrupted | unknown
  Inferred via `_classify_error()` which uses isinstance checks on
  exceptions (KeyboardInterrupt, PermissionError, ConnectionError,
  TimeoutError) and substring fallback on plain messages. Misuse paths
  now pass `error_kind_hint="misuse"` explicitly.

Item #9 — `--json-schema-version`
  Top-level `reverse-api-engineer --json-schema-version` prints the
  schema version (currently `1`) and exits 0. Lets a wrapper query the
  contract version without having to invoke a real run.

Bonus: KeyboardInterrupt now formats as the literal "interrupted" in
the `error` field (str(KeyboardInterrupt()) is empty); empty exception
messages fall back to the class name.

Tests: 9 new tests in test_cli_followups.py (TestSchemaV2Normalization,
TestJsonSchemaVersionFlag) + updated existing tests to assert on the
normalized usage shape and the new error_kind field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…item 5)

Lets agents and CI wrappers sanity-check inputs (prompt, url, config,
env vars, deps, output dir writability) without launching a browser
or burning LLM tokens. Implies --json since --dry-run is fundamentally
about machine-parseable validation.

Output shape: same top-level fields as `agent --json` (schema_version,
status, run_id=null, prompt, url, mode="dry-run", error, error_kind)
plus:
  - `would_run` sub-object: agent_provider, sdk, model, output_dir,
    headless — what an actual `agent` invocation would do
  - `checks` array: one entry per validation step with name, status
    (ok | warn | error), and a human message

Validations:
  1. prompt non-empty
  2. url is http(s):// if provided
  3. agent_provider in {auto, chrome-mcp}
  4. SDK env var present (warn — SDK may resolve auth elsewhere)
  5. node binary in PATH (required by both MCP servers) + version
  6. chrome-mcp without --headless: warn that auto-connect needs Chrome
     146+ with remote-debugging enabled (not auto-checkable)
  7. output_dir writable (probe-write-and-delete)

Aggregate status is `error` if any check is `error`, otherwise `ok`.
Error kinds are `misuse` for prompt/url issues, `config_invalid` for
env/deps/output_dir issues — matching the schema-v1 error_kind enum.

Tests: 6 new in TestAgentDryRun covering: ok path, missing prompt
(misuse), bad url (misuse), unwritable output_dir (config_invalid),
that run_agent_capture is NOT called under --dry-run, and --help
mentions the flag with "Implies --json".

Closes the last medium-priority blocker on issue #62. Item #8
(`run --json` wrapped) intentionally deferred — orthogonal surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three P2 issues flagged by cubic-dev-ai:

1. Check npx availability separately from node
   MCP servers shell out to `npx <package>` (not just node), so a
   minimal Docker image with node-but-no-npx would pass dry-run and
   then fail the real run. Added a dedicated `npx` check.

2. Use a unique probe filename for output_dir writability
   The fixed `.dry_run_write_probe` could legitimately exist in a
   user's output dir and would be deleted by the probe. Now uses
   `.rae_dry_run_probe_{pid}_{8hex}` and refuses to touch any path
   that already exists, guaranteeing we never clobber user data.

3. Resolve `would_run.model` per-SDK
   Previously always read `claude_code_model` regardless of which
   SDK was configured, so an opencode/copilot session would have a
   misleading manifest. Now branches on `sdk` to pick
   `opencode_model` / `copilot_model` / `claude_code_model` with
   the matching default — mirroring the live capture path.

Adds 4 regression tests in TestAgentDryRun:
  - test_dry_run_checks_npx_separately_from_node (mocks shutil.which)
  - test_dry_run_probe_does_not_clobber_existing_files (creates a
    canary file at the OLD probe path, asserts it survives)
  - test_dry_run_resolves_correct_model_per_sdk (opencode case)
  - test_dry_run_copilot_model_resolution (copilot case)

10/10 in TestAgentDryRun, 706 passing on full suite (5 pre-existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kind-agent on PR #66 noted that `engineer --json` without RUN_ID
fell through to Click's plain-text "Missing argument" error on
stderr — wrappers expecting JSON for every malformed invocation
would choke.

Now `run_id` is declared optional at the click level. The function
re-validates inline:
  - --json + missing RUN_ID → JSON misuse payload on stdout, exit 2
  - plain + missing RUN_ID → standard Click "Usage / Error" on stderr,
    exit 2 (preserves the familiar UX)

Adds 2 regression tests in TestEngineerJsonMissingRunId covering both
paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kalil0321 kalil0321 force-pushed the feat/agent-friendly-schema-v2 branch from 9cc9bf6 to 70d5b57 Compare May 6, 2026 00:39
@kalil0321 kalil0321 changed the base branch from feat/agent-friendly-followups to main May 6, 2026 00:39
@kalil0321
Copy link
Copy Markdown
Owner Author

Sync + comments addressed

Sync: rebased onto current main (which already includes #61 and #65), and changed PR base from feat/agent-friendly-followupsmain. The branch now contains the schema-v2 work + the dry-run work that was merged in via #67. 4 unique commits on top of main; 708/713 in the suite (5 pre-existing).

@kind-agent — partial Ack

  • Tests 1, 2, 3, 4 passing: noted, thanks.
  • Test 5 (full suite inconclusive): No ACK as a regression. The 5 failing tests (test_base_engineer::test_existing_client_language_falls_back_to_newest_file, test_opencode_ui::test_sync_flash, test_sync::test_is_temporary_file*, test_tui::test_thinking_truncation) all reproduce on main (verified via stash + run on previous PRs in this stack); they're tracked separately and not introduced by any PR in this stack.
  • "Engineer parse-boundary JSON caveat": Ack — fixed in 70d5b57. You were right: engineer --json without RUN_ID fell through to Click's plain-text "Missing argument" error, which wrappers expecting JSON couldn't parse. run_id is now declared optional at the click level; the function validates inline:
    • --json + missing RUN_ID → JSON misuse payload on stdout, exit 2
    • plain + missing RUN_ID → standard Click Usage / Error on stderr, exit 2 (familiar UX preserved)
    • 2 regression tests in TestEngineerJsonMissingRunId cover both paths.

@cubic-dev-ai

No issues found — noted, thanks.

@chatgpt-codex-connector / @greptile-apps

Empty marker reviews — no actionable suggestions.

@kind-agent
Copy link
Copy Markdown

kind-agent Bot commented May 6, 2026

Re-running tests because the PR changed after the previous run.


Triggered by Kind

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented May 6, 2026

Sync + comments addressed

Sync: rebased onto current main (which already includes #61 and #65), and changed PR base from feat/agent-friendly-followupsmain. The branch now contains the schema-v2 work + the dry-run work that was merged in via #67. 4 unique commits on top of main; 708/713 in the suite (5 pre-existing).

@kind-agent — partial Ack

...

@kalil0321 Thanks for the thorough update, Kalil. Everything checks out:

  • Rebase onto main — clean, and the 5 pre-existing failures being non-blocking is well-documented.
  • engineer --json without RUN_ID — the fix looks solid: optional at the Click layer, re-validated inline, JSON payload vs. Click error depending on --json. The two regression tests in TestEngineerJsonMissingRunId cover both branches well.
  • No response on the Greptile comments from the PR description — just a heads up in case you missed them, they flagged two things worth a look:
    1. config_invalid unreachable — declared in ERROR_KINDS and in the PR description table, but no code path produces it (no isinstance branch for config-related exceptions, no error_kind_hint call-site passes it). A wrapper that branches on error_kind == "config_invalid" will silently never fire.
    2. raw key shape inconsistency_normalize_usage returns {} for None/empty input (no raw key), but returns {"input_tokens": …, "raw": {…}} for non-empty dicts. Callers that always destructure payload["usage"]["raw"] hit a KeyError on the empty path.

Neither is a blocker, but if you want to knock them out while you're in this stack, happy to help with the patches. Otherwise the PR looks ready to merge from my side.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Thanks for the detailed sync and the engineer boundary fix — the run_id=required=False + inline revalidation approach is clean, and the two-path test coverage (TestEngineerJsonMissingRunId) is exactly right.

Status of the two open inline comments:

P1 — config_invalid reachability: Partially addressed. _build_dry_run_payload now directly sets error_kind = "config_invalid" for non-prompt/url check failures (node, npx, API key, output_dir), bypassing _classify_error entirely. This covers the dry-run path. The original concern — that _classify_error itself has no code path to emit it for live agent/engineer runs — is still true. If a live run throws a ValueError from a malformed config, it'll still land on "unknown". Whether that matters depends on whether config errors can surface during live capture; if they can only surface during pre-flight validation (which now has dry-run), consider closing this as won't-fix and removing "config_invalid" from ERROR_KINDS documentation, or add a note that it's dry-run only.

P2 — raw key inconsistent shape: Still open. _normalize_usage(None) and _normalize_usage({}) return {} with no raw key, while any non-empty input returns {"input_tokens": …, "raw": {…}}. The fix is one line:

    if not raw or not isinstance(raw, dict):
        return {"raw": {}}

This keeps the shape consistent so callers can always do payload["usage"]["raw"] without special-casing the empty path.

Third-party P2 — OpenCode candidate keys: There's also a reviewer comment pointing out that OpenCodeEngineer emits cache_creation_tokens and cost (not cache_creation_input_tokens/estimated_cost_usd), so switching to OpenCode will silently drop cache_write_tokens and total_cost_usd from the stable subset despite them being present in raw. Worth adding those aliases to _STABLE_USAGE_KEYS if you have the source to confirm the exact key names.

The engineer boundary fix and the dry-run work look good to merge as-is; the two remaining items are self-contained and low-risk to address in a follow-up if you'd rather keep the stack moving.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

@kalil0321 kalil0321 merged commit c596386 into main May 6, 2026
1 of 2 checks passed
@kind-agent
Copy link
Copy Markdown

kind-agent Bot commented May 6, 2026

⚠️ Error — The test run failed unexpectedly.

Bad escaped character in JSON at position 1109 (line 3 column 309)

This is likely a transient issue. You can re-trigger a run from the dashboard.

@kalil0321 kalil0321 mentioned this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant