Skip to content

feat(cli): agent --dry-run for pre-flight validation (issue #62 item 5)#67

Merged
kalil0321 merged 2 commits into
feat/agent-friendly-schema-v2from
feat/agent-dry-run
May 6, 2026
Merged

feat(cli): agent --dry-run for pre-flight validation (issue #62 item 5)#67
kalil0321 merged 2 commits into
feat/agent-friendly-schema-v2from
feat/agent-dry-run

Conversation

@kalil0321
Copy link
Copy Markdown
Owner

@kalil0321 kalil0321 commented May 6, 2026

Stacked on #66 (feat/agent-friendly-schema-v2).

Last medium-priority item from the agent-friendliness backlog (#62). Lets agents and CI wrappers sanity-check inputs without launching a browser or burning LLM tokens.

Example

```bash
$ reverse-api-engineer agent --dry-run -p "fetch jobs" -u https://jobs.example.com | jq
{
"schema_version": 1,
"status": "ok",
"mode": "dry-run",
"prompt": "fetch jobs",
"url": "https://jobs.example.com",
"would_run": {
"agent_provider": "auto",
"sdk": "claude",
"model": "claude-sonnet-4-6",
"output_dir": "/home/kalil/.reverse-api/runs",
"headless": false
},
"checks": [
{ "name": "prompt", "status": "ok", "message": "10 chars" },
{ "name": "url", "status": "ok", "message": "https://jobs.example.com" },
{ "name": "agent_provider", "status": "ok", "message": "auto" },
{ "name": "sdk:claude", "status": "warn", "message": "ANTHROPIC_API_KEY not set in env (the SDK may still resolve auth via a config file)" },
{ "name": "node", "status": "ok", "message": "v22.22.2" },
{ "name": "output_dir", "status": "ok", "message": "/home/kalil/.reverse-api/runs" }
],
"error": null,
"error_kind": null
}
```

Validations

Check Status semantics
prompt non-empty error if missing
url is http(s):// if given error if invalid scheme
agent_provider in {auto, chrome-mcp} error if unknown
SDK env var present warn (SDK may resolve auth elsewhere)
node binary + version error if missing, ok with version, warn if can't query
chrome-mcp without --headless warn about auto-connect requirements
output_dir writable (probe-write) error if not writable

Aggregate status is error if any check is error, else ok. error_kind is misuse for prompt/url issues, config_invalid for env/deps/output_dir issues.

Implies --json

--dry-run is fundamentally about machine-parseable validation, so it implies --json. Logs go to stderr, JSON to stdout.

Test plan

  • 6 new tests (TestAgentDryRun): ok path, missing prompt → misuse, bad url → misuse, unwritable output_dir → config_invalid, does NOT call run_agent_capture (no browser, no LLM, no cost), --help advertises the flag
  • All 702 tests pass on full suite (5 failures are pre-existing on main)
  • Live smoke: `agent --dry-run` ok → exit 0; `agent --dry-run` (no prompt) → exit 1 with error_kind: "misuse"

What's next on #62

Only item #8 (`run --json` wrapped, ~25 lines) remains, intentionally deferred — orthogonal surface that wraps a script's stdout/stderr/exit_code rather than the agent contract.

🤖 Generated with Claude Code


Summary by cubic

Add agent --dry-run to validate prompts, URLs, config, env, and deps without launching a browser or spending LLM tokens. Adds npx checks, a unique output-dir probe, and per-SDK model resolution; addresses #62 item 5.

  • New Features
    • New --dry-run for agent validates: prompt, URL scheme, agent_provider (auto | chrome-mcp), SDK env var, node and npx presence, headed chrome-mcp warning, and output-dir writability via a unique, non-clobbering probe.
    • Implies --json: JSON to stdout, logs to stderr.
    • Payload matches agent --json with mode: "dry-run", plus would_run (agent_provider, sdk, model resolved per SDK, output_dir, headless) and checks.
    • Status is error if any check fails; error_kind is misuse (prompt/url) or config_invalid (env/deps/output_dir). Exits 0/1.
    • Skips run_agent_capture entirely (no browser, no LLM).

Written for commit 060e527. Summary will update on new commits.

Greptile Summary

This PR adds --dry-run to the agent command, letting agents and CI pipelines validate prompt/URL/config/env without launching a browser or consuming LLM tokens. It introduces _build_dry_run_payload which runs seven sequential checks and returns a structured JSON manifest, and wires it into the agent command so it short-circuits before any real work begins.

  • _build_dry_run_payload covers prompt presence, URL scheme, agent_provider validity, SDK env-var presence (warn-only), Node.js availability, headed chrome-mcp auto-connect warning, and output-dir writability — aggregating to error/ok with misuse vs config_invalid error kinds.
  • Six new tests in TestAgentDryRun cover the happy path, both misuse error cases, config-invalid for an unwritable dir, the no-browser invariant, and --help text.

Confidence Score: 4/5

Safe to merge; the dry-run path exits before any browser or LLM work, keeping the existing agent flow completely untouched.

The implementation is correct and well-tested. The one issue is that a failed probe.unlink() after a successful probe.write_text() causes the output-dir check to be falsely marked as an error and leaves a stray .dry_run_write_probe file on disk — a real but low-probability edge case that does not affect the browser-launch guard or any existing behavior.

The probe-file cleanup in _build_dry_run_payload inside src/reverse_api/cli.py is the only spot worth a second look before merging.

Important Files Changed

Filename Overview
src/reverse_api/cli.py Adds _build_dry_run_payload and --dry-run flag to the agent command. Logic is sound; minor probe-file cleanup issue if unlink() raises after a successful write.
tests/test_cli_followups.py Adds 6 new tests covering ok path, misuse errors, config errors, no-browser invariant, and help text. Coverage is thorough and assertions are correct.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/reverse_api/cli.py:282-285
If `probe.unlink()` raises (e.g., a race condition where another process deletes the file first, or an OS-level permission quirk), the outer `except Exception` catches it and marks `output_dir` as an error — even though the directory was demonstrably writable. The probe file is also left behind as a stray `.dry_run_write_probe` artifact. Adding `missing_ok=True` makes the unlink unconditional and avoids both problems.

```suggestion
        probe = base / ".dry_run_write_probe"
        probe.write_text("")
        try:
            probe.unlink(missing_ok=True)
        except Exception:
            pass  # write succeeded; ignore cleanup failure
        checks.append({"name": "output_dir", "status": "ok", "message": str(base)})
```

Reviews (1): Last reviewed commit: "feat(cli): agent --dry-run for safe pre-..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

…item 5)

Lets agents and CI wrappers sanity-check inputs (prompt, url, config,
env vars, deps, output dir writability) without launching a browser
or burning LLM tokens. Implies --json since --dry-run is fundamentally
about machine-parseable validation.

Output shape: same top-level fields as `agent --json` (schema_version,
status, run_id=null, prompt, url, mode="dry-run", error, error_kind)
plus:
  - `would_run` sub-object: agent_provider, sdk, model, output_dir,
    headless — what an actual `agent` invocation would do
  - `checks` array: one entry per validation step with name, status
    (ok | warn | error), and a human message

Validations:
  1. prompt non-empty
  2. url is http(s):// if provided
  3. agent_provider in {auto, chrome-mcp}
  4. SDK env var present (warn — SDK may resolve auth elsewhere)
  5. node binary in PATH (required by both MCP servers) + version
  6. chrome-mcp without --headless: warn that auto-connect needs Chrome
     146+ with remote-debugging enabled (not auto-checkable)
  7. output_dir writable (probe-write-and-delete)

Aggregate status is `error` if any check is `error`, otherwise `ok`.
Error kinds are `misuse` for prompt/url issues, `config_invalid` for
env/deps/output_dir issues — matching the schema-v1 error_kind enum.

Tests: 6 new in TestAgentDryRun covering: ok path, missing prompt
(misuse), bad url (misuse), unwritable output_dir (config_invalid),
that run_agent_capture is NOT called under --dry-run, and --help
mentions the flag with "Implies --json".

Closes the last medium-priority blocker on issue #62. Item #8
(`run --json` wrapped) intentionally deferred — orthogonal surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8a23c51a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/reverse_api/cli.py Outdated
"would_run": {
"agent_provider": agent_provider,
"sdk": sdk,
"model": model or config_manager.get("claude_code_model", "claude-sonnet-4-6"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report the SDK-specific model in dry-run

When the configured SDK is opencode or copilot, this would_run.model value does not match the real agent invocation: run_auto_capture ignores the CLI model for those SDKs and uses opencode_model or copilot_model from config instead. In those configurations agent --dry-run can approve and display a Claude model even though the actual run will use a different SDK/model, making the pre-flight manifest misleading for wrappers that rely on it.

Useful? React with 👍 / 👎.

Comment thread src/reverse_api/cli.py
Comment on lines +254 to +256
node = shutil.which("node")
if node is None:
checks.append({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check for npx, not just node

In environments where node is present but npx is not on PATH, this check reports the dependency as ok even though every MCP server configuration launched by the agent uses npx as the executable. That makes agent --dry-run succeed while the subsequent real agent run fails before starting the browser, so the pre-flight validation misses the actual required binary.

Useful? React with 👍 / 👎.

Comment thread src/reverse_api/cli.py Outdated
Comment on lines +282 to +285
probe = base / ".dry_run_write_probe"
probe.write_text("")
probe.unlink()
checks.append({"name": "output_dir", "status": "ok", "message": str(base)})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 If probe.unlink() raises (e.g., a race condition where another process deletes the file first, or an OS-level permission quirk), the outer except Exception catches it and marks output_dir as an error — even though the directory was demonstrably writable. The probe file is also left behind as a stray .dry_run_write_probe artifact. Adding missing_ok=True makes the unlink unconditional and avoids both problems.

Suggested change
probe = base / ".dry_run_write_probe"
probe.write_text("")
probe.unlink()
checks.append({"name": "output_dir", "status": "ok", "message": str(base)})
probe = base / ".dry_run_write_probe"
probe.write_text("")
try:
probe.unlink(missing_ok=True)
except Exception:
pass # write succeeded; ignore cleanup failure
checks.append({"name": "output_dir", "status": "ok", "message": str(base)})
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/reverse_api/cli.py
Line: 282-285

Comment:
If `probe.unlink()` raises (e.g., a race condition where another process deletes the file first, or an OS-level permission quirk), the outer `except Exception` catches it and marks `output_dir` as an error — even though the directory was demonstrably writable. The probe file is also left behind as a stray `.dry_run_write_probe` artifact. Adding `missing_ok=True` makes the unlink unconditional and avoids both problems.

```suggestion
        probe = base / ".dry_run_write_probe"
        probe.write_text("")
        try:
            probe.unlink(missing_ok=True)
        except Exception:
            pass  # write succeeded; ignore cleanup failure
        checks.append({"name": "output_dir", "status": "ok", "message": str(base)})
```

How can I resolve this? If you propose a fix, please make it concise.

@kind-agent
Copy link
Copy Markdown

kind-agent Bot commented May 6, 2026

Test Results

🛡️ 5.5/6

Results

# Test Status Details
1 Targeted TestAgentDryRun pytest coverage ✅ passed Ran pytest -q tests/test_cli_followups.py -k TestAgentDryRun; all 6 dry-run tests passed, covering ok/error aggregate status, error_kind mapping, help text, warnings, and exit code behavior.
2 Broader CLI regression file ✅ passed Ran pytest -q tests/test_cli_followups.py; all 43 tests in the CLI follow-up file passed, so the changed CLI surface did not regress within its adjacent test coverage.
3 Manual agent --dry-run CLI probes ✅ passed Verified direct dry-run behavior with controlled config/env and patched execution entrypoints: ok JSON output with real node on PATH, missing SDK env as warning, missing prompt/invalid URL as misuse, missing node/unknown provider/unknown SDK/unwritable output dir as config_invalid, chrome-mcp without --headless warning, and no actual agent execution entrypoint invocation during dry-run.
4 Repository-wide pytest suite ⚠️ inconclusive A full pytest -q run reported many existing failures outside the changed CLI area (starting in tests unrelated to this PR). This prevents using the whole suite as a clean regression signal for the branch, but it did not block validation of the dry-run change itself.

Issues Found

Repository-wide pytest suite: The full test suite is not green in this environment due to numerous unrelated failures outside the agent --dry-run change. I did not see evidence that the new dry-run behavior caused those failures, but they limit branch-wide regression confidence beyond the CLI-focused coverage.

Summary

The agent --dry-run change behaved as intended in targeted tests and direct CLI probes. The new validation path produced machine-readable JSON, returned the expected status/error semantics, and avoided invoking the real agent execution path. The only notable risk is that the repository-wide suite is already failing elsewhere, so overall branch health cannot be inferred from a full-suite pass.


View full run details · Tested by Kind

I tested d8a23c5 (feat(cli): agent --dry-run for safe pre-flight validation (issue #62 item 5))

📹 View browser recording

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/reverse_api/cli.py">

<violation number="1" location="src/reverse_api/cli.py:254">
P2: The pre-flight check verifies `node` is on PATH, but the actual MCP server configurations launched by the agent use `npx` as the executable. In environments where `node` is installed but `npx` is missing (e.g., minimal Docker images), dry-run will report ok while the real agent run will fail. Add a check for `npx` availability as well.</violation>

<violation number="2" location="src/reverse_api/cli.py:282">
P2: Using a fixed `.dry_run_write_probe` filename can overwrite and delete an existing file in the output directory during validation.</violation>

<violation number="3" location="src/reverse_api/cli.py:318">
P2: The `would_run.model` field always reads from `claude_code_model` regardless of which SDK is configured. When `sdk` is `opencode` or `copilot`, the actual agent run resolves the model from a different config key (e.g., `opencode_model`, `copilot_model`), so the dry-run manifest will report an incorrect model for those SDKs. Consider branching on `sdk` to select the appropriate model config key.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/reverse_api/cli.py Outdated
Comment thread src/reverse_api/cli.py
Comment thread src/reverse_api/cli.py Outdated
Three P2 issues flagged by cubic-dev-ai:

1. Check npx availability separately from node
   MCP servers shell out to `npx <package>` (not just node), so a
   minimal Docker image with node-but-no-npx would pass dry-run and
   then fail the real run. Added a dedicated `npx` check.

2. Use a unique probe filename for output_dir writability
   The fixed `.dry_run_write_probe` could legitimately exist in a
   user's output dir and would be deleted by the probe. Now uses
   `.rae_dry_run_probe_{pid}_{8hex}` and refuses to touch any path
   that already exists, guaranteeing we never clobber user data.

3. Resolve `would_run.model` per-SDK
   Previously always read `claude_code_model` regardless of which
   SDK was configured, so an opencode/copilot session would have a
   misleading manifest. Now branches on `sdk` to pick
   `opencode_model` / `copilot_model` / `claude_code_model` with
   the matching default — mirroring the live capture path.

Adds 4 regression tests in TestAgentDryRun:
  - test_dry_run_checks_npx_separately_from_node (mocks shutil.which)
  - test_dry_run_probe_does_not_clobber_existing_files (creates a
    canary file at the OLD probe path, asserts it survives)
  - test_dry_run_resolves_correct_model_per_sdk (opencode case)
  - test_dry_run_copilot_model_resolution (copilot case)

10/10 in TestAgentDryRun, 706 passing on full suite (5 pre-existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kalil0321
Copy link
Copy Markdown
Owner Author

@cubic-dev-ai — all 3 issues acknowledged and fixed in 060e527

  • P2 (cli.py:254) — npx check missing alongside node: Ack. Confirmed: both MCP servers (rae-playwright-mcp, chrome-devtools-mcp) are launched via npx <package>, so node-without-npx (e.g. minimal Docker images) would slip past dry-run and crash the real run. Added a dedicated npx check; missing npx now reports error_kind: "config_invalid". Regression test: test_dry_run_checks_npx_separately_from_node mocks shutil.which to simulate the failure mode.

  • P2 (cli.py:282) — probe filename could clobber existing data: Ack. The fixed .dry_run_write_probe could legitimately exist in someone's output dir and would have been silently deleted. Probe is now .rae_dry_run_probe_{pid}_{8-char-hex} (collision-resistant), and we refuse to touch any path that already exists. Regression test creates a canary at the OLD probe path with content "user data — do not delete" and asserts it survives.

  • P2 (cli.py:318) — would_run.model ignores SDK: Ack. Previously hard-coded claude_code_model; now branches on sdk to read opencode_model / copilot_model / claude_code_model with the matching default — same logic as the live capture path. Two regression tests cover the opencode and copilot cases.

@kind-agent — partial Ack

  • Tests 1, 2, 3 passing: noted, thanks.
  • Test 4 (full suite inconclusive): the 5 failures (test_base_engineer, test_opencode_ui, test_sync ×2, test_tui) all reproduce on main — verified earlier on this stack. They're tracked separately and not introduced by any PR in this stack.

10/10 in TestAgentDryRun post-fix; 706 pass on full suite.

@kind-agent
Copy link
Copy Markdown

kind-agent Bot commented May 6, 2026

Re-running tests because the PR changed after the previous run.


Triggered by Kind

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented May 6, 2026

@cubic-dev-ai — all 3 issues acknowledged and fixed in 060e527

  • P2 (cli.py:254) — npx check missing alongside node: Ack. Confirmed: both MCP servers (rae-playwright-mcp, chrome-devtools-mcp) are launched via npx <package>, so node-without-npx (e.g. minimal Docker images) would slip past dry-run and crash the real run. Added a dedicated npx check; missing npx now reports error_kind: "config_invalid". Regression test: test_dry_run_checks_npx_separately_from_node mocks shutil.which to simulate the failure mode.

  • P2 (cli.py:282) — probe filename could clobber existing data: Ack. The fixed .dry_run_write_probe could legitimately exist in someone's output dir and would have been silently deleted. Probe is now .rae_dry_run_probe_{pid}_{8-char-hex} (collision-resistant), and we refuse to touch any path that already exists. Regression test creates a canary at the OLD probe path with content "user data — do not delete" and asserts it survives.
    ...

@kalil0321 Thanks for the thorough follow-up, and for addressing all three points from the earlier review.

Quick confirmation after looking at the updated code in 060e527:

  1. npx check — the dedicated npx block at line ~217 with the separate test mocking shutil.which looks right. Good call on catching the Docker-minimal-image footgun.

  2. Unique probe filename — the {pid}_{8hex} scheme plus the probe.exists() guard means the old fixed-name clobber risk is gone. The regression test that creates .dry_run_write_probe as a canary and asserts it survives is exactly the right kind of regression guard.

  3. SDK-aware model resolution — the sdk→model_key/default_model mapping now mirrors the live path. test_dry_run_resolves_correct_model_per_sdk confirming "irrelevant" is not in the resolved model is a clean assertion.

10/10 in TestAgentDryRun and 706 passing across the full suite (with the 5 pre-existing failures confirmed on main) — that's solid coverage. No further concerns from my side.

@kind-agent
Copy link
Copy Markdown

kind-agent Bot commented May 6, 2026

Test Results

🛡️ 5.5/6

Results

# Test Status Details
1 Targeted agent --dry-run regressions ✅ passed Ran pytest -q tests/test_cli_followups.py -k 'DryRun or dry_run' after installing the package in editable mode; 10 tests passed, 37 deselected. This re-covered the newly acknowledged fixes for separate npx detection, collision-safe output-dir probe handling, and SDK-specific would_run.model selection.
2 Adjacent CLI follow-up coverage ✅ passed Ran pytest -q tests/test_cli_followups.py; all 47 tests in the file passed, indicating no regression in nearby CLI/help/JSON follow-up behavior on this branch.
3 --dry-run help / observable contract ✅ passed The passing Click-based dry-run tests on this commit exercise the user-visible contract for dry-run payloads and misuse/config cases, including help coverage in the same follow-up suite. I attempted additional direct shell invocations, but sandbox security heuristics blocked those specific commands before execution.
4 Broader suite health ⚠️ inconclusive A plain pytest -q run did not provide reliable PR signal here because the environment lacked an active async pytest plugin under the current interpreter/tooling, causing many unrelated async tests to fail with framework/plugin errors. That prevented a meaningful comparison against the author's note about existing full-suite failures.

Summary

The latest acknowledged dry-run fixes in 060e527 retested cleanly: targeted regression coverage passed, and broader CLI follow-up coverage in tests/test_cli_followups.py also passed. I did not find evidence of a regression in the changed dry-run behavior; the only incomplete area was full-suite validation, which was blocked by this sandbox's pytest async setup rather than by the PR's CLI changes.


View full run details · Tested by Kind

I tested 060e527 (fix(cli): address review feedback on agent --dry-run (PR #67))

📹 View browser recording

@kalil0321 kalil0321 merged commit 9cc9bf6 into feat/agent-friendly-schema-v2 May 6, 2026
3 checks passed
@kalil0321 kalil0321 deleted the feat/agent-dry-run branch May 6, 2026 00:34
kalil0321 added a commit that referenced this pull request May 6, 2026
Three P2 issues flagged by cubic-dev-ai:

1. Check npx availability separately from node
   MCP servers shell out to `npx <package>` (not just node), so a
   minimal Docker image with node-but-no-npx would pass dry-run and
   then fail the real run. Added a dedicated `npx` check.

2. Use a unique probe filename for output_dir writability
   The fixed `.dry_run_write_probe` could legitimately exist in a
   user's output dir and would be deleted by the probe. Now uses
   `.rae_dry_run_probe_{pid}_{8hex}` and refuses to touch any path
   that already exists, guaranteeing we never clobber user data.

3. Resolve `would_run.model` per-SDK
   Previously always read `claude_code_model` regardless of which
   SDK was configured, so an opencode/copilot session would have a
   misleading manifest. Now branches on `sdk` to pick
   `opencode_model` / `copilot_model` / `claude_code_model` with
   the matching default — mirroring the live capture path.

Adds 4 regression tests in TestAgentDryRun:
  - test_dry_run_checks_npx_separately_from_node (mocks shutil.which)
  - test_dry_run_probe_does_not_clobber_existing_files (creates a
    canary file at the OLD probe path, asserts it survives)
  - test_dry_run_resolves_correct_model_per_sdk (opencode case)
  - test_dry_run_copilot_model_resolution (copilot case)

10/10 in TestAgentDryRun, 706 passing on full suite (5 pre-existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant