feat(cli): agent --dry-run for pre-flight validation (issue #62 item 5)#67
Conversation
…item 5)
Lets agents and CI wrappers sanity-check inputs (prompt, url, config,
env vars, deps, output dir writability) without launching a browser
or burning LLM tokens. Implies --json since --dry-run is fundamentally
about machine-parseable validation.
Output shape: same top-level fields as `agent --json` (schema_version,
status, run_id=null, prompt, url, mode="dry-run", error, error_kind)
plus:
- `would_run` sub-object: agent_provider, sdk, model, output_dir,
headless — what an actual `agent` invocation would do
- `checks` array: one entry per validation step with name, status
(ok | warn | error), and a human message
Validations:
1. prompt non-empty
2. url is http(s):// if provided
3. agent_provider in {auto, chrome-mcp}
4. SDK env var present (warn — SDK may resolve auth elsewhere)
5. node binary in PATH (required by both MCP servers) + version
6. chrome-mcp without --headless: warn that auto-connect needs Chrome
146+ with remote-debugging enabled (not auto-checkable)
7. output_dir writable (probe-write-and-delete)
Aggregate status is `error` if any check is `error`, otherwise `ok`.
Error kinds are `misuse` for prompt/url issues, `config_invalid` for
env/deps/output_dir issues — matching the schema-v1 error_kind enum.
Tests: 6 new in TestAgentDryRun covering: ok path, missing prompt
(misuse), bad url (misuse), unwritable output_dir (config_invalid),
that run_agent_capture is NOT called under --dry-run, and --help
mentions the flag with "Implies --json".
Closes the last medium-priority blocker on issue #62. Item #8
(`run --json` wrapped) intentionally deferred — orthogonal surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d8a23c51a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "would_run": { | ||
| "agent_provider": agent_provider, | ||
| "sdk": sdk, | ||
| "model": model or config_manager.get("claude_code_model", "claude-sonnet-4-6"), |
There was a problem hiding this comment.
Report the SDK-specific model in dry-run
When the configured SDK is opencode or copilot, this would_run.model value does not match the real agent invocation: run_auto_capture ignores the CLI model for those SDKs and uses opencode_model or copilot_model from config instead. In those configurations agent --dry-run can approve and display a Claude model even though the actual run will use a different SDK/model, making the pre-flight manifest misleading for wrappers that rely on it.
Useful? React with 👍 / 👎.
| node = shutil.which("node") | ||
| if node is None: | ||
| checks.append({ |
There was a problem hiding this comment.
In environments where node is present but npx is not on PATH, this check reports the dependency as ok even though every MCP server configuration launched by the agent uses npx as the executable. That makes agent --dry-run succeed while the subsequent real agent run fails before starting the browser, so the pre-flight validation misses the actual required binary.
Useful? React with 👍 / 👎.
| probe = base / ".dry_run_write_probe" | ||
| probe.write_text("") | ||
| probe.unlink() | ||
| checks.append({"name": "output_dir", "status": "ok", "message": str(base)}) |
There was a problem hiding this comment.
If
probe.unlink() raises (e.g., a race condition where another process deletes the file first, or an OS-level permission quirk), the outer except Exception catches it and marks output_dir as an error — even though the directory was demonstrably writable. The probe file is also left behind as a stray .dry_run_write_probe artifact. Adding missing_ok=True makes the unlink unconditional and avoids both problems.
| probe = base / ".dry_run_write_probe" | |
| probe.write_text("") | |
| probe.unlink() | |
| checks.append({"name": "output_dir", "status": "ok", "message": str(base)}) | |
| probe = base / ".dry_run_write_probe" | |
| probe.write_text("") | |
| try: | |
| probe.unlink(missing_ok=True) | |
| except Exception: | |
| pass # write succeeded; ignore cleanup failure | |
| checks.append({"name": "output_dir", "status": "ok", "message": str(base)}) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/reverse_api/cli.py
Line: 282-285
Comment:
If `probe.unlink()` raises (e.g., a race condition where another process deletes the file first, or an OS-level permission quirk), the outer `except Exception` catches it and marks `output_dir` as an error — even though the directory was demonstrably writable. The probe file is also left behind as a stray `.dry_run_write_probe` artifact. Adding `missing_ok=True` makes the unlink unconditional and avoids both problems.
```suggestion
probe = base / ".dry_run_write_probe"
probe.write_text("")
try:
probe.unlink(missing_ok=True)
except Exception:
pass # write succeeded; ignore cleanup failure
checks.append({"name": "output_dir", "status": "ok", "message": str(base)})
```
How can I resolve this? If you propose a fix, please make it concise.
Test Results🛡️ 5.5/6 Results
Issues FoundRepository-wide pytest suite: The full test suite is not green in this environment due to numerous unrelated failures outside the SummaryThe View full run details · Tested by Kind I tested |
There was a problem hiding this comment.
3 issues found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/reverse_api/cli.py">
<violation number="1" location="src/reverse_api/cli.py:254">
P2: The pre-flight check verifies `node` is on PATH, but the actual MCP server configurations launched by the agent use `npx` as the executable. In environments where `node` is installed but `npx` is missing (e.g., minimal Docker images), dry-run will report ok while the real agent run will fail. Add a check for `npx` availability as well.</violation>
<violation number="2" location="src/reverse_api/cli.py:282">
P2: Using a fixed `.dry_run_write_probe` filename can overwrite and delete an existing file in the output directory during validation.</violation>
<violation number="3" location="src/reverse_api/cli.py:318">
P2: The `would_run.model` field always reads from `claude_code_model` regardless of which SDK is configured. When `sdk` is `opencode` or `copilot`, the actual agent run resolves the model from a different config key (e.g., `opencode_model`, `copilot_model`), so the dry-run manifest will report an incorrect model for those SDKs. Consider branching on `sdk` to select the appropriate model config key.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Three P2 issues flagged by cubic-dev-ai:
1. Check npx availability separately from node
MCP servers shell out to `npx <package>` (not just node), so a
minimal Docker image with node-but-no-npx would pass dry-run and
then fail the real run. Added a dedicated `npx` check.
2. Use a unique probe filename for output_dir writability
The fixed `.dry_run_write_probe` could legitimately exist in a
user's output dir and would be deleted by the probe. Now uses
`.rae_dry_run_probe_{pid}_{8hex}` and refuses to touch any path
that already exists, guaranteeing we never clobber user data.
3. Resolve `would_run.model` per-SDK
Previously always read `claude_code_model` regardless of which
SDK was configured, so an opencode/copilot session would have a
misleading manifest. Now branches on `sdk` to pick
`opencode_model` / `copilot_model` / `claude_code_model` with
the matching default — mirroring the live capture path.
Adds 4 regression tests in TestAgentDryRun:
- test_dry_run_checks_npx_separately_from_node (mocks shutil.which)
- test_dry_run_probe_does_not_clobber_existing_files (creates a
canary file at the OLD probe path, asserts it survives)
- test_dry_run_resolves_correct_model_per_sdk (opencode case)
- test_dry_run_copilot_model_resolution (copilot case)
10/10 in TestAgentDryRun, 706 passing on full suite (5 pre-existing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cubic-dev-ai — all 3 issues acknowledged and fixed in
|
|
Re-running tests because the PR changed after the previous run. Triggered by Kind |
@kalil0321 Thanks for the thorough follow-up, and for addressing all three points from the earlier review. Quick confirmation after looking at the updated code in
10/10 in |
Test Results🛡️ 5.5/6 Results
SummaryThe latest acknowledged dry-run fixes in View full run details · Tested by Kind I tested |
Three P2 issues flagged by cubic-dev-ai:
1. Check npx availability separately from node
MCP servers shell out to `npx <package>` (not just node), so a
minimal Docker image with node-but-no-npx would pass dry-run and
then fail the real run. Added a dedicated `npx` check.
2. Use a unique probe filename for output_dir writability
The fixed `.dry_run_write_probe` could legitimately exist in a
user's output dir and would be deleted by the probe. Now uses
`.rae_dry_run_probe_{pid}_{8hex}` and refuses to touch any path
that already exists, guaranteeing we never clobber user data.
3. Resolve `would_run.model` per-SDK
Previously always read `claude_code_model` regardless of which
SDK was configured, so an opencode/copilot session would have a
misleading manifest. Now branches on `sdk` to pick
`opencode_model` / `copilot_model` / `claude_code_model` with
the matching default — mirroring the live capture path.
Adds 4 regression tests in TestAgentDryRun:
- test_dry_run_checks_npx_separately_from_node (mocks shutil.which)
- test_dry_run_probe_does_not_clobber_existing_files (creates a
canary file at the OLD probe path, asserts it survives)
- test_dry_run_resolves_correct_model_per_sdk (opencode case)
- test_dry_run_copilot_model_resolution (copilot case)
10/10 in TestAgentDryRun, 706 passing on full suite (5 pre-existing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on #66 (
feat/agent-friendly-schema-v2).Last medium-priority item from the agent-friendliness backlog (#62). Lets agents and CI wrappers sanity-check inputs without launching a browser or burning LLM tokens.
Example
```bash
$ reverse-api-engineer agent --dry-run -p "fetch jobs" -u https://jobs.example.com | jq
{
"schema_version": 1,
"status": "ok",
"mode": "dry-run",
"prompt": "fetch jobs",
"url": "https://jobs.example.com",
"would_run": {
"agent_provider": "auto",
"sdk": "claude",
"model": "claude-sonnet-4-6",
"output_dir": "/home/kalil/.reverse-api/runs",
"headless": false
},
"checks": [
{ "name": "prompt", "status": "ok", "message": "10 chars" },
{ "name": "url", "status": "ok", "message": "https://jobs.example.com" },
{ "name": "agent_provider", "status": "ok", "message": "auto" },
{ "name": "sdk:claude", "status": "warn", "message": "ANTHROPIC_API_KEY not set in env (the SDK may still resolve auth via a config file)" },
{ "name": "node", "status": "ok", "message": "v22.22.2" },
{ "name": "output_dir", "status": "ok", "message": "/home/kalil/.reverse-api/runs" }
],
"error": null,
"error_kind": null
}
```
Validations
promptnon-emptyurlishttp(s)://if givenagent_providerin{auto, chrome-mcp}nodebinary + versionoutput_dirwritable (probe-write)Aggregate
statusiserrorif any check iserror, elseok.error_kindismisusefor prompt/url issues,config_invalidfor env/deps/output_dir issues.Implies --json
--dry-runis fundamentally about machine-parseable validation, so it implies--json. Logs go to stderr, JSON to stdout.Test plan
TestAgentDryRun): ok path, missing prompt → misuse, bad url → misuse, unwritable output_dir → config_invalid, does NOT callrun_agent_capture(no browser, no LLM, no cost), --help advertises the flagmain)error_kind: "misuse"What's next on #62
Only item #8 (`run --json` wrapped, ~25 lines) remains, intentionally deferred — orthogonal surface that wraps a script's stdout/stderr/exit_code rather than the agent contract.
🤖 Generated with Claude Code
Summary by cubic
Add
agent --dry-runto validate prompts, URLs, config, env, and deps without launching a browser or spending LLM tokens. Addsnpxchecks, a unique output-dir probe, and per-SDK model resolution; addresses #62 item 5.--dry-runforagentvalidates: prompt, URL scheme,agent_provider(auto|chrome-mcp), SDK env var,nodeandnpxpresence, headedchrome-mcpwarning, and output-dir writability via a unique, non-clobbering probe.--json: JSON to stdout, logs to stderr.agent --jsonwithmode: "dry-run", pluswould_run(agent_provider, sdk, model resolved per SDK, output_dir, headless) andchecks.errorif any check fails;error_kindismisuse(prompt/url) orconfig_invalid(env/deps/output_dir). Exits 0/1.run_agent_captureentirely (no browser, no LLM).Written for commit 060e527. Summary will update on new commits.
Greptile Summary
This PR adds
--dry-runto theagentcommand, letting agents and CI pipelines validate prompt/URL/config/env without launching a browser or consuming LLM tokens. It introduces_build_dry_run_payloadwhich runs seven sequential checks and returns a structured JSON manifest, and wires it into theagentcommand so it short-circuits before any real work begins._build_dry_run_payloadcovers prompt presence, URL scheme,agent_providervalidity, SDK env-var presence (warn-only), Node.js availability, headedchrome-mcpauto-connect warning, and output-dir writability — aggregating toerror/okwithmisusevsconfig_invaliderror kinds.TestAgentDryRuncover the happy path, both misuse error cases, config-invalid for an unwritable dir, the no-browser invariant, and--helptext.Confidence Score: 4/5
Safe to merge; the dry-run path exits before any browser or LLM work, keeping the existing agent flow completely untouched.
The implementation is correct and well-tested. The one issue is that a failed probe.unlink() after a successful probe.write_text() causes the output-dir check to be falsely marked as an error and leaves a stray .dry_run_write_probe file on disk — a real but low-probability edge case that does not affect the browser-launch guard or any existing behavior.
The probe-file cleanup in _build_dry_run_payload inside src/reverse_api/cli.py is the only spot worth a second look before merging.
Important Files Changed
_build_dry_run_payloadand--dry-runflag to theagentcommand. Logic is sound; minor probe-file cleanup issue ifunlink()raises after a successful write.Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "feat(cli): agent --dry-run for safe pre-..." | Re-trigger Greptile