test(agent-runtime): cover slash command regressions by yacosta738 · Pull Request #783 · dallay/corvus

yacosta738 · 2026-05-05T14:00:14Z

Related Issues

Closes #543

Summary

Adds slash-command regression coverage for the current registry-backed session commands so the command-platform rework has stable behavior guards.

Freezes the current built-in slash command registry descriptors, aliases, argument shapes, capabilities, permissions, and backend requirements.
Covers parser/argument failures, unknown slash-command fallthrough, normalized error behavior, plan-mode interactions, and CLI/Gateway HTTP parity.
Adds real Sqlite-backed authz/ownership coverage for /resume missing-scope and wrong-owner failures across transports.

Tested Information

cargo test session_command_regression -- --nocapture
cargo test pre_execution::tests:: -- --nocapture

Both commands pass locally.

Documentation Impact

Docs updated in: N/A
No docs update required because: this PR adds regression tests only and does not change runtime behavior, setup, configuration, UX, APIs, or operations.
I verified the documentation matches the current behavior.

Breaking Changes

None.

Checklist

I have checked that there isn’t already a PR solving the same problem.
I have read the Contributing Guidelines.
I ensured my code follows the project's style guidelines.
I have added or updated tests that prove my fix is effective or that my feature works.
I have updated the documentation, or I explained above why no documentation update is needed.
I verified the documentation matches the current behavior.
I have documented any breaking changes in the Breaking Changes section.
I have linked the related issue (if any).

cloudflare-workers-and-pages · 2026-05-05T14:00:16Z

Deploying corvus with Cloudflare Pages

Latest commit:	`7ca8ab4`
Status:	✅ Deploy successful!
Preview URL:	https://cb840d80.corvus-42x.pages.dev
Branch Preview URL:	https://test-slash-command-regressio.corvus-42x.pages.dev

View logs

coderabbitai · 2026-05-05T14:00:32Z

📝 Walkthrough

Summary by CodeRabbit

Tests
- Expanded test suite with regression tests for registry stability snapshots, command validation error handling, transport parity between CLI and HTTP, plan-mode behavior consistency, and authorization enforcement across transports.
- Added shared test helper for comparing ingress decision values.

Walkthrough

A test helper function and five regression tests are added to the pre-execution module to verify slash-command behavior: registry stability, invalid input normalization, CLI/gateway transport parity, plan-mode invariance, and authorization enforcement across transports.

Changes

Slash-Command Regression Test Suite

Layer / File(s)	Summary
Test Infrastructure `clients/agent-runtime/src/pre_execution/mod.rs` (lines 110–124)	`assert_same_ingress_decision` helper compares `IngressDecision` outcomes with context-aware failure messages for structural assertion.
Registry Stability `clients/agent-runtime/src/pre_execution/mod.rs` (lines 543–670)	`session_command_regression_registry_freezes_current_session_commands` snapshots `default_registry()` descriptors (names, aliases, argument shapes, capabilities, permissions, backends) and asserts exact equality against hardcoded canonical state.
Input Validation `clients/agent-runtime/src/pre_execution/mod.rs` (lines 672–753)	`session_command_regression_invalid_inputs_have_normalized_failures` verifies invalid prompts produce normalized `SessionCommandOutcome::Failure` with correct `command`, `kind`, and `message` fields; conditionally asserts `session_id` presence/absence based on failure type.
Transport Parity `clients/agent-runtime/src/pre_execution/mod.rs` (lines 754–813)	`session_command_regression_cli_gateway_transport_parity` evaluates identical prompts via CLI and gateway HTTP ingress builders and asserts decision equivalence using the helper.
Plan-Mode Invariance `clients/agent-runtime/src/pre_execution/mod.rs` (lines 814–853)	`session_command_regression_plan_mode_keeps_slash_command_path` compares slash-command handling between Standard and Plan execution modes to ensure no behavior drift.
Authorization Enforcement `clients/agent-runtime/src/pre_execution/mod.rs` (lines 854–954)	`session_command_regression_authz_and_ownership_enforcement_is_transport_stable` exercises `/resume` with missing scope and wrong owner over both transports, asserting failure classification and decision transport-stability.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

dallay/corvus#775: Introduces session-command dispatch and failure-classification changes that these regression tests directly validate.

Suggested labels

area:rust

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title follows Conventional Commit style with test prefix and clear scope, at 52 characters, well under the 72-character limit.
Description check	✅ Passed	Description includes all required sections: Related Issues, Summary, Tested Information, Documentation Impact, Breaking Changes, and completed Checklist; matches template structure.
Linked Issues check	✅ Passed	PR fully addresses `#543` acceptance criteria: covers parser failures, authz/ownership failures, CLI/gateway parity, plan-mode interactions, and normalized error behavior across existing session commands.
Out of Scope Changes check	✅ Passed	All changes are confined to test code in pre_execution/mod.rs with no modifications to exported entities or runtime behavior, staying within the regression testing scope.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/slash-command-regression-543

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@clients/agent-runtime/src/pre_execution/mod.rs`:
- Around line 716-734: The test currently derives the expected session_id by
checking message.starts_with("invalid slash command usage"), which is brittle;
update the test cases to include an explicit expected_session_id value for each
case and assert failure.session_id equals that expected value instead of using
the message-prefix heuristic. Locate the match arm handling
IngressDecision::SessionCommand (pattern binding outcome:
SessionCommandOutcome::Failure(failure)) and replace the conditional on
message.starts_with(...) with an assertion comparing failure.session_id to the
new expected_session_id field from the test case tuple (e.g., use the same tuple
that supplies command, kind, message to also supply expected_session_id),
keeping the existing equality checks for failure.command, failure.kind, and
failure.message.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a71684a8-5de0-465c-a70a-d35690626be6

📥 Commits

Reviewing files that changed from the base of the PR and between db03e13 and 9066b07.

📒 Files selected for processing (1)

clients/agent-runtime/src/pre_execution/mod.rs

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pr-checks
GitHub Check: sonar
GitHub Check: semgrep-cloud-platform/scan
GitHub Check: Cloudflare Pages

🧰 Additional context used

📓 Path-based instructions (4)

clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

clients/agent-runtime/src/pre_execution/mod.rs

clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

clients/agent-runtime/src/pre_execution/mod.rs

**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

clients/agent-runtime/src/pre_execution/mod.rs

**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

clients/agent-runtime/src/pre_execution/mod.rs

🔇 Additional comments (5)

clients/agent-runtime/src/pre_execution/mod.rs (5)

110-124: LGTM — clean custom equality helper.

Taking ownership and delegating to assert_eq! on the inner types is the right call given IngressDecision doesn't implement PartialEq. The fallthrough arm's panic! with {context} gives enough signal to diagnose variant mismatches.

543-670: LGTM — intentional golden-file regression guard.

Order-sensitive exact match is the right call for a freeze test. Using #[test] (sync) instead of #[tokio::test] is correct since default_registry() and .iter() are synchronous.

754-812: LGTM — solid transport parity regression guard.

Using include_blocking_fallback: false correctly prevents the test from hitting the evaluate() codepath. Including /tool toggle shell (which hits InvalidArguments) confirms the parity check covers error paths too, not just success paths.

814-852: LGTM — plan-mode invariance correctly asserted.

854-953: LGTM — real SQLite-backed authz scenario with correct lifecycle setup.

The temp_dir binding correctly outlives memory. The cli_wrong_owner.clone() before assert_same_ingress_decision is necessary since the value is consumed by the helper then re-used for the kind assertion at line 943. Missing-scope and wrong-owner failure paths are both transport-stable.

coderabbitai · 2026-05-05T14:05:18Z

+            match decision {
+                IngressDecision::SessionCommand {
+                    outcome: SessionCommandOutcome::Failure(failure),
+                } => {
+                    assert_eq!(failure.command, command, "command drift for {prompt}");
+                    assert_eq!(failure.kind, kind, "failure kind drift for {prompt}");
+                    assert_eq!(failure.message, message, "message drift for {prompt}");
+                    if message.starts_with("invalid slash command usage") {
+                        assert_eq!(
+                            failure.session_id, None,
+                            "argument-shape failures stay pre-context"
+                        );
+                    } else {
+                        assert_eq!(failure.session_id, Some("session-1".to_string()));
+                    }
+                }
+                other => panic!("prompt {prompt} should fail as session command, got {other:?}"),
+            }
+        }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make per-case session_id expectation explicit rather than derived from message prefix.

The message.starts_with("invalid slash command usage") heuristic at line 723 couples the session_id assertion to a message prefix. For current cases it works, but adding a new case whose message happens to start with that prefix (or not) would silently apply the wrong assertion. The exact message equality is already asserted, so this second check should use an explicit expected value per case.

🛠️ Proposed fix: add `expected_session_id` to the cases tuple

let cases = [ ( "/tools extra", "/tools", SessionCommandFailureKind::InvalidArguments, "invalid slash command usage for /tools: this command does not accept trailing arguments", + None::<&str>, ), ( "/mcp", "/mcp", SessionCommandFailureKind::InvalidArguments, "invalid slash command usage for /mcp: a subcommand argument is required", + None::<&str>, ), ( "/tool toggle shell", "/tool", SessionCommandFailureKind::InvalidArguments, "Unknown /tool subcommand: 'toggle'. Use enable or disable.", + Some("session-1"), ), ( "/session archive", "/session", SessionCommandFailureKind::InvalidArguments, "Unknown /session subcommand: 'archive'. Usage: /session, /session status, /session inspect, or /session list", + Some("session-1"), ), ]; - for (prompt, command, kind, message) in cases { + for (prompt, command, kind, message, expected_session_id) in cases { // ... assert_eq!(failure.command, command, "command drift for {prompt}"); assert_eq!(failure.kind, kind, "failure kind drift for {prompt}"); assert_eq!(failure.message, message, "message drift for {prompt}"); - if message.starts_with("invalid slash command usage") { - assert_eq!( - failure.session_id, None, - "argument-shape failures stay pre-context" - ); - } else { - assert_eq!(failure.session_id, Some("session-1".to_string())); - } + assert_eq!( + failure.session_id, + expected_session_id.map(str::to_string), + "session_id drift for {prompt}" + );

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@clients/agent-runtime/src/pre_execution/mod.rs` around lines 716 - 734, The test currently derives the expected session_id by checking message.starts_with("invalid slash command usage"), which is brittle; update the test cases to include an explicit expected_session_id value for each case and assert failure.session_id equals that expected value instead of using the message-prefix heuristic. Locate the match arm handling IngressDecision::SessionCommand (pattern binding outcome: SessionCommandOutcome::Failure(failure)) and replace the conditional on message.starts_with(...) with an assertion comparing failure.session_id to the new expected_session_id field from the test case tuple (e.g., use the same tuple that supplies command, kind, message to also supply expected_session_id), keeping the existing equality checks for failure.command, failure.kind, and failure.message.

sonarqubecloud · 2026-05-05T14:32:44Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
99.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

yacosta738 added 2 commits May 5, 2026 15:55

test(agent-runtime): cover slash command regressions

c4f5c2d

style(agent-runtime): format slash regression tests

9066b07

github-actions Bot added the size/m Denotes a medium change size label May 5, 2026

coderabbitai Bot added the area:rust label May 5, 2026

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Merge branch 'main' into test/slash-command-regression-543

7ca8ab4

yacosta738 merged commit 420d5bd into main May 5, 2026
17 checks passed

yacosta738 deleted the test/slash-command-regression-543 branch May 5, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(agent-runtime): cover slash command regressions#783

test(agent-runtime): cover slash command regressions#783
yacosta738 merged 3 commits into
mainfrom
test/slash-command-regression-543

yacosta738 commented May 5, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 5, 2026

Uh oh!

sonarqubecloud Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yacosta738 commented May 5, 2026

Related Issues

Summary

Tested Information

Documentation Impact

Breaking Changes

Checklist

Uh oh!

cloudflare-workers-and-pages Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying corvus with Cloudflare Pages

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 5, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented May 5, 2026 •

edited

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading