Skip to content

test(agent-runtime): cover slash command regressions#783

Merged
yacosta738 merged 3 commits into
mainfrom
test/slash-command-regression-543
May 5, 2026
Merged

test(agent-runtime): cover slash command regressions#783
yacosta738 merged 3 commits into
mainfrom
test/slash-command-regression-543

Conversation

@yacosta738
Copy link
Copy Markdown
Contributor

Related Issues

Closes #543


Summary

Adds slash-command regression coverage for the current registry-backed session commands so the command-platform rework has stable behavior guards.

  • Freezes the current built-in slash command registry descriptors, aliases, argument shapes, capabilities, permissions, and backend requirements.
  • Covers parser/argument failures, unknown slash-command fallthrough, normalized error behavior, plan-mode interactions, and CLI/Gateway HTTP parity.
  • Adds real Sqlite-backed authz/ownership coverage for /resume missing-scope and wrong-owner failures across transports.

Tested Information

  • cargo test session_command_regression -- --nocapture
  • cargo test pre_execution::tests:: -- --nocapture

Both commands pass locally.


Documentation Impact

  • Docs updated in: N/A
  • No docs update required because: this PR adds regression tests only and does not change runtime behavior, setup, configuration, UX, APIs, or operations.
  • I verified the documentation matches the current behavior.

Breaking Changes

None.


Checklist

  • I have checked that there isn’t already a PR solving the same problem.
  • I have read the Contributing Guidelines.
  • I ensured my code follows the project's style guidelines.
  • I have added or updated tests that prove my fix is effective or that my feature works.
  • I have updated the documentation, or I explained above why no documentation update is needed.
  • I verified the documentation matches the current behavior.
  • I have documented any breaking changes in the Breaking Changes section.
  • I have linked the related issue (if any).

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 5, 2026

Deploying corvus with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7ca8ab4
Status: ✅  Deploy successful!
Preview URL: https://cb840d80.corvus-42x.pages.dev
Branch Preview URL: https://test-slash-command-regressio.corvus-42x.pages.dev

View logs

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Tests
    • Expanded test suite with regression tests for registry stability snapshots, command validation error handling, transport parity between CLI and HTTP, plan-mode behavior consistency, and authorization enforcement across transports.
    • Added shared test helper for comparing ingress decision values.

Walkthrough

A test helper function and five regression tests are added to the pre-execution module to verify slash-command behavior: registry stability, invalid input normalization, CLI/gateway transport parity, plan-mode invariance, and authorization enforcement across transports.

Changes

Slash-Command Regression Test Suite

Layer / File(s) Summary
Test Infrastructure
clients/agent-runtime/src/pre_execution/mod.rs (lines 110–124)
assert_same_ingress_decision helper compares IngressDecision outcomes with context-aware failure messages for structural assertion.
Registry Stability
clients/agent-runtime/src/pre_execution/mod.rs (lines 543–670)
session_command_regression_registry_freezes_current_session_commands snapshots default_registry() descriptors (names, aliases, argument shapes, capabilities, permissions, backends) and asserts exact equality against hardcoded canonical state.
Input Validation
clients/agent-runtime/src/pre_execution/mod.rs (lines 672–753)
session_command_regression_invalid_inputs_have_normalized_failures verifies invalid prompts produce normalized SessionCommandOutcome::Failure with correct command, kind, and message fields; conditionally asserts session_id presence/absence based on failure type.
Transport Parity
clients/agent-runtime/src/pre_execution/mod.rs (lines 754–813)
session_command_regression_cli_gateway_transport_parity evaluates identical prompts via CLI and gateway HTTP ingress builders and asserts decision equivalence using the helper.
Plan-Mode Invariance
clients/agent-runtime/src/pre_execution/mod.rs (lines 814–853)
session_command_regression_plan_mode_keeps_slash_command_path compares slash-command handling between Standard and Plan execution modes to ensure no behavior drift.
Authorization Enforcement
clients/agent-runtime/src/pre_execution/mod.rs (lines 854–954)
session_command_regression_authz_and_ownership_enforcement_is_transport_stable exercises /resume with missing scope and wrong owner over both transports, asserting failure classification and decision transport-stability.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • dallay/corvus#775: Introduces session-command dispatch and failure-classification changes that these regression tests directly validate.

Suggested labels

area:rust

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commit style with test prefix and clear scope, at 52 characters, well under the 72-character limit.
Description check ✅ Passed Description includes all required sections: Related Issues, Summary, Tested Information, Documentation Impact, Breaking Changes, and completed Checklist; matches template structure.
Linked Issues check ✅ Passed PR fully addresses #543 acceptance criteria: covers parser failures, authz/ownership failures, CLI/gateway parity, plan-mode interactions, and normalized error behavior across existing session commands.
Out of Scope Changes check ✅ Passed All changes are confined to test code in pre_execution/mod.rs with no modifications to exported entities or runtime behavior, staying within the regression testing scope.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/slash-command-regression-543

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the size/m Denotes a medium change size label May 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@clients/agent-runtime/src/pre_execution/mod.rs`:
- Around line 716-734: The test currently derives the expected session_id by
checking message.starts_with("invalid slash command usage"), which is brittle;
update the test cases to include an explicit expected_session_id value for each
case and assert failure.session_id equals that expected value instead of using
the message-prefix heuristic. Locate the match arm handling
IngressDecision::SessionCommand (pattern binding outcome:
SessionCommandOutcome::Failure(failure)) and replace the conditional on
message.starts_with(...) with an assertion comparing failure.session_id to the
new expected_session_id field from the test case tuple (e.g., use the same tuple
that supplies command, kind, message to also supply expected_session_id),
keeping the existing equality checks for failure.command, failure.kind, and
failure.message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a71684a8-5de0-465c-a70a-d35690626be6

📥 Commits

Reviewing files that changed from the base of the PR and between db03e13 and 9066b07.

📒 Files selected for processing (1)
  • clients/agent-runtime/src/pre_execution/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pr-checks
  • GitHub Check: sonar
  • GitHub Check: semgrep-cloud-platform/scan
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (4)
clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

  • clients/agent-runtime/src/pre_execution/mod.rs
clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

  • clients/agent-runtime/src/pre_execution/mod.rs
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

  • clients/agent-runtime/src/pre_execution/mod.rs
**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

  • clients/agent-runtime/src/pre_execution/mod.rs
🔇 Additional comments (5)
clients/agent-runtime/src/pre_execution/mod.rs (5)

110-124: LGTM — clean custom equality helper.

Taking ownership and delegating to assert_eq! on the inner types is the right call given IngressDecision doesn't implement PartialEq. The fallthrough arm's panic! with {context} gives enough signal to diagnose variant mismatches.


543-670: LGTM — intentional golden-file regression guard.

Order-sensitive exact match is the right call for a freeze test. Using #[test] (sync) instead of #[tokio::test] is correct since default_registry() and .iter() are synchronous.


754-812: LGTM — solid transport parity regression guard.

Using include_blocking_fallback: false correctly prevents the test from hitting the evaluate() codepath. Including /tool toggle shell (which hits InvalidArguments) confirms the parity check covers error paths too, not just success paths.


814-852: LGTM — plan-mode invariance correctly asserted.


854-953: LGTM — real SQLite-backed authz scenario with correct lifecycle setup.

The temp_dir binding correctly outlives memory. The cli_wrong_owner.clone() before assert_same_ingress_decision is necessary since the value is consumed by the helper then re-used for the kind assertion at line 943. Missing-scope and wrong-owner failure paths are both transport-stable.

Comment on lines +716 to +734
match decision {
IngressDecision::SessionCommand {
outcome: SessionCommandOutcome::Failure(failure),
} => {
assert_eq!(failure.command, command, "command drift for {prompt}");
assert_eq!(failure.kind, kind, "failure kind drift for {prompt}");
assert_eq!(failure.message, message, "message drift for {prompt}");
if message.starts_with("invalid slash command usage") {
assert_eq!(
failure.session_id, None,
"argument-shape failures stay pre-context"
);
} else {
assert_eq!(failure.session_id, Some("session-1".to_string()));
}
}
other => panic!("prompt {prompt} should fail as session command, got {other:?}"),
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make per-case session_id expectation explicit rather than derived from message prefix.

The message.starts_with("invalid slash command usage") heuristic at line 723 couples the session_id assertion to a message prefix. For current cases it works, but adding a new case whose message happens to start with that prefix (or not) would silently apply the wrong assertion. The exact message equality is already asserted, so this second check should use an explicit expected value per case.

🛠️ Proposed fix: add `expected_session_id` to the cases tuple
     let cases = [
         (
             "/tools extra",
             "/tools",
             SessionCommandFailureKind::InvalidArguments,
             "invalid slash command usage for /tools: this command does not accept trailing arguments",
+            None::<&str>,
         ),
         (
             "/mcp",
             "/mcp",
             SessionCommandFailureKind::InvalidArguments,
             "invalid slash command usage for /mcp: a subcommand argument is required",
+            None::<&str>,
         ),
         (
             "/tool toggle shell",
             "/tool",
             SessionCommandFailureKind::InvalidArguments,
             "Unknown /tool subcommand: 'toggle'. Use enable or disable.",
+            Some("session-1"),
         ),
         (
             "/session archive",
             "/session",
             SessionCommandFailureKind::InvalidArguments,
             "Unknown /session subcommand: 'archive'. Usage: /session, /session status, /session inspect, or /session list",
+            Some("session-1"),
         ),
     ];

-    for (prompt, command, kind, message) in cases {
+    for (prompt, command, kind, message, expected_session_id) in cases {
         // ...
             assert_eq!(failure.command, command, "command drift for {prompt}");
             assert_eq!(failure.kind, kind, "failure kind drift for {prompt}");
             assert_eq!(failure.message, message, "message drift for {prompt}");
-            if message.starts_with("invalid slash command usage") {
-                assert_eq!(
-                    failure.session_id, None,
-                    "argument-shape failures stay pre-context"
-                );
-            } else {
-                assert_eq!(failure.session_id, Some("session-1".to_string()));
-            }
+            assert_eq!(
+                failure.session_id,
+                expected_session_id.map(str::to_string),
+                "session_id drift for {prompt}"
+            );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@clients/agent-runtime/src/pre_execution/mod.rs` around lines 716 - 734, The
test currently derives the expected session_id by checking
message.starts_with("invalid slash command usage"), which is brittle; update the
test cases to include an explicit expected_session_id value for each case and
assert failure.session_id equals that expected value instead of using the
message-prefix heuristic. Locate the match arm handling
IngressDecision::SessionCommand (pattern binding outcome:
SessionCommandOutcome::Failure(failure)) and replace the conditional on
message.starts_with(...) with an assertion comparing failure.session_id to the
new expected_session_id field from the test case tuple (e.g., use the same tuple
that supplies command, kind, message to also supply expected_session_id),
keeping the existing equality checks for failure.command, failure.kind, and
failure.message.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

@yacosta738 yacosta738 merged commit 420d5bd into main May 5, 2026
17 checks passed
@yacosta738 yacosta738 deleted the test/slash-command-regression-543 branch May 5, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:rust size/m Denotes a medium change size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slash Command Regression Test Suite

1 participant