test(agent-runtime): cover slash command regressions#783
Conversation
Deploying corvus with
|
| Latest commit: |
7ca8ab4
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://cb840d80.corvus-42x.pages.dev |
| Branch Preview URL: | https://test-slash-command-regressio.corvus-42x.pages.dev |
📝 WalkthroughSummary by CodeRabbit
WalkthroughA test helper function and five regression tests are added to the pre-execution module to verify slash-command behavior: registry stability, invalid input normalization, CLI/gateway transport parity, plan-mode invariance, and authorization enforcement across transports. ChangesSlash-Command Regression Test Suite
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@clients/agent-runtime/src/pre_execution/mod.rs`:
- Around line 716-734: The test currently derives the expected session_id by
checking message.starts_with("invalid slash command usage"), which is brittle;
update the test cases to include an explicit expected_session_id value for each
case and assert failure.session_id equals that expected value instead of using
the message-prefix heuristic. Locate the match arm handling
IngressDecision::SessionCommand (pattern binding outcome:
SessionCommandOutcome::Failure(failure)) and replace the conditional on
message.starts_with(...) with an assertion comparing failure.session_id to the
new expected_session_id field from the test case tuple (e.g., use the same tuple
that supplies command, kind, message to also supply expected_session_id),
keeping the existing equality checks for failure.command, failure.kind, and
failure.message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a71684a8-5de0-465c-a70a-d35690626be6
📒 Files selected for processing (1)
clients/agent-runtime/src/pre_execution/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pr-checks
- GitHub Check: sonar
- GitHub Check: semgrep-cloud-platform/scan
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (4)
clients/agent-runtime/src/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Files:
clients/agent-runtime/src/pre_execution/mod.rs
clients/agent-runtime/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Run
cargo fmt --all -- --check,cargo clippy --all-targets -- -D warnings, andcargo testfor code validation, or document which checks were skipped and why
Files:
clients/agent-runtime/src/pre_execution/mod.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/agent-runtime/src/pre_execution/mod.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/agent-runtime/src/pre_execution/mod.rs
🔇 Additional comments (5)
clients/agent-runtime/src/pre_execution/mod.rs (5)
110-124: LGTM — clean custom equality helper.Taking ownership and delegating to
assert_eq!on the inner types is the right call givenIngressDecisiondoesn't implementPartialEq. The fallthrough arm'spanic!with{context}gives enough signal to diagnose variant mismatches.
543-670: LGTM — intentional golden-file regression guard.Order-sensitive exact match is the right call for a freeze test. Using
#[test](sync) instead of#[tokio::test]is correct sincedefault_registry()and.iter()are synchronous.
754-812: LGTM — solid transport parity regression guard.Using
include_blocking_fallback: falsecorrectly prevents the test from hitting theevaluate()codepath. Including/tool toggle shell(which hitsInvalidArguments) confirms the parity check covers error paths too, not just success paths.
814-852: LGTM — plan-mode invariance correctly asserted.
854-953: LGTM — real SQLite-backed authz scenario with correct lifecycle setup.The
temp_dirbinding correctly outlivesmemory. Thecli_wrong_owner.clone()beforeassert_same_ingress_decisionis necessary since the value is consumed by the helper then re-used for the kind assertion at line 943. Missing-scope and wrong-owner failure paths are both transport-stable.
| match decision { | ||
| IngressDecision::SessionCommand { | ||
| outcome: SessionCommandOutcome::Failure(failure), | ||
| } => { | ||
| assert_eq!(failure.command, command, "command drift for {prompt}"); | ||
| assert_eq!(failure.kind, kind, "failure kind drift for {prompt}"); | ||
| assert_eq!(failure.message, message, "message drift for {prompt}"); | ||
| if message.starts_with("invalid slash command usage") { | ||
| assert_eq!( | ||
| failure.session_id, None, | ||
| "argument-shape failures stay pre-context" | ||
| ); | ||
| } else { | ||
| assert_eq!(failure.session_id, Some("session-1".to_string())); | ||
| } | ||
| } | ||
| other => panic!("prompt {prompt} should fail as session command, got {other:?}"), | ||
| } | ||
| } |
There was a problem hiding this comment.
Make per-case session_id expectation explicit rather than derived from message prefix.
The message.starts_with("invalid slash command usage") heuristic at line 723 couples the session_id assertion to a message prefix. For current cases it works, but adding a new case whose message happens to start with that prefix (or not) would silently apply the wrong assertion. The exact message equality is already asserted, so this second check should use an explicit expected value per case.
🛠️ Proposed fix: add `expected_session_id` to the cases tuple
let cases = [
(
"/tools extra",
"/tools",
SessionCommandFailureKind::InvalidArguments,
"invalid slash command usage for /tools: this command does not accept trailing arguments",
+ None::<&str>,
),
(
"/mcp",
"/mcp",
SessionCommandFailureKind::InvalidArguments,
"invalid slash command usage for /mcp: a subcommand argument is required",
+ None::<&str>,
),
(
"/tool toggle shell",
"/tool",
SessionCommandFailureKind::InvalidArguments,
"Unknown /tool subcommand: 'toggle'. Use enable or disable.",
+ Some("session-1"),
),
(
"/session archive",
"/session",
SessionCommandFailureKind::InvalidArguments,
"Unknown /session subcommand: 'archive'. Usage: /session, /session status, /session inspect, or /session list",
+ Some("session-1"),
),
];
- for (prompt, command, kind, message) in cases {
+ for (prompt, command, kind, message, expected_session_id) in cases {
// ...
assert_eq!(failure.command, command, "command drift for {prompt}");
assert_eq!(failure.kind, kind, "failure kind drift for {prompt}");
assert_eq!(failure.message, message, "message drift for {prompt}");
- if message.starts_with("invalid slash command usage") {
- assert_eq!(
- failure.session_id, None,
- "argument-shape failures stay pre-context"
- );
- } else {
- assert_eq!(failure.session_id, Some("session-1".to_string()));
- }
+ assert_eq!(
+ failure.session_id,
+ expected_session_id.map(str::to_string),
+ "session_id drift for {prompt}"
+ );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@clients/agent-runtime/src/pre_execution/mod.rs` around lines 716 - 734, The
test currently derives the expected session_id by checking
message.starts_with("invalid slash command usage"), which is brittle; update the
test cases to include an explicit expected_session_id value for each case and
assert failure.session_id equals that expected value instead of using the
message-prefix heuristic. Locate the match arm handling
IngressDecision::SessionCommand (pattern binding outcome:
SessionCommandOutcome::Failure(failure)) and replace the conditional on
message.starts_with(...) with an assertion comparing failure.session_id to the
new expected_session_id field from the test case tuple (e.g., use the same tuple
that supplies command, kind, message to also supply expected_session_id),
keeping the existing equality checks for failure.command, failure.kind, and
failure.message.
|



Related Issues
Closes #543
Summary
Adds slash-command regression coverage for the current registry-backed session commands so the command-platform rework has stable behavior guards.
/resumemissing-scope and wrong-owner failures across transports.Tested Information
cargo test session_command_regression -- --nocapturecargo test pre_execution::tests:: -- --nocaptureBoth commands pass locally.
Documentation Impact
Breaking Changes
None.
Checklist