refactor(agent-loop): consolidate and unify runtime execution#123
Conversation
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces the legacy agent loop with a stream-based AgentLoop and step-wise Agent APIs; adds DispatchAction and tool-risk/approval checks; introduces retry/backoff canonical execution and SSE preview flows; removes Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AgentLoop
participant Agent
participant Provider
participant Dispatcher
participant Tool
Client->>AgentLoop: submit prompt (session)
AgentLoop->>Agent: prepare_turn(user_message)
Agent->>Provider: chat(effective_model, user_message)
Provider-->>Agent: LLM response (text / tool_call)
Agent->>Dispatcher: check_tool_risk(tool_name, args)
alt ApprovalRequired
Dispatcher-->>AgentLoop: ApprovalRequired(tool)
AgentLoop-->>Client: emit ApprovalRequired event
Client->>AgentLoop: approval granted/denied
AgentLoop->>Dispatcher: resume with approval
else Execute
Dispatcher->>Tool: execute(tool_name, args)
Tool-->>Dispatcher: ToolExecutionResult(output, success, action)
Dispatcher-->>Agent: tool result
Agent->>AgentLoop: update history / emit ToolDispatchCompleted
end
AgentLoop->>Agent: step -> final_text or continue
AgentLoop-->>Client: emit Complete(final_text) / Error
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Contributor ReportUser: @yacosta738
Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-03-03 to 2026-03-03 |
There was a problem hiding this comment.
Actionable comments posted: 16
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
clients/agent-runtime/src/agent/agent.rs (1)
401-407:⚠️ Potential issue | 🟠 Major
ToolExecutionResult.successis incorrect for failures.Line 404 hardcodes
success: true, even when the tool errors or is unknown, which breaks downstream status semantics.✅ Suggested correctness fix
- let result = if let Some(tool) = self.tools.iter().find(|t| t.name() == call.name) { + let (result, success) = if let Some(tool) = self.tools.iter().find(|t| t.name() == call.name) { match tool.execute(call.arguments.clone()).await { Ok(r) => { self.observer.record_event(&ObserverEvent::ToolCall { tool: call.name.clone(), duration: start.elapsed(), success: r.success, }); if r.success { - r.output + (r.output, true) } else { - format!("Error: {}", r.error.unwrap_or(r.output)) + (format!("Error: {}", r.error.unwrap_or(r.output)), false) } } Err(e) => { self.observer.record_event(&ObserverEvent::ToolCall { tool: call.name.clone(), duration: start.elapsed(), success: false, }); - format!("Error executing {}: {e}", call.name) + (format!("Error executing {}: {e}", call.name), false) } } } else { - format!("Unknown tool: {}", call.name) + (format!("Unknown tool: {}", call.name), false) }; ToolExecutionResult { name: call.name.clone(), output: result, - success: true, + success, tool_call_id: call.tool_call_id.clone(), action: crate::agent::dispatcher::DispatchAction::Execute, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/agent/agent.rs` around lines 401 - 407, The ToolExecutionResult is being constructed with success: true unconditionally; change it to reflect the actual outcome (e.g., success: result.is_ok() or success: matches!(result, Ok(_))) and ensure any unknown-tool branch sets success = false so downstream status semantics are correct — update the constructor in agent::agent.rs where ToolExecutionResult { name: call.name.clone(), output: result, ... } is created to compute success from the result value rather than hardcoding true.
🟡 Minor comments (6)
openspec/changes/archive/2026-03-03-agent-loop/archive-report.md-6-6 (1)
6-6:⚠️ Potential issue | 🟡 MinorUpdate verification prerequisite path to the archived location.
Line 6 points to a pre-archive path (
openspec/changes/agent-loop/...), which will be stale once archived.📝 Suggested fix
-**Verification prerequisite**: PASS (`openspec/changes/agent-loop/verify-report.md`) +**Verification prerequisite**: PASS (`openspec/changes/archive/2026-03-03-agent-loop/verify-report.md`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-03-03-agent-loop/archive-report.md` at line 6, Update the verification prerequisite path in archive-report.md so it points to the archived location instead of the pre-archive path; replace the reference "openspec/changes/agent-loop/verify-report.md" with the archived path (e.g., under the archive directory structure) so the line reading "**Verification prerequisite**: PASS (`openspec/changes/agent-loop/verify-report.md`)" references the correct archived file.clients/agent-runtime/tests/cli_loop_events_e2e.rs-7-15 (1)
7-15:⚠️ Potential issue | 🟡 MinorHarden environment isolation for CLI E2E commands.
These
Commandinvocations inherit parent env, so pre-setCORVUS_UNIFIED_*vars can make outcomes host-dependent and flaky.Also applies to: 32-41, 53-60, 71-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/tests/cli_loop_events_e2e.rs` around lines 7 - 15, The Command invocations (the Command::new(...).args(...).env(...).output() calls used to run the corvus binary) inherit the parent process environment which can make tests flaky; call env_clear() on each Command before setting the required env vars and then explicitly set HOME, XDG_CONFIG_HOME, CORVUS_UNIFIED_LOOP_PREVIEW, CORVUS_UNIFIED_LOOP_ONLY and RUST_LOG (and any other CORVUS_UNIFIED_* vars used in the other blocks at lines 32-41, 53-60, 71-80) so the CLI E2E tests run with a deterministic environment and do not pick up host-level variables.openspec/changes/archive/2026-03-03-agent-loop/proposal.md-99-100 (1)
99-100:⚠️ Potential issue | 🟡 MinorReplace unresolved placeholder in dependency path.
Line 99 still uses
{argument}and will not resolve to a real artifact path.📝 Suggested fix
-- Existing exploration artifact: `openspec/changes/{argument}/exploration.md`. +- Existing exploration artifact: `openspec/changes/archive/2026-03-03-agent-loop/exploration.md`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-03-03-agent-loop/proposal.md` around lines 99 - 100, The path string "openspec/changes/{argument}/exploration.md" in proposal.md contains the unresolved placeholder "{argument}"; replace it with the real change/artifact slug (e.g., the change folder name "2026-03-03-agent-loop" or the intended artifact identifier) so the reference becomes "openspec/changes/2026-03-03-agent-loop/exploration.md" (update the string in the proposal.md file accordingly).openspec/changes/archive/2026-03-03-agent-loop/verify-report.md-9-9 (1)
9-9:⚠️ Potential issue | 🟡 MinorFix heading level increment in markdown structure.
The heading jumps from H1 to H3; insert an H2 level to satisfy markdownlint MD001.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-03-03-agent-loop/verify-report.md` at line 9, The markdown jumps from an H1 to an H3 at the "Completeness" section causing MD001; change the "### Completeness" heading to an H2 (i.e. "## Completeness") or insert an intermediate H2 heading above it so the document has a proper H1→H2→H3 hierarchy; update the "Completeness" heading token (###) to "##" to fix the heading level increment.openspec/changes/archive/2026-03-03-agent-loop/design.md-100-108 (1)
100-108:⚠️ Potential issue | 🟡 MinorUpdate spec contract to match implemented loop event names.
The interface block still documents
LoopEvent::Done(...), while the runtime/tests in this PR useLoopEvent::Complete(...). Please align the design artifact to avoid API drift in archived docs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-03-03-agent-loop/design.md` around lines 100 - 108, The design doc's LoopEvent enum is out of sync: it documents LoopEvent::Done(FinalResponse) but runtime/tests use LoopEvent::Complete(...). Update the spec in the design.md to replace the Done variant with Complete and ensure the variant signature matches the implementation (e.g., Complete(FinalResponse) or whatever the runtime uses), updating any surrounding references to LoopEvent::Done to LoopEvent::Complete so the archived contract matches the code.clients/agent-runtime/tests/agent_loop_integration.rs-163-164 (1)
163-164:⚠️ Potential issue | 🟡 MinorUse an isolated workspace directory for this test.
Using
std::env::temp_dir()shares state across tests/processes and can make this integration test flaky.💡 Suggested fix
#[tokio::test] async fn full_prompt_tool_response_cycle_with_dummy_provider() { + let workspace = tempfile::TempDir::new().unwrap(); let provider = Box::new(IntegrationProvider { calls: Mutex::new(vec![ @@ let mut agent = Agent::builder() @@ - .workspace_dir(std::env::temp_dir()) + .workspace_dir(workspace.path().to_path_buf()) .build() .unwrap();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/tests/agent_loop_integration.rs` around lines 163 - 164, The test should use an isolated temporary workspace instead of std::env::temp_dir(): create a tempfile::TempDir (or assert_fs::TempDir) at the start of the test, pass its path to the .workspace_dir(...) call (referenced where .workspace_dir(std::env::temp_dir()) is used), and keep the TempDir variable in scope for the duration of the test so the directory isn't deleted prematurely.
🧹 Nitpick comments (3)
openspec/config.yaml (1)
31-33: Test execution rule is too permissiveLine 31 says to run tests only if infrastructure exists; in this repo, test infrastructure is part of the standard stack, so this wording can allow avoidable skips.
Suggested fix
checks: - - Run tests if test infrastructure exists + - Run tests for all affected modules/packages - Compare implementation against every spec scenarioAs per coding guidelines, "Testing: Verify that tests cover critical paths and follow established naming conventions."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/config.yaml` around lines 31 - 33, Update the permissive test execution note so tests run by default: replace the conditional wording "- Run tests if test infrastructure exists" with a definitive statement that tests are executed as part of the standard stack and ensure the test_command key remains "make test"; adjust the adjacent description "- Compare implementation against every spec scenario" if needed to reflect mandatory execution and confirm naming conventions and critical-path coverage are enforced in the repository's test policy referenced by test_command.clients/agent-runtime/src/main.rs (1)
445-447:tool_callsis currently a dead API parameter.
collect_unified_loop_resultignores_tool_calls, butcollect_unified_loop_eventsstill accepts and forwards it. Either use it or remove it to avoid misleading call sites.Also applies to: 531-536
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/main.rs` around lines 445 - 447, collect_unified_loop_result currently ignores the _tool_calls parameter while collect_unified_loop_events still accepts and forwards it; update the API so it’s not misleading by either (A) using the tool_calls value inside collect_unified_loop_result (and adjust callers) to influence behavior, or (B) remove the parameter from collect_unified_loop_result and stop forwarding it from collect_unified_loop_events (and update all call sites). Locate the symbols collect_unified_loop_result and collect_unified_loop_events in the diff, decide which approach fits existing logic, then consistently remove or propagate the tool_calls parameter across both functions and their callers so there is no dead/unused parameter.clients/agent-runtime/tests/legacy_loop_guard.rs (1)
5-9: Make the re-export guard less brittle.This assertion only catches one exact formatting of the re-export. Consider asserting absence of broader
loop_public export patterns so harmless formatting/style changes don’t create false negatives.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/tests/legacy_loop_guard.rs` around lines 5 - 9, The current assertion checks for one exact string and is brittle; update the test that reads into mod_rs to search for broader public re-export patterns of loop_ instead of the exact "pub use loop_::{process_message, run};". Modify the assertion to fail if mod_rs matches a regex or contains patterns like "pub use loop_", "pub use loop_::", or any "pub(crate) use ...loop_", i.e. use a regex (e.g., r"pub\s+(?:crate\s+)?use\s+[^;]*\bloop_\b") or multiple contains checks on the mod_rs string to catch variations in spacing, visibility, or symbol lists while preserving the same assert/expect flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 520-527: In the agent runtime step where you push assistant
responses to history (the block that calls
ConversationMessage::Chat(ChatMessage::assistant(...)) and then uses
print!("{text}") and std::io::stdout().flush()), remove the direct stdout write
and synchronous flush: stop printing raw model text and avoid cloning the text
for that purpose (remove text.clone()). If an emission/telemetry callback is
required instead, call a provided non-blocking event/observer API or emit via a
logger that redacts sensitive content (e.g., use a debug/info path that
sanitizes the text) rather than writing to stdout; keep only the
history.push(ConversationMessage::Chat(ChatMessage::assistant(...))) and ensure
no blocking I/O or secret data is logged.
- Around line 534-537: The current flow calls self.execute_tools(&calls) without
invoking ToolDispatcher::check_tool_risk, allowing approval-gated tools to run;
update the sequence in the agent run path so that before calling execute_tools
you call self.tool_dispatcher.check_tool_risk(&calls) (or similar) to evaluate
each parsed call, handle the returned gating result (e.g., require user
approval, abort, or filter out disallowed calls), and only pass approved calls
into self.execute_tools(&approved_calls). After execution keep the existing
formatting/history steps (self.tool_dispatcher.format_results,
self.history.push, self.trim_history) but ensure results reflect only executed,
approved tools and that unapproved calls are not executed.
In `@clients/agent-runtime/src/agent/dispatcher.rs`:
- Around line 35-43: The current check_tool_risk in function check_tool_risk
defaults unknown tool names to DispatchAction::Execute, which is too permissive;
change the fallback to require approval for unrecognized tools (e.g., return
DispatchAction::ApprovalRequired(tool_name.to_string()) or an explicit
ApprovalRequired variant) so any new aliases or unknown tool names cannot bypass
approval; update match in check_tool_risk to only allow automatic Execute for
the known safe names ("shell", "bash", "execute_command" if intended) and treat
the _ arm as ApprovalRequired, referencing the check_tool_risk function and the
DispatchAction enum for where to apply the change.
In `@clients/agent-runtime/src/agent/unified_entrypoint.rs`:
- Around line 50-150: The retry loop currently executes AgentLoop::run twice per
retry cycle (first into current_events and again into retried), causing
duplicated side effects; change the control flow so each loop iteration runs
AgentLoop::run exactly once: move creation of next_config/update of options
(timeout/step_duration/backoff) and the prompt "tool-failure" fallback check so
they happen after detecting a recoverable error and before the next iteration,
remove the second run block that collects into retried, and ensure retries_used,
options.max_retries checks and the UnifiedExecutionResult returns are performed
consistently around that single run; keep references to AgentLoop::new, run,
retries_used, options.step_duration, options.backoff_millis,
prompt.contains("tool-failure"), and UnifiedExecutionResult to locate the
affected code.
- Around line 95-101: The shift 1u32 << retries_used can overflow when
retries_used >= 32; change computation of timeout_multiplier in
unified_entrypoint.rs to use checked_shl (e.g.,
1u32.checked_shl(retries_used.into())) and handle None by capping the multiplier
(use u32::MAX or a sensible max like (1u32 << 31)) so the code does not panic;
then use that safe timeout_multiplier when computing next_config.timeout (and
keep the existing min Duration::from_millis(1) guards for options.step_duration
and timeout).
In `@clients/agent-runtime/src/agent/unified_loop.rs`:
- Around line 46-80: The pending_approval Mutex can retain state across runs; in
run() acquire self.pending_approval.lock() at the start and set *pending = None
to clear any stale approval before proceeding, and similarly update the resume()
method to clear or validate pending_approval (e.g., set to None on successful
resume or reject if it points to a different run) so resume() never operates on
leftover state from prior runs; reference the pending_approval field and the
run() and resume() functions when making these changes.
- Line 52: The event stream currently includes raw prompt text by constructing
LoopEvent::LLMProgress(prompt.to_string()), which can leak sensitive content;
change the emission to avoid embedding the full prompt (e.g., send a redacted
summary, a length/character count, a hash, or a prompt_id) so no raw prompt
payload is placed into the events vec; update the creation site where events is
set (the vec![LoopEvent::Start, LoopEvent::LLMProgress(...)] expression) to use
a sanitized representation (or call a helper like redact_prompt or
make_prompt_id) and ensure LoopEvent consumers expect that sanitized format.
- Around line 75-80: The approval gating currently checks prompt text via
prompt.contains("needs-approval") which is bypassable; instead determine
approval by the tool's risk classification or explicit approval flag. Update the
logic in unified_loop (where idx == 0, pending_approval.lock(), and
LoopEvent::ApprovalRequired(tool_name) are used) to query the tool metadata or
risk assessor (e.g., a ToolDescriptor.needs_approval / Tool::risk_level or a
RiskAssessmentService.classify(tool_name)) and only set *pending_approval and
emit LoopEvent::ApprovalRequired(tool_name) when that classification indicates
high risk; remove reliance on prompt.contains and ensure the same
pending_approval and early return behavior stays intact.
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 444-445: The current approval logic uses
msg.content.contains("#approve") which allows anyone to self-approve risky
actions; change approval_granted so it no longer inspects free-form message
text. Instead rely only on the CORVUS_UNIFIED_APPROVE environment flag or an
explicit, structured approval field (e.g., a boolean field on the message like
msg.metadata.approved or an authorization token/header), or check sender
privileges via an auth/ACL check before granting approval; update the approval
check around the approval_granted variable and any downstream tests to use the
structured/authorized approval mechanism rather than
msg.content.contains("#approve").
In `@clients/agent-runtime/src/cron/scheduler.rs`:
- Around line 139-141: The success arm in the match is discarding the agent's
actual output by returning a constant string; change the match on run_result to
capture and pass through the successful payload (e.g., use Ok(output) => (true,
output)) instead of Ok(()) => ("agent job executed"), and ensure sensitive-data
handling is done at persistence/delivery boundaries by applying redaction inside
persist_job_result and announce_delivery (or their equivalents) rather than
replacing the agent response here.
In `@clients/agent-runtime/src/gateway/mod.rs`:
- Around line 1715-1720: The code computes approval_granted by allowing either
the environment variable CORVUS_UNIFIED_APPROVE or the untrusted header
"X-Unified-Approve" to grant approval; remove the header-based check so
approval_granted is derived only from server-side policy (e.g.,
std::env::var("CORVUS_UNIFIED_APPROVE") comparison) and do not consult
headers.get("X-Unified-Approve") or the associated to_str/trim/map chain in the
approval_granted assignment (update the variable initialization where
approval_granted is defined to drop the header branch).
- Around line 677-684: The SSE frame currently injects the entire payload on one
data: line which allows newlines in loop_event_payload() to break the frame;
change map_loop_event_to_sse_frame to sanitize by normalizing CRLF (e.g.,
replace "\r\n" with "\n"), split the payload into lines, and emit one "data:
{line}\n" per sanitized line (joining them together) instead of a single
interpolated data line, then assemble the frame with the existing id and event
headers (use loop_event_name and loop_event_payload to locate the values).
In `@clients/agent-runtime/src/main.rs`:
- Around line 622-628: The current info!(?event, "Unified loop preview event")
call logs full unified-loop payloads (preview_events / event /
canonical.session_id) which may include secrets; replace this with logging only
non-sensitive metadata (e.g., event type/kind, canonical.session_id, timestamps,
tool name) or sanitize/scrub sensitive fields before logging, and avoid printing
full event structs at info level (if needed keep full payloads only under
trace/debug behind an explicit scrubber); update the logging site that uses
preview_events/event to extract and log safe fields instead of ?event.
- Around line 635-654: The current branches for approval_required,
timeout_aborted, and fallback_response print a message but return Ok(()) which
falsely signals success; update each branch in the function that handles
`canonical` to return an explicit Err value instead of Ok(()) using the crate's
error type (e.g., an anyhow::Error or the local error enum) and include
contextual information (session id and the reason/fallback/tool) in the error.
Specifically, replace the Ok(()) returns in the `if let Some(tool) =
canonical.approval_required`, `if canonical.timeout_aborted`, and `if let
Some(fallback) = canonical.fallback_response` branches with Err(...) bearing
clear messages referencing `canonical.session_id`, `tool`, and `fallback` as
appropriate so callers receive non-zero/error results for
blocked/aborted/fallbacked requests.
In `@openspec/config.yaml`:
- Around line 31-35: The coverage policy is effectively disabled because
coverage_threshold is set to 0; update the openspec configuration by raising
coverage_threshold to a non-zero interim value (e.g., 50–70) to enforce coverage
checks, and ensure the test_command ("make test") and any CI step that reads
coverage_threshold will fail the build when coverage falls below this new value;
modify the coverage_threshold entry accordingly and, if needed, add a comment
documenting the baseline and plan to increment it over time.
---
Outside diff comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 401-407: The ToolExecutionResult is being constructed with
success: true unconditionally; change it to reflect the actual outcome (e.g.,
success: result.is_ok() or success: matches!(result, Ok(_))) and ensure any
unknown-tool branch sets success = false so downstream status semantics are
correct — update the constructor in agent::agent.rs where ToolExecutionResult {
name: call.name.clone(), output: result, ... } is created to compute success
from the result value rather than hardcoding true.
---
Minor comments:
In `@clients/agent-runtime/tests/agent_loop_integration.rs`:
- Around line 163-164: The test should use an isolated temporary workspace
instead of std::env::temp_dir(): create a tempfile::TempDir (or
assert_fs::TempDir) at the start of the test, pass its path to the
.workspace_dir(...) call (referenced where .workspace_dir(std::env::temp_dir())
is used), and keep the TempDir variable in scope for the duration of the test so
the directory isn't deleted prematurely.
In `@clients/agent-runtime/tests/cli_loop_events_e2e.rs`:
- Around line 7-15: The Command invocations (the
Command::new(...).args(...).env(...).output() calls used to run the corvus
binary) inherit the parent process environment which can make tests flaky; call
env_clear() on each Command before setting the required env vars and then
explicitly set HOME, XDG_CONFIG_HOME, CORVUS_UNIFIED_LOOP_PREVIEW,
CORVUS_UNIFIED_LOOP_ONLY and RUST_LOG (and any other CORVUS_UNIFIED_* vars used
in the other blocks at lines 32-41, 53-60, 71-80) so the CLI E2E tests run with
a deterministic environment and do not pick up host-level variables.
In `@openspec/changes/archive/2026-03-03-agent-loop/archive-report.md`:
- Line 6: Update the verification prerequisite path in archive-report.md so it
points to the archived location instead of the pre-archive path; replace the
reference "openspec/changes/agent-loop/verify-report.md" with the archived path
(e.g., under the archive directory structure) so the line reading
"**Verification prerequisite**: PASS
(`openspec/changes/agent-loop/verify-report.md`)" references the correct
archived file.
In `@openspec/changes/archive/2026-03-03-agent-loop/design.md`:
- Around line 100-108: The design doc's LoopEvent enum is out of sync: it
documents LoopEvent::Done(FinalResponse) but runtime/tests use
LoopEvent::Complete(...). Update the spec in the design.md to replace the Done
variant with Complete and ensure the variant signature matches the
implementation (e.g., Complete(FinalResponse) or whatever the runtime uses),
updating any surrounding references to LoopEvent::Done to LoopEvent::Complete so
the archived contract matches the code.
In `@openspec/changes/archive/2026-03-03-agent-loop/proposal.md`:
- Around line 99-100: The path string
"openspec/changes/{argument}/exploration.md" in proposal.md contains the
unresolved placeholder "{argument}"; replace it with the real change/artifact
slug (e.g., the change folder name "2026-03-03-agent-loop" or the intended
artifact identifier) so the reference becomes
"openspec/changes/2026-03-03-agent-loop/exploration.md" (update the string in
the proposal.md file accordingly).
In `@openspec/changes/archive/2026-03-03-agent-loop/verify-report.md`:
- Line 9: The markdown jumps from an H1 to an H3 at the "Completeness" section
causing MD001; change the "### Completeness" heading to an H2 (i.e. "##
Completeness") or insert an intermediate H2 heading above it so the document has
a proper H1→H2→H3 hierarchy; update the "Completeness" heading token (###) to
"##" to fix the heading level increment.
---
Nitpick comments:
In `@clients/agent-runtime/src/main.rs`:
- Around line 445-447: collect_unified_loop_result currently ignores the
_tool_calls parameter while collect_unified_loop_events still accepts and
forwards it; update the API so it’s not misleading by either (A) using the
tool_calls value inside collect_unified_loop_result (and adjust callers) to
influence behavior, or (B) remove the parameter from collect_unified_loop_result
and stop forwarding it from collect_unified_loop_events (and update all call
sites). Locate the symbols collect_unified_loop_result and
collect_unified_loop_events in the diff, decide which approach fits existing
logic, then consistently remove or propagate the tool_calls parameter across
both functions and their callers so there is no dead/unused parameter.
In `@clients/agent-runtime/tests/legacy_loop_guard.rs`:
- Around line 5-9: The current assertion checks for one exact string and is
brittle; update the test that reads into mod_rs to search for broader public
re-export patterns of loop_ instead of the exact "pub use
loop_::{process_message, run};". Modify the assertion to fail if mod_rs matches
a regex or contains patterns like "pub use loop_", "pub use loop_::", or any
"pub(crate) use ...loop_", i.e. use a regex (e.g.,
r"pub\s+(?:crate\s+)?use\s+[^;]*\bloop_\b") or multiple contains checks on the
mod_rs string to catch variations in spacing, visibility, or symbol lists while
preserving the same assert/expect flow.
In `@openspec/config.yaml`:
- Around line 31-33: Update the permissive test execution note so tests run by
default: replace the conditional wording "- Run tests if test infrastructure
exists" with a definitive statement that tests are executed as part of the
standard stack and ensure the test_command key remains "make test"; adjust the
adjacent description "- Compare implementation against every spec scenario" if
needed to reflect mandatory execution and confirm naming conventions and
critical-path coverage are enforced in the repository's test policy referenced
by test_command.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (23)
clients/agent-runtime/src/agent/agent.rsclients/agent-runtime/src/agent/dispatcher.rsclients/agent-runtime/src/agent/loop_.rsclients/agent-runtime/src/agent/mod.rsclients/agent-runtime/src/agent/tests.rsclients/agent-runtime/src/agent/unified_entrypoint.rsclients/agent-runtime/src/agent/unified_loop.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/cron/scheduler.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/main.rsclients/agent-runtime/tests/agent_loop_integration.rsclients/agent-runtime/tests/cli_loop_events_e2e.rsclients/agent-runtime/tests/legacy_loop_guard.rsopenspec/changes/archive/2026-03-03-agent-loop/archive-report.mdopenspec/changes/archive/2026-03-03-agent-loop/design.mdopenspec/changes/archive/2026-03-03-agent-loop/exploration.mdopenspec/changes/archive/2026-03-03-agent-loop/proposal.mdopenspec/changes/archive/2026-03-03-agent-loop/specs/agent-loop/spec.mdopenspec/changes/archive/2026-03-03-agent-loop/tasks.mdopenspec/changes/archive/2026-03-03-agent-loop/verify-report.mdopenspec/config.yamlopenspec/specs/agent-loop/spec.md
💤 Files with no reviewable changes (1)
- clients/agent-runtime/src/agent/loop_.rs
Deploying corvus with
|
| Latest commit: |
28d3d8a
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://59e6d05a.corvus-42x.pages.dev |
| Branch Preview URL: | https://feature-dallay-136-agent-loo.corvus-42x.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 12
♻️ Duplicate comments (2)
clients/agent-runtime/src/main.rs (1)
641-647:⚠️ Potential issue | 🟠 MajorAvoid emitting full event payloads in preview output.
Line 641 prints
loop_event={event:?}; event payloads can include model/tool/error text and leak sensitive content into stdout/log collectors. Prefer emitting only event kind + safe metadata.As per coding guidelines:
clients/agent-runtime/src/**/*.rs: "Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/main.rs` around lines 641 - 647, The preview currently prints the entire event payload via println!("loop_event={event:?}") which can leak sensitive data; remove that debug print and instead emit only non-sensitive metadata (e.g., use event_kind and canonical.session_id already passed to info!) or a redacted event identifier; update the code around the println and the info! call (references: the println! with event, the info! macro using canonical.session_id and event_kind) so no raw event payload or Debug formatting of `event` is written to stdout or logs.clients/agent-runtime/src/agent/unified_entrypoint.rs (1)
106-120:⚠️ Potential issue | 🟠 MajorRemove prompt-keyword test hooks from production control flow.
Lines 106-120 and 143-152 let user text (
"tool-failure","timeout","needs-approval") directly force fallback/timeout/approval behavior. That makes runtime semantics content-triggerable and can be abused or cause false outcomes in real usage.🛠️ Safer pattern
pub struct UnifiedExecutionConfig { pub tool_calls: usize, pub step_duration: Duration, pub max_retries: u8, pub backoff_millis: u64, + pub deterministic_test_hooks: bool, } impl Default for UnifiedExecutionConfig { fn default() -> Self { Self { tool_calls: 1, step_duration: Duration::from_millis(1), max_retries: 1, backoff_millis: 25, + deterministic_test_hooks: false, } } } @@ - if prompt.contains("tool-failure") { + if options.deterministic_test_hooks && prompt.contains("tool-failure") { @@ - if prompt.contains("timeout") { + if options.deterministic_test_hooks && prompt.contains("timeout") { @@ - if prompt.contains("needs-approval") { + if options.deterministic_test_hooks && prompt.contains("needs-approval") {Also applies to: 143-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/agent/unified_entrypoint.rs` around lines 106 - 120, The code currently checks prompt.contains(...) to force fallbacks/timeouts/approvals (the branches that push LoopEvent::Error/Complete and return UnifiedExecutionResult with used_fallback), which must be removed from production; instead, gate all such keyword-triggered behaviors behind a dedicated test-only flag or configuration (e.g., a boolean field like enable_test_triggers passed into the unified entrypoint function or struct) and only execute the existing branches (the prompt.contains checks and the associated creation of LoopEvent::Error/Complete and return of UnifiedExecutionResult with session_id, events, retries_used, used_fallback) when that flag is true; update the function/method that owns prompt handling to read the new flag and ensure default is false in production, and remove direct reliance on prompt contents for runtime control flow.
🧹 Nitpick comments (2)
openspec/changes/archive/2026-03-03-agent-loop/verify-report.md (1)
44-44: Consider introducing a non-zero coverage gate.
coverage_threshold: 0keeps verification permissive; even a modest baseline would strengthen regression protection for this refactor.As per coding guidelines, “Testing: Verify that tests cover critical paths and follow established naming conventions”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-03-03-agent-loop/verify-report.md` at line 44, The "Coverage" block currently sets coverage_threshold: 0; change the coverage gate to a non-zero baseline by updating the coverage_threshold key in the "Coverage" section (verify-report.md) to an appropriate value for this project (e.g., 70–80 for strict projects or a conservative 20–40 as a minimum), and ensure the CI/verification pipeline that reads coverage_threshold enforces the new value so regressions are blocked; adjust any related test/coverage configs if the pipeline reads a different source.openspec/changes/archive/2026-03-03-agent-loop/proposal.md (1)
105-109: Make success criteria measurable and test-gated.These criteria are directionally good but too qualitative for phased enforcement. Add explicit thresholds (e.g., max latency regression %, retry ceiling, approval/risk conformance test pass requirements) so go/no-go decisions are objective.
As per coding guidelines: "Testing: Verify that tests cover critical paths..." and "Architecture: Ensure changes respect the multi-module structure and contract boundaries between KMP and Rust sidecars."Proposed tightening of success criteria
- [ ] A canonical Agent Loop contract is documented and approved for CLI, channels, and gateway. - [ ] Entrypoint behavior differences are either eliminated or explicitly specified with justification. - [ ] Session scoping and approval/risk invariants are consistently enforced across all loop surfaces. - [ ] Migration achieves parity with existing production behavior before legacy path removal. - [ ] No critical security regressions and no unacceptable performance regressions during phased rollout. + - [ ] Canonical contract approved and mapped to executable conformance tests for CLI/channels/gateway. + - [ ] Any remaining entrypoint semantic differences are documented with owner, rationale, and expiry date. + - [ ] 100% pass on approval/risk/session invariant tests across all entrypoints. + - [ ] Parity validated by baseline-vs-unified golden tests before legacy path removal. + - [ ] Security: zero critical/high findings in rollout validation. + - [ ] Performance: p95 turn latency regression <= 10% and no increase in timeout/error budget burn.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-03-03-agent-loop/proposal.md` around lines 105 - 109, Update the checklist items in the proposal to be measurable and test-gated: for each bullet (e.g., "Agent Loop contract", "Entrypoint behavior differences", "Session scoping and approval/risk invariants", "Migration parity", "No critical security/regressions") replace qualitative language with explicit thresholds and pass criteria (for example: "max latency regression ≤ X% measured over Y runs", "retry ceiling ≤ N attempts", "approval/risk conformance: ≥ 99% green on the ApprovalConformanceTest suite", "migration parity: all ProdBehaviorTests pass with zero discrepancies", "security: no High/Critical findings in SCA/pen-test and CI denies merge on new high findings"), and add corresponding test gates named in the doc (e.g., ApprovalConformanceTest, ProdBehaviorTests, PerformanceRegressionTest, SecurityGate) that must pass before rollout; ensure each criterion includes how it is measured (metric, window, and acceptance threshold) and that KMP/Rust contract boundaries are validated by specific cross-module tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 521-529: The code currently pushes both ConversationMessage::Chat
(via ChatMessage::assistant) and ConversationMessage::AssistantToolCalls,
causing duplicate assistant entries when a tool-call response is returned
(affects self.history in agent.rs and flows like XmlToolDispatcher). Update the
logic in the method that handles responses so you only push the plain Chat entry
when response.tool_calls is empty and push
ConversationMessage::AssistantToolCalls (with response.text and
response.tool_calls) when tool_calls is non-empty; reference the symbols
self.history, ConversationMessage::Chat, ChatMessage::assistant,
ConversationMessage::AssistantToolCalls, and XmlToolDispatcher to locate and
change the branching accordingly.
- Around line 531-550: The current logic collects gated_results and
approved_calls separately then appends executed_results to gated_results, which
can reorder outputs versus the original calls; change this to preserve original
call order by: (1) when iterating calls with tool_dispatcher.check_tool_risk,
record gated ToolExecutionResult into a temporary HashMap keyed by
call.tool_call_id and collect approved_calls as before; (2) await
self.execute_tools(&approved_calls) to get executed_results and insert each
executed ToolExecutionResult into the same HashMap keyed by tool_call_id; (3)
finally iterate the original calls vector and push into the final results Vec
the entry from the HashMap for each call.tool_call_id so the output order
matches the input order (use the call.tool_call_id string as the unique key and
functions involved: tool_dispatcher.check_tool_risk, execute_tools,
DispatchAction, approved_calls, gated_results).
In `@clients/agent-runtime/src/agent/dispatcher.rs`:
- Around line 45-47: The current prefix-based rule that returns
DispatchAction::Execute for any name where name.starts_with("file_") ||
name.starts_with("memory_") is unsafe because it will auto-approve future
high-risk tools; replace this ad-hoc prefix check with an explicit allowlist or
registry lookup (e.g., SAFE_TOOL_NAMES.contains(&name) or
ToolRegistry::is_safe(name)) and default to a gated action (e.g., require
approval) for unknown names; update the match arm that currently returns
DispatchAction::Execute to consult that allowlist/registry and only return
DispatchAction::Execute for explicitly known safe tool names, otherwise return a
conservative action (like RequestApproval or DispatchAction::RequireApproval)
and log the event for auditability.
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 2930-2943: The test mutates global env var CORVUS_UNIFIED_APPROVE
around an await (process_channel_message) which can race with other tests;
serialize access by guarding the env-var mutation with a static mutex (e.g., a
once_cell::sync::Lazy<Mutex<()>> or lazy_static<MUTEX>) and lock it before
calling std::env::set_var, keep the lock across the await and
std::env::remove_var, or alternately mark the test to run serially; update the
test that calls process_channel_message to acquire the static lock, set the env
var, await process_channel_message(...), then remove the env var and drop the
lock to ensure no concurrent tests observe the temporary override.
- Around line 311-323: The code currently normalizes XML tool-aliases only when
params.dispatcher_mode == "xml", but create_channel_dispatcher may return
XmlToolDispatcher even for other dispatcher_mode values (fallback path), so move
the normalization decision to check the actual dispatcher implementation instead
of the raw string; in practice, detect whether the chosen dispatcher is
XmlToolDispatcher (or whether dispatcher implements/uses XML parsing) when
building response_for_parse and call normalize_xml_tool_aliases on response.text
accordingly (update the branch that constructs response_for_parse and reference
params.dispatcher_mode, create_channel_dispatcher, XmlToolDispatcher,
normalize_xml_tool_aliases, and the response_for_parse /
crate::providers::ChatResponse creation to ensure XML aliases are normalized
whenever XmlToolDispatcher is in use).
In `@clients/agent-runtime/src/gateway/mod.rs`:
- Around line 2678-2689: These tests mutate the global env var
CORVUS_GATEWAY_UNIFIED_LOOP_PREVIEW around calling handle_webhook (State(state)
/ test_connect_info()), which can race in parallel test runs; make the env
change isolated by either using a test-level serialization (e.g., apply a global
async Mutex or the serial_test crate to serialize tests that touch env) or by
using a scoped/env-restore helper that saves the prior value and restores it in
a finally block; update both the block around the handle_webhook call shown
(using std::env::set_var/remove_var) and the similar block at the other
occurrence (lines referenced 2769-2783) to use the chosen isolation pattern so
the env var is always restored even on panic/failure.
- Around line 719-721: Remove the ad-hoc test hooks that change runtime behavior
based on message content: delete the prompt.contains("timeout") conditional that
sets config.timeout = Duration::from_millis(1) (and the analogous logic around
lines 1829–1838) and ensure timeout/tool-call behavior is driven only by
explicit configuration or flags (e.g., RuntimeConfig/preview options or
env/config getters) rather than inspecting the `prompt` string; update any
callers of `config.timeout` or tool-invocation logic to respect the explicit
configuration API and add a clear debug/feature-flag if needed for local testing
instead of embedding content-based switches.
- Around line 1724-1731: The canonical path currently runs full request
processing via run_canonical_outcome -> execute_with_retry_backoff (which builds
an AgentLoop) and on success falls through to call simple_chat, causing
duplicate execution; fix by short-circuiting on successful canonical results:
after awaiting run_canonical_outcome, detect non-error outcomes (i.e., not
approval_required, timeout_aborted, or fallback_response-without-used_fallback)
and return or forward that canonical result immediately instead of calling
simple_chat, or alternatively refactor run_canonical_outcome to perform
validation-only (not invoke execute_with_retry_backoff/AgentLoop) so no full
processing occurs in the canonical phase—apply the chosen change at the gateway
flow that currently calls run_canonical_outcome and later calls simple_chat.
In `@clients/agent-runtime/tests/legacy_loop_guard.rs`:
- Around line 19-20: The current check in legacy_loop_guard.rs only looks for
the literal "agent::loop_" which misses grouped imports like
"agent::{loop_::run}"; update the detection logic so it flags either the direct
path or grouped imports by checking for content.contains("agent::loop_") OR
(content.contains("agent::{") && content.contains("loop_")); apply the same
change to the other occurrence on lines 29-30 so both assertions detect grouped
import syntax such as agent::{loop_::...}.
- Line 5: The test uses relative paths like
fs::read_to_string("src/agent/mod.rs") and Path::new("src/...") which break when
tests run from the workspace root; update these to build paths from the
CARGO_MANIFEST_DIR (e.g., use std::env::var("CARGO_MANIFEST_DIR") and
PathBuf::from(manifest_dir).join("src/...")) for each occurrence (references:
the fs::read_to_string call that reads "src/agent/mod.rs" and other Path::new
usages around lines noted), and ensure existence checks likewise use the
manifest-rooted PathBuf; additionally add a short comment at the top of
clients/agent-runtime/tests/legacy_loop_guard.rs stating which validation checks
were run or skipped (cargo fmt --all -- --check, cargo clippy --all-targets --
-D warnings, cargo test) and why.
In `@openspec/changes/archive/2026-03-03-agent-loop/proposal.md`:
- Line 54: The proposal currently references the legacy compatibility file
loop_.rs as the rollback target but that file may be removed; update the
document to specify a concrete post-merge fallback/rollback mechanism instead of
a removed module: either (A) describe a feature-flagged unified loop path (e.g.,
"agent::loop::unified" behind a runtime/feature flag) that can be switched back
to at runtime, or (B) explicitly require retaining a small compatibility shim
named loop_.rs that exposes the old API and forwards to the new implementation;
apply the same replacement for the other occurrences noted (the mentions at
lines 86-87) so the rollback path is executable after merge.
In `@openspec/changes/archive/2026-03-03-agent-loop/verify-report.md`:
- Line 79: Fix the wording in the sentence that currently reads "Verification
still requires targeted Cargo tests in addition to `make build` and `make test`
due current Gradle task wiring (`:agent-runtime:cargoTest` skipped)." by
inserting the missing preposition so it reads "due to current Gradle task
wiring"; update that exact sentence in verify-report.md accordingly.
---
Duplicate comments:
In `@clients/agent-runtime/src/agent/unified_entrypoint.rs`:
- Around line 106-120: The code currently checks prompt.contains(...) to force
fallbacks/timeouts/approvals (the branches that push LoopEvent::Error/Complete
and return UnifiedExecutionResult with used_fallback), which must be removed
from production; instead, gate all such keyword-triggered behaviors behind a
dedicated test-only flag or configuration (e.g., a boolean field like
enable_test_triggers passed into the unified entrypoint function or struct) and
only execute the existing branches (the prompt.contains checks and the
associated creation of LoopEvent::Error/Complete and return of
UnifiedExecutionResult with session_id, events, retries_used, used_fallback)
when that flag is true; update the function/method that owns prompt handling to
read the new flag and ensure default is false in production, and remove direct
reliance on prompt contents for runtime control flow.
In `@clients/agent-runtime/src/main.rs`:
- Around line 641-647: The preview currently prints the entire event payload via
println!("loop_event={event:?}") which can leak sensitive data; remove that
debug print and instead emit only non-sensitive metadata (e.g., use event_kind
and canonical.session_id already passed to info!) or a redacted event
identifier; update the code around the println and the info! call (references:
the println! with event, the info! macro using canonical.session_id and
event_kind) so no raw event payload or Debug formatting of `event` is written to
stdout or logs.
---
Nitpick comments:
In `@openspec/changes/archive/2026-03-03-agent-loop/proposal.md`:
- Around line 105-109: Update the checklist items in the proposal to be
measurable and test-gated: for each bullet (e.g., "Agent Loop contract",
"Entrypoint behavior differences", "Session scoping and approval/risk
invariants", "Migration parity", "No critical security/regressions") replace
qualitative language with explicit thresholds and pass criteria (for example:
"max latency regression ≤ X% measured over Y runs", "retry ceiling ≤ N
attempts", "approval/risk conformance: ≥ 99% green on the
ApprovalConformanceTest suite", "migration parity: all ProdBehaviorTests pass
with zero discrepancies", "security: no High/Critical findings in SCA/pen-test
and CI denies merge on new high findings"), and add corresponding test gates
named in the doc (e.g., ApprovalConformanceTest, ProdBehaviorTests,
PerformanceRegressionTest, SecurityGate) that must pass before rollout; ensure
each criterion includes how it is measured (metric, window, and acceptance
threshold) and that KMP/Rust contract boundaries are validated by specific
cross-module tests.
In `@openspec/changes/archive/2026-03-03-agent-loop/verify-report.md`:
- Line 44: The "Coverage" block currently sets coverage_threshold: 0; change the
coverage gate to a non-zero baseline by updating the coverage_threshold key in
the "Coverage" section (verify-report.md) to an appropriate value for this
project (e.g., 70–80 for strict projects or a conservative 20–40 as a minimum),
and ensure the CI/verification pipeline that reads coverage_threshold enforces
the new value so regressions are blocked; adjust any related test/coverage
configs if the pipeline reads a different source.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (15)
clients/agent-runtime/src/agent/agent.rsclients/agent-runtime/src/agent/dispatcher.rsclients/agent-runtime/src/agent/unified_entrypoint.rsclients/agent-runtime/src/agent/unified_loop.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/main.rsclients/agent-runtime/tests/agent_loop_integration.rsclients/agent-runtime/tests/cli_loop_events_e2e.rsclients/agent-runtime/tests/legacy_loop_guard.rsopenspec/changes/archive/2026-03-03-agent-loop/archive-report.mdopenspec/changes/archive/2026-03-03-agent-loop/design.mdopenspec/changes/archive/2026-03-03-agent-loop/proposal.mdopenspec/changes/archive/2026-03-03-agent-loop/verify-report.mdopenspec/config.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
- openspec/config.yaml
- openspec/changes/archive/2026-03-03-agent-loop/design.md
- clients/agent-runtime/src/agent/unified_loop.rs
…nsolidation-and-migration
…ng with unified entrypoint configuration
…factoring-consolidation-and-migration' into feature/dallay-136-agent-loop-refactoring-consolidation-and-migration
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
clients/agent-runtime/src/agent/agent.rs (1)
654-661:⚠️ Potential issue | 🟠 Major
--peripheraloverrides are currently ignored in runtime execution.
runaccepts_peripheral_overridesbut never applies or rejects them, so the CLI option is effectively a silent no-op in this path.Based on learnings: “Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/agent/agent.rs` around lines 654 - 661, The run function currently accepts _peripheral_overrides but ignores them, silently making the CLI --peripheral flag a no-op; update run (and the parameter name) to either apply the overrides to the runtime Config or explicitly reject them: if you don't implement peripheral handling, change the parameter from _peripheral_overrides to peripheral_overrides and add a guard near the start of run that checks if peripheral_overrides is non-empty and returns an Err with a clear message saying peripheral overrides are unsupported; alternatively, if peripherals are supported elsewhere, forward peripheral_overrides into the Config or the component that consumes peripherals so they're actually applied (reference the run function and the _peripheral_overrides / peripheral_overrides variable and Config struct to locate where to validate or forward).
♻️ Duplicate comments (2)
clients/agent-runtime/src/agent/dispatcher.rs (1)
6-18:⚠️ Potential issue | 🟠 MajorDefault-allow risk fallback still weakens approval gating.
check_tool_riskstill returnsExecutefor the_branch, andSAFE_TOOL_NAMESis not consulted. This means newly added tool names can run without explicit classification.Suggested hardening
fn check_tool_risk(&self, tool_name: &str, _arguments: &Value) -> DispatchAction { - match tool_name { + let normalized = tool_name.to_ascii_lowercase(); + match normalized.as_str() { "shell" | "bash" | "execute_command" => { DispatchAction::ApprovalRequired(tool_name.to_string()) } - // All other tools - agent has already verified they exist - // Unknown tools are handled by the agent (executed and return "Unknown tool" error) - _ => DispatchAction::Execute, + name if SAFE_TOOL_NAMES.contains(&name) => DispatchAction::Execute, + _ => DispatchAction::ApprovalRequired(format!("unclassified tool: {tool_name}")), } }As per coding guidelines:
**/*— “Security: Check for vulnerabilities in encryption, crypto code, and configuration handling.”Also applies to: 52-60
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/agent/dispatcher.rs` around lines 6 - 18, The risk-check currently defaults to allowing execution because check_tool_risk returns Execute in the wildcard branch and SAFE_TOOL_NAMES is never used; update check_tool_risk to treat unknown tools as Deny (or RequireApproval) by default and explicitly consult SAFE_TOOL_NAMES (the SAFE_TOOL_NAMES slice) to allow only those known-safe names, e.g., change the '_' match to consult SAFE_TOOL_NAMES.contains(&tool_name) and return Execute only if present, otherwise return Deny/RequireApproval; ensure any callers of check_tool_risk handle the stricter Deny/RequireApproval return value.clients/agent-runtime/src/gateway/mod.rs (1)
1743-1751:⚠️ Potential issue | 🟠 MajorCanonical webhook path still falls through to a second model execution.
After
run_canonical_outcomesucceeds (no approval/timeout/fallback return), the handler continues tosimple_chat, so the request can be processed twice.Also applies to: 1782-1816
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/gateway/mod.rs` around lines 1743 - 1751, run_canonical_outcome can produce a final canonical response but the handler currently falls through and calls simple_chat causing duplicate model execution; update the handler around the call to crate::agent::unified_entrypoint::run_canonical_outcome (and the similar block at lines ~1782-1816) to inspect the returned outcome and short-circuit/return immediately when it indicates a final/handled result (e.g., approval, timeout, fallback, or canonical success) instead of continuing to call simple_chat; locate the unified_entrypoint::CanonicalOutcome return type/variants and add an early return or match arm that sends that response back to the client and prevents the subsequent simple_chat invocation.
🧹 Nitpick comments (4)
clients/agent-runtime/tests/legacy_loop_guard.rs (1)
1-10: LGTM! Path resolution and validation documentation properly addressed.The
runtime_pathhelper correctly roots all file accesses atCARGO_MANIFEST_DIR, fixing the flaky test execution from workspace root. The validation comment on lines 4-6 documents which checks were run/skipped per coding guidelines.One minor preference:
env!("CARGO_MANIFEST_DIR")(compile-time macro) is slightly more idiomatic thanstd::env::var()(runtime lookup), but both work correctly since Cargo always sets this variable during test runs.Optional: use compile-time env! macro
fn runtime_path(relative: &str) -> PathBuf { - PathBuf::from(std::env::var("CARGO_MANIFEST_DIR").expect("CARGO_MANIFEST_DIR")).join(relative) + PathBuf::from(env!("CARGO_MANIFEST_DIR")).join(relative) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/tests/legacy_loop_guard.rs` around lines 1 - 10, The runtime_path helper uses a runtime env lookup; replace the std::env::var("CARGO_MANIFEST_DIR") call in the runtime_path function with the compile-time env! macro (env!("CARGO_MANIFEST_DIR")) so the path is resolved at compile time—update runtime_path to join the provided relative path onto env!("CARGO_MANIFEST_DIR") while keeping the PathBuf return type and existing behavior.clients/agent-runtime/src/main.rs (1)
633-635: Avoid cloning preview events before iterating.
let preview_events = canonical.events.clone();is unnecessary here; iterate over&canonical.eventsto avoid extra allocation on large event sets.Based on learnings: “Applies to clients/agent-runtime/src/**/*.rs : Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/main.rs` around lines 633 - 635, The code clones canonical.events into preview_events before iterating, causing an unnecessary allocation; change the loop to iterate by reference over &canonical.events instead of cloning (remove let preview_events = canonical.events.clone() and use for event in &canonical.events { ... } when printing), so you avoid duplicating the event Vec and reduce memory/cpu overhead in the block that contains the println!("loop_event={event:?}"); ensure any use of event inside the loop accounts for it being a reference.clients/agent-runtime/src/channels/mod.rs (2)
1509-1512: Tool instructions are now duplicated in the system prompt.
build_system_promptalready emits a “Tool Use Protocol” section; appendingbuild_tool_instructions_textadds another large protocol block and increases token cost/conflict risk. Prefer one canonical generator.As per coding guidelines:
clients/agent-runtime/src/**/*.rs: Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 1509 - 1512, The system prompt is getting duplicate "Tool Use Protocol" text because build_system_prompt already emits tool instructions, so remove the extra generation: delete the tools_registry.iter().map(...).collect() into tool_specs and the system_prompt.push_str(&build_tool_instructions_text(&tool_specs)) call; instead rely on build_system_prompt to produce the canonical protocol (or, if build_system_prompt needs tool info, pass tools_registry (or a borrowed iterator) directly into build_system_prompt to avoid allocating a Vec and extra clones). Ensure no other code expects build_tool_instructions_text output here and remove any now-unused imports/variables.
291-293: Avoid rebuildingtool_specson every message.Line 291 allocates/spec-copies per request. Precompute once in runtime context (or use
crate::agent::dispatcher::tool_specs) and reuse a shared slice.As per coding guidelines:
clients/agent-runtime/src/**/*.rs: Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 291 - 293, The code currently rebuilds tool_specs on every message by mapping tools_registry.iter().map(|tool| tool.spec()).collect(), causing per-request allocations; instead precompute and store the Vec<crate::tools::ToolSpec> once in the runtime context (or use the existing crate::agent::dispatcher::tool_specs), expose it as a shared reference (e.g., Arc<Vec<...>> or &'static [ToolSpec]) and replace the per-request construction in mod.rs with a cheap clone/reference to that shared slice so handlers read from the precomputed tool_specs rather than recreating it each time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 546-569: The code currently discards the dispatcher-provided
reason by using matches! on self.tool_dispatcher.check_tool_risk(&call.name,
&call.arguments) and then reusing call.name as the ApprovalRequired reason;
instead call check_tool_risk directly, match its result to extract the
ApprovalRequired(reason) payload (retain the owned String or clone as needed),
set needs_approval based on that match, and use that extracted reason for the
ToolExecutionResult.output and action fields (where
DispatchAction::ApprovalRequired(reason) is inserted into results_by_call_id) so
the original dispatcher context from check_tool_risk is preserved instead of
being replaced with call.name; reference functions/types: check_tool_risk,
DispatchAction::ApprovalRequired, ToolExecutionResult, and results_by_call_id.
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 113-138: The function map_loop_event_to_channel_content is only
used by tests and triggers dead-code warnings in non-test builds; fix by
restricting its compilation to tests—either move it into the module's test code
or add a cfg attribute like #[cfg(test)] directly above fn
map_loop_event_to_channel_content so it is compiled only for tests (preserve the
existing signature and behavior); alternatively, if you intend it to be
available at runtime, make it pub(crate) instead of private.
- Around line 2939-2953: The env var is set then removed around an await call so
a panic in process_channel_message(...).await can skip cleanup; wrap the
set/remove in an RAII guard so the variable is always removed on scope exit:
after std::env::set_var("CORVUS_UNIFIED_APPROVE","1") create a guard (or use
scopeguard::guard) that calls std::env::remove_var("CORVUS_UNIFIED_APPROVE") in
Drop, then call process_channel_message(runtime_ctx, traits::ChannelMessage {
... }).await normally and let the guard drop to ensure cleanup even if the await
panics.
In `@openspec/changes/archive/2026-03-03-agent-loop/proposal.md`:
- Around line 86-88: Update the rollback paragraph to explicitly state that
rollback does not disable or weaken enforcement of approval, risk, or auth
checks and preserves deny-by-default behavior: when reverting via runtime flags
(`CORVUS_UNIFIED_LOOP_PREVIEW=0`, `CORVUS_UNIFIED_LOOP_ONLY=0`) and “disabling
convergence-specific adapters,” add a clarifying sentence that all
approval/risk/auth enforcement and deny-by-default access controls across
CLI/channel/gateway paths remain active and unchanged, and that only
convergence-specific adapter selection is toggled while canonical security
checks continue to execute.
---
Outside diff comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 654-661: The run function currently accepts _peripheral_overrides
but ignores them, silently making the CLI --peripheral flag a no-op; update run
(and the parameter name) to either apply the overrides to the runtime Config or
explicitly reject them: if you don't implement peripheral handling, change the
parameter from _peripheral_overrides to peripheral_overrides and add a guard
near the start of run that checks if peripheral_overrides is non-empty and
returns an Err with a clear message saying peripheral overrides are unsupported;
alternatively, if peripherals are supported elsewhere, forward
peripheral_overrides into the Config or the component that consumes peripherals
so they're actually applied (reference the run function and the
_peripheral_overrides / peripheral_overrides variable and Config struct to
locate where to validate or forward).
---
Duplicate comments:
In `@clients/agent-runtime/src/agent/dispatcher.rs`:
- Around line 6-18: The risk-check currently defaults to allowing execution
because check_tool_risk returns Execute in the wildcard branch and
SAFE_TOOL_NAMES is never used; update check_tool_risk to treat unknown tools as
Deny (or RequireApproval) by default and explicitly consult SAFE_TOOL_NAMES (the
SAFE_TOOL_NAMES slice) to allow only those known-safe names, e.g., change the
'_' match to consult SAFE_TOOL_NAMES.contains(&tool_name) and return Execute
only if present, otherwise return Deny/RequireApproval; ensure any callers of
check_tool_risk handle the stricter Deny/RequireApproval return value.
In `@clients/agent-runtime/src/gateway/mod.rs`:
- Around line 1743-1751: run_canonical_outcome can produce a final canonical
response but the handler currently falls through and calls simple_chat causing
duplicate model execution; update the handler around the call to
crate::agent::unified_entrypoint::run_canonical_outcome (and the similar block
at lines ~1782-1816) to inspect the returned outcome and short-circuit/return
immediately when it indicates a final/handled result (e.g., approval, timeout,
fallback, or canonical success) instead of continuing to call simple_chat;
locate the unified_entrypoint::CanonicalOutcome return type/variants and add an
early return or match arm that sends that response back to the client and
prevents the subsequent simple_chat invocation.
---
Nitpick comments:
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1509-1512: The system prompt is getting duplicate "Tool Use
Protocol" text because build_system_prompt already emits tool instructions, so
remove the extra generation: delete the tools_registry.iter().map(...).collect()
into tool_specs and the
system_prompt.push_str(&build_tool_instructions_text(&tool_specs)) call; instead
rely on build_system_prompt to produce the canonical protocol (or, if
build_system_prompt needs tool info, pass tools_registry (or a borrowed
iterator) directly into build_system_prompt to avoid allocating a Vec and extra
clones). Ensure no other code expects build_tool_instructions_text output here
and remove any now-unused imports/variables.
- Around line 291-293: The code currently rebuilds tool_specs on every message
by mapping tools_registry.iter().map(|tool| tool.spec()).collect(), causing
per-request allocations; instead precompute and store the
Vec<crate::tools::ToolSpec> once in the runtime context (or use the existing
crate::agent::dispatcher::tool_specs), expose it as a shared reference (e.g.,
Arc<Vec<...>> or &'static [ToolSpec]) and replace the per-request construction
in mod.rs with a cheap clone/reference to that shared slice so handlers read
from the precomputed tool_specs rather than recreating it each time.
In `@clients/agent-runtime/src/main.rs`:
- Around line 633-635: The code clones canonical.events into preview_events
before iterating, causing an unnecessary allocation; change the loop to iterate
by reference over &canonical.events instead of cloning (remove let
preview_events = canonical.events.clone() and use for event in &canonical.events
{ ... } when printing), so you avoid duplicating the event Vec and reduce
memory/cpu overhead in the block that contains the
println!("loop_event={event:?}"); ensure any use of event inside the loop
accounts for it being a reference.
In `@clients/agent-runtime/tests/legacy_loop_guard.rs`:
- Around line 1-10: The runtime_path helper uses a runtime env lookup; replace
the std::env::var("CARGO_MANIFEST_DIR") call in the runtime_path function with
the compile-time env! macro (env!("CARGO_MANIFEST_DIR")) so the path is resolved
at compile time—update runtime_path to join the provided relative path onto
env!("CARGO_MANIFEST_DIR") while keeping the PathBuf return type and existing
behavior.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
assets/corvus-isotype.svgis excluded by!**/*.svg
📒 Files selected for processing (10)
clients/agent-runtime/src/agent/agent.rsclients/agent-runtime/src/agent/dispatcher.rsclients/agent-runtime/src/agent/unified_entrypoint.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/main.rsclients/agent-runtime/tests/cli_loop_events_e2e.rsclients/agent-runtime/tests/legacy_loop_guard.rsopenspec/changes/archive/2026-03-03-agent-loop/proposal.mdopenspec/changes/archive/2026-03-03-agent-loop/verify-report.md
🚧 Files skipped from review as they are similar to previous changes (2)
- clients/agent-runtime/tests/cli_loop_events_e2e.rs
- clients/agent-runtime/src/agent/unified_entrypoint.rs
There was a problem hiding this comment.
♻️ Duplicate comments (4)
clients/agent-runtime/tests/legacy_loop_guard.rs (1)
16-18:⚠️ Potential issue | 🟠 MajorHarden legacy-loop detection to avoid misses and unrelated false positives.
Line 16’s checks can miss valid re-export forms like
pub use self::loop_::....
Lines 28-29 and 40-41 combine broad file-wide substring checks; this can fail whenagent::{...}and an unrelatedloop_appear in different places.Suggested patch
use std::fs; use std::path::PathBuf; @@ fn runtime_path(relative: &str) -> PathBuf { PathBuf::from(env!("CARGO_MANIFEST_DIR")).join(relative) } + +fn has_legacy_loop_reexport(content: &str) -> bool { + content.lines().any(|line| { + let line = line.trim_start(); + (line.starts_with("pub use ") || line.starts_with("pub(crate) use ")) + && line.contains("loop_::") + }) +} + +fn has_legacy_loop_reference(content: &str) -> bool { + content.lines().any(|line| { + let trimmed = line.trim_start(); + if trimmed.starts_with("//") { + return false; + } + let compact: String = trimmed.chars().filter(|c| !c.is_whitespace()).collect(); + compact.contains("agent::loop_") + || (compact.contains("agent::{") && compact.contains("loop_::")) + }) +} @@ let mod_rs = fs::read_to_string(runtime_path("src/agent/mod.rs")).expect("read agent/mod.rs"); assert!( - !mod_rs.contains("pub use loop_") - && !mod_rs.contains("pub(crate) use loop_") - && !mod_rs.contains("pub use loop_::"), + !has_legacy_loop_reexport(&mod_rs), "legacy loop re-export must be removed" ); } @@ let content = fs::read_to_string(runtime_path(path)).expect("read runtime entrypoint"); assert!( - !(content.contains("agent::loop_") - || (content.contains("agent::{") && content.contains("loop_"))), + !has_legacy_loop_reference(&content), "{path} should not call legacy loop module directly" ); @@ assert!( - !(content.contains("agent::loop_") - || (content.contains("agent::{") && content.contains("loop_"))), + !has_legacy_loop_reference(&content), "channels runtime should not import legacy loop module" ); }As per coding guidelines: “Testing: Verify that tests cover critical paths and follow established naming conventions”.
Also applies to: 28-29, 40-41
clients/agent-runtime/src/main.rs (1)
631-635:⚠️ Potential issue | 🟠 MajorPreview/canonical-only paths still emit raw loop event payloads.
println!("loop_event={event:?}")can expose sensitive prompt/tool/error payload data. Print only event kind (or scrubbed metadata) here as well.🧹 Suggested fix
- for event in &canonical.events { - println!("loop_event={event:?}"); + for event in &canonical.events { let event_kind = match event { crate::agent::unified_loop::LoopEvent::Start => "start", crate::agent::unified_loop::LoopEvent::LLMProgress(_) => "llm_progress", @@ crate::agent::unified_loop::LoopEvent::Complete(_) => "complete", crate::agent::unified_loop::LoopEvent::Error(_) => "error", }; + println!("loop_event_kind={event_kind}"); info!( session_id = %canonical.session_id, event_kind, @@ if std::env::var("CORVUS_UNIFIED_CANONICAL_ONLY").as_deref() == Ok("1") { println!("loop_session={}", canonical.session_id); - for event in canonical.events { - println!("loop_event={event:?}"); + for event in canonical.events { + let event_kind = match event { + crate::agent::unified_loop::LoopEvent::Start => "start", + crate::agent::unified_loop::LoopEvent::LLMProgress(_) => "llm_progress", + crate::agent::unified_loop::LoopEvent::ToolDispatchStarted(_) => "tool_dispatch_started", + crate::agent::unified_loop::LoopEvent::ToolDispatchCompleted(_) => "tool_dispatch_completed", + crate::agent::unified_loop::LoopEvent::CompactionTriggered => "compaction_triggered", + crate::agent::unified_loop::LoopEvent::ApprovalRequired(_) => "approval_required", + crate::agent::unified_loop::LoopEvent::Complete(_) => "complete", + crate::agent::unified_loop::LoopEvent::Error(_) => "error", + }; + println!("loop_event_kind={event_kind}"); } return Ok(()); }As per coding guidelines:
clients/agent-runtime/src/**/*.rs— “Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements.”Also applies to: 695-699
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/main.rs` around lines 631 - 635, The current preview code prints raw event payloads via println!("loop_event={event:?}") which can leak sensitive data; change it to emit only non-sensitive identifiers (e.g., event kinds or scrubbed metadata) by replacing the raw debug print with a limited print that references only event_kind or a sanitized field derived from each entry in canonical.events, and do the same for the other occurrence (the later println at the second block around the event handling). Locate the block using canonical.session_id and canonical.events and update the println calls so they never interpolate the full event (event) — only event_kind or a small whitelist of safe fields.clients/agent-runtime/src/agent/dispatcher.rs (1)
49-62:⚠️ Potential issue | 🟠 MajorDefault
check_tool_riskis still fail-open for unclassified tools.The
_ => DispatchAction::Executebranch auto-approves any new/renamed tool that is not explicitly matched, which weakens approval gating.🔐 Suggested hardening
fn check_tool_risk(&self, tool_name: &str, _arguments: &Value) -> DispatchAction { - // Only these specific commands require approval - they're the only truly risky operations - match tool_name { + let name = tool_name.to_ascii_lowercase(); + match name.as_str() { "shell" | "bash" | "execute_command" => { DispatchAction::ApprovalRequired(tool_name.to_string()) } - // All other tools - including those not in SAFE_TOOL_NAMES - are allowed - // The agent checks tool existence separately and handles unknown tools appropriately - _ => DispatchAction::Execute, + safe if SAFE_TOOL_NAMES.contains(&safe) => DispatchAction::Execute, + _ => DispatchAction::ApprovalRequired(format!("unclassified tool: {tool_name}")), } }As per coding guidelines:
**/*— “Security: Check for vulnerabilities in encryption, crypto code, and configuration handling.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/agent/dispatcher.rs` around lines 49 - 62, The check_tool_risk function currently "fails open" by executing any unrecognized tool; change it to fail-closed by requiring approval for unknown/unclassified tools: update check_tool_risk (and/or introduce a SAFE_TOOL_NAMES whitelist) so only explicitly approved safe tool names return DispatchAction::Execute and every other tool_name returns DispatchAction::ApprovalRequired(tool_name.to_string()); keep the existing risky-tool branch for "shell" | "bash" | "execute_command" but flip the `_` match arm from Execute to ApprovalRequired (or replace the match with a membership check against SAFE_TOOL_NAMES to return Execute only for those entries).clients/agent-runtime/src/agent/agent.rs (1)
522-534:⚠️ Potential issue | 🟠 MajorNative tool-call turns can still duplicate assistant context.
When
response.tool_callsis non-empty andtextis non-empty, this path pushes bothConversationMessage::ChatandConversationMessage::AssistantToolCalls, causing duplicate assistant content in the next provider request.♻️ Suggested fix
- // Push assistant text as separate Chat message (for intermediate text preservation) - if !text.is_empty() { - self.history - .push(ConversationMessage::Chat(ChatMessage::assistant(text))); - } - - // Also push tool calls message - if !response.tool_calls.is_empty() { + if self.tool_dispatcher.should_send_tool_specs() { self.history.push(ConversationMessage::AssistantToolCalls { text: response.text.clone(), tool_calls: response.tool_calls.clone(), }); + } else if !text.is_empty() { + self.history + .push(ConversationMessage::Chat(ChatMessage::assistant(text))); }As per coding guidelines:
clients/agent-runtime/src/**/*.rs— “Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/agent/agent.rs` around lines 522 - 534, The current logic pushes both ConversationMessage::Chat and ConversationMessage::AssistantToolCalls when text and response.tool_calls are non-empty, causing duplicate assistant content and extra clones; change the push logic in the Agent (the block that manipulates self.history) so that if response.tool_calls is non-empty you only push ConversationMessage::AssistantToolCalls (containing the assistant text and tool calls) and skip pushing ConversationMessage::Chat, and when constructing the AssistantToolCalls message avoid unnecessary cloning of response.text and response.tool_calls by taking ownership or moving them into the history entry (e.g., consume response fields or use into/move semantics) instead of calling .clone().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 522-534: The current logic pushes both ConversationMessage::Chat
and ConversationMessage::AssistantToolCalls when text and response.tool_calls
are non-empty, causing duplicate assistant content and extra clones; change the
push logic in the Agent (the block that manipulates self.history) so that if
response.tool_calls is non-empty you only push
ConversationMessage::AssistantToolCalls (containing the assistant text and tool
calls) and skip pushing ConversationMessage::Chat, and when constructing the
AssistantToolCalls message avoid unnecessary cloning of response.text and
response.tool_calls by taking ownership or moving them into the history entry
(e.g., consume response fields or use into/move semantics) instead of calling
.clone().
In `@clients/agent-runtime/src/agent/dispatcher.rs`:
- Around line 49-62: The check_tool_risk function currently "fails open" by
executing any unrecognized tool; change it to fail-closed by requiring approval
for unknown/unclassified tools: update check_tool_risk (and/or introduce a
SAFE_TOOL_NAMES whitelist) so only explicitly approved safe tool names return
DispatchAction::Execute and every other tool_name returns
DispatchAction::ApprovalRequired(tool_name.to_string()); keep the existing
risky-tool branch for "shell" | "bash" | "execute_command" but flip the `_`
match arm from Execute to ApprovalRequired (or replace the match with a
membership check against SAFE_TOOL_NAMES to return Execute only for those
entries).
In `@clients/agent-runtime/src/main.rs`:
- Around line 631-635: The current preview code prints raw event payloads via
println!("loop_event={event:?}") which can leak sensitive data; change it to
emit only non-sensitive identifiers (e.g., event kinds or scrubbed metadata) by
replacing the raw debug print with a limited print that references only
event_kind or a sanitized field derived from each entry in canonical.events, and
do the same for the other occurrence (the later println at the second block
around the event handling). Locate the block using canonical.session_id and
canonical.events and update the println calls so they never interpolate the full
event (event) — only event_kind or a small whitelist of safe fields.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
clients/agent-runtime/src/agent/agent.rsclients/agent-runtime/src/agent/dispatcher.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/main.rsclients/agent-runtime/tests/legacy_loop_guard.rsopenspec/changes/archive/2026-03-03-agent-loop/proposal.md
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
clients/agent-runtime/src/agent/agent.rs (1)
542-554:⚠️ Potential issue | 🟠 MajorUnknown-tool branch bypasses dispatcher fail-closed policy.
Line [542]–Line [554] skips
check_tool_riskwhen the tool is not locally registered. That means unknown names bypass the dispatcher’s defaultApprovalRequiredcontract and are forced down the execute path, which weakens risk signaling consistency.🧩 Suggested fix
- // Check if tool exists - unknown tools should be executed to return "Unknown tool" error - let tool_exists = self.tools.iter().any(|t| t.name() == call.name); - - // Only require approval for known risky tools, not unknown ones - // Extract the actual reason from check_tool_risk instead of using call.name - let (needs_approval, extracted_reason) = if tool_exists { - match self - .tool_dispatcher - .check_tool_risk(&call.name, &call.arguments) - { - DispatchAction::ApprovalRequired(reason) => (true, reason), - DispatchAction::Execute => (false, String::new()), - } - } else { - (false, String::new()) // Unknown tools will be executed and return "Unknown tool: {name}" - }; - - if needs_approval { + match self + .tool_dispatcher + .check_tool_risk(&call.name, &call.arguments) + { + DispatchAction::ApprovalRequired(extracted_reason) => { let key = call .tool_call_id .clone() .unwrap_or_else(|| format!("{}#{index}", call.name)); results_by_call_id.insert( key, ToolExecutionResult { name: call.name.clone(), output: format!("approval required before executing `{extracted_reason}`"), success: false, tool_call_id: call.tool_call_id.clone(), action: DispatchAction::ApprovalRequired(extracted_reason), }, ); - } else { - approved_calls.push(call.clone()); - approved_call_keys.push( - call.tool_call_id - .clone() - .unwrap_or_else(|| format!("{}#{index}", call.name)), - ); + } + DispatchAction::Execute => { + approved_calls.push(call.clone()); + approved_call_keys.push( + call.tool_call_id + .clone() + .unwrap_or_else(|| format!("{}#{index}", call.name)), + ); + } }Based on learnings: "Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable."
Also applies to: 556-578
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/agent/agent.rs` around lines 542 - 554, The unknown-tool branch currently bypasses the dispatcher risk check (the tool_exists conditional) and thus weakens the fail-closed policy; change the logic in the agent method that computes (needs_approval, extracted_reason) to always call self.tool_dispatcher.check_tool_risk(&call.name, &call.arguments) regardless of tool_exists, and treat non-local/unknown tools the same as any other result (handle DispatchAction::ApprovalRequired(reason) => (true, reason) and DispatchAction::Execute => (false, String::new())); ensure any downstream behavior that assumed unknown tools auto-execute (the else arm returning (false, String::new())) is removed so unknown names are subject to ApprovalRequired if the dispatcher signals it.clients/agent-runtime/src/agent/dispatcher.rs (1)
6-19:⚠️ Potential issue | 🟠 MajorAuto-executing side-effect tools weakens approval gating.
Line [8]–Line [15] currently classify write/network/delegation-style tools as safe, and Line [61] executes them without approval. That makes the risk gate too permissive for high-impact actions.
🛡️ Suggested hardening
const SAFE_TOOL_NAMES: &[&str] = &[ "file_read", - "file_write", - "memory_store", "memory_recall", - "memory_forget", - "schedule", - "delegate", - "composio", - "browser_open", "echo", "mock_price", "counter", ]; @@ match tool_name { "shell" | "bash" | "execute_command" => { DispatchAction::ApprovalRequired(tool_name.to_string()) } + "file_write" + | "memory_store" + | "memory_forget" + | "schedule" + | "delegate" + | "composio" + | "browser_open" => { + DispatchAction::ApprovalRequired(tool_name.to_string()) + } _ => { // Fail-closed: only tools in SAFE_TOOL_NAMES are allowed without approval if SAFE_TOOL_NAMES.contains(&tool_name) { DispatchAction::Execute } else {Based on learnings: "Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable."
Also applies to: 53-66
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/agent/dispatcher.rs` around lines 6 - 19, SAFE_TOOL_NAMES currently marks high-impact tools (e.g., "file_write", "delegate", "browser_open", "schedule") as safe and the dispatcher auto-executes them without approval; remove those from SAFE_TOOL_NAMES and enforce an explicit approval check in the dispatcher execution path. Update the SAFE_TOOL_NAMES constant to only include truly read-only/non-side-effect tools, and change the code path that uses SAFE_TOOL_NAMES (the dispatcher logic that decides auto-run/execute_tool or dispatch) to require an explicit approved flag or approval callback before running any tool not in a strict read-only allowlist; add/adjust unit tests to cover denied-by-default behavior and the explicit approval flow.
🧹 Nitpick comments (1)
clients/agent-runtime/src/main.rs (1)
634-651: Extract duplicated loop-event kind mapping to a helper.The same
matchappears twice. Centralizing it avoids drift and simplifies future event additions.♻️ Suggested refactor
+fn loop_event_kind(event: &crate::agent::unified_loop::LoopEvent) -> &'static str { + match event { + crate::agent::unified_loop::LoopEvent::Start => "start", + crate::agent::unified_loop::LoopEvent::LLMProgress(_) => "llm_progress", + crate::agent::unified_loop::LoopEvent::ToolDispatchStarted(_) => "tool_dispatch_started", + crate::agent::unified_loop::LoopEvent::ToolDispatchCompleted(_) => "tool_dispatch_completed", + crate::agent::unified_loop::LoopEvent::CompactionTriggered => "compaction_triggered", + crate::agent::unified_loop::LoopEvent::ApprovalRequired(_) => "approval_required", + crate::agent::unified_loop::LoopEvent::Complete(_) => "complete", + crate::agent::unified_loop::LoopEvent::Error(_) => "error", + } +} @@ - let event_kind = match event { - ... - }; + let event_kind = loop_event_kind(event); @@ - let event_kind = match event { - ... - }; + let event_kind = loop_event_kind(event);Also applies to: 699-716
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/main.rs` around lines 634 - 651, Extract the duplicated match into a single helper that maps crate::agent::unified_loop::LoopEvent to a static &str (e.g., fn loop_event_kind(event: &crate::agent::unified_loop::LoopEvent) -> &'static str), implement the match once inside that helper covering all variants shown (Start, LLMProgress, ToolDispatchStarted, ToolDispatchCompleted, CompactionTriggered, ApprovalRequired, Complete, Error), and replace both in-file occurrences of the inline match with calls to loop_event_kind(&event) so future event additions only need to be updated in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/agent/dispatcher.rs`:
- Around line 365-371: Remove the stale "Failing test for TDD" comment and
expand the test_risk_classification unit test to assert multiple classifications
from NativeToolDispatcher::check_tool_risk: keep the existing assertion that
"shell" returns DispatchAction::ApprovalRequired by calling
check_tool_risk("shell", ...), add at least one assertion that a known-safe tool
(e.g., "echo" or similar) returns DispatchAction::Allowed (or the appropriate
safe variant returned by check_tool_risk), and add another assertion that an
unknown tool (e.g., "unknown_tool") returns DispatchAction::Unknown or the
appropriate unknown/Default variant; reference the DispatchAction enum variants
and the check_tool_risk method to locate and update expectations accordingly.
---
Duplicate comments:
In `@clients/agent-runtime/src/agent/agent.rs`:
- Around line 542-554: The unknown-tool branch currently bypasses the dispatcher
risk check (the tool_exists conditional) and thus weakens the fail-closed
policy; change the logic in the agent method that computes (needs_approval,
extracted_reason) to always call
self.tool_dispatcher.check_tool_risk(&call.name, &call.arguments) regardless of
tool_exists, and treat non-local/unknown tools the same as any other result
(handle DispatchAction::ApprovalRequired(reason) => (true, reason) and
DispatchAction::Execute => (false, String::new())); ensure any downstream
behavior that assumed unknown tools auto-execute (the else arm returning (false,
String::new())) is removed so unknown names are subject to ApprovalRequired if
the dispatcher signals it.
In `@clients/agent-runtime/src/agent/dispatcher.rs`:
- Around line 6-19: SAFE_TOOL_NAMES currently marks high-impact tools (e.g.,
"file_write", "delegate", "browser_open", "schedule") as safe and the dispatcher
auto-executes them without approval; remove those from SAFE_TOOL_NAMES and
enforce an explicit approval check in the dispatcher execution path. Update the
SAFE_TOOL_NAMES constant to only include truly read-only/non-side-effect tools,
and change the code path that uses SAFE_TOOL_NAMES (the dispatcher logic that
decides auto-run/execute_tool or dispatch) to require an explicit approved flag
or approval callback before running any tool not in a strict read-only
allowlist; add/adjust unit tests to cover denied-by-default behavior and the
explicit approval flow.
---
Nitpick comments:
In `@clients/agent-runtime/src/main.rs`:
- Around line 634-651: Extract the duplicated match into a single helper that
maps crate::agent::unified_loop::LoopEvent to a static &str (e.g., fn
loop_event_kind(event: &crate::agent::unified_loop::LoopEvent) -> &'static str),
implement the match once inside that helper covering all variants shown (Start,
LLMProgress, ToolDispatchStarted, ToolDispatchCompleted, CompactionTriggered,
ApprovalRequired, Complete, Error), and replace both in-file occurrences of the
inline match with calls to loop_event_kind(&event) so future event additions
only need to be updated in one place.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
clients/agent-runtime/src/agent/agent.rsclients/agent-runtime/src/agent/dispatcher.rsclients/agent-runtime/src/agent/tests.rsclients/agent-runtime/src/main.rsclients/agent-runtime/tests/cli_loop_events_e2e.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- clients/agent-runtime/tests/cli_loop_events_e2e.rs
|




This pull request introduces significant improvements to the agent runtime, focusing on stepwise execution, risk-based tool dispatching, and a new unified entrypoint for agent execution. The changes enhance modularity, testability, and safety in tool invocation, and introduce robust retry and approval mechanisms.
Agent Execution Refactoring and Stepwise Processing:
Agentstruct to split the message handling intoprepare_turn(model selection and prompt setup) andstep(single step execution), enabling more granular control and easier testing of agent behavior. The mainturnmethod now repeatedly callsstepuntil a final result is produced or the iteration limit is reached. [1] [2] [3] [4] [5] [6]Tool Dispatching and Risk Management:
DispatchActionenum to distinguish between normal tool execution and cases where tool invocation requires explicit approval (e.g., for risky tools like shell commands). TheToolDispatchertrait now includes acheck_tool_riskmethod, and theToolExecutionResultstruct records the action taken. [1] [2]DispatchActionfield in tool execution results, and added a unit test for risk classification logic. [1] [2] [3] [4] [5] [6]Unified Entrypoint and Retry Logic:
unified_entrypointwith a robustexecute_with_retry_backofffunction that manages agent execution with retry and exponential backoff on recoverable errors, and provides fallback responses when retries are exhausted. Therun_canonical_outcomefunction encapsulates the canonical execution path, including approval and timeout handling.Module Organization and API Updates:
Agent execution and stepwise processing:
Agentto split message handling intoprepare_turnandstep, enabling granular, testable, and interruptible agent execution. The mainturnmethod now iterates overstepcalls. [1] [2] [3] [4] [5] [6]Tool dispatching and risk management:
DispatchActionandcheck_tool_riskto enforce approval for risky tool calls, and updatedToolExecutionResultand related tests to reflect this action. [1] [2] [3] [4] [5] [6] [7] [8]Unified entrypoint and retry logic:
unified_entrypointmodule withexecute_with_retry_backoffandrun_canonical_outcomefor robust, retryable, and approval-aware agent execution, including comprehensive tests.Module organization:
Summary by CodeRabbit
New Features
Refactor
Bug Fixes