Skip to content

feat: rig 0.31 upgrade, tool nudging, and invariant harness#292

Merged
jamiepine merged 3 commits intospacedriveapp:mainfrom
vsumner:fix/rig-review-closure
Mar 4, 2026
Merged

feat: rig 0.31 upgrade, tool nudging, and invariant harness#292
jamiepine merged 3 commits intospacedriveapp:mainfrom
vsumner:fix/rig-review-closure

Conversation

@vsumner
Copy link
Contributor

@vsumner vsumner commented Mar 3, 2026

feat: rig 0.31 upgrade, tool nudging, and invariant harness

Summary

Upgrades the rig LLM framework to 0.31, adds automatic tool usage nudging for workers, introduces an invariant-driven fault harness for regression testing, and fixes branch orphaning on failure. Includes vendored imap-proto fix for future-incompat warnings.

Motivation

Rig 0.31

We were on rig 0.30 which lacked structured outputs and had API inconsistencies. 0.31 brings prompt_typed<T>() for type-safe LLM extraction (used immediately in cortex profile generation), cleaner provider APIs, and reqwest 0.13 compatibility.

Tool Nudging

Workers often respond with "I'll help you with that" without calling tools, wasting tokens and stalling tasks. Tool nudging detects text-only responses early and retries with a nudge prompt.

Invariant Harness

The agent loop has implicit invariants (worker limits, cortex single-flight, retry budgets) that are hard to test. The harness provides property-based testing via deterministic fault injection.

Branch Orphaning Fix

When branches failed, they didn't send BranchResult, leaving the channel waiting forever. Now failures are properly signaled.

Changes

Dependencies

Package Before After
rig 0.30.0 0.31
reqwest 0.12 0.13
opentelemetry 0.29 0.31
schemars 0.8 1.2

New Files

File Purpose
src/agent/invariant_harness.rs Deterministic fault injection for worker limit, failure classification, event receiver, cortex single-flight, and retry budget invariants
docs/design-docs/tool-nudging.md Documentation for tool nudging behavior and policy
tests/tool_nudge.rs Integration tests for tool nudging public API
vendor/imap-proto-0.10.2/ Vendored imap-proto with future-incompat fixes

Modified

Core Hook System (src/hooks/spacebot.rs)

  • Added ToolNudgePolicy enum with process-scoped defaults (Worker=Enabled, Branch/Channel=Disabled)
  • Added prompt_with_tool_nudge_retry() for bounded retry loops
  • Added prompt_once() for single-shot prompts without retry
  • Added TOOL_NUDGE_PROMPT constant: "Please proceed and use the available tools."
  • Added history pruning to prevent synthetic nudge accumulation
  • Added secret scanning for tool arguments (prevents leaking secrets into tool calls)

Worker (src/agent/worker.rs)

  • Uses prompt_with_tool_nudge_retry() for main task loop
  • Follow-up prompts use prompt_once() with cloned hook
  • Tool nudging disabled for follow-ups to avoid annoying users

Cortex (src/agent/cortex.rs)

  • Profile synthesis now uses prompt_typed::<ProfileLlmResponse>() for type-safe extraction
  • Removes manual JSON parsing

Channel Dispatch (src/agent/channel_dispatch.rs)

Branch (src/agent/branch.rs)

  • Simplified construction via AgentCtx helper
  • Cleaner error propagation

Tools (src/tools.rs)

  • create_branch_tool_server() now takes ChannelState directly

API Changes

None. All changes are internal or additive:

  • Tool nudging is automatic and scoped by process type
  • Worker completion events maintain same shape
  • No config changes required

Testing

Invariant Harness (6 tests)

cargo test --lib invariant_harness
  • deterministic_worker_limit_invariant — worker cap enforcement
  • deterministic_failure_classification_invariant — success/cancel/fail mapping
  • deterministic_event_receiver_invariant — lagged/closed handling
  • deterministic_cortex_single_flight_invariant — max 1 inflight cortex
  • deterministic_retry_budget_invariant — max 4 retries
  • seeded_fault_sequence_preserves_invariants — 300 random faults

Tool Nudging (12 integration tests)

cargo test --test tool_nudge
  • Policy defaults by process type
  • Hook constants and helpers
  • Event emission (status, tool started, tool completed)
  • Hook action responses

Unit Tests (existing)

cargo test --lib hooks::spacebot::tests      # 10 tests
cargo test --lib channel_dispatch::tests      # 2 tests

Verification

# Build
$ cargo build --lib
✅ Finished

# Format
$ cargo fmt -- --check
✅ Clean

# Clippy
$ cargo clippy --tests
✅ Clean

# Tests
$ cargo test --lib invariant_harness          ✅ 6 passed
$ cargo test --test tool_nudge                ✅ 12 passed
$ cargo test --lib hooks::spacebot::tests     ✅ 10 passed
$ cargo test --lib channel_dispatch::tests    ✅ 2 passed

Future Work

  • Streaming: rig 0.31 has improved SSE support; we should implement streaming for webchat
  • Extractor module: Use rig's extractor for structured outputs beyond just cortex
  • EmbeddingModel trait: Implement rig's trait for memory/search standardization

Related


Note: The 19 failing config tests are pre-existing test isolation issues (mutex poisoning/env pollution) documented in .github/issues/test-isolation-failures.md. They fail when run together but pass individually, unrelated to this PR.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

Walkthrough

Refactors agent prompting to use hook-based prompt flows (SpacebotHook/CortexHook), implements a tool-nudging mechanism with bounded retries and history hygiene, introduces structured worker completion/error classification and new event error disposition handling, adds Cortex chat send locking and 409 Busy mapping, and includes many tests and docs updates.

Changes

Cohort / File(s) Summary
Docs & README
README.md, docs/content/docs/(core)/architecture.mdx, docs/design-docs/cortex-chat.md, docs/design-docs/tool-nudging.md
Bump Rig version in README; document Rig structured output for profile synthesis; clarify POST /api/cortex-chat/send 409 CONFLICT behavior and SSE example; add new Tool Nudging design doc describing policy, flow, history hygiene, metrics, and tests.
Hooks & Tool-Nudging Impl
src/hooks.rs, src/hooks/spacebot.rs
Add ToolNudgePolicy export and implement tool-nudge flow in SpacebotHook: prompt_with_tool_nudge_retry, prompt_once, nudge state, history pruning, guard_tool_result, tool event emissions, metrics, and extensive unit tests.
Agent core utilities & harness
src/agent.rs, src/agent/invariant_harness.rs
Add EventRecvDisposition enum and classify_event_recv_error, extract_last_assistant_text, and a new invariant_harness test module exercising agent/channel/worker invariants under fault sequences.
Agent - branch/channel/dispatch/worker/compactor/ingestion/cortex
src/agent/branch.rs, src/agent/channel.rs, src/agent/channel_dispatch.rs, src/agent/worker.rs, src/agent/compactor.rs, src/agent/ingestion.rs, src/agent/cortex.rs, src/agent/cortex_chat.rs
Replace direct agent.prompt calls with hook-based prompt_once / prompt_with_tool_nudge_retry; add WorkerCompletionError/WorkerCompletionKind, map_worker_completion_result, completion_flags, reserve_worker_slot_local, change spawn_worker_task signature; wire ChannelId into compactor; typed prompt for profile generation; add CortexChatSendError, send-lock semantics, timeout handling, and UTF‑8-safe previews; update worker/ingestion/error flows to use new helpers and nudging.
API surface & SSE/event handling
src/api/cortex.rs, src/api/state.rs, src/api/system.rs
Add map_cortex_chat_send_error mapping Busy->409 and use it in cortex_chat_send; unify event receiver error handling using classify_event_recv_error to log lag or stop the stream.
LLM & tools utilities
src/llm/model.rs, src/tools.rs, src/tools/worker_inspect.rs
Propagate message_id field in CompletionResponse initialization paths; add UTF‑8-safe truncation helpers truncate_at_char_boundary and truncate_utf8_ellipsis and adopt them in worker_inspect and other previews.
Tests & integration
tests/tool_nudge.rs, tests embedded in src/hooks/spacebot.rs, src/agent/* test modules
Add extensive unit and integration tests for tool-nudging behavior, SpacebotHook APIs, Cortex chat send locking and truncation, worker completion mapping, and a deterministic invariant harness.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • jamiepine
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: rig 0.31 upgrade, tool nudging implementation, and invariant harness addition.
Description check ✅ Passed The description is well-structured, covering motivation, changes, new files, testing, and verification related to the rig upgrade and new features.
Linked Issues check ✅ Passed The PR addresses issue #279 by ensuring BranchResult is sent on branch failure. Changes in channel_dispatch.rs and branch.rs implement the required fix for branch orphaning.
Out of Scope Changes check ✅ Passed All changes align with stated objectives: rig 0.31 upgrade, tool nudging, invariant harness, branch orphaning fix, and necessary dependency updates.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vsumner vsumner changed the title fix(rig): upgrade to 0.31, add tool nudging, invariant harness feat: rig 0.31 upgrade, tool nudging, and invariant harness Mar 3, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/tool_nudge.rs (1)

80-94: Test name implies policy assertion but doesn't verify it.

The test is named worker_hook_has_enabled_policy_by_default but it only verifies hook creation and cloning succeed—it doesn't actually assert that the policy is Enabled. Consider adding an assertion or renaming to clarify intent (e.g., worker_hook_creates_and_clones_successfully).

The same issue applies to branch_hook_has_disabled_policy_by_default (lines 97-109) and channel_hook_has_disabled_policy_by_default (lines 111-124).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tool_nudge.rs` around lines 80 - 94, The test
worker_hook_has_enabled_policy_by_default currently only constructs and clones
SpacebotHook; update it to assert the hook's policy is Enabled by calling the
accessor on the created SpacebotHook instance (e.g., hook.policy() or
hook.get_policy()) and compare to the Enabled enum variant, or if the accessor
is different use the appropriate method on SpacebotHook; similarly update
branch_hook_has_disabled_policy_by_default and
channel_hook_has_disabled_policy_by_default to assert their policies are
Disabled (or rename the tests to reflect they only verify creation/cloning if
you prefer not to add assertions).
src/agent/compactor.rs (1)

310-322: Consider documenting or removing the catch-all arm.

The _ => 0 catch-all (line 320) silently handles any future ReasoningContent variants by contributing 0 to the estimate. This is defensive but could mask newly added variants that should be counted.

Consider either:

  1. Adding a comment explaining this is intentional future-proofing, or
  2. Explicitly matching all variants and using a compile-time check
💡 Option: Make future variants explicit
             rig::message::ReasoningContent::Summary(summary) => summary.len(),
-            _ => 0,
+            // Future variants default to 0; update this match when new variants are added
+            #[allow(unreachable_patterns)]
+            _ => 0,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/compactor.rs` around lines 310 - 322, The catch-all `_ => 0` in the
mapping inside AssistantContent::Reasoning silently ignores future
ReasoningContent variants; either replace it with explicit matches for every
current rig::message::ReasoningContent variant so the compiler will force
updates when new variants are added (remove the wildcard arm), or if you want to
keep defensive behavior, add a clear inline comment on the `_ => 0` arm stating
this is intentional future-proofing and why it should remain (referencing the
closure inside AssistantContent::Reasoning and the ReasoningContent enum).
src/agent/cortex_chat.rs (1)

339-346: Minor: ProcessId variant doesn't match ProcessType.

ProcessId::Worker(uuid::Uuid::new_v4()) is used with ProcessType::Cortex. While this works functionally, a ProcessId::Cortex variant (if available) would be more semantically accurate. If no such variant exists, this is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/cortex_chat.rs` around lines 339 - 346, The ProcessId used when
constructing SpacebotHook is semantically mismatched: you're creating
SpacebotHook via SpacebotHook::new(...) with ProcessType::Cortex but passing
ProcessId::Worker(uuid::Uuid::new_v4()); update the ProcessId to use a
Cortex-specific variant (e.g., ProcessId::Cortex or similar) if such a variant
exists so the ProcessId aligns with ProcessType::Cortex in the SpacebotHook::new
call before wrapping it in CortexChatHook::new(event_tx.clone(), spacebot_hook).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tools.rs`:
- Around line 197-206: truncate_utf8_ellipsis currently truncates to max_bytes
and then appends "..." which can exceed max_bytes; change it so the final string
is never longer than max_bytes by accounting for the ellipsis length: compute
available = if max_bytes > 3 { max_bytes - 3 } else { max_bytes }, call
truncate_at_char_boundary(value, available) and append "..." only when max_bytes
> 3 (otherwise return the truncated slice with no ellipsis); ensure you still
preserve UTF‑8 boundaries and refer to truncate_utf8_ellipsis and
truncate_at_char_boundary when making the change.

---

Nitpick comments:
In `@src/agent/compactor.rs`:
- Around line 310-322: The catch-all `_ => 0` in the mapping inside
AssistantContent::Reasoning silently ignores future ReasoningContent variants;
either replace it with explicit matches for every current
rig::message::ReasoningContent variant so the compiler will force updates when
new variants are added (remove the wildcard arm), or if you want to keep
defensive behavior, add a clear inline comment on the `_ => 0` arm stating this
is intentional future-proofing and why it should remain (referencing the closure
inside AssistantContent::Reasoning and the ReasoningContent enum).

In `@src/agent/cortex_chat.rs`:
- Around line 339-346: The ProcessId used when constructing SpacebotHook is
semantically mismatched: you're creating SpacebotHook via SpacebotHook::new(...)
with ProcessType::Cortex but passing ProcessId::Worker(uuid::Uuid::new_v4());
update the ProcessId to use a Cortex-specific variant (e.g., ProcessId::Cortex
or similar) if such a variant exists so the ProcessId aligns with
ProcessType::Cortex in the SpacebotHook::new call before wrapping it in
CortexChatHook::new(event_tx.clone(), spacebot_hook).

In `@tests/tool_nudge.rs`:
- Around line 80-94: The test worker_hook_has_enabled_policy_by_default
currently only constructs and clones SpacebotHook; update it to assert the
hook's policy is Enabled by calling the accessor on the created SpacebotHook
instance (e.g., hook.policy() or hook.get_policy()) and compare to the Enabled
enum variant, or if the accessor is different use the appropriate method on
SpacebotHook; similarly update branch_hook_has_disabled_policy_by_default and
channel_hook_has_disabled_policy_by_default to assert their policies are
Disabled (or rename the tests to reflect they only verify creation/cloning if
you prefer not to add assertions).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ebd1e1 and 685db40.

⛔ Files ignored due to path filters (15)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
  • Cargo.toml is excluded by !**/*.toml
  • vendor/imap-proto-0.10.2/Cargo.toml is excluded by !**/*.toml, !**/vendor/**
  • vendor/imap-proto-0.10.2/src/body.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/body_structure.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/builders/command.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/builders/mod.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/core.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/lib.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/macros.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/mod.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc3501.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc4551.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc5464.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/types.rs is excluded by !**/vendor/**
📒 Files selected for processing (23)
  • README.md
  • docs/content/docs/(core)/architecture.mdx
  • docs/design-docs/cortex-chat.md
  • docs/design-docs/tool-nudging.md
  • src/agent.rs
  • src/agent/branch.rs
  • src/agent/channel.rs
  • src/agent/channel_dispatch.rs
  • src/agent/compactor.rs
  • src/agent/cortex.rs
  • src/agent/cortex_chat.rs
  • src/agent/ingestion.rs
  • src/agent/invariant_harness.rs
  • src/agent/worker.rs
  • src/api/cortex.rs
  • src/api/state.rs
  • src/api/system.rs
  • src/hooks.rs
  • src/hooks/spacebot.rs
  • src/llm/model.rs
  • src/tools.rs
  • src/tools/worker_inspect.rs
  • tests/tool_nudge.rs

}
Err(error) => Err(WorkerCompletionError::from_spacebot_error(error)),
};
let (result_text, kind) = classify_worker_completion_result(worker_result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth scrubbing result_text on the error paths too (not just Ok(text)), since some error strings can include echoed inputs/tool output.

Suggested change
let (result_text, kind) = classify_worker_completion_result(worker_result);
let (mut result_text, kind) = classify_worker_completion_result(worker_result);
if let Some(store) = &secrets_store {
result_text = crate::secrets::scrub::scrub_with_store(&result_text, store);
}

src/tools.rs Outdated

let end = truncate_at_char_boundary(value, max_bytes);

format!("{}...", &value[..end])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small edge case: truncate_utf8_ellipsis can return a string longer than max_bytes because it appends ... after truncating to max_bytes.

Suggested change
format!("{}...", &value[..end])
if max_bytes <= 3 {
let end = truncate_at_char_boundary(value, max_bytes);
return value[..end].to_string();
}
let end = truncate_at_char_boundary(value, max_bytes - 3);
format!("{}...", &value[..end])

Comment on lines +358 to +360
super::EventRecvDisposition::Continue { .. } => {
tracing::debug!(channel_id = %self.id, %error, "event receiver lagged, continuing channel loop");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice handling of Lagged vs Closed. Since classify_event_recv_error returns the lag count, it might be worth logging it for observability.

Suggested change
super::EventRecvDisposition::Continue { .. } => {
tracing::debug!(channel_id = %self.id, %error, "event receiver lagged, continuing channel loop");
}
super::EventRecvDisposition::Continue { lagged_count } => {
tracing::debug!(channel_id = %self.id, %error, lagged_count = ?lagged_count, "event receiver lagged, continuing channel loop");
}

@vsumner vsumner force-pushed the fix/rig-review-closure branch from 469ca25 to 139b26b Compare March 3, 2026 03:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/agent/compactor.rs (1)

181-238: ⚠️ Potential issue | 🔴 Critical

Remove the LLM agent from compaction; implement as a programmatic monitor instead.

Per architectural requirements, the Compactor must not be an LLM process. It should be a programmatic monitor that watches context size and triggers compaction using tiered thresholds (>80% background, >85% aggressive, >95% emergency truncation). Replace AgentBuilder and hook.prompt_once() with direct logic to compute summaries and extract memories without invoking an LLM agent. The ProcessId::Worker pairing with ProcessType::Compactor is acceptable given that ProcessId::Compactor does not exist in the current enum; however, fixing the implementation approach is the priority.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/compactor.rs` around lines 181 - 238, The compactor currently
creates an LLM agent (SpacebotModel, AgentBuilder, tool_server with
MemorySaveTool) and calls hook.prompt_once, but per architecture it must be a
programmatic monitor instead; remove AgentBuilder, SpacebotHook::prompt_once
call, SpacebotModel creation, ToolServer/MemorySaveTool usage, and replace with
code in run_compaction that 1) uses
render_messages_as_transcript(&removed_messages) to compute current token/size,
2) applies tiered thresholds (>80% background, >85% aggressive, >95% emergency
truncation) to decide whether to summarize, extract memories, or truncate, and
3) programmatically produce/serialize summaries and extracted memories and
persist them directly via deps.memory_search (or other memory storage APIs) and
push summary back into history; keep ProcessId::Worker/ProcessType::Compactor
pairing but implement all summary/extraction logic without invoking prompt_once
or any LLM-related types (SpacebotModel, AgentBuilder, SpacebotHook,
ToolServerHandle).
♻️ Duplicate comments (1)
src/agent/channel_dispatch.rs (1)

588-633: ⚠️ Potential issue | 🟠 Major

Scrub result_text on error paths too (secret leak risk via event payload + logs).

Right now, secret scrubbing is only applied on Ok(text). Error strings can still contain echoed inputs/tool output; since result_text is logged and sent to the channel as an event payload, this is worth scrubbing unconditionally.

Proposed fix (scrub after classification, before logging/sending)
-        let (result_text, kind) = classify_worker_completion_result(worker_result);
+        let (mut result_text, kind) = classify_worker_completion_result(worker_result);
+        if let Some(store) = &secrets_store {
+            result_text = crate::secrets::scrub::scrub_with_store(&result_text, store);
+        }
         match kind {
             WorkerCompletionKind::Success => {}
             WorkerCompletionKind::Cancelled => {
                 tracing::info!(worker_id = %worker_id, result = %result_text, "worker cancelled");
             }
             WorkerCompletionKind::Failed => {
                 tracing::error!(worker_id = %worker_id, result = %result_text, "worker failed");
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel_dispatch.rs` around lines 588 - 633, The result_text
emitted from classify_worker_completion_result is not scrubbed on error paths,
risking secret leaks; after calling
classify_worker_completion_result(&worker_result) but before any logging or
sending (i.e., before the match on WorkerCompletionKind and before sending
ProcessEvent::WorkerComplete) run the scrub logic used earlier
(crate::secrets::scrub::scrub_with_store) using the same secrets_store option to
replace result_text with a scrubbed version; update references to result_text
(used in tracing::info!/error! and the ProcessEvent payload) to use the scrubbed
value so both logs and the event payload never contain raw secrets.
🧹 Nitpick comments (3)
tests/tool_nudge.rs (2)

80-152: Consider a small helper to reduce hook-construction duplication in tests.

All three “hook_has_policy_by_default” tests repeat the same setup. A tiny fn make_hook(process_type, process_id, channel_id, event_tx) helper would keep future additions lean.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tool_nudge.rs` around lines 80 - 152, Introduce a small test helper
(e.g. make_hook) that wraps SpacebotHook::new to eliminate duplicated
construction logic used by worker_hook_has_enabled_policy_by_default,
branch_hook_has_disabled_policy_by_default,
channel_hook_has_disabled_policy_by_default, and hook_clone_works; the helper
should accept the same identifying inputs (agent: Arc<str>, process_id:
ProcessId, process_type: ProcessType, channel_id: Option<Arc<str>>, or
optionally create its own (event_tx, event_rx) via
tokio::sync::broadcast::channel) and return a SpacebotHook, then update those
test functions to call make_hook(...) instead of repeating the SpacebotHook::new
call (refer to SpacebotHook::new, ToolNudgePolicy, and the test function names
to locate usages).

217-274: Harden ToolCompleted assertion by not “best-effort” draining with try_recv().

let _ = event_rx.try_recv(); can silently fail (e.g., empty), leaving the started event to be consumed when you expect ToolCompleted. It’s unlikely here, but making it deterministic would reduce test flake risk.

Proposed test hardening
 async fn tool_result_emits_completed_event() {
     use rig::agent::PromptHook;
+    use std::time::Duration;

     let (event_tx, mut event_rx) = tokio::sync::broadcast::channel(8);
     let hook = spacebot::hooks::SpacebotHook::new(
         std::sync::Arc::from("test-agent"),
         spacebot::ProcessId::Worker(uuid::Uuid::new_v4()),
         spacebot::ProcessType::Worker,
         None,
         event_tx,
     );

     // First emit a tool started event to get a call_id
     let _ =
         <spacebot::hooks::SpacebotHook as PromptHook<spacebot::llm::SpacebotModel>>::on_tool_call(
             &hook,
             "test_tool",
             Some("call_123".to_string()),
             "internal_id",
             "{}",
         )
         .await;

-    // Consume the started event
-    let _ = event_rx.try_recv();
+    // Consume (and assert) the started event
+    let started = tokio::time::timeout(Duration::from_secs(1), event_rx.recv())
+        .await
+        .expect("tool started event should be delivered")
+        .expect("broadcast receive should succeed");
+    assert!(matches!(started, spacebot::ProcessEvent::ToolStarted { .. }));

     // Now emit the result
     let action = <spacebot::hooks::SpacebotHook as PromptHook<spacebot::llm::SpacebotModel>>::on_tool_result(
         &hook,
         "test_tool",
         Some("call_123".to_string()),
         "internal_id",
         "{}",
         "Tool result content",
     )
     .await;

@@
-    let event = event_rx
-        .try_recv()
-        .expect("Should receive tool completed event");
+    let event = tokio::time::timeout(Duration::from_secs(1), event_rx.recv())
+        .await
+        .expect("tool completed event should be delivered")
+        .expect("broadcast receive should succeed");
     match event {
         spacebot::ProcessEvent::ToolCompleted {
             tool_name, result, ..
         } => {
             assert_eq!(tool_name, "test_tool");
             assert_eq!(result, "Tool result content");
         }
         _ => panic!("Expected ToolCompleted event, got {:?}", event),
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tool_nudge.rs` around lines 217 - 274, The test
tool_result_emits_completed_event is brittle because it uses event_rx.try_recv()
to "drain" the started event which can silently fail; replace the
non-deterministic drain with an await and explicit assertion: call
event_rx.recv().await (or tokio::time::timeout around it) to receive the started
event emitted by PromptHook::on_tool_call, match/assert it is a
spacebot::ProcessEvent::ToolStarted (checking call_id/tool_name as needed), then
proceed to call PromptHook::on_tool_result and assert the ToolCompleted event
from event_rx.recv().await; update references in the test to use
event_rx.recv().await instead of try_recv() and add a clear panic/assert message
if the started event is not received.
src/agent/worker.rs (1)

327-417: Follow-up handling is clearer; consider using transition_to() for state changes in this block.

In the follow-up loop, self.state = ... is used in several places. Using self.transition_to(...) (or at least routing these assignments through it) would ensure future edits can’t accidentally introduce illegal transitions without surfacing a runtime error.

As per coding guidelines: "Worker state machine must validate transitions with can_transition_to() using matches!; illegal transitions are runtime errors, not silent bugs".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 327 - 417, The follow-up loop directly
assigns self.state in multiple places (e.g., WorkerState::WaitingForInput,
WorkerState::Running, WorkerState::Failed, WorkerState::Done); replace those
direct assignments with calls to self.transition_to(...) so state changes are
validated (which uses can_transition_to()/matches!) and will raise on illegal
transitions; update every occurrence in this block (the follow-up loop and the
final completion/failure handling) to call transition_to with the same target
WorkerState and remove the raw self.state = ... assignments, ensuring behavior
otherwise stays the same (preserve the hook.send_status calls and logging around
each transition).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design-docs/tool-nudging.md`:
- Around line 20-30: Add a language specifier to the fenced block (e.g.,
```text) to satisfy markdownlint MD040 and replace the ambiguous termination
label "tool_nudge" with the exact constant used elsewhere
"spacebot_tool_nudge_retry" so the flow reads "→ Terminate with special
"spacebot_tool_nudge_retry" reason"; update the fenced block content accordingly
to keep wording tight and consistent with the runtime constant.

In `@src/agent/cortex_chat.rs`:
- Around line 100-103: The save_message call currently discards its Result with
let _ =, so change both uses of store.save_message(thread_id, "assistant",
&message, channel_ref).await to handle errors explicitly: match or use if let
Err(e) to either log the error via the existing logger (or send an error event)
and/or propagate it instead of ignoring it; keep let _ = only for event_tx.send
where receiver may be dropped. Update the success path and the error path (where
CortexChatEvent::Error { message } is sent) to record failures from save_message
(include thread_id/message context) rather than silently dropping them.

In `@src/agent/cortex.rs`:
- Around line 839-842: The AgentBuilder chains creating the Cortex agents (the
AgentBuilder::new(model) calls that end with
.preamble(&bulletin_prompt).hook(CortexHook::new()).build()) do not set an
explicit turn limit; add a .default_max_turns(...) call to those builder chains
to make behavior deterministic—use .default_max_turns(50) for worker agents,
.default_max_turns(10) for branch agents, .default_max_turns(5) for channel
agents (apply the appropriate value where each AgentBuilder is used, including
the two Cortex agent sites mentioned).

In `@src/hooks/spacebot.rs`:
- Around line 464-467: The guard path in spacebot.rs exits early when
self.guard_tool_result(tool_name, result) returns a non-Continue HookAction,
which prevents cleaning up TOOL_CALL_TIMERS and leaves orphaned timer entries;
modify the early-return path so that before returning a non-Continue HookAction
(e.g., Terminate) you call record_tool_result_metrics(tool_name, result) and
remove/clear any related entries in TOOL_CALL_TIMERS (or invoke the same cleanup
helper used after normal completion), and apply the same change pattern to the
analogous guard-first/metrics-later sequence in src/agent/cortex_chat.rs to
ensure timers are always cleaned up on guarded termination.

---

Outside diff comments:
In `@src/agent/compactor.rs`:
- Around line 181-238: The compactor currently creates an LLM agent
(SpacebotModel, AgentBuilder, tool_server with MemorySaveTool) and calls
hook.prompt_once, but per architecture it must be a programmatic monitor
instead; remove AgentBuilder, SpacebotHook::prompt_once call, SpacebotModel
creation, ToolServer/MemorySaveTool usage, and replace with code in
run_compaction that 1) uses render_messages_as_transcript(&removed_messages) to
compute current token/size, 2) applies tiered thresholds (>80% background, >85%
aggressive, >95% emergency truncation) to decide whether to summarize, extract
memories, or truncate, and 3) programmatically produce/serialize summaries and
extracted memories and persist them directly via deps.memory_search (or other
memory storage APIs) and push summary back into history; keep
ProcessId::Worker/ProcessType::Compactor pairing but implement all
summary/extraction logic without invoking prompt_once or any LLM-related types
(SpacebotModel, AgentBuilder, SpacebotHook, ToolServerHandle).

---

Duplicate comments:
In `@src/agent/channel_dispatch.rs`:
- Around line 588-633: The result_text emitted from
classify_worker_completion_result is not scrubbed on error paths, risking secret
leaks; after calling classify_worker_completion_result(&worker_result) but
before any logging or sending (i.e., before the match on WorkerCompletionKind
and before sending ProcessEvent::WorkerComplete) run the scrub logic used
earlier (crate::secrets::scrub::scrub_with_store) using the same secrets_store
option to replace result_text with a scrubbed version; update references to
result_text (used in tracing::info!/error! and the ProcessEvent payload) to use
the scrubbed value so both logs and the event payload never contain raw secrets.

---

Nitpick comments:
In `@src/agent/worker.rs`:
- Around line 327-417: The follow-up loop directly assigns self.state in
multiple places (e.g., WorkerState::WaitingForInput, WorkerState::Running,
WorkerState::Failed, WorkerState::Done); replace those direct assignments with
calls to self.transition_to(...) so state changes are validated (which uses
can_transition_to()/matches!) and will raise on illegal transitions; update
every occurrence in this block (the follow-up loop and the final
completion/failure handling) to call transition_to with the same target
WorkerState and remove the raw self.state = ... assignments, ensuring behavior
otherwise stays the same (preserve the hook.send_status calls and logging around
each transition).

In `@tests/tool_nudge.rs`:
- Around line 80-152: Introduce a small test helper (e.g. make_hook) that wraps
SpacebotHook::new to eliminate duplicated construction logic used by
worker_hook_has_enabled_policy_by_default,
branch_hook_has_disabled_policy_by_default,
channel_hook_has_disabled_policy_by_default, and hook_clone_works; the helper
should accept the same identifying inputs (agent: Arc<str>, process_id:
ProcessId, process_type: ProcessType, channel_id: Option<Arc<str>>, or
optionally create its own (event_tx, event_rx) via
tokio::sync::broadcast::channel) and return a SpacebotHook, then update those
test functions to call make_hook(...) instead of repeating the SpacebotHook::new
call (refer to SpacebotHook::new, ToolNudgePolicy, and the test function names
to locate usages).
- Around line 217-274: The test tool_result_emits_completed_event is brittle
because it uses event_rx.try_recv() to "drain" the started event which can
silently fail; replace the non-deterministic drain with an await and explicit
assertion: call event_rx.recv().await (or tokio::time::timeout around it) to
receive the started event emitted by PromptHook::on_tool_call, match/assert it
is a spacebot::ProcessEvent::ToolStarted (checking call_id/tool_name as needed),
then proceed to call PromptHook::on_tool_result and assert the ToolCompleted
event from event_rx.recv().await; update references in the test to use
event_rx.recv().await instead of try_recv() and add a clear panic/assert message
if the started event is not received.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 685db40 and 139b26b.

⛔ Files ignored due to path filters (15)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
  • Cargo.toml is excluded by !**/*.toml
  • vendor/imap-proto-0.10.2/Cargo.toml is excluded by !**/*.toml, !**/vendor/**
  • vendor/imap-proto-0.10.2/src/body.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/body_structure.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/builders/command.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/builders/mod.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/core.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/lib.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/macros.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/mod.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc3501.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc4551.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc5464.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/types.rs is excluded by !**/vendor/**
📒 Files selected for processing (23)
  • README.md
  • docs/content/docs/(core)/architecture.mdx
  • docs/design-docs/cortex-chat.md
  • docs/design-docs/tool-nudging.md
  • src/agent.rs
  • src/agent/branch.rs
  • src/agent/channel.rs
  • src/agent/channel_dispatch.rs
  • src/agent/compactor.rs
  • src/agent/cortex.rs
  • src/agent/cortex_chat.rs
  • src/agent/ingestion.rs
  • src/agent/invariant_harness.rs
  • src/agent/worker.rs
  • src/api/cortex.rs
  • src/api/state.rs
  • src/api/system.rs
  • src/hooks.rs
  • src/hooks/spacebot.rs
  • src/llm/model.rs
  • src/tools.rs
  • src/tools/worker_inspect.rs
  • tests/tool_nudge.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • docs/content/docs/(core)/architecture.mdx
  • README.md
  • src/api/cortex.rs
  • src/agent/ingestion.rs
  • docs/design-docs/cortex-chat.md

Comment on lines +20 to +30
```
Worker loop starts
→ First completion call
→ If text-only response (no tool calls)
→ Terminate with special "tool_nudge" reason
→ Retry with nudge prompt
→ Max 2 retries per prompt request
→ If tool call present
→ Continue normally
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix markdownlint MD040 + tighten the doc wording around the termination reason.

  • The fenced block at Line 20 should declare a language.
  • The flow says "tool_nudge" but the constant used elsewhere is "spacebot_tool_nudge_retry"—worth making explicit to avoid confusion.
Doc fix
-```
+```text
 Worker loop starts
   → First completion call
   → If text-only response (no tool calls)
-    → Terminate with special "tool_nudge" reason
+    → Terminate with special "spacebot_tool_nudge_retry" reason
     → Retry with nudge prompt
     → Max 2 retries per prompt request
   → If tool call present
     → Continue normally
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.21.0)</summary>

[warning] 20-20: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/design-docs/tool-nudging.md around lines 20 - 30, Add a language
specifier to the fenced block (e.g., ```text) to satisfy markdownlint MD040 and
replace the ambiguous termination label "tool_nudge" with the exact constant
used elsewhere "spacebot_tool_nudge_retry" so the flow reads "→ Terminate with
special "spacebot_tool_nudge_retry" reason"; update the fenced block content
accordingly to keep wording tight and consistent with the runtime constant.


</details>

<!-- fingerprinting:phantom:medusa:falcon -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +839 to +842
let agent = AgentBuilder::new(model)
.preamble(&bulletin_prompt)
.hook(CortexHook::new())
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Set explicit turn limits on both Cortex LLM agents.

Both builders still rely on implicit defaults. Please set default_max_turns(...) explicitly to keep behavior bounded and deterministic across upgrades.

♻️ Proposed fix
     let agent = AgentBuilder::new(model)
         .preamble(&bulletin_prompt)
+        .default_max_turns(3)
         .hook(CortexHook::new())
         .build();
@@
     let agent = AgentBuilder::new(model)
         .preamble(&profile_prompt)
+        .default_max_turns(3)
         .hook(CortexHook::new())
         .build();

As per coding guidelines "Agent construction must set max_turns explicitly; use max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels; typically 1-3 turns for channels."

Also applies to: 1021-1024

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/cortex.rs` around lines 839 - 842, The AgentBuilder chains creating
the Cortex agents (the AgentBuilder::new(model) calls that end with
.preamble(&bulletin_prompt).hook(CortexHook::new()).build()) do not set an
explicit turn limit; add a .default_max_turns(...) call to those builder chains
to make behavior deterministic—use .default_max_turns(50) for worker agents,
.default_max_turns(10) for branch agents, .default_max_turns(5) for channel
agents (apply the appropriate value where each AgentBuilder is used, including
the two Cortex agent sites mentioned).

Comment on lines +464 to 467
let guard_action = self.guard_tool_result(tool_name, result);
if !matches!(guard_action, HookAction::Continue) {
return guard_action;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clean up tool-call timers before returning on guarded termination.

If guard_tool_result returns Terminate, the method exits before record_tool_result_metrics, leaving TOOL_CALL_TIMERS entries orphaned (metrics feature). Repeated guarded terminations can grow the map over time.

🔧 Proposed fix
     async fn on_tool_result(
         &self,
         tool_name: &str,
         _tool_call_id: Option<String>,
         internal_call_id: &str,
         _args: &str,
         result: &str,
     ) -> HookAction {
         let guard_action = self.guard_tool_result(tool_name, result);
         if !matches!(guard_action, HookAction::Continue) {
+            #[cfg(feature = "metrics")]
+            if let Ok(mut timers) = TOOL_CALL_TIMERS.lock() {
+                timers.remove(internal_call_id);
+            }
             return guard_action;
         }

         // Cap the result stored in the broadcast event to avoid blowing up
         // event subscribers with multi-MB tool results.
         self.emit_tool_completed_event(tool_name, result);
@@
         self.record_tool_result_metrics(tool_name, internal_call_id);
Downstream impact: `src/agent/cortex_chat.rs` Line 141-Line 150 follows the same guard-first/metrics-later pattern and should mirror this cleanup path.

Also applies to: 480-480

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hooks/spacebot.rs` around lines 464 - 467, The guard path in spacebot.rs
exits early when self.guard_tool_result(tool_name, result) returns a
non-Continue HookAction, which prevents cleaning up TOOL_CALL_TIMERS and leaves
orphaned timer entries; modify the early-return path so that before returning a
non-Continue HookAction (e.g., Terminate) you call
record_tool_result_metrics(tool_name, result) and remove/clear any related
entries in TOOL_CALL_TIMERS (or invoke the same cleanup helper used after normal
completion), and apply the same change pattern to the analogous
guard-first/metrics-later sequence in src/agent/cortex_chat.rs to ensure timers
are always cleaned up on guarded termination.

vsumner and others added 3 commits March 3, 2026 22:07
- Add tests/tool_nudge.rs with 12 integration tests for tool nudging
- Add docs/design-docs/tool-nudging.md with comprehensive documentation
- Tests cover: policy defaults, constants, event emission, hook behavior

Part of fix/rig-review-closure addressing review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vsumner vsumner force-pushed the fix/rig-review-closure branch from 139b26b to 0d0843c Compare March 4, 2026 03:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/agent/worker.rs (1)

348-382: ⚠️ Potential issue | 🟠 Major

Preserve cancellation semantics in interactive follow-ups.

PromptCancelled in follow-up prompts currently falls through the generic error arm and is later returned as AgentError::Other, so cancelled follow-ups are emitted as failures.

🔧 Suggested fix
-        let mut follow_up_failure: Option<String> = None;
+        let mut follow_up_failure: Option<crate::error::AgentError> = None;

-                let follow_up_result: std::result::Result<(), String> = loop {
+                let follow_up_result: std::result::Result<(), crate::error::AgentError> = loop {
                     match follow_up_hook
                         .prompt_once(&agent, &mut history, &follow_up_prompt)
                         .await
                     {
                         Ok(_response) => break Ok(()),
+                        Err(rig::completion::PromptError::PromptCancelled { reason, .. }) => {
+                            self.hook.send_status("cancelled");
+                            break Err(crate::error::AgentError::Cancelled { reason });
+                        }
                         Err(error) if is_context_overflow_error(&error.to_string()) => {
                             ...
                         }
                         Err(error) => {
                             let failure_reason = format!("follow-up failed: {error}");
                             self.write_failure_log(&history, &failure_reason);
-                            break Err(failure_reason);
+                            break Err(crate::error::AgentError::Other(anyhow::anyhow!(failure_reason)));
                         }
                     }
                 };

-                if let Err(failure_reason) = follow_up_result {
+                if let Err(failure) = follow_up_result {
                     self.state = WorkerState::Failed;
                     self.hook.send_status("failed");
-                    follow_up_failure = Some(failure_reason);
+                    follow_up_failure = Some(failure);
                     break;
                 }

-        if let Some(failure_reason) = follow_up_failure {
+        if let Some(failure) = follow_up_failure {
             self.persist_transcript(&compacted_history, &history);
-            tracing::error!(worker_id = %self.id, reason = %failure_reason, "worker failed");
-            return Err(crate::error::AgentError::Other(anyhow::anyhow!(failure_reason)).into());
+            tracing::error!(worker_id = %self.id, error = %failure, "worker failed");
+            return Err(failure.into());
         }

Also applies to: 398-402

♻️ Duplicate comments (4)
src/agent/channel_dispatch.rs (1)

615-628: ⚠️ Potential issue | 🟠 Major

Scrub panic-derived failure text before emitting WorkerComplete.

Normal error messages are scrubbed with the secret store, but panic payloads are currently forwarded unsanitized into worker completion text.

🔒 Suggested fix
             Err(panic_payload) => {
                 let panic_message = panic_payload
                     .downcast_ref::<&str>()
                     .map(|message| (*message).to_string())
                     .or_else(|| panic_payload.downcast_ref::<String>().cloned())
                     .unwrap_or_else(|| "unknown panic payload".to_string());
+                let panic_message = if let Some(store) = &secrets_store {
+                    crate::secrets::scrub::scrub_with_store(&panic_message, store)
+                } else {
+                    panic_message
+                };
                 tracing::error!(
                     worker_id = %worker_id,
                     panic_message = %panic_message,
                     "worker task panicked"
                 );
                 Err(WorkerCompletionError::failed(format!(
                     "worker task panicked: {panic_message}"
                 )))
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel_dispatch.rs` around lines 615 - 628, The panic payload
string (panic_payload -> panic_message) is being forwarded unsanitized into both
the tracing::error log and WorkerCompletionError::failed; pass panic_message
through the service's secret-scrubbing/redaction utility used for normal errors
before logging and before constructing WorkerCompletionError::failed so secrets
are not leaked. Locate the block handling Err(panic_payload) (variables
panic_payload and panic_message, the tracing::error call, and
WorkerCompletionError::failed) and replace direct uses of panic_message with the
scrubbed/redacted version returned by the existing secret-store scrub function
used elsewhere in the codebase.
src/agent/cortex_chat.rs (1)

100-103: ⚠️ Potential issue | 🟠 Major

Handle save_message failures explicitly instead of dropping them.

Both paths can silently lose persisted chat history if SQLite writes fail.

🔧 Suggested fix
 async fn persist_and_emit_cortex_chat_error(
@@
 ) {
-    let _ = store
-        .save_message(thread_id, "assistant", &message, channel_ref)
-        .await;
+    if let Err(error) = store
+        .save_message(thread_id, "assistant", &message, channel_ref)
+        .await
+    {
+        tracing::warn!(
+            %error,
+            thread_id = %thread_id,
+            "failed to persist cortex chat error message"
+        );
+    }
     let _ = event_tx.send(CortexChatEvent::Error { message }).await;
 }
@@
                 Ok(Ok(response)) => {
-                    let _ = store
-                        .save_message(&thread_id, "assistant", &response, channel_ref)
-                        .await;
+                    if let Err(error) = store
+                        .save_message(&thread_id, "assistant", &response, channel_ref)
+                        .await
+                    {
+                        tracing::warn!(
+                            %error,
+                            thread_id = %thread_id,
+                            "failed to persist cortex assistant response"
+                        );
+                    }
                     let _ = event_tx
                         .send(CortexChatEvent::Done {
                             full_text: response,
                         })

As per coding guidelines "Don't silently discard errors; use let _ = only on channel sends where the receiver may be dropped; handle, log, or propagate all other errors."

Also applies to: 375-377

docs/design-docs/tool-nudging.md (1)

20-30: ⚠️ Potential issue | 🟡 Minor

Fix the fenced code block language and align the termination reason label.

The fenced block should declare a language (MD040), and the reason label should match runtime naming to avoid confusion.

📝 Suggested fix
-```
+```text
 Worker loop starts
   → First completion call
   → If text-only response (no tool calls)
-    → Terminate with special "tool_nudge" reason
+    → Terminate with special "spacebot_tool_nudge_retry" reason
     → Retry with nudge prompt
     → Max 2 retries per prompt request
   → If tool call present
     → Continue normally
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/design-docs/tool-nudging.md around lines 20 - 30, Update the fenced
code block in the tool-nudging design doc to declare a language (e.g., use

runtime name "spacebot_tool_nudge_retry"; specifically locate the block
containing the Worker loop steps and replace the fence indicator with a language
and the line "→ Terminate with special "tool_nudge" reason" with "→ Terminate
with special "spacebot_tool_nudge_retry" reason" so the doc matches runtime
naming.
src/agent/cortex.rs (1)

839-842: ⚠️ Potential issue | 🟠 Major

Set explicit turn limits on both Cortex agent builders.

Both agent constructions still rely on implicit defaults. Please set explicit turn caps to keep behavior bounded across model/runtime upgrades.

♻️ Suggested fix
     let agent = AgentBuilder::new(model)
         .preamble(&bulletin_prompt)
+        .default_max_turns(cortex_config.bulletin_max_turns)
         .hook(CortexHook::new())
         .build();
@@
     let agent = AgentBuilder::new(model)
         .preamble(&profile_prompt)
+        .default_max_turns(3)
         .hook(CortexHook::new())
         .build();

As per coding guidelines "Agent construction must set max_turns explicitly; use max_turns(50) for workers, max_turns(10) for branches, max_turns(5) for channels; typically 1-3 turns for channels."

Also applies to: 1021-1024

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/cortex.rs` around lines 839 - 842, Both Cortex AgentBuilder chains
need an explicit max_turns set; update the AgentBuilder::new(model) chains that
call .preamble(&bulletin_prompt).hook(CortexHook::new()).build() to include
.max_turns(10) before .build() (apply the same change to the other AgentBuilder
instance around lines ~1021-1024) so both agents have an explicit turn cap per
agent construction guidelines.
🧹 Nitpick comments (1)
src/hooks/spacebot.rs (1)

253-268: Consolidate leak-check logic in on_tool_result.

guard_tool_result() already terminates on leaked output before the channel-specific leak branch runs, so the later leak check/reason path is effectively redundant and can drift from actual behavior.

Also applies to: 477-490

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hooks/spacebot.rs` around lines 253 - 268, The duplicate leak-detection
logic in on_tool_result duplicates guard_tool_result behavior and can drift;
remove the redundant leak check/termination branch inside on_tool_result and
instead call/return the result from guard_tool_result so all leak handling
(including the termination reason) is centralized; update any other similar
duplicate at the second occurrence (the branch around lines referenced as
477-490) to use guard_tool_result or defer to scan_for_leaks via
guard_tool_result, ensuring the termination HookAction and reason text only come
from guard_tool_result for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent.rs`:
- Around line 35-51: The function extract_last_assistant_text currently scans
backward and keeps looking past the most recent Assistant message when that
message contains no text, potentially returning stale text; modify the logic so
that when you encounter the first rig::message::Message::Assistant during the
reverse iteration you aggregate its rig::message::AssistantContent::Text parts
into combined and then immediately return Some(combined) if non-empty or return
None if empty (i.e., do not continue scanning older messages after the first
Assistant); keep the existing content iteration and newline handling but ensure
the function stops at that first Assistant message.

In `@src/agent/worker.rs`:
- Around line 386-395: The code currently assigns self.state directly (e.g.,
self.state = WorkerState::Failed and WorkerState::WaitingForInput) after
handling follow_up_result; change these to use the state-machine helpers by
calling self.transition_to(WorkerState::Failed) and
self.transition_to(WorkerState::WaitingForInput) instead, and ensure you
validate allowed transitions first using the can_transition_to/matches! pattern
(e.g., matches!(self.state.can_transition_to(&WorkerState::X), true) or
equivalent) so illegal transitions panic rather than silently mutate state.

---

Duplicate comments:
In `@docs/design-docs/tool-nudging.md`:
- Around line 20-30: Update the fenced code block in the tool-nudging design doc
to declare a language (e.g., use ```text) and change the termination reason
label from "tool_nudge" to the runtime name "spacebot_tool_nudge_retry";
specifically locate the block containing the Worker loop steps and replace the
fence indicator with a language and the line "→ Terminate with special
"tool_nudge" reason" with "→ Terminate with special "spacebot_tool_nudge_retry"
reason" so the doc matches runtime naming.

In `@src/agent/channel_dispatch.rs`:
- Around line 615-628: The panic payload string (panic_payload -> panic_message)
is being forwarded unsanitized into both the tracing::error log and
WorkerCompletionError::failed; pass panic_message through the service's
secret-scrubbing/redaction utility used for normal errors before logging and
before constructing WorkerCompletionError::failed so secrets are not leaked.
Locate the block handling Err(panic_payload) (variables panic_payload and
panic_message, the tracing::error call, and WorkerCompletionError::failed) and
replace direct uses of panic_message with the scrubbed/redacted version returned
by the existing secret-store scrub function used elsewhere in the codebase.

In `@src/agent/cortex.rs`:
- Around line 839-842: Both Cortex AgentBuilder chains need an explicit
max_turns set; update the AgentBuilder::new(model) chains that call
.preamble(&bulletin_prompt).hook(CortexHook::new()).build() to include
.max_turns(10) before .build() (apply the same change to the other AgentBuilder
instance around lines ~1021-1024) so both agents have an explicit turn cap per
agent construction guidelines.

---

Nitpick comments:
In `@src/hooks/spacebot.rs`:
- Around line 253-268: The duplicate leak-detection logic in on_tool_result
duplicates guard_tool_result behavior and can drift; remove the redundant leak
check/termination branch inside on_tool_result and instead call/return the
result from guard_tool_result so all leak handling (including the termination
reason) is centralized; update any other similar duplicate at the second
occurrence (the branch around lines referenced as 477-490) to use
guard_tool_result or defer to scan_for_leaks via guard_tool_result, ensuring the
termination HookAction and reason text only come from guard_tool_result for
consistency.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0c82c8e3-d132-43c5-8338-05fdb0186c96

📥 Commits

Reviewing files that changed from the base of the PR and between 139b26b and 0d0843c.

⛔ Files ignored due to path filters (15)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
  • Cargo.toml is excluded by !**/*.toml
  • vendor/imap-proto-0.10.2/Cargo.toml is excluded by !**/*.toml, !**/vendor/**
  • vendor/imap-proto-0.10.2/src/body.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/body_structure.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/builders/command.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/builders/mod.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/core.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/lib.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/macros.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/mod.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc3501.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc4551.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/parser/rfc5464.rs is excluded by !**/vendor/**
  • vendor/imap-proto-0.10.2/src/types.rs is excluded by !**/vendor/**
📒 Files selected for processing (23)
  • README.md
  • docs/content/docs/(core)/architecture.mdx
  • docs/design-docs/cortex-chat.md
  • docs/design-docs/tool-nudging.md
  • src/agent.rs
  • src/agent/branch.rs
  • src/agent/channel.rs
  • src/agent/channel_dispatch.rs
  • src/agent/compactor.rs
  • src/agent/cortex.rs
  • src/agent/cortex_chat.rs
  • src/agent/ingestion.rs
  • src/agent/invariant_harness.rs
  • src/agent/worker.rs
  • src/api/cortex.rs
  • src/api/state.rs
  • src/api/system.rs
  • src/hooks.rs
  • src/hooks/spacebot.rs
  • src/llm/model.rs
  • src/tools.rs
  • src/tools/worker_inspect.rs
  • tests/tool_nudge.rs
🚧 Files skipped from review as they are similar to previous changes (8)
  • src/agent/ingestion.rs
  • tests/tool_nudge.rs
  • docs/content/docs/(core)/architecture.mdx
  • src/hooks.rs
  • src/agent/compactor.rs
  • src/api/system.rs
  • README.md
  • docs/design-docs/cortex-chat.md

Comment on lines +35 to +51
pub(crate) fn extract_last_assistant_text(history: &[rig::message::Message]) -> Option<String> {
for message in history.iter().rev() {
if let rig::message::Message::Assistant { content, .. } = message {
let mut combined = String::new();
for item in content.iter() {
if let rig::message::AssistantContent::Text(text) = item {
if !combined.is_empty() {
combined.push('\n');
}
combined.push_str(&text.text);
}
}
if !combined.is_empty() {
return Some(combined);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid returning stale text from older assistant turns.

On Line 36, the reverse scan continues past the newest assistant message when that message has no text, which can return text from an older turn. For a function named extract_last_assistant_text, this should stop at the first assistant message and return None if it has no text.

💡 Proposed fix
 pub(crate) fn extract_last_assistant_text(history: &[rig::message::Message]) -> Option<String> {
     for message in history.iter().rev() {
         if let rig::message::Message::Assistant { content, .. } = message {
             let mut combined = String::new();
             for item in content.iter() {
                 if let rig::message::AssistantContent::Text(text) = item {
                     if !combined.is_empty() {
                         combined.push('\n');
                     }
                     combined.push_str(&text.text);
                 }
             }
-            if !combined.is_empty() {
-                return Some(combined);
-            }
+            return if combined.is_empty() {
+                None
+            } else {
+                Some(combined)
+            };
         }
     }

     None
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent.rs` around lines 35 - 51, The function extract_last_assistant_text
currently scans backward and keeps looking past the most recent Assistant
message when that message contains no text, potentially returning stale text;
modify the logic so that when you encounter the first
rig::message::Message::Assistant during the reverse iteration you aggregate its
rig::message::AssistantContent::Text parts into combined and then immediately
return Some(combined) if non-empty or return None if empty (i.e., do not
continue scanning older messages after the first Assistant); keep the existing
content iteration and newline handling but ensure the function stops at that
first Assistant message.

Comment on lines +386 to 395
if let Err(failure_reason) = follow_up_result {
self.state = WorkerState::Failed;
self.hook.send_status("failed");
follow_up_failure = Some(failure_reason);
break;
}

self.state = WorkerState::WaitingForInput;
self.hook.send_status("waiting for input");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use transition_to() in the new follow-up state path.

This new flow writes self.state directly instead of using the state-machine guard already defined on the type.

🔁 Suggested fix
-                    self.state = WorkerState::Failed;
+                    self.transition_to(WorkerState::Failed)?;
                     self.hook.send_status("failed");
                     follow_up_failure = Some(failure_reason);
                     break;
                 }

-                self.state = WorkerState::WaitingForInput;
+                self.transition_to(WorkerState::WaitingForInput)?;
                 self.hook.send_status("waiting for input");
As per coding guidelines, "Worker state machine must validate transitions with `can_transition_to()` using `matches!`; illegal transitions are runtime errors, not silent bugs."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if let Err(failure_reason) = follow_up_result {
self.state = WorkerState::Failed;
self.hook.send_status("failed");
follow_up_failure = Some(failure_reason);
break;
}
self.state = WorkerState::WaitingForInput;
self.hook.send_status("waiting for input");
}
if let Err(failure_reason) = follow_up_result {
self.transition_to(WorkerState::Failed)?;
self.hook.send_status("failed");
follow_up_failure = Some(failure_reason);
break;
}
self.transition_to(WorkerState::WaitingForInput)?;
self.hook.send_status("waiting for input");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/worker.rs` around lines 386 - 395, The code currently assigns
self.state directly (e.g., self.state = WorkerState::Failed and
WorkerState::WaitingForInput) after handling follow_up_result; change these to
use the state-machine helpers by calling self.transition_to(WorkerState::Failed)
and self.transition_to(WorkerState::WaitingForInput) instead, and ensure you
validate allowed transitions first using the can_transition_to/matches! pattern
(e.g., matches!(self.state.can_transition_to(&WorkerState::X), true) or
equivalent) so illegal transitions panic rather than silently mutate state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants