Agent harness: USD cost tracking, stop hooks, per-tool result caps#1268
Conversation
The fork-mode prefix-replay path was built but never wired into a real archetype: only the synthetic `fork` definition set `uses_fork_context = true`, no agent.toml opted in, and modern providers do automatic prefix caching anyway. Remove it. Kept: PARENT_CONTEXT / ParentExecutionContext — load-bearing for every spawn_subagent / dispatch / triage path. Removed: - ForkContext struct + FORK_CONTEXT task-local + with_fork_context / current_fork helpers - AgentDefinition.uses_fork_context flag (and is_false serde helper) - Synthetic fork_definition() + builtin_definitions.rs append - SubagentMode::Fork variant - SubagentRunError::NoForkContext variant - run_fork_mode + the dispatch branch in subagent_runner/ops.rs - Agent::build_fork_context + the spawn_subagent fork-mode branch in session/turn.rs - spawn_subagent `mode` argument (always typed now) — schema slimmed, telemetry hardcoded to "typed" Tests dropped: fork_mode_replays_parent_prefix_bytes, fork_mode_errors_when_no_fork_context, turn_dispatches_spawn_subagent_in_fork_mode, build_fork_context_uses_visible_specs_and_prompt_argument, fork_and_parent_contexts_are_visible_only_within_scope. The fork_definition presence assertion in registry_load_merges_builtins_and_custom was relaxed accordingly.
Adds `agent::cost::TurnCost`, a running tally folded across every
provider call inside a tool-call loop iteration. Prefers the
authoritative `charged_amount_usd` the OpenHuman backend already
returns on `UsageInfo`; falls back to a small tier-keyed token-rate
estimate (`reasoning-v1` / `agentic-v1` / `coding-v1`) when the
backend doesn't surface billing.
Wires the accumulator into `harness::tool_loop::run_tool_call_loop`
and emits a new `AgentProgress::TurnCostUpdated { model, iteration,
input_tokens, output_tokens, cached_input_tokens, total_usd }` event
after each LLM response that carries usage. End-of-turn log line now
includes cumulative tokens and USD.
Match exhaustiveness preserved in:
- `channels/providers/web.rs`: log debug line for the new event
- `threads/turn_state/mirror.rs`: ack without flushing the snapshot
(cost doesn't change lifecycle / phase)
The accumulator is the substrate the next change (mid-turn stop hooks)
will use to enforce a `max_turn_usd` cap. Telemetry-only for now.
Tests: 8 new unit tests in `agent::cost` covering the pricing table,
fallback estimator, charged-vs-estimated split, and token aggregation.
Adds `agent::stop_hooks` — a task-local-scoped hook trait fired at the top of each iteration in `run_tool_call_loop`. Distinct from the existing `InterruptFence` (user-driven cancellation): stop hooks are the policy lever for budget caps, rate limits, and custom kill switches that should cut a turn before the next provider call rather than after the fact. Built-in hooks: - `BudgetStopHook` — caps cumulative turn cost in USD, reading from the new `TurnCost` accumulator. - `MaxIterationsStopHook` — per-call iteration cap independent of the agent's persistent `max_tool_iterations` config. Wiring uses a task-local (`with_stop_hooks` / `current_stop_hooks`) mirroring `PARENT_CONTEXT` and `CURRENT_AGENT_SANDBOX_MODE`. This avoids touching the 13 call sites of `run_tool_call_loop`, whose signature is already 16 params wide. When a hook returns `StopDecision::Stop`, the loop bails with `anyhow::Error` whose message names the hook and reason, so the caller can surface "$X.XX cap reached" or similar to the user. Tests: 5 unit tests on the hook semantics and task-local scope, plus 2 integration tests proving (a) a hook actually aborts the loop with the reason propagated and (b) the loop is byte-identical when no hooks are installed.
Adds `Tool::max_result_size_chars` (default `None`) so individual tools can declare a fast deterministic cap on the body they thread back to the LLM. When set and exceeded, the agent's tool loop truncates with a `[truncated by tool cap: N more chars not shown]` marker and skips the global `PayloadSummarizer` for that call. This is the cheap counterpart to the LLM-summarizer path: tools that *know* their output is bounded but unpredictable (`shell`, `web_fetch`) get a hard cap; tools whose callers genuinely want full content (`read_file`, `grep`) leave it unset. Caps applied: - `shell` → 30k chars (prevents `find /` / install-log blowups) - `web_fetch` → 50k chars (1MB byte cap was still tens of thousands of tokens; agent rarely needs that much). Test: `run_tool_call_loop_applies_per_tool_max_result_size_cap` exercises a tool that emits a 200k-char body and declares a 100-char cap; verifies the body is truncated to <1KB in history with the cap marker present.
Adds `Tool::is_concurrency_safe(&self, args) -> bool` (default `false`) so individual tools can declare whether two concurrent invocations are safe to fan out across parallel awaits. Annotates the four read-only built-ins that obviously qualify: - `file_read` - `grep` - `glob` - `web_fetch` (idempotent GET) Tools that mutate state (`shell`, write tools, MCP exec) keep the default `false`. The annotation is intentionally **not yet load-bearing** — the dispatch layer in `harness::tool_loop::run_tool_call_loop` still runs every call serially. Shipping the trait surface separately lets the dispatch refactor land without coordinating with every tool author. That refactor is tracked in tinyhumansai#1267 (parallel batching of consecutive concurrency-safe calls within a single LLM iteration). Tests: two unit tests pin the trait defaults (`is_concurrency_safe = false`, `max_result_size_chars = None`) so future trait-shape changes can't silently regress the contract.
Pre-push hook ran rustfmt and reformatted lines this branch touched — folding those reformat-only edits in so the next push is hook-clean. No behaviour change.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
📝 WalkthroughWalkthroughRemoves fork-mode subagent plumbing in favor of typed-only runs, adds per‑turn cost accounting with pricing/TurnCost and mid‑turn stop hooks, extends the Tool trait with concurrency/result-size hints, and wires cost tracking and stop-hook checks into the agent tool-call loop and progress events. ChangesFork Context Removal
Per-Turn Cost Accounting & Stop Hooks
Tool Trait Extensions for Concurrency Safety & Result Size
Sequence Diagram(s)sequenceDiagram
participant Agent
participant TurnLoop
participant StopHooks
participant LLM
participant Tool as Tool<br/>(e.g., file_read)
participant Cost as Cost<br/>Tracker
activate TurnLoop
Note over TurnLoop,Cost: Initialize TurnCost
TurnLoop->>Cost: TurnCost::new()
Note over TurnLoop,StopHooks: Capture stop hooks
TurnLoop->>StopHooks: current_stop_hooks()
rect rgba(100, 150, 255, 0.5)
Note over TurnLoop: Loop iteration: evaluate hooks before LLM call
TurnLoop->>StopHooks: check(ctx: iteration, max, cost, model)
alt Hook signals Stop
StopHooks-->>TurnLoop: StopDecision::Stop { reason }
TurnLoop->>TurnLoop: Abort loop with message
else Hooks permit Continue
StopHooks-->>TurnLoop: StopDecision::Continue
TurnLoop->>LLM: Send prompt / model call
LLM-->>TurnLoop: Response + UsageInfo
TurnLoop->>Cost: add_call(model, usage)
Cost-->>TurnLoop: Updated totals
TurnLoop->>Tool: Execute tool call (serial or parallel per tool hints)
Tool-->>TurnLoop: Tool result
TurnLoop->>Agent: Emit TurnCostUpdated(model, iteration, tokens, usd)
end
end
TurnLoop->>Agent: TurnCompleted (final costs)
deactivate TurnLoop
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Empty commit to nudge CodeRabbit into reviewing the branch.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/openhuman/agent/stop_hooks.rs (1)
120-132: ⚡ Quick winAdd debug/trace logs when a hook decides to stop.
These branches are the new policy gate for aborting a turn, but they currently leave no stable breadcrumb with the hook name, iteration, model, or threshold values. That will make unexpected mid-turn halts hard to reconstruct from runtime logs. As per coding guidelines, "Use
log/tracingatdebugortracelevel on RPC entry/exit, error paths, state transitions, and any branch that is hard to infer from tests; include stable prefixes ... and correlation fields."Possible shape
async fn check(&self, ctx: &TurnState<'_>) -> StopDecision { let spent = ctx.cost.total_usd(); if spent >= self.max_usd { + log::debug!( + "[agent:stop_hooks] hook=budget decision=stop iteration={} model={} spent_usd={:.4} cap_usd={:.4}", + ctx.iteration, + ctx.model, + spent, + self.max_usd + ); StopDecision::Stop { reason: format!( "turn cost ${spent:.4} reached cap ${cap:.4}", @@ async fn check(&self, ctx: &TurnState<'_>) -> StopDecision { if ctx.iteration > self.cap { + log::debug!( + "[agent:stop_hooks] hook=max_iterations decision=stop iteration={} model={} cap={}", + ctx.iteration, + ctx.model, + self.cap + ); StopDecision::Stop { reason: format!( "turn reached iteration cap {} (about to start iteration {})",Also applies to: 157-168
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/stop_hooks.rs` around lines 120 - 132, Add a structured debug/trace log inside the StopHook::check implementation so every decision records context: log when spending >= self.max_usd and when continuing, using a stable prefix like "stop_hook" and include the hook identity (type or a name field from self), the current iteration and model identifiers from ctx (e.g., ctx.iteration, ctx.model or ctx.model_name if available), plus numeric values spent (ctx.cost.total_usd()) and threshold (self.max_usd); use tracing::debug! or tracing::trace! with key=value fields so runtime breadcrumbs exist for both the Stop { reason: ... } branch and the Continue branch in async fn check(&self, ctx: &TurnState<'_>).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent/cost.rs`:
- Around line 97-103: The lookup_pricing() branch incorrectly groups "coding"
with the agentic/sonnet pricing (returning PRICING_TABLE[1]); split the
condition so "coding" is checked separately and returns the coding pricing row
instead of the agentic row (i.e., add a dedicated match for
model.to_ascii_lowercase().contains("coding") that returns the coding
PRICING_TABLE entry rather than PRICING_TABLE[1], leaving "sonnet" and "agentic"
mapped to PRICING_TABLE[1]).
In `@src/openhuman/agent/stop_hooks.rs`:
- Around line 108-111: BudgetStopHook::new currently accepts any f64 which lets
NaN/neg/inf silently disable the guard; change BudgetStopHook::new to validate
max_usd (use f64::is_finite and ensure max_usd > 0.0 and not NaN) and return a
Result<Self, Error> (or panic with a clear message) instead of unconditionally
constructing self; update any other constructors/places that create
BudgetStopHook (the other impls referenced around lines 120-132) to handle the
Result or propagate the error so invalid caps are rejected up front.
---
Nitpick comments:
In `@src/openhuman/agent/stop_hooks.rs`:
- Around line 120-132: Add a structured debug/trace log inside the
StopHook::check implementation so every decision records context: log when
spending >= self.max_usd and when continuing, using a stable prefix like
"stop_hook" and include the hook identity (type or a name field from self), the
current iteration and model identifiers from ctx (e.g., ctx.iteration, ctx.model
or ctx.model_name if available), plus numeric values spent
(ctx.cost.total_usd()) and threshold (self.max_usd); use tracing::debug! or
tracing::trace! with key=value fields so runtime breadcrumbs exist for both the
Stop { reason: ... } branch and the Continue branch in async fn check(&self,
ctx: &TurnState<'_>).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5a0d1b2d-4708-4b48-82f3-b0b3a1cb90db
📒 Files selected for processing (33)
src/openhuman/agent/cost.rssrc/openhuman/agent/debug/mod.rssrc/openhuman/agent/harness/builtin_definitions.rssrc/openhuman/agent/harness/definition.rssrc/openhuman/agent/harness/definition_loader.rssrc/openhuman/agent/harness/definition_tests.rssrc/openhuman/agent/harness/fork_context.rssrc/openhuman/agent/harness/mod.rssrc/openhuman/agent/harness/payload_summarizer.rssrc/openhuman/agent/harness/sandbox_context.rssrc/openhuman/agent/harness/session/tests.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/session/turn_tests.rssrc/openhuman/agent/harness/subagent_runner/ops.rssrc/openhuman/agent/harness/subagent_runner/ops_tests.rssrc/openhuman/agent/harness/subagent_runner/types.rssrc/openhuman/agent/harness/tool_loop.rssrc/openhuman/agent/harness/tool_loop_tests.rssrc/openhuman/agent/mod.rssrc/openhuman/agent/progress.rssrc/openhuman/agent/stop_hooks.rssrc/openhuman/channels/providers/web.rssrc/openhuman/channels/runtime/dispatch.rssrc/openhuman/threads/turn_state/mirror.rssrc/openhuman/tools/impl/agent/spawn_subagent.rssrc/openhuman/tools/impl/filesystem/file_read.rssrc/openhuman/tools/impl/filesystem/glob_search.rssrc/openhuman/tools/impl/filesystem/grep.rssrc/openhuman/tools/impl/network/web_fetch.rssrc/openhuman/tools/impl/system/shell.rssrc/openhuman/tools/orchestrator_tools.rssrc/openhuman/tools/traits.rstests/agent_harness_public.rs
💤 Files with no reviewable changes (8)
- src/openhuman/agent/harness/definition_loader.rs
- src/openhuman/channels/runtime/dispatch.rs
- src/openhuman/agent/harness/session/turn_tests.rs
- src/openhuman/agent/harness/session/tests.rs
- src/openhuman/agent/harness/payload_summarizer.rs
- src/openhuman/agent/harness/definition_tests.rs
- src/openhuman/agent/harness/definition.rs
- src/openhuman/tools/orchestrator_tools.rs
- agent/cost.rs: route 'coding' model strings to PRICING_TABLE[2] (coding tier) instead of PRICING_TABLE[1] (agentic). The two rows have identical numbers today, but they're separate tiers — when they diverge the misroute would silently misestimate turn cost. - agent/stop_hooks.rs: BudgetStopHook now fails closed on non-finite or non-positive max_usd values. NaN comparisons always return false, so 'spent >= NaN' previously disabled the guard silently. Negative / infinite caps now stop with an explicit reason instead of producing undefined behaviour. Tests: - lookup_pricing_routes_coding_to_coding_row_not_agentic - budget_hook_fails_closed_on_nan_cap - budget_hook_fails_closed_on_non_positive_cap
Summary
agent::cost::TurnCostaccumulator andAgentProgress::TurnCostUpdatedevent — per-turn USD tracking, prefers backend-reportedcharged_amount_usdwith a tier-keyed token-rate fallback.agent::stop_hookstask-local + built-inBudgetStopHook/MaxIterationsStopHook— fired between iterations ofrun_tool_call_loopso policy can halt a turn before the next provider call.Tool::max_result_size_charscap (defaultNone) — applied toshell(30k chars) andweb_fetch(50k chars) so chatty tools can't blow the context window.Tool::is_concurrency_safeflag — annotated onfile_read/grep/glob/web_fetch. Trait surface only; dispatch wiring is tracked in Wire is_concurrency_safe into tool_loop dispatch (parallel reads) #1267.FORK_CONTEXT/uses_fork_contexthalf of the fork-context module — only the syntheticforkarchetype ever set it, modern providers do automatic prefix caching, ~570 lines deleted.Problem
Comparing the agent harness to the Claude Code spec at kuberwastaken/claurst surfaced four gaps that produce real pain on long turns:
InterruptFence; runaway turns burn tokens untilmax_tool_iterationssaves us.shell(find /,pnpm install) orweb_fetch(1 MB HTML page) call could push 100k+ chars into history, evicting useful context.Separately, the
FORK_CONTEXThalf ofharness::fork_contexthad been built but never wired: only the syntheticforkarchetype setuses_fork_context = true, and zeroagent.tomlfiles used it. Carrying that code is a maintenance tax for no win.Solution
Five focused commits, each with its own tests:
refactor(agent): drop dead FORK_CONTEXT half of fork_context— keptPARENT_CONTEXT(load-bearing for everyspawn_subagent/ triage path), removedForkContext,with_fork_context/current_fork, the syntheticfork_definition(), theuses_fork_contextflag,run_fork_mode, and themode: \"fork\"arg onspawn_subagent.feat(agent): per-turn USD cost accumulator—agent::cost::TurnCostsummed insiderun_tool_call_loop, emitsTurnCostUpdatedafter every provider response with a usage block. Falls back to a tier-keyed pricing table (reasoning-v1/agentic-v1/coding-v1) when the backend doesn't surface billing.feat(agent): mid-turn stop hooks for policy-driven halts— task-local-scopedStopHooktrait. Built-ins:BudgetStopHook(USD cap, reads fromTurnCost),MaxIterationsStopHook. Distinct fromInterruptFence(which is user-driven). Hooks ride a task-local because the loop signature is already 16 params and the function has 13 call sites.feat(tools): per-tool max_result_size_chars cap—Tool::max_result_size_chars(defaultNone). Applied toshellandweb_fetch. When the cap fires, the body is truncated with[truncated by tool cap: N more chars not shown]and the globalPayloadSummarizeris skipped for that call.feat(tools): is_concurrency_safe trait method— the trait surface for parallel dispatch. Annotated onfile_read/grep/glob/web_fetch. The dispatch refactor (a ~300-line restructuring of the per-call body) is intentionally a follow-up — see Wire is_concurrency_safe into tool_loop dispatch (parallel reads) #1267.Submission Checklist
docs/TESTING-STRATEGY.mdtool_loop.rscovered by 2 new integration tests (stop-hook abort + per-tool cap).docs/TESTING-STRATEGY.md)Closes #NNNin the## RelatedsectionImpact
TurnCostUpdatedevent lands on the existingAgentProgresschannel;web.rsconsumes it as a debug log line for now.shell/web_fetchheavy turns; cost tracker adds onempsc::sendper provider response (negligible).mode: \"fork\"callers (none in tree) would now get an unknown-arg error; nobody uses it.PARENT_CONTEXTretained, system-prompt byte stability preserved.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
Validation Blocked
Behavior Changes
Parity Contract
Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Improvements
Tests