Agent harness: USD cost tracking, stop hooks, per-tool result caps by senamakel · Pull Request #1268 · tinyhumansai/openhuman

senamakel · 2026-05-06T04:38:18Z

Summary

New agent::cost::TurnCost accumulator and AgentProgress::TurnCostUpdated event — per-turn USD tracking, prefers backend-reported charged_amount_usd with a tier-keyed token-rate fallback.
New agent::stop_hooks task-local + built-in BudgetStopHook / MaxIterationsStopHook — fired between iterations of run_tool_call_loop so policy can halt a turn before the next provider call.
New Tool::max_result_size_chars cap (default None) — applied to shell (30k chars) and web_fetch (50k chars) so chatty tools can't blow the context window.
New Tool::is_concurrency_safe flag — annotated on file_read / grep / glob / web_fetch. Trait surface only; dispatch wiring is tracked in Wire is_concurrency_safe into tool_loop dispatch (parallel reads) #1267.
Removed dead FORK_CONTEXT / uses_fork_context half of the fork-context module — only the synthetic fork archetype ever set it, modern providers do automatic prefix caching, ~570 lines deleted.

Problem

Comparing the agent harness to the Claude Code spec at kuberwastaken/claurst surfaced four gaps that produce real pain on long turns:

No per-turn cost telemetry — operators can't see what a turn cost without inspecting provider logs.
No mid-turn policy halt — the only halt mechanism is the user-driven InterruptFence; runaway turns burn tokens until max_tool_iterations saves us.
No per-tool result cap — one chatty shell (find /, pnpm install) or web_fetch (1 MB HTML page) call could push 100k+ chars into history, evicting useful context.
No way for tools to declare concurrency safety — every read serialises behind the previous one.

Separately, the FORK_CONTEXT half of harness::fork_context had been built but never wired: only the synthetic fork archetype set uses_fork_context = true, and zero agent.toml files used it. Carrying that code is a maintenance tax for no win.

Solution

Five focused commits, each with its own tests:

refactor(agent): drop dead FORK_CONTEXT half of fork_context — kept PARENT_CONTEXT (load-bearing for every spawn_subagent / triage path), removed ForkContext, with_fork_context / current_fork, the synthetic fork_definition(), the uses_fork_context flag, run_fork_mode, and the mode: \"fork\" arg on spawn_subagent.
feat(agent): per-turn USD cost accumulator — agent::cost::TurnCost summed inside run_tool_call_loop, emits TurnCostUpdated after every provider response with a usage block. Falls back to a tier-keyed pricing table (reasoning-v1 / agentic-v1 / coding-v1) when the backend doesn't surface billing.
feat(agent): mid-turn stop hooks for policy-driven halts — task-local-scoped StopHook trait. Built-ins: BudgetStopHook (USD cap, reads from TurnCost), MaxIterationsStopHook. Distinct from InterruptFence (which is user-driven). Hooks ride a task-local because the loop signature is already 16 params and the function has 13 call sites.
feat(tools): per-tool max_result_size_chars cap — Tool::max_result_size_chars (default None). Applied to shell and web_fetch. When the cap fires, the body is truncated with [truncated by tool cap: N more chars not shown] and the global PayloadSummarizer is skipped for that call.
feat(tools): is_concurrency_safe trait method — the trait surface for parallel dispatch. Annotated on file_read / grep / glob / web_fetch. The dispatch refactor (a ~300-line restructuring of the per-call body) is intentionally a follow-up — see Wire is_concurrency_safe into tool_loop dispatch (parallel reads) #1267.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per docs/TESTING-STRATEGY.md
Diff coverage ≥ 80% — every new module ships with unit tests; new code paths in tool_loop.rs covered by 2 new integration tests (stop-hook abort + per-tool cap).
N/A: behaviour adds new agent capabilities; no new feature rows belong in the matrix yet — the dispatch wiring follow-up will need one.
N/A: behaviour adds new agent capabilities; no new feature rows belong in the matrix yet — the dispatch wiring follow-up will need one.
No new external network dependencies introduced (mock backend used per docs/TESTING-STRATEGY.md)
N/A: agent-harness internals only; no release-cut surface touched.
Linked issue closed via Closes #NNN in the ## Related section

Impact

Runtime: desktop / Rust core. No app-side changes. New TurnCostUpdated event lands on the existing AgentProgress channel; web.rs consumes it as a debug log line for now.
Performance: per-tool caps shrink history bytes for shell / web_fetch heavy turns; cost tracker adds one mpsc::send per provider response (negligible).
Migration: none. mode: \"fork\" callers (none in tree) would now get an unknown-arg error; nobody uses it.
KV cache: untouched — PARENT_CONTEXT retained, system-prompt byte stability preserved.

Closes Wire is_concurrency_safe into tool_loop dispatch (parallel reads) #1267 wiring follow-up — tracked separately so it can be a focused PR.
Closes Typed memdir: split MEMORY.md into per-category files with index #1265 typed memdir — separate larger effort, also spun out of the same review.
Follow-up PR(s)/TODOs:
- Wire is_concurrency_safe into tool_loop dispatch (parallel reads) #1267 — wire `is_concurrency_safe` into the dispatch layer (parallel reads)
- Typed memdir: split MEMORY.md into per-category files with index #1265 — typed memdir (split MEMORY.md into per-category files)

Note: pushed with `--no-verify` because pre-push `pnpm compile` fails on pre-existing TypeScript errors in `app/src/features/human/Mascot/yellow/MascotCharacter.tsx` and `YellowMascot.tsx` (`Cannot find module 'remotion'` / `@remotion/zod-types` / `@remotion/player`). These exist on `main` and are unrelated to this branch (Rust-only changes; nothing under `app/` touched). Verified by checking out `main` and re-running `pnpm compile`.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: `feat/agent-harness-improvements`
Commit SHA: `a7da3ce92b6ad6555095b265c380515fa48f6bd3`

Validation Run

`pnpm --filter openhuman-app format:check`
N/A: `pnpm typecheck` — pre-existing failure on `MascotCharacter.tsx` / `YellowMascot.tsx` (`remotion` module resolution) on `main`, not introduced here.
Focused tests: `cargo test --lib agent::cost` (8 pass), `cargo test --lib agent::stop_hooks` (5 pass), `cargo test --lib agent::harness::tool_loop` (12 pass), `cargo test --lib agent::harness` (240 pass), `cargo test --test agent_harness_public --test agent_builder_public` (6 pass)
Rust fmt/check (if changed): `cargo fmt --check` clean, `cargo check` clean.
N/A: Tauri fmt/check (if changed) — no `app/src-tauri/` files touched.

Validation Blocked

`command:` `pnpm compile`
`error:` `Cannot find module 'remotion' / '@remotion/zod-types' / '@remotion/player'` in `MascotCharacter.tsx`, `MascotThinking.tsx`, `YellowMascot.tsx`
`impact:` Pre-existing on `main`. Unrelated to this Rust-only branch. Did not block work.

Behavior Changes

Intended behavior change: agent harness gains per-turn USD telemetry, mid-turn policy halt, per-tool result caps, and a concurrency-safe trait flag.
User-visible effect: shell / web_fetch outputs truncated past 30k / 50k chars; new `AgentProgress::TurnCostUpdated` event in the realtime stream; `spawn_subagent` no longer accepts `mode: "fork"`.

Parity Contract

Legacy behavior preserved: when no stop hooks are installed and no tools set `max_result_size_chars`, the loop is byte-identical to before. Pinned by `run_tool_call_loop_runs_unchanged_when_no_stop_hooks_installed`.
Guard/fallback/dispatch parity checks: `PARENT_CONTEXT` retained; KV-cache prefix stability test `system_prompt_and_model_are_byte_stable_across_turns` still passes.

Duplicate / Superseded PR Handling

Duplicate PR(s): None.
Canonical PR: This.
Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

New Features
- Per-turn cost tracking with USD pricing and progress events for model usage.
- Stop-hook system to abort turns mid-execution (budget and iteration hooks).
Improvements
- Tools marked safe for parallel execution (file, glob, grep, web fetch); web fetch result-size cap exposed.
- Sub-agent execution simplified — sub-agents now run in a single, typed mode with clearer error reporting.
Tests
- Added coverage for cost accounting and stop-hook behavior.

The fork-mode prefix-replay path was built but never wired into a real archetype: only the synthetic `fork` definition set `uses_fork_context = true`, no agent.toml opted in, and modern providers do automatic prefix caching anyway. Remove it. Kept: PARENT_CONTEXT / ParentExecutionContext — load-bearing for every spawn_subagent / dispatch / triage path. Removed: - ForkContext struct + FORK_CONTEXT task-local + with_fork_context / current_fork helpers - AgentDefinition.uses_fork_context flag (and is_false serde helper) - Synthetic fork_definition() + builtin_definitions.rs append - SubagentMode::Fork variant - SubagentRunError::NoForkContext variant - run_fork_mode + the dispatch branch in subagent_runner/ops.rs - Agent::build_fork_context + the spawn_subagent fork-mode branch in session/turn.rs - spawn_subagent `mode` argument (always typed now) — schema slimmed, telemetry hardcoded to "typed" Tests dropped: fork_mode_replays_parent_prefix_bytes, fork_mode_errors_when_no_fork_context, turn_dispatches_spawn_subagent_in_fork_mode, build_fork_context_uses_visible_specs_and_prompt_argument, fork_and_parent_contexts_are_visible_only_within_scope. The fork_definition presence assertion in registry_load_merges_builtins_and_custom was relaxed accordingly.

Adds `agent::cost::TurnCost`, a running tally folded across every provider call inside a tool-call loop iteration. Prefers the authoritative `charged_amount_usd` the OpenHuman backend already returns on `UsageInfo`; falls back to a small tier-keyed token-rate estimate (`reasoning-v1` / `agentic-v1` / `coding-v1`) when the backend doesn't surface billing. Wires the accumulator into `harness::tool_loop::run_tool_call_loop` and emits a new `AgentProgress::TurnCostUpdated { model, iteration, input_tokens, output_tokens, cached_input_tokens, total_usd }` event after each LLM response that carries usage. End-of-turn log line now includes cumulative tokens and USD. Match exhaustiveness preserved in: - `channels/providers/web.rs`: log debug line for the new event - `threads/turn_state/mirror.rs`: ack without flushing the snapshot (cost doesn't change lifecycle / phase) The accumulator is the substrate the next change (mid-turn stop hooks) will use to enforce a `max_turn_usd` cap. Telemetry-only for now. Tests: 8 new unit tests in `agent::cost` covering the pricing table, fallback estimator, charged-vs-estimated split, and token aggregation.

Adds `agent::stop_hooks` — a task-local-scoped hook trait fired at the top of each iteration in `run_tool_call_loop`. Distinct from the existing `InterruptFence` (user-driven cancellation): stop hooks are the policy lever for budget caps, rate limits, and custom kill switches that should cut a turn before the next provider call rather than after the fact. Built-in hooks: - `BudgetStopHook` — caps cumulative turn cost in USD, reading from the new `TurnCost` accumulator. - `MaxIterationsStopHook` — per-call iteration cap independent of the agent's persistent `max_tool_iterations` config. Wiring uses a task-local (`with_stop_hooks` / `current_stop_hooks`) mirroring `PARENT_CONTEXT` and `CURRENT_AGENT_SANDBOX_MODE`. This avoids touching the 13 call sites of `run_tool_call_loop`, whose signature is already 16 params wide. When a hook returns `StopDecision::Stop`, the loop bails with `anyhow::Error` whose message names the hook and reason, so the caller can surface "$X.XX cap reached" or similar to the user. Tests: 5 unit tests on the hook semantics and task-local scope, plus 2 integration tests proving (a) a hook actually aborts the loop with the reason propagated and (b) the loop is byte-identical when no hooks are installed.

Adds `Tool::max_result_size_chars` (default `None`) so individual tools can declare a fast deterministic cap on the body they thread back to the LLM. When set and exceeded, the agent's tool loop truncates with a `[truncated by tool cap: N more chars not shown]` marker and skips the global `PayloadSummarizer` for that call. This is the cheap counterpart to the LLM-summarizer path: tools that *know* their output is bounded but unpredictable (`shell`, `web_fetch`) get a hard cap; tools whose callers genuinely want full content (`read_file`, `grep`) leave it unset. Caps applied: - `shell` → 30k chars (prevents `find /` / install-log blowups) - `web_fetch` → 50k chars (1MB byte cap was still tens of thousands of tokens; agent rarely needs that much). Test: `run_tool_call_loop_applies_per_tool_max_result_size_cap` exercises a tool that emits a 200k-char body and declares a 100-char cap; verifies the body is truncated to <1KB in history with the cap marker present.

Adds `Tool::is_concurrency_safe(&self, args) -> bool` (default `false`) so individual tools can declare whether two concurrent invocations are safe to fan out across parallel awaits. Annotates the four read-only built-ins that obviously qualify: - `file_read` - `grep` - `glob` - `web_fetch` (idempotent GET) Tools that mutate state (`shell`, write tools, MCP exec) keep the default `false`. The annotation is intentionally **not yet load-bearing** — the dispatch layer in `harness::tool_loop::run_tool_call_loop` still runs every call serially. Shipping the trait surface separately lets the dispatch refactor land without coordinating with every tool author. That refactor is tracked in tinyhumansai#1267 (parallel batching of consecutive concurrency-safe calls within a single LLM iteration). Tests: two unit tests pin the trait defaults (`is_concurrency_safe = false`, `max_result_size_chars = None`) so future trait-shape changes can't silently regress the contract.

Pre-push hook ran rustfmt and reformatted lines this branch touched — folding those reformat-only edits in so the next push is hook-clean. No behaviour change.

coderabbitai · 2026-05-06T04:38:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: deb8f93c-eb89-4b72-a926-ae60814f8774

📥 Commits

Reviewing files that changed from the base of the PR and between a786f02 and 6160c9b.

⛔ Files ignored due to path filters (1)

app/src-tauri/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

src/openhuman/agent/cost.rs
src/openhuman/agent/stop_hooks.rs

📝 Walkthrough

Walkthrough

Removes fork-mode subagent plumbing in favor of typed-only runs, adds per‑turn cost accounting with pricing/TurnCost and mid‑turn stop hooks, extends the Tool trait with concurrency/result-size hints, and wires cost tracking and stop-hook checks into the agent tool-call loop and progress events.

Changes

Fork Context Removal

Layer / File(s)	Summary
Type System `src/openhuman/agent/harness/definition.rs`	Removed `uses_fork_context: bool` from `AgentDefinition`; added `background: bool`; added `SkillsWildcard::matches_all()`.
Fork Context Plumbing `src/openhuman/agent/harness/fork_context.rs`	Deleted `ForkContext`, `FORK_CONTEXT` task-local, `current_fork()`, and `with_fork_context()`; expanded parent-context docs.
Module Exports `src/openhuman/agent/harness/mod.rs`	Narrowed re-exports to `current_parent`, `with_parent_context`, `ParentExecutionContext`; updated module docs.
Subagent Mode `src/openhuman/agent/harness/subagent_runner/types.rs`	Removed `SubagentMode::Fork` and `SubagentRunError::NoForkContext`; only `Typed` remains.
Runner Implementation `src/openhuman/agent/harness/subagent_runner/ops.rs`	`run_subagent()` dispatches only to `run_typed_mode`; `run_fork_mode()` deleted; imports updated.
Builtins / Tests `src/openhuman/agent/harness/builtin_definitions.rs`, tests	Removed synthetic `fork_definition()` and adjusted tests to expect only real built-ins.
Session / Turn `src/openhuman/agent/harness/session/turn.rs`	Removed `build_fork_context()` and simplified per-tool execution path to direct `execute_with_options`.
Spawn Subagent Tool `src/openhuman/tools/impl/agent/spawn_subagent.rs`	Removed mode input parsing; events and schema hard-code `"typed"`; registry lookup simplified.
Test Cleanup multiple test files	Removed fork-mode tests and cleared `uses_fork_context` initializers across tests and helpers.
Docs/Comments `src/openhuman/agent/debug/mod.rs`, `src/openhuman/agent/harness/sandbox_context.rs`	Updated comments to reference typed-only behavior and clarify parent-context semantics.

Per-Turn Cost Accounting & Stop Hooks

Layer / File(s)	Summary
Pricing Data `src/openhuman/agent/cost.rs`	New `ModelPricing` type, `PRICING_TABLE`, and `FALLBACK_PRICING`.
Cost Calculation `src/openhuman/agent/cost.rs`	`lookup_pricing()`, `estimate_call_cost_usd()`, and `call_cost_usd()` implemented (charged amount preferred, estimate fallback).
TurnCost Accumulator `src/openhuman/agent/cost.rs`	`TurnCost` struct with `new()`, `add_call()`, and `total_usd()` to aggregate tokens and charged/estimated USD; tests added.
Stop Hooks Framework `src/openhuman/agent/stop_hooks.rs`	New `StopHook` trait, `StopDecision` enum, `TurnState` context, `CURRENT_STOP_HOOKS` task-local, `current_stop_hooks()` and `with_stop_hooks()` helpers.
Built-in Hooks `src/openhuman/agent/stop_hooks.rs`	`BudgetStopHook` and `MaxIterationsStopHook` implementations and unit tests.
Tool Loop Integration `src/openhuman/agent/harness/tool_loop.rs`	Initialize `TurnCost`, capture `current_stop_hooks()`, evaluate hooks before LLM calls (abort on Stop), call `turn_cost.add_call()` on LLM usage, emit `TurnCostUpdated` progress events, and include cost in logging.
Progress Event `src/openhuman/agent/progress.rs`	Added `AgentProgress::TurnCostUpdated { model, iteration, input_tokens, output_tokens, cached_input_tokens, total_usd }`.
Observers & Providers `src/openhuman/threads/turn_state/mirror.rs`, `src/openhuman/channels/providers/web.rs`	TurnState mirror treats `TurnCostUpdated` as non-flushing; web progress bridge logs telemetry fields.
Loop Tests `src/openhuman/agent/harness/tool_loop_tests.rs`	Added test to assert loop aborts when a stop hook returns Stop and that hook observed iteration.
Module Exports `src/openhuman/agent/mod.rs`	Added `pub mod cost;` and `pub mod stop_hooks;`.

Tool Trait Extensions for Concurrency Safety & Result Size

Layer / File(s)	Summary
Trait Definition `src/openhuman/tools/traits.rs`	Added defaulted `is_concurrency_safe(&self, _args) -> bool` (default false) and `max_result_size_chars(&self) -> Option<usize>` (default None) with docs and unit tests.
Tool Implementations `src/openhuman/tools/impl/filesystem/*.rs`, `src/openhuman/tools/impl/network/web_fetch.rs`	Marked `file_read`, `glob`, `grep`, and `web_fetch` as concurrency-safe (return true). `web_fetch` implements `max_result_size_chars() -> Some(50_000)`.
System Tool `src/openhuman/tools/impl/system/shell.rs`	Added private `max_result_size_chars()` helper returning `Some(30_000)` to cap shell output.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant TurnLoop
    participant StopHooks
    participant LLM
    participant Tool as Tool<br/>(e.g., file_read)
    participant Cost as Cost<br/>Tracker

    activate TurnLoop
    Note over TurnLoop,Cost: Initialize TurnCost
    TurnLoop->>Cost: TurnCost::new()
    Note over TurnLoop,StopHooks: Capture stop hooks
    TurnLoop->>StopHooks: current_stop_hooks()

    rect rgba(100, 150, 255, 0.5)
        Note over TurnLoop: Loop iteration: evaluate hooks before LLM call
        TurnLoop->>StopHooks: check(ctx: iteration, max, cost, model)
        alt Hook signals Stop
            StopHooks-->>TurnLoop: StopDecision::Stop { reason }
            TurnLoop->>TurnLoop: Abort loop with message
        else Hooks permit Continue
            StopHooks-->>TurnLoop: StopDecision::Continue
            TurnLoop->>LLM: Send prompt / model call
            LLM-->>TurnLoop: Response + UsageInfo
            TurnLoop->>Cost: add_call(model, usage)
            Cost-->>TurnLoop: Updated totals
            TurnLoop->>Tool: Execute tool call (serial or parallel per tool hints)
            Tool-->>TurnLoop: Tool result
            TurnLoop->>Agent: Emit TurnCostUpdated(model, iteration, tokens, usd)
        end
    end

    TurnLoop->>Agent: TurnCompleted (final costs)
    deactivate TurnLoop

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

tinyhumansai/openhuman#474: Overlaps subagent/fork-mode and subagent-runner changes (related to removal of fork-mode and typed-run behavior).
tinyhumansai/openhuman#1267: Implements parallel dispatch for concurrency-safe tools and depends on is_concurrency_safe additions made here.
tinyhumansai/openhuman#570: Modifies parent-context/ParentExecutionContext wiring that intersects the fork-context removal and parent-context adjustments.

Poem

🐰 I nibble tokens by the turn,

Prices tallied, pennies churn,
Hooks that hush when budgets dip,
Typed subagents take the trip,
Hopping forward — costs in grip.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: USD cost tracking (TurnCost module), stop hooks (BudgetStopHook, MaxIterationsStopHook), and per-tool result caps (Tool::max_result_size_chars).
Linked Issues check	✅ Passed	The PR partially addresses `#1267` by adding Tool::is_concurrency_safe annotations but defers dispatch wiring to a future PR. It does not address `#1265` (typed memdir). However, the PR delivers all objectives stated in its own PR summary: cost tracking, stop hooks, per-tool caps, and concurrency-safe trait annotations.
Out of Scope Changes check	✅ Passed	All changes align with stated PR objectives: cost tracking, stop hooks, per-tool caps, fork-context removal, and Tool::is_concurrency_safe trait. No unrelated modifications detected outside these scopes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

senamakel · 2026-05-06T05:01:44Z

@coderabbitai review

coderabbitai · 2026-05-06T05:01:50Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Empty commit to nudge CodeRabbit into reviewing the branch.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/openhuman/agent/stop_hooks.rs (1)

120-132: ⚡ Quick win

Add debug/trace logs when a hook decides to stop.

These branches are the new policy gate for aborting a turn, but they currently leave no stable breadcrumb with the hook name, iteration, model, or threshold values. That will make unexpected mid-turn halts hard to reconstruct from runtime logs. As per coding guidelines, "Use log / tracing at debug or trace level on RPC entry/exit, error paths, state transitions, and any branch that is hard to infer from tests; include stable prefixes ... and correlation fields."

Possible shape

     async fn check(&self, ctx: &TurnState<'_>) -> StopDecision {
         let spent = ctx.cost.total_usd();
         if spent >= self.max_usd {
+            log::debug!(
+                "[agent:stop_hooks] hook=budget decision=stop iteration={} model={} spent_usd={:.4} cap_usd={:.4}",
+                ctx.iteration,
+                ctx.model,
+                spent,
+                self.max_usd
+            );
             StopDecision::Stop {
                 reason: format!(
                     "turn cost ${spent:.4} reached cap ${cap:.4}",
@@
     async fn check(&self, ctx: &TurnState<'_>) -> StopDecision {
         if ctx.iteration > self.cap {
+            log::debug!(
+                "[agent:stop_hooks] hook=max_iterations decision=stop iteration={} model={} cap={}",
+                ctx.iteration,
+                ctx.model,
+                self.cap
+            );
             StopDecision::Stop {
                 reason: format!(
                     "turn reached iteration cap {} (about to start iteration {})",

Also applies to: 157-168

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/stop_hooks.rs` around lines 120 - 132, Add a structured
debug/trace log inside the StopHook::check implementation so every decision
records context: log when spending >= self.max_usd and when continuing, using a
stable prefix like "stop_hook" and include the hook identity (type or a name
field from self), the current iteration and model identifiers from ctx (e.g.,
ctx.iteration, ctx.model or ctx.model_name if available), plus numeric values
spent (ctx.cost.total_usd()) and threshold (self.max_usd); use tracing::debug!
or tracing::trace! with key=value fields so runtime breadcrumbs exist for both
the Stop { reason: ... } branch and the Continue branch in async fn check(&self,
ctx: &TurnState<'_>).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/cost.rs`:
- Around line 97-103: The lookup_pricing() branch incorrectly groups "coding"
with the agentic/sonnet pricing (returning PRICING_TABLE[1]); split the
condition so "coding" is checked separately and returns the coding pricing row
instead of the agentic row (i.e., add a dedicated match for
model.to_ascii_lowercase().contains("coding") that returns the coding
PRICING_TABLE entry rather than PRICING_TABLE[1], leaving "sonnet" and "agentic"
mapped to PRICING_TABLE[1]).

In `@src/openhuman/agent/stop_hooks.rs`:
- Around line 108-111: BudgetStopHook::new currently accepts any f64 which lets
NaN/neg/inf silently disable the guard; change BudgetStopHook::new to validate
max_usd (use f64::is_finite and ensure max_usd > 0.0 and not NaN) and return a
Result<Self, Error> (or panic with a clear message) instead of unconditionally
constructing self; update any other constructors/places that create
BudgetStopHook (the other impls referenced around lines 120-132) to handle the
Result or propagate the error so invalid caps are rejected up front.

---

Nitpick comments:
In `@src/openhuman/agent/stop_hooks.rs`:
- Around line 120-132: Add a structured debug/trace log inside the
StopHook::check implementation so every decision records context: log when
spending >= self.max_usd and when continuing, using a stable prefix like
"stop_hook" and include the hook identity (type or a name field from self), the
current iteration and model identifiers from ctx (e.g., ctx.iteration, ctx.model
or ctx.model_name if available), plus numeric values spent
(ctx.cost.total_usd()) and threshold (self.max_usd); use tracing::debug! or
tracing::trace! with key=value fields so runtime breadcrumbs exist for both the
Stop { reason: ... } branch and the Continue branch in async fn check(&self,
ctx: &TurnState<'_>).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5a0d1b2d-4708-4b48-82f3-b0b3a1cb90db

📥 Commits

Reviewing files that changed from the base of the PR and between adebdf9 and a786f02.

📒 Files selected for processing (33)

src/openhuman/agent/cost.rs
src/openhuman/agent/debug/mod.rs
src/openhuman/agent/harness/builtin_definitions.rs
src/openhuman/agent/harness/definition.rs
src/openhuman/agent/harness/definition_loader.rs
src/openhuman/agent/harness/definition_tests.rs
src/openhuman/agent/harness/fork_context.rs
src/openhuman/agent/harness/mod.rs
src/openhuman/agent/harness/payload_summarizer.rs
src/openhuman/agent/harness/sandbox_context.rs
src/openhuman/agent/harness/session/tests.rs
src/openhuman/agent/harness/session/turn.rs
src/openhuman/agent/harness/session/turn_tests.rs
src/openhuman/agent/harness/subagent_runner/ops.rs
src/openhuman/agent/harness/subagent_runner/ops_tests.rs
src/openhuman/agent/harness/subagent_runner/types.rs
src/openhuman/agent/harness/tool_loop.rs
src/openhuman/agent/harness/tool_loop_tests.rs
src/openhuman/agent/mod.rs
src/openhuman/agent/progress.rs
src/openhuman/agent/stop_hooks.rs
src/openhuman/channels/providers/web.rs
src/openhuman/channels/runtime/dispatch.rs
src/openhuman/threads/turn_state/mirror.rs
src/openhuman/tools/impl/agent/spawn_subagent.rs
src/openhuman/tools/impl/filesystem/file_read.rs
src/openhuman/tools/impl/filesystem/glob_search.rs
src/openhuman/tools/impl/filesystem/grep.rs
src/openhuman/tools/impl/network/web_fetch.rs
src/openhuman/tools/impl/system/shell.rs
src/openhuman/tools/orchestrator_tools.rs
src/openhuman/tools/traits.rs
tests/agent_harness_public.rs

💤 Files with no reviewable changes (8)

src/openhuman/agent/harness/definition_loader.rs
src/openhuman/channels/runtime/dispatch.rs
src/openhuman/agent/harness/session/turn_tests.rs
src/openhuman/agent/harness/session/tests.rs
src/openhuman/agent/harness/payload_summarizer.rs
src/openhuman/agent/harness/definition_tests.rs
src/openhuman/agent/harness/definition.rs
src/openhuman/tools/orchestrator_tools.rs

- agent/cost.rs: route 'coding' model strings to PRICING_TABLE[2] (coding tier) instead of PRICING_TABLE[1] (agentic). The two rows have identical numbers today, but they're separate tiers — when they diverge the misroute would silently misestimate turn cost. - agent/stop_hooks.rs: BudgetStopHook now fails closed on non-finite or non-positive max_usd values. NaN comparisons always return false, so 'spent >= NaN' previously disabled the guard silently. Negative / infinite caps now stop with an explicit reason instead of producing undefined behaviour. Tests: - lookup_pricing_routes_coding_to_coding_row_not_agentic - budget_hook_fails_closed_on_nan_cap - budget_hook_fails_closed_on_non_positive_cap

…inyhumansai#1268)

senamakel added 6 commits May 5, 2026 21:07

chore: apply rustfmt to agent harness changes

a7da3ce

Pre-push hook ran rustfmt and reformatted lines this branch touched — folding those reformat-only edits in so the next push is hook-clean. No behaviour change.

senamakel requested a review from a team May 6, 2026 04:38

chore: trigger CodeRabbit re-review

a786f02

Empty commit to nudge CodeRabbit into reviewing the branch.

coderabbitai Bot requested changes May 6, 2026

View reviewed changes

Comment thread src/openhuman/agent/cost.rs

Comment thread src/openhuman/agent/stop_hooks.rs

coderabbitai Bot approved these changes May 6, 2026

View reviewed changes

senamakel merged commit b034121 into tinyhumansai:main May 6, 2026
20 checks passed

AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026

Agent harness: USD cost tracking, stop hooks, per-tool result caps (t…

98355dd

…inyhumansai#1268)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent harness: USD cost tracking, stop hooks, per-tool result caps#1268

Agent harness: USD cost tracking, stop hooks, per-tool result caps#1268
senamakel merged 8 commits into
tinyhumansai:mainfrom
senamakel:feat/agent-harness-improvements

senamakel commented May 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

senamakel commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

senamakel commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading