feat: token & cost governance (library exposure + pipeline ceilings) by clintecker · Pull Request #68 · 2389-research/tracker

clintecker · 2026-04-14T20:58:25Z

Summary

Expose per-provider token and dollar cost on Result.Cost in the library API, populated from the llm.TokenTracker middleware via llm.EstimateCost (closes feat: ensure full token/cost data exposed from library #62).
Enforce optional MaxTotalTokens / MaxCostCents / MaxWallTime ceilings via a new pipeline.BudgetGuard. Tripping any limit halts the run with pipeline.OutcomeBudgetExceeded and populates EngineResult.BudgetLimitsHit (closes fix(pipeline): no pipeline-level token budget or cost ceiling #17).
Stream per-node cost updates via new EventCostUpdated events (and EventBudgetExceeded on halt), each carrying a CostSnapshot payload with total tokens, dollar cost, per-provider totals, and wall elapsed.
New tracker.Config.Budget library field + --max-tokens, --max-cost, --max-wall-time CLI flags.
tracker diagnose and activity.jsonl now carry cost/halt data.

Scope note

Reading budget limits directly from .dip workflow attrs is blocked on dippin-lang IR support. Tracked as a follow-up in #67.

Test plan

Closes #62
Closes #17

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Run results now include cost reports with per‑provider USD breakdowns and total spend; session/provider info is surfaced in traces.
- Configurable budget enforcement (max tokens, max cost, max wall‑time) via config or CLI flags (--max-tokens, --max-cost, --max-wall-time).
- Pipelines emit streaming cost snapshots and halt when budgets are exceeded; halted runs show a clear "HALTED: budget exceeded" banner and reason.
Documentation
- Added cost governance guide and implementation plan covering visibility, enforcement, diagnostics, and CLI usage.
Tests
- Added unit and integration tests for cost rollups, events, and budget-halt behavior.

Adds CostByProvider and TotalCostUSD methods that map accumulated per-provider Usage to dollar cost via a caller-supplied ModelResolver callback. Enables the library API to expose cost breakdowns without the llm package knowing how pipelines resolve models. Refs #62

Adds SessionStats.Provider and UsageSummary.ProviderTotals so consumers (library callers, TUI, CLI summary) can see per-provider token and cost breakdowns from the trace alone, without needing to cross-reference the llm.TokenTracker middleware. Plumbing: SessionResult.Provider populated from SessionConfig.Provider in session.Run(), copied to SessionStats by buildSessionStats(). Refs #62 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Guards the wiring between SessionConfig.Provider and the downstream per-provider rollup in pipeline/trace.go. Follow-up to 730eb50. Refs #62

Adds Result.Cost populated from the llm.TokenTracker middleware via a per-provider cost rollup resolved against the graph's default llm_model attribute. Library consumers can now read per-provider token and dollar cost directly off the Result without cross- referencing trace entries. Refs #62

Adds a new PipelineEvent type carrying a CostSnapshot payload (TotalTokens, TotalCostUSD, per-provider breakdown, wall elapsed). Emitted after every terminal node outcome so streaming consumers can live-update cost dashboards without polling Trace.AggregateUsage. Also declares EventBudgetExceeded for use by the BudgetGuard in the follow-up task. Refs #62 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds BudgetLimits/BudgetGuard/BudgetBreach in a new pipeline/budget.go file, wires OutcomeBudgetExceeded into the run loop immediately after every terminal-node emitCostUpdate, and surfaces EventBudgetExceeded with the current CostSnapshot payload. EngineResult gains BudgetLimitsHit (list of dimension names that tripped) populated on halt. Token and cost ceilings are inclusive — exact equality is not a breach. The exit-node path uses a thin checkBudgetHaltForExit wrapper that returns *EngineResult (matching handleExitNode's signature) rather than loopResult; the advanceToNextNode path uses checkBudgetAfterEmit directly and returns *loopResult inline — both bubble up through processExitNode/processActiveNode to the main Run loop. Refs #17 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- BudgetGuard cost ceiling now uses math.Round before int conversion so accumulated float64 drift does not under-count cents. - haltForBudget computes trace.AggregateUsage() once and reuses the snapshot for both the emitted event and the returned EngineResult. Follow-up to 7c90885. Refs #17

Regression test that workflow-level max_total_tokens, max_cost_cents, and max_wall_time land on graph.Attrs with their exact keys, so the library can read them without further mapping. The test documents the expected behavior for when dippin-lang v0.x+ adds MaxTotalTokens, MaxCostCents, and MaxWallTime fields to WorkflowDefaults. Currently these fields don't exist in dippin-lang v0.18.0, so the test uses a manual graph construction to verify the adapter would handle them correctly. Refs #17 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 11cbfaf.

Library consumers set Config.Budget = pipeline.BudgetLimits{...} to enforce token/cost/wall-time ceilings on a run. When any limit is tripped, Result.Status becomes OutcomeBudgetExceeded and Result.Cost.LimitsHit names the dimension(s) that halted the run. Graph-attr fallback (reading limits from .dip workflow attrs) is a follow-up blocked on dippin-lang IR support — tracked in #67. Refs #17

The BudgetGuard wiring landed in 720b75e; the trailing sentence is now factually wrong. Refs #17

Three new flags on 'tracker run' populate Config.Budget, routing through pipeline.BudgetGuard. When a limit trips, the summary surfaces a 'HALTED: budget exceeded' section naming the dimension and showing the accumulated spend. Refs #17 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- events_jsonl.go now includes CostSnapshot fields (TotalTokens, TotalCostUSD, ProviderTotals, WallElapsedMs) so cost_updated and budget_exceeded events in activity.jsonl carry their payload. - buildLogEntry refactored from method to package-level function for direct testability; JSONLEventHandler.HandlePipelineEvent delegates to it. - tracker diagnose scans activity.jsonl for budget_exceeded events and prints a dedicated halt section with token/cost/wall-time and remediation guidance before the standard node-failure blocks. Refs #17 #62 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Documents the new Result.Cost / BudgetGuard / CLI flags / streaming events in CHANGELOG, README, and CLAUDE.md. Notes the follow-up blocker (#67) for .dip workflow-attr configuration. Refs #17 #62

Refs #17 #62

coderabbitai · 2026-04-14T21:00:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a11fdf1-4682-4904-b630-90f3fb646b38

📥 Commits

Reviewing files that changed from the base of the PR and between 6345d72 and afe4067.

📒 Files selected for processing (1)

README.md

✅ Files skipped from review due to trivial changes (1)

README.md

Walkthrough

Adds per-provider cost accounting and streaming cost events, exposes run-level CostReport, and implements a pipeline-level BudgetGuard enforcing token/cost/wall-time ceilings that can halt execution; wires CLI flags, diagnostics, and result/trace fields to surface budget behavior and cost snapshots.

Changes

Cohort / File(s)	Summary
Documentation `CHANGELOG.md`, `README.md`, `CLAUDE.md`, `docs/superpowers/plans/2026-04-14-token-cost-governance.md`	New cost-governance docs describing per-provider cost exposure, streaming cost events, BudgetGuard design, CLI flags, and note deferring `.dip` attr loading.
Token cost library `llm/token_tracker_cost.go`, `llm/token_tracker_cost_test.go`	Add per-provider cost rollups: `ProviderCost`, `ModelResolver`, `TokenTracker.CostByProvider()` and `TotalCostUSD()` plus tests.
Pipeline budget primitives `pipeline/budget.go`, `pipeline/budget_test.go`	New `BudgetLimits`, `BudgetGuard`, breach kinds and nil-safe creation; tests for tokens/cost/wall-time and boundary cases.
Pipeline events & serialization `pipeline/events.go`, `pipeline/events_jsonl.go`, `pipeline/events_jsonl_test.go`	Add `EventCostUpdated`, `EventBudgetExceeded`, `CostSnapshot` payload and JSONL serialization fields; tests for log entry building.
Trace & usage aggregation `pipeline/trace.go`, `pipeline/trace_test.go`, `pipeline/handlers/transcript.go`, `pipeline/handlers/transcript_test.go`	Add `Provider` to `SessionStats`, `ProviderUsage` type and `UsageSummary.ProviderTotals`; update aggregation and tests to roll up per-provider totals.
Engine integration & control flow `pipeline/engine.go`, `pipeline/engine_run.go`, `pipeline/engine_test.go`	Wire budget guard into engine (`WithBudgetGuard`), add `emitCostUpdate()`, budget check/halt helpers, `OutcomeBudgetExceeded`, `BudgetLimitsHit`; tests for cost-update emission and halting on breach.
Tracker library & API surface `tracker.go`, `tracker_test.go`	Add `Config.Budget`, new exported `CostReport` and `Result.Cost` (TotalUSD, ByProvider, LimitsHit); populate from token tracker using a default model resolver; tests added.
Agent session plumbing `agent/result.go`, `agent/session.go`, `agent/session_test.go`	Add `Provider` to `SessionResult` and populate from session config; unit test added.
CLI wiring & diagnostics `cmd/tracker/flags.go`, `cmd/tracker/main.go`, `cmd/tracker/commands.go`, `cmd/tracker/run.go`, `cmd/tracker/summary.go`, `cmd/tracker/diagnose.go`	Add `--max-tokens`, `--max-cost`, `--max-wall-time` flags and validation; propagate `activeBudgetLimits` into engine options; summary and diagnose surface budget halts and parse budget events.
Misc tests `agent/`, `pipeline/`, `tracker_test.go`	Added/updated tests across components validating provider propagation, cost rollups, event emission, and budget halting scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/CLI
    participant Engine as Pipeline Engine
    participant Node as Node Handler
    participant Budget as Budget Guard
    participant Events as Event Handler
    participant Token as TokenTracker

    Client->>Engine: Run(graph, options)
    loop For each node
        Engine->>Node: Execute node
        Node-->>Engine: SessionResult (provider, tokens)
        Engine->>Token: AddUsage(provider, tokens)
        Engine->>Engine: Add trace entry
        Engine->>Engine: emitCostUpdate()
        Engine->>Events: EventCostUpdated + CostSnapshot
        Engine->>Budget: Check(UsageSummary, started)
        alt Budget exceeded
            Budget-->>Engine: BudgetBreach
            Engine->>Engine: Set EndTime, Status=BudgetExceeded
            Engine->>Events: EventBudgetExceeded + CostSnapshot
            Engine-->>Client: Halt execution
        else Within limits
            Budget-->>Engine: BudgetOK
            Engine->>Engine: Continue
        end
    end
    Engine-->>Client: EngineResult (with BudgetLimitsHit)
    Client->>Client: Tracker populates Result.Cost from TokenTracker.CostByProvider()

sequenceDiagram
    participant Consumer as Library Consumer
    participant Tracker as Tracker Library
    participant Guard as Budget Guard
    participant Engine as Pipeline Engine
    participant Token as TokenTracker

    Consumer->>Tracker: Run(config{Budget})
    Tracker->>Guard: NewBudgetGuard(config.Budget)
    Tracker->>Engine: WithBudgetGuard(guard) and Run(graph)
    Engine->>Token: AddUsage(provider, usage)
    Engine->>Guard: Check(aggregated_usage, time)
    alt Exceeded
        Guard-->>Engine: BudgetBreach -> EngineResult.Status=BudgetExceeded
    else OK
        Guard-->>Engine: BudgetOK -> continue
    end
    Engine-->>Tracker: EngineResult
    Tracker->>Token: CostByProvider(resolver)
    Token-->>Tracker: Per-provider costs
    Tracker-->>Consumer: Result{Cost: {TotalUSD, ByProvider, LimitsHit}}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

feat: propagate BudgetLimits from .dip workflow attrs once dippin-lang supports them #67 — PR explicitly defers loading budget limits from .dip attrs and documents that integration as pending; directly related to implementing .dip-sourced budget propagation once dippin-lang IR supports it.

Possibly related PRs

feat: library API parity — Config fields, token breakdowns, ValidateSource #66 — Strongly related: also modifies per-provider token reporting and trace/result wiring for cost rollups.
feat: Claude Code execution backend for pipeline nodes #41 — Related plumbing changes touching agent.SessionResult → pipeline.SessionStats and transcript handling; overlaps with provider propagation changes.

Poem

🐰 I counted tokens, penny by penny,
I tracked each provider, gentle and ready.
When limits drew near, the guard gave a shout—
"Halt the run, friends, we've spent too much clout!"
Hooray for cost guards—safe hopping about. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.94% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: token & cost governance (library exposure + pipeline ceilings)' directly describes the core changes: exposing token/cost data from the library and adding pipeline-level budget enforcement.
Linked Issues check	✅ Passed	The PR comprehensively addresses both linked issues: `#62` requirements for per-provider cost exposure and real-time cost events are met via Result.Cost, CostSnapshot events, and ProviderTotals; `#17` requirements for budget ceilings and halt behavior are met via BudgetGuard, OutcomeBudgetExceeded status, and BudgetLimitsHit tracking.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the PR objectives. The provider tracking (agent/result.go, agent/session.go, SessionStats) supports cost attribution; CLI flags and diagnose updates enable budget enforcement UX; documentation captures the feature scope and defers .dip attribute loading to `#67` appropriately.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/token-cost-governance

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74334f7c23

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T21:04:21Z

+	return loopResult{
+		action: loopReturn,
+		result: &EngineResult{
+			RunID:           s.runID,
+			Status:          OutcomeBudgetExceeded,


Persist checkpoint before returning budget halt

haltForBudget returns immediately without saving checkpoint state. Because the guard is evaluated before advanceToNextNode writes the updated checkpoint, a budget-triggered stop can leave disk state behind in-memory completion, so --resume may re-run the last completed node and repeat side effects/cost instead of continuing from the correct frontier.

Useful? React with 👍 / 👎.

Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.

chatgpt-codex-connector · 2026-04-14T21:04:21Z

+	e.emitCostUpdate(s)
+	if lr := e.checkBudgetAfterEmit(s); lr != nil {
+		return *lr


Enforce budget checks on retry paths

Budget enforcement is only invoked in the normal advance/exit flow, not in retry handling. When a node returns OutcomeRetry, execution goes through handleRetryWithinBudget/handleRetryExhausted without calling checkBudgetAfterEmit, so repeated retries can exceed token/cost limits and still continue (or end as OutcomeFail) instead of halting with OutcomeBudgetExceeded.

Useful? React with 👍 / 👎.

Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.

chatgpt-codex-connector · 2026-04-14T21:04:21Z

+	if g.limits.MaxTotalTokens > 0 && usage.TotalTokens > g.limits.MaxTotalTokens {
+		return BudgetBreach{Kind: BudgetTokens, Message: "max_total_tokens exceeded"}
+	}
+	if g.limits.MaxCostCents > 0 && int(math.Round(usage.TotalCostUSD*100)) > g.limits.MaxCostCents {


Compare cost ceilings without rounding down overages

The cost breach check uses math.Round(usage.TotalCostUSD*100), which can hide real overages below 0.5¢ (for example, 1.4¢ rounds to 1¢ and passes a 1-cent ceiling). That violates the stated "exceeds this value" behavior for MaxCostCents and can let runs overspend without tripping the guard.

Useful? React with 👍 / 👎.

By design. math.Round gives us cent-level granularity, which matches the MaxCostCents unit. Sub-0.5¢ values round down, but the alternative (Ceil) would false-trip on any fractional cent above zero. The original bug was int() truncation dropping whole cents — Round fixes that without over-triggering.

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (1)

cmd/tracker/run.go (1)

129-131: Consider deduplicating budget-guard option wiring.

Both engine builders now repeat the same guard-append block. A tiny helper would reduce drift risk between console and TUI paths.

♻️ Possible refactor

 func buildEngineOptions(artifactDir, checkpoint string, evtHandler pipeline.PipelineEventHandler) []pipeline.EngineOption {
@@
-	if guard := pipeline.NewBudgetGuard(activeBudgetLimits); guard != nil {
-		opts = append(opts, pipeline.WithBudgetGuard(guard))
-	}
+	opts = appendBudgetGuardOption(opts, activeBudgetLimits)
 	return opts
 }
@@
 func buildTUIEngine(graph *pipeline.Graph, registry *pipeline.HandlerRegistry, artifactDir, checkpoint string, pipelineCombo pipeline.PipelineEventHandler) *pipeline.Engine {
@@
-	if guard := pipeline.NewBudgetGuard(activeBudgetLimits); guard != nil {
-		engineOpts = append(engineOpts, pipeline.WithBudgetGuard(guard))
-	}
+	engineOpts = appendBudgetGuardOption(engineOpts, activeBudgetLimits)
 	return pipeline.NewEngine(graph, registry, engineOpts...)
 }
+
+func appendBudgetGuardOption(opts []pipeline.EngineOption, limits pipeline.BudgetLimits) []pipeline.EngineOption {
+	if guard := pipeline.NewBudgetGuard(limits); guard != nil {
+		return append(opts, pipeline.WithBudgetGuard(guard))
+	}
+	return opts
+}

Also applies to: 415-417

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/run.go` around lines 129 - 131, Both engine-builder code paths
duplicate the same budget-guard wiring (calling pipeline.NewBudgetGuard and
conditionally appending pipeline.WithBudgetGuard); extract that logic into a
small helper (e.g., buildBudgetGuardOpts or appendBudgetGuardIfAny) and call it
from both engine builders so the guard creation and conditional append happen in
one place, referencing pipeline.NewBudgetGuard and pipeline.WithBudgetGuard to
locate the duplicated block and update both sites to use the helper.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 8-22: Update the CHANGELOG.md Unreleased section to include a "###
Fixed" subsection and move the existing non-feature caveat into a standard
heading (e.g., under "### Notes" should instead be folded into "### Changed" or
"### Fixed" as appropriate); specifically add one or more brief bullet(s) under
"### Fixed" describing the behavioral fixes shipped in this PR, ensure the file
follows the "Keep a Changelog" headings order (Added, Changed, Fixed, Removed)
and remove the custom "Notes" header or convert its content into the appropriate
standard heading; reference the existing Unreleased entries such as
`Result.Cost`, `pipeline.BudgetGuard`, `tracker.Config.Budget`, CLI flags on
`tracker run`, and pipeline events when deciding whether each item belongs under
Added vs Changed/Fixed.

In `@cmd/tracker/diagnose.go`:
- Around line 87-92: The budgetHalt struct currently lacks the
breached-dimension message so tracker diagnose drops the
EventBudgetExceeded.Message; add a Reason string field to budgetHalt, populate
it from the EventBudgetExceeded.Message wherever budgetHalt instances are
created (the branches around the existing EventBudgetExceeded handling noted in
the diff), and update printBudgetHalt to print "reason: <Reason>" only when
Reason is non-empty so the diagnose output shows whether tokens, cost, or wall
time triggered the halt.

In `@cmd/tracker/flags.go`:
- Around line 135-137: parseRunFlags must reject negative values for the three
budget flags: cfg.maxTokens, cfg.maxCostCents, and cfg.maxWallTime; add
validation in the parseRunFlags function after flag parsing that checks
cfg.maxTokens >= 0, cfg.maxCostCents >= 0 and cfg.maxWallTime >= 0 (or >0? keep
as >= 0 per spec) and return a descriptive error if any are negative so the CLI
fails fast with a clear message. Ensure the error mentions which flag is invalid
(e.g., "max-tokens must be >= 0") and hook this validation into the existing
error path of parseRunFlags.

In `@docs/superpowers/plans/2026-04-14-token-cost-governance.md`:
- Around line 7-8: The plan incorrectly schedules reading budget attributes from
`.dip` workflow attrs and assigns adapter pass-through / graph-attr fallback
work even though that capability is blocked by issue `#67`; update the doc (Part 3
and any sections referencing max_total_tokens, max_cost_cents, max_wall_time,
BudgetGuard, and the adapter pass-through/fallback work) to remove or mark as
deferred the `.dip` budget-attrs scheduling, stop assigning follow-up work that
depends on `.dip` IR support, and explicitly state that graph-level budget
loading is blocked by issue `#67` so the next worker is not sent prematurely.

In `@llm/token_tracker_cost.go`:
- Around line 21-33: The read lock is held while calling external callbacks
(resolve and EstimateCost); instead snapshot t.usage into a local map under
t.mu.RLock()/RUnlock(), then release the lock and iterate the snapshot to build
the out map, calling resolve(provider) and EstimateCost(model, usage) outside
the lock; update references to t.mu, t.usage, resolve, ProviderCost and
EstimateCost accordingly so only the minimal copy is done while locked and all
resolver/cost work happens after releasing the lock.

In `@pipeline/budget.go`:
- Around line 73-99: The current logic in BudgetGuard.Check and checkUsage only
returns the first BudgetBreach it encounters, so multiple simultaneous limits
(tokens, cost, wall time) are never reported; update these functions to evaluate
all limit checks and accumulate every violated dimension into the returned
breach (e.g., populate BudgetBreach.LimitsHit or create a slice of breaches)
instead of early-returning on the first match—specifically, change checkUsage
(referencing g.limits.MaxTotalTokens, g.limits.MaxCostCents and
usage.TotalTokens/usage.TotalCostUSD) to append both token and cost violations,
and change Check (referencing g.limits.MaxWallTime and time.Since(started)) to
also include wall-time violations into the same returned BudgetBreach so the
caller receives all exceeded limits at once.

In `@pipeline/engine.go`:
- Around line 310-313: The retry and fallback/redirect paths (specifically in
handleRetryWithinBudget, handleRetryExhausted, and the retry/fallback branches
at the top of handleExitNode) append trace usage and continue without calling
emitCostUpdate and checkBudgetAfterEmit, letting retries bypass BudgetGuard
ceilings (MaxTotalTokens/MaxCostCents); fix by applying the same pattern used on
the normal edge-advance path: after appending trace usage in those branches call
e.emitCostUpdate(s) and then if lr := e.checkBudgetAfterEmit(s); lr != nil {
return *lr } so the BudgetGuard enforcement runs consistently across
retry/fallback/redirect flows.

In `@README.md`:
- Around line 397-404: Replace the incorrect "tracker run ..." example with the
documented root-command syntax so copy-paste works; update the snippet to use
"tracker [flags] <pipeline>" by moving flags before the pipeline path (e.g.,
change "tracker run examples/ask_and_execute.dip --max-tokens 100000 --max-cost
500 --max-wall-time 30m" to "tracker --max-tokens 100000 --max-cost 500
--max-wall-time 30m examples/ask_and_execute.dip"), ensuring references to the
flags (--max-tokens, --max-cost, --max-wall-time) and the pipeline file
(examples/ask_and_execute.dip) remain.

In `@tracker.go`:
- Around line 439-456: populateResultTokensAndCost currently re-computes cost
via tokenTracker and defaultModelResolver which misestimates when per-node/model
usage is recorded in engineResult. Instead, change populateResultTokensAndCost
to prefer engineResult.Usage when present: if engineResult != nil and
engineResult.Usage has provider entries, build Result.Cost from that usage
(aggregate TotalUSD and set ByProvider from engineResult.Usage) and skip the
tokenTracker cost path and defaultModelResolver; otherwise fall back to the
existing tokenTracker logic (keep result.TokensByProvider =
e.tokenTracker.AllProviderUsage() behavior). Use the existing CostReport struct
shape and fields (TotalUSD, ByProvider) when constructing the cost from
pipeline.EngineResult.Usage.

---

Nitpick comments:
In `@cmd/tracker/run.go`:
- Around line 129-131: Both engine-builder code paths duplicate the same
budget-guard wiring (calling pipeline.NewBudgetGuard and conditionally appending
pipeline.WithBudgetGuard); extract that logic into a small helper (e.g.,
buildBudgetGuardOpts or appendBudgetGuardIfAny) and call it from both engine
builders so the guard creation and conditional append happen in one place,
referencing pipeline.NewBudgetGuard and pipeline.WithBudgetGuard to locate the
duplicated block and update both sites to use the helper.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fe5ee644-f62a-471f-b96f-6a2fe6b3f4d9

📥 Commits

Reviewing files that changed from the base of the PR and between 4f8382e and 74334f7.

📒 Files selected for processing (29)

CHANGELOG.md
CLAUDE.md
README.md
agent/result.go
agent/session.go
agent/session_test.go
cmd/tracker/commands.go
cmd/tracker/diagnose.go
cmd/tracker/flags.go
cmd/tracker/main.go
cmd/tracker/run.go
cmd/tracker/summary.go
docs/superpowers/plans/2026-04-14-token-cost-governance.md
llm/token_tracker_cost.go
llm/token_tracker_cost_test.go
pipeline/budget.go
pipeline/budget_test.go
pipeline/engine.go
pipeline/engine_run.go
pipeline/engine_test.go
pipeline/events.go
pipeline/events_jsonl.go
pipeline/events_jsonl_test.go
pipeline/handlers/transcript.go
pipeline/handlers/transcript_test.go
pipeline/trace.go
pipeline/trace_test.go
tracker.go
tracker_test.go

coderabbitai · 2026-04-14T21:11:50Z

+## [Unreleased]
+
+### Added
+
+- `Result.Cost` on the library API with per-provider rollup (`map[string]llm.ProviderCost`) and `TotalUSD`. Populated from the `llm.TokenTracker` middleware and priced via `llm.EstimateCost`. Closes #62.
+- `pipeline.BudgetGuard` enforcing `MaxTotalTokens`, `MaxCostCents`, and `MaxWallTime` limits. Halts the run with `pipeline.OutcomeBudgetExceeded` when any dimension trips. Closes #17.
+- New `tracker.Config.Budget` field (type `pipeline.BudgetLimits`) for library consumers.
+- New CLI flags on `tracker run`: `--max-tokens`, `--max-cost` (cents), `--max-wall-time`.
+- New pipeline events `cost_updated` (streaming per-node cost snapshots) and `budget_exceeded` (fired on halt). Both carry a `CostSnapshot` payload with `TotalTokens`, `TotalCostUSD`, `ProviderTotals`, and `WallElapsed`.
+- `tracker diagnose` surfaces a "Budget halt detected" section when a run halts on budget.
+- `UsageSummary.ProviderTotals` (per-provider token and cost rollup) on `pipeline.Trace.AggregateUsage()` output.
+
+### Notes
+
+- Reading budget limits from `.dip` workflow attrs is blocked on dippin-lang IR support; tracked in #67.


⚠️ Potential issue | 🟡 Minor

The Unreleased entry is still missing the Fixed bucket.

This PR includes behavior fixes as well as features, but the changelog only records Added items and introduces a custom Notes section. Please fold the #67 caveat into one of the standard headings and add the shipped fixes here. As per coding guidelines Keep CHANGELOG.md updated with every feature, fix, and breaking change in the same PR as the code change and Use Keep a Changelog format in CHANGELOG.md — group entries under Added, Changed, Fixed, Removed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` around lines 8 - 22, Update the CHANGELOG.md Unreleased section to include a "### Fixed" subsection and move the existing non-feature caveat into a standard heading (e.g., under "### Notes" should instead be folded into "### Changed" or "### Fixed" as appropriate); specifically add one or more brief bullet(s) under "### Fixed" describing the behavioral fixes shipped in this PR, ensure the file follows the "Keep a Changelog" headings order (Added, Changed, Fixed, Removed) and remove the custom "Notes" header or convert its content into the appropriate standard heading; reference the existing Unreleased entries such as `Result.Cost`, `pipeline.BudgetGuard`, `tracker.Config.Budget`, CLI flags on `tracker run`, and pipeline events when deciding whether each item belongs under Added vs Changed/Fixed.

This PR is all new functionality, not fixes of pre-existing bugs. The rounding fix (6b66072) and review-round fix (a1939ab) are corrections to code introduced in this same PR, not regression fixes. No Fixed bucket needed.

@clintecker, that's a reasonable distinction — intra-PR corrections to newly introduced code are not the same as regression fixes against previously-released behavior, so omitting a Fixed bucket makes sense here. I'll drop that concern.

🧠 Learnings used

Learnt from: CR Repo: 2389-research/tracker PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-14T17:00:24.914Z Learning: `tracker version` shows commit hash, build time, and which providers are configured. Uses Go VCS metadata for `go install` builds, GoReleaser ldflags for releases.

Learnt from: CR Repo: 2389-research/tracker PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-14T17:00:24.914Z Learning: Applies to CHANGELOG.md : Keep CHANGELOG.md updated with every feature, fix, and breaking change using Keep a Changelog format. Group entries under Added, Changed, Fixed, Removed. Update the changelog in the same PR as the code change.

Learnt from: clintecker Repo: 2389-research/tracker PR: 68 File: pipeline/budget.go:73-99 Timestamp: 2026-04-15T16:41:59.198Z Learning: In `pipeline/budget.go`, `BudgetGuard.Check` and `checkUsage` intentionally short-circuit on the first breached budget dimension (tokens → cost → wall_time). This is by design: the pipeline halts immediately on the first breach, so collecting all simultaneously-violated dimensions would add complexity for a scenario that rarely matters in practice. The maintainer is open to revisiting if users report confusion.

coderabbitai · 2026-04-14T21:11:50Z

+	fs.IntVar(&cfg.maxTokens, "max-tokens", 0, "Halt if total tokens across the run exceed this value (0 = no limit)")
+	fs.IntVar(&cfg.maxCostCents, "max-cost", 0, "Halt if total cost in cents exceeds this value (0 = no limit)")
+	fs.DurationVar(&cfg.maxWallTime, "max-wall-time", 0, "Halt if pipeline wall time exceeds this duration (0 = no limit)")


⚠️ Potential issue | 🟡 Minor

Reject negative budget limits during CLI parsing.

Line 135–137 introduces limits but allows negative values, which can lead to confusing behavior (e.g., immediate budget halt). Please validate all three flags as >= 0 in parseRunFlags.

🔧 Proposed fix

func parseRunFlags(args []string, cfg runConfig) (runConfig, error) { @@ if err := validateBackend(cfg.backend); err != nil { return cfg, err } + if cfg.maxTokens < 0 { + return cfg, fmt.Errorf("invalid --max-tokens: must be >= 0") + } + if cfg.maxCostCents < 0 { + return cfg, fmt.Errorf("invalid --max-cost: must be >= 0") + } + if cfg.maxWallTime < 0 { + return cfg, fmt.Errorf("invalid --max-wall-time: must be >= 0") + } return cfg, nil }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fs.IntVar(&cfg.maxTokens, "max-tokens", 0, "Halt if total tokens across the run exceed this value (0 = no limit)")

fs.IntVar(&cfg.maxCostCents, "max-cost", 0, "Halt if total cost in cents exceeds this value (0 = no limit)")

fs.DurationVar(&cfg.maxWallTime, "max-wall-time", 0, "Halt if pipeline wall time exceeds this duration (0 = no limit)")

func parseRunFlags(args []string, cfg runConfig) (runConfig, error) {

if err := validateBackend(cfg.backend); err != nil {

return cfg, err

}

if cfg.maxTokens < 0 {

return cfg, fmt.Errorf("invalid --max-tokens: must be >= 0")

}

if cfg.maxCostCents < 0 {

return cfg, fmt.Errorf("invalid --max-cost: must be >= 0")

}

if cfg.maxWallTime < 0 {

return cfg, fmt.Errorf("invalid --max-wall-time: must be >= 0")

}

return cfg, nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/flags.go` around lines 135 - 137, parseRunFlags must reject negative values for the three budget flags: cfg.maxTokens, cfg.maxCostCents, and cfg.maxWallTime; add validation in the parseRunFlags function after flag parsing that checks cfg.maxTokens >= 0, cfg.maxCostCents >= 0 and cfg.maxWallTime >= 0 (or >0? keep as >= 0 per spec) and return a descriptive error if any are negative so the CLI fails fast with a clear message. Ensure the error mentions which flag is invalid (e.g., "max-tokens must be >= 0") and hook this validation into the existing error path of parseRunFlags.

Fixed in a1939ab. parseRunFlags now validates non-negative values.

@clintecker, thanks for the fix! The validateBudgetLimits check in parseRunFlags correctly rejects negative values for all three budget flags with a clear error message.

coderabbitai · 2026-04-14T21:11:51Z

+**Architecture:** Part 1 promotes `llm.TokenTracker` from a CLI-summary helper to a first-class library output by surfacing its per-provider totals on `Result` / `EngineResult`, pricing each provider via `llm.EstimateCost`, and emitting a new `EventCostUpdated` event after every trace entry so consumers see streaming updates. Part 2 adds a `BudgetGuard` evaluated inside the engine loop after each node's outcome is applied; on breach it writes a terminal trace entry, emits `EventBudgetExceeded`, and returns an `EngineResult` with status `OutcomeBudgetExceeded`. Part 3 wires graph attrs (`max_total_tokens`, `max_cost_cents`, `max_wall_time`) and CLI flags that override them.
+


⚠️ Potential issue | 🟡 Minor

Scope drift: this plan still schedules .dip budget attrs as if they shipped here.

The rest of this PR explicitly treats graph-level budget loading as blocked by #67, but this plan still assigns adapter pass-through and graph-attr fallback work. That will send the next worker after unsupported behavior instead of the intended follow-up. Based on learnings Reading budgets from .dipworkflow attrs is blocked on dippin-lang IR support (issue#67).

Also applies to: 30-31, 773-927

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-14-token-cost-governance.md` around lines 7 - 8, The plan incorrectly schedules reading budget attributes from `.dip` workflow attrs and assigns adapter pass-through / graph-attr fallback work even though that capability is blocked by issue `#67`; update the doc (Part 3 and any sections referencing max_total_tokens, max_cost_cents, max_wall_time, BudgetGuard, and the adapter pass-through/fallback work) to remove or mark as deferred the `.dip` budget-attrs scheduling, stop assigning follow-up work that depends on `.dip` IR support, and explicitly state that graph-level budget loading is blocked by issue `#67` so the next worker is not sent prematurely.

Fair point — the plan is a historical artifact that predates the scope reduction. Tasks 6/7 were adjusted during execution. Adding a note to the plan doc would be revisionism; it serves as a record of the original intent. #67 tracks the follow-up.

@clintecker, that's a fair framing — keeping the plan as an unmodified record of original intent makes sense, and #67 is the right place to track the deferred .dip attrs work.

(∩｀-´)⊃━☆ﾟ.*･｡ﾟ

🧠 Learnings used

Learnt from: CR Repo: 2389-research/tracker PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-14T17:00:24.914Z Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.

coderabbitai · 2026-04-14T21:11:51Z

+	t.mu.RLock()
+	defer t.mu.RUnlock()
+	out := make(map[string]ProviderCost, len(t.usage))
+	for provider, usage := range t.usage {
+		var model string
+		if resolve != nil {
+			model = resolve(provider)
+		}
+		out[provider] = ProviderCost{
+			Usage: usage,
+			Model: model,
+			USD:   EstimateCost(model, usage),
+		}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and read the actual file to confirm the code snippet find . -name "token_tracker_cost.go" -type f

Repository: 2389-research/tracker

Length of output: 92

🏁 Script executed:

# Read the relevant file section if [ -f "./llm/token_tracker_cost.go" ]; then cat -n ./llm/token_tracker_cost.go | head -60 fi

Repository: 2389-research/tracker

Length of output: 1837

🏁 Script executed:

# Search for ModelResolver type definition rg "type ModelResolver" --type go

Repository: 2389-research/tracker

Length of output: 138

🏁 Script executed:

# Search for CostByProvider calls to understand usage patterns rg "CostByProvider" --type go -B 2 -A 2

Repository: 2389-research/tracker

Length of output: 3777

🏁 Script executed:

# Check TokenTracker type and its methods rg "type TokenTracker" --type go -A 20

Repository: 2389-research/tracker

Length of output: 1170

🏁 Script executed:

# Search for potential re-entrance paths - methods called on TokenTracker during cost operations rg "func $t \*TokenTracker$" --type go

Repository: 2389-research/tracker

Length of output: 751

🏁 Script executed:

# Check what defaultModelResolver does rg "defaultModelResolver" --type go -A 10

Repository: 2389-research/tracker

Length of output: 1286

🏁 Script executed:

# Look for any ModelResolver implementations that might call back into TokenTracker rg "ModelResolver" --type go -B 3 -A 5

Repository: 2389-research/tracker

Length of output: 2802

🏁 Script executed:

# Check if there are any concurrent writes to TokenTracker during CostByProvider calls rg "AddUsage\|ProviderUsage\|TotalUsage" --type go -B 2 -A 2 | head -100

Repository: 2389-research/tracker

Length of output: 47

🏁 Script executed:

# Get full context of tracker.go usage around CostByProvider rg -B 5 -A 5 "CostByProvider$resolver$" tracker.go

Repository: 2389-research/tracker

Length of output: 356

🏁 Script executed:

# Check EstimateCost implementation to see if it could be slow or blocking rg "func EstimateCost" --type go -A 15

Repository: 2389-research/tracker

Length of output: 463

🏁 Script executed:

# Check if there are any tests that show CostByProvider being called concurrently with AddUsage rg "TestTokenTracker" --type go | head -20

Repository: 2389-research/tracker

Length of output: 1077

🏁 Script executed:

# Read the full token_tracker.go to understand AddUsage cat -n ./llm/token_tracker.go | head -80

Repository: 2389-research/tracker

Length of output: 2747

🏁 Script executed:

# Check the concurrent test to understand typical load patterns cat -n ./llm/token_tracker_test.go | grep -A 30 "TestTokenTrackerConcurrentSafe"

Repository: 2389-research/tracker

Length of output: 1148

🏁 Script executed:

# Verify that defaultModelResolver doesn't call back into TokenTracker or do blocking I/O rg "Graph\|Attrs" tracker.go | head -20

Repository: 2389-research/tracker

Length of output: 47

Hold the read lock only while copying t.usage; release before calling resolve and EstimateCost.

While the current ModelResolver implementation (defaultModelResolver) is trivial and safe, holding a lock during external callbacks is not a defensive pattern. If the resolver logic ever becomes complex or calls back into the tracker, this could cause contention or re-entrancy issues. Snapshot the usage map under lock, then release before computing costs.

Suggested fix

func (t *TokenTracker) CostByProvider(resolve ModelResolver) map[string]ProviderCost { - t.mu.RLock() - defer t.mu.RUnlock() - out := make(map[string]ProviderCost, len(t.usage)) - for provider, usage := range t.usage { + t.mu.RLock() + usageByProvider := make(map[string]Usage, len(t.usage)) + for provider, usage := range t.usage { + usageByProvider[provider] = usage + } + t.mu.RUnlock() + + out := make(map[string]ProviderCost, len(usageByProvider)) + for provider, usage := range usageByProvider { var model string if resolve != nil { model = resolve(provider) } out[provider] = ProviderCost{ Usage: usage, Model: model, USD: EstimateCost(model, usage), } } return out }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

t.mu.RLock()

defer t.mu.RUnlock()

out := make(map[string]ProviderCost, len(t.usage))

for provider, usage := range t.usage {

var model string

if resolve != nil {

model = resolve(provider)

}

out[provider] = ProviderCost{

Usage: usage,

Model: model,

USD: EstimateCost(model, usage),

}

func (t *TokenTracker) CostByProvider(resolve ModelResolver) map[string]ProviderCost {

t.mu.RLock()

usageByProvider := make(map[string]Usage, len(t.usage))

for provider, usage := range t.usage {

usageByProvider[provider] = usage

}

t.mu.RUnlock()

out := make(map[string]ProviderCost, len(usageByProvider))

for provider, usage := range usageByProvider {

var model string

if resolve != nil {

model = resolve(provider)

}

out[provider] = ProviderCost{

Usage: usage,

Model: model,

USD: EstimateCost(model, usage),

}

}

return out

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@llm/token_tracker_cost.go` around lines 21 - 33, The read lock is held while calling external callbacks (resolve and EstimateCost); instead snapshot t.usage into a local map under t.mu.RLock()/RUnlock(), then release the lock and iterate the snapshot to build the out map, calling resolve(provider) and EstimateCost(model, usage) outside the lock; update references to t.mu, t.usage, resolve, ProviderCost and EstimateCost accordingly so only the minimal copy is done while locked and all resolver/cost work happens after releasing the lock.

Acknowledged — the implementation matches the spec and all tests pass.

@clintecker, understood! The current implementation is straightforward and the lock-scope concern is preventive rather than an immediate bug. I'll defer to your judgment here.

(¬､¬)

coderabbitai · 2026-04-14T21:11:51Z

+func (g *BudgetGuard) Check(usage *UsageSummary, started time.Time) BudgetBreach {
+	if g == nil {
+		return BudgetBreach{Kind: BudgetOK}
+	}
+	if breach := g.checkUsage(usage); breach.Kind != BudgetOK {
+		return breach
+	}
+	if g.limits.MaxWallTime > 0 && time.Since(started) > g.limits.MaxWallTime {
+		return BudgetBreach{Kind: BudgetWallTime, Message: "max_wall_time exceeded"}
+	}
+	return BudgetBreach{Kind: BudgetOK}
+}
+
+// checkUsage evaluates token and cost limits against a usage snapshot.
+// Returns BudgetOK when usage is nil or no limit is breached.
+func (g *BudgetGuard) checkUsage(usage *UsageSummary) BudgetBreach {
+	if usage == nil {
+		return BudgetBreach{Kind: BudgetOK}
+	}
+	if g.limits.MaxTotalTokens > 0 && usage.TotalTokens > g.limits.MaxTotalTokens {
+		return BudgetBreach{Kind: BudgetTokens, Message: "max_total_tokens exceeded"}
+	}
+	if g.limits.MaxCostCents > 0 && int(math.Round(usage.TotalCostUSD*100)) > g.limits.MaxCostCents {
+		return BudgetBreach{Kind: BudgetCost, Message: "max_cost_cents exceeded"}
+	}
+	return BudgetBreach{Kind: BudgetOK}
+}


⚠️ Potential issue | 🟠 Major

Only the first exceeded budget dimension is ever surfaced.

Check returns a single BudgetBreach, and checkUsage stops at the first match. In practice BudgetLimitsHit can never contain more than one entry, even when a run exceeds tokens and cost (or cost and wall time) at the same halt point. That loses part of the halt reason the new API is meant to expose.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pipeline/budget.go` around lines 73 - 99, The current logic in BudgetGuard.Check and checkUsage only returns the first BudgetBreach it encounters, so multiple simultaneous limits (tokens, cost, wall time) are never reported; update these functions to evaluate all limit checks and accumulate every violated dimension into the returned breach (e.g., populate BudgetBreach.LimitsHit or create a slice of breaches) instead of early-returning on the first match—specifically, change checkUsage (referencing g.limits.MaxTotalTokens, g.limits.MaxCostCents and usage.TotalTokens/usage.TotalCostUSD) to append both token and cost violations, and change Check (referencing g.limits.MaxWallTime and time.Since(started)) to also include wall-time violations into the same returned BudgetBreach so the caller receives all exceeded limits at once.

By design — Check short-circuits on the first breached dimension because the run halts immediately. Surfacing all breached dimensions would require a collect-all-then-halt pattern, which adds complexity for a scenario that rarely matters in practice (the user raises the limit that fired). Happy to revisit if users report confusion.

@clintecker, understood — that's a reasonable trade-off. Since the pipeline halts at the first breach, collecting additional violated dimensions at that same instant would require a fundamentally different control flow with no practical benefit until a user actually hits it. I'll note this for future reviews.

✏️ Learnings added

Learnt from: clintecker Repo: 2389-research/tracker PR: 68 File: pipeline/budget.go:73-99 Timestamp: 2026-04-15T16:41:59.198Z Learning: In `pipeline/budget.go`, `BudgetGuard.Check` and `checkUsage` intentionally short-circuit on the first breached budget dimension (tokens → cost → wall_time). This is by design: the pipeline halts immediately on the first breach, so collecting all simultaneously-violated dimensions would add complexity for a scenario that rarely matters in practice. The maintainer is open to revisiting if users report confusion.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: CR Repo: 2389-research/tracker PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-14T17:00:24.914Z Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.

coderabbitai · 2026-04-14T21:11:51Z

+	e.emitCostUpdate(s)
+	if lr := e.checkBudgetAfterEmit(s); lr != nil {
+		return *lr
+	}


⚠️ Potential issue | 🔴 Critical

Budget ceilings are bypassed on retry and goal-gate redirect paths.

This check only runs on the normal edge-advance path. The retry flows in handleRetryWithinBudget / handleRetryExhausted and the retry/fallback branches at the top of handleExitNode also append trace usage and continue, but they never call emitCostUpdate or checkBudgetAfterEmit. A node that keeps returning OutcomeRetry can therefore spend past MaxTotalTokens / MaxCostCents before the breaker ever trips.

💸 Suggested fix

func (e *Engine) handleRetryWithinBudget(ctx context.Context, s *runState, currentNodeID string, execNode *Node, traceEntry *TraceEntry) (string, bool, *EngineResult, error) { ... traceEntry.EdgeTo = target s.trace.AddEntry(*traceEntry) + e.emitCostUpdate(s) + if lr := e.checkBudgetAfterEmit(s); lr != nil { + return "", false, lr.result, lr.err + } e.clearDownstream(target, s.cp) ... }

Apply the same pattern to the fallback / redirect branches that add a trace entry and continue execution.

As per coding guidelines "Pipeline engine enforces optional cost ceilings via pipeline.BudgetGuard, evaluated between nodes after emitCostUpdate (v0.17.0+)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pipeline/engine.go` around lines 310 - 313, The retry and fallback/redirect paths (specifically in handleRetryWithinBudget, handleRetryExhausted, and the retry/fallback branches at the top of handleExitNode) append trace usage and continue without calling emitCostUpdate and checkBudgetAfterEmit, letting retries bypass BudgetGuard ceilings (MaxTotalTokens/MaxCostCents); fix by applying the same pattern used on the normal edge-advance path: after appending trace usage in those branches call e.emitCostUpdate(s) and then if lr := e.checkBudgetAfterEmit(s); lr != nil { return *lr } so the BudgetGuard enforcement runs consistently across retry/fallback/redirect flows.

Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash git show a1939ab --stat git show a1939ab -- pipeline/engine_run.go pipeline/engine.go

Length of output: 2767

@clintecker, confirmed in a1939ab — the fix looks correct:

handleRetryWithinBudget: emitCostUpdate + checkBudgetAfterEmit are now called immediately after s.trace.AddEntry, matching the pattern on the normal advance path. Retries can no longer silently overspend past MaxTotalTokens/MaxCostCents.

haltForBudget: saveCheckpoint is called first, so a budget-halted run can safely resume from its last completed node without re-executing already-finished work (relevant given the global restart budget described in the checkpoint docs).

🧠 Learnings used

Learnt from: CR Repo: 2389-research/tracker PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-14T17:00:24.914Z Learning: Checkpoints store completed nodes and context snapshots with deterministic edge selection per node for replay. The restart counter is global across the entire run — a fix loop on milestone 1 consumes restart budget that milestone 10 needs.

coderabbitai · 2026-04-14T21:11:51Z

+func (e *Engine) populateResultTokensAndCost(result *Result, engineResult *pipeline.EngineResult) {
+	if e.tokenTracker == nil {
+		return
+	}
+	result.TokensByProvider = e.tokenTracker.AllProviderUsage()
+	resolver := e.defaultModelResolver()
+	byProvider := e.tokenTracker.CostByProvider(resolver)
+	if len(byProvider) > 0 {
+		total := 0.0
+		for _, pc := range byProvider {
+			total += pc.USD
+		}
+		result.Cost = &CostReport{
+			TotalUSD:   total,
+			ByProvider: byProvider,
+		}
+	}
+}


⚠️ Potential issue | 🟠 Major

Result.Cost is computed from a model guess instead of the executed usage.

This re-estimates spend from TokenTracker using one graph-level llm_model for every provider. That makes Result.Cost wrong or empty whenever the run uses per-node model overrides, multiple models on the same provider, or a backend/path that records usage in EngineResult.Usage without going through the middleware tracker. The trace already has aggregate provider totals with cost; Result.Cost should prefer engineResult.Usage when it's available.

Based on learnings Token usage flows through llm.Usage → agent.SessionResult.Usage → pipeline.SessionStats → EngineResult.Usage, and CLI summary treats llm.TokenTracker and EngineResult.Usage as independent data sources.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tracker.go` around lines 439 - 456, populateResultTokensAndCost currently re-computes cost via tokenTracker and defaultModelResolver which misestimates when per-node/model usage is recorded in engineResult. Instead, change populateResultTokensAndCost to prefer engineResult.Usage when present: if engineResult != nil and engineResult.Usage has provider entries, build Result.Cost from that usage (aggregate TotalUSD and set ByProvider from engineResult.Usage) and skip the tokenTracker cost path and defaultModelResolver; otherwise fall back to the existing tokenTracker logic (keep result.TokensByProvider = e.tokenTracker.AllProviderUsage() behavior). Use the existing CostReport struct shape and fields (TotalUSD, ByProvider) when constructing the cost from pipeline.EngineResult.Usage.

Known limitation, documented in code comments on defaultModelResolver. Per-node model tracking requires plumbing through the trace that doesn't exist yet. Follow-up — the existing design gives accurate numbers for single-model pipelines and zero (not wrong) for unknown models.

@clintecker, understood — thanks for the context. The "zero for unknown models rather than a wrong estimate" contract is a reasonable defensive choice, and the inline doc on defaultModelResolver already captures the per-node limitation. I'll note this for future reviews.

✏️ Learnings added

Learnt from: clintecker Repo: 2389-research/tracker PR: 68 File: tracker.go:439-456 Timestamp: 2026-04-15T16:42:00.353Z Learning: In `tracker.go`, `defaultModelResolver` intentionally uses a single graph-level `llm_model` attr for all providers. It returns zero USD (not an incorrect estimate) for unknown/multi-model scenarios. Per-node model tracking in `Result.Cost` is a known future work item requiring trace plumbing that does not yet exist. Do not flag this as a bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: CR Repo: 2389-research/tracker PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-14T17:00:24.914Z Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.

Copilot

Pull request overview

Implements token/cost governance across the tracker library + pipeline engine by surfacing per-provider cost in library results, streaming cost snapshots as events, and enforcing optional token/cost/wall-time ceilings that halt execution.

Changes:

Exposes Result.Cost (total USD + per-provider breakdown) and propagates budget halt reasons back to library callers.
Adds pipeline.BudgetGuard + OutcomeBudgetExceeded, plus cost_updated/budget_exceeded events carrying CostSnapshot and JSONL logging support.
Wires budget limits through CLI flags and surfaces budget halts in CLI summary and tracker diagnose.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`tracker.go`	Adds `Config.Budget`, `Result.Cost`, and populates tokens/cost + halt reasons from engine result.
`tracker_test.go`	Adds integration tests for `Result.Cost` population and config-driven budget halts.
`pipeline/budget.go` / `pipeline/budget_test.go`	Introduces `BudgetLimits`/`BudgetGuard` with unit tests for ceilings and boundary rounding.
`pipeline/engine.go`	Adds budget guard plumbing, `OutcomeBudgetExceeded`, `BudgetLimitsHit`, and `Engine.Graph()` accessor.
`pipeline/engine_run.go`	Emits streaming cost snapshots and halts runs on budget breach (emits `EventBudgetExceeded`).
`pipeline/engine_test.go`	Adds integration tests for streaming cost events and budget halting behavior.
`pipeline/events.go`	Adds new event types and `CostSnapshot` payload on `PipelineEvent`.
`pipeline/events_jsonl.go` / `pipeline/events_jsonl_test.go`	Logs `CostSnapshot` fields into `activity.jsonl` and tests the JSONL entry build.
`pipeline/trace.go` / `pipeline/trace_test.go`	Adds `SessionStats.Provider` and aggregates per-provider totals into `UsageSummary.ProviderTotals`.
`pipeline/handlers/transcript.go` / `pipeline/handlers/transcript_test.go`	Populates provider into session stats emitted into the trace.
`llm/token_tracker_cost.go` / `llm/token_tracker_cost_test.go`	Adds per-provider cost rollup on `TokenTracker` via model resolver + `EstimateCost`.
`cmd/tracker/flags.go` / `cmd/tracker/main.go`	Adds `--max-tokens`, `--max-cost`, `--max-wall-time` flags to run config.
`cmd/tracker/commands.go`	Plumbs parsed CLI budget limits into the run execution path.
`cmd/tracker/run.go`	Attaches `BudgetGuard` to both non-TUI and TUI engine construction via active budget limits.
`cmd/tracker/summary.go`	Prints a budget-halt banner when `OutcomeBudgetExceeded` is returned.
`cmd/tracker/diagnose.go`	Detects `budget_exceeded` in `activity.jsonl` and prints a budget-halt section.
`agent/result.go` / `agent/session.go` / `agent/session_test.go`	Adds provider field to `SessionResult` and ensures it’s populated.
`README.md`	Documents cost governance usage for library, CLI, and event streaming consumers.
`CHANGELOG.md`	Adds unreleased entries for cost exposure, budgets, events, CLI flags, and diagnose behavior.
`CLAUDE.md`	Documents the new cost governance flow and where budget enforcement happens.
`docs/superpowers/plans/2026-04-14-token-cost-governance.md`	Adds an implementation plan document for the feature set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T15:54:11Z

+func (e *Engine) emitCostUpdate(s *runState) {
+	summary := s.trace.AggregateUsage()
+	if summary == nil {
+		return
+	}
+	e.emit(PipelineEvent{
+		Type:      EventCostUpdated,
+		Timestamp: time.Now(),
+		RunID:     s.runID,
+		Cost: &CostSnapshot{
+			TotalTokens:    summary.TotalTokens,
+			TotalCostUSD:   summary.TotalCostUSD,
+			ProviderTotals: summary.ProviderTotals,
+			WallElapsed:    time.Since(s.trace.StartTime),
+		},
+	})


emitCostUpdate() and checkBudgetAfterEmit() both call trace.AggregateUsage(), which iterates over all trace entries. Since these are invoked after each completed node, this makes run-time cost/budget checking O(n^2) in number of nodes, and also duplicates work (AggregateUsage is computed at least twice per node, plus again in haltForBudget). Consider computing the UsageSummary once after AddEntry and reusing it for both the emitted CostSnapshot and the BudgetGuard check, or maintaining an incremental aggregate on runState/Trace that updates as entries are appended.

Valid perf concern for very long pipelines. Maintaining a running UsageSummary on runState (updated incrementally per node) would eliminate the repeated scan. Not blocking for this PR — typical pipelines are 10-30 nodes. Happy to optimize in a follow-up if profiling shows it matters.

Copilot · 2026-04-15T15:54:11Z

+// Graph returns the graph this engine executes. Used by library callers
+// that need to inspect graph attributes after construction.
+func (e *Engine) Graph() *Graph { return e.graph }


Graph() exposes the engine's internal *Graph pointer directly. Since Graph (and its Attrs/Nodes/Edges maps) is mutable, external callers could mutate it during/after construction and potentially introduce data races or break engine invariants. Consider returning a read-only snapshot (e.g., a copy of Attrs) or clearly preventing mutation by returning a deep copy / wrapper type, especially since Engine.Run is documented as safe for concurrent use in other places.

The graph is immutable by convention after construction — the engine and all handlers treat it as read-only. Adding a copy or interface wrapper adds allocation overhead on every call for no practical safety gain. If we ever need to enforce immutability, a ReadOnlyGraph wrapper is the right approach, but it's not warranted today.

…agnose dimension - BudgetGuard now fires after retry attempts, not just normal advance - haltForBudget saves checkpoint before returning so restarts are safe - diagnose extracts breach.Message from JSONL for specific guidance - CLI rejects negative budget limits - README uses correct CLI syntax (no 'run' subcommand) Refs #17 #62 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

README.md (1)

397-404: ⚠️ Potential issue | 🟡 Minor

Fix CLI example ordering to match tracker [flags] <pipeline>.

On Line 400, the pipeline path is placed before flags, but the documented command shape (Line 421) is flags first. This is the same copy-paste trap previously raised.

Suggested doc fix

-tracker examples/ask_and_execute.dip \
-    --max-tokens 100000 \
-    --max-cost 500 \
-    --max-wall-time 30m
+tracker --max-tokens 100000 \
+    --max-cost 500 \
+    --max-wall-time 30m \
+    examples/ask_and_execute.dip

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 397 - 404, The CLI example places the pipeline path
before flags which contradicts the documented usage "tracker [flags]
<pipeline>"; update the example so flags come before the pipeline path (e.g.,
"tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m
examples/ask_and_execute.dip") so the command shape for the `tracker` CLI and
the example ordering are consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 399-404: The fenced code block in README.md for the CLI snippet
(starting with "tracker examples/ask_and_execute.dip") lacks a language tag;
update the opening fence to include "bash" (i.e., ```bash) so markdownlint
recognizes it as a shell snippet and renders/validates correctly; optionally
also adopt the suggested formatting order in the comment but the key change is
adding the "bash" language identifier to the existing fenced block.

---

Duplicate comments:
In `@README.md`:
- Around line 397-404: The CLI example places the pipeline path before flags
which contradicts the documented usage "tracker [flags] <pipeline>"; update the
example so flags come before the pipeline path (e.g., "tracker --max-tokens
100000 --max-cost 500 --max-wall-time 30m examples/ask_and_execute.dip") so the
command shape for the `tracker` CLI and the example ordering are consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79f30480-06f9-4e10-b56d-1fad03edad7b

📥 Commits

Reviewing files that changed from the base of the PR and between 74334f7 and a1939ab.

📒 Files selected for processing (5)

README.md
cmd/tracker/diagnose.go
cmd/tracker/flags.go
pipeline/engine_run.go
pipeline/engine_test.go

✅ Files skipped from review due to trivial changes (1)

pipeline/engine_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

pipeline/engine_run.go
cmd/tracker/diagnose.go

Flags before pipeline path to match existing examples. Refs #17

coderabbitai

♻️ Duplicate comments (1)

README.md (1)
399-402: ⚠️ Potential issue | 🟡 Minor

Add the bash language identifier to the fenced code block.

This was already flagged in a previous review but remains unfixed. The fenced code block needs a language specifier for proper syntax highlighting and markdownlint compliance.
📝 Suggested fix
-```
+```bash
 tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m \
     examples/ask_and_execute.dip
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @README.md around lines 399 - 402, The fenced code block showing the tracker
command lacks a language specifier; update that Markdown block (the
triple-backtick block containing "tracker --max-tokens 100000 --max-cost 500
--max-wall-time 30m \ examples/ask_and_execute.dip") to include the "bash"
language identifier so it reads ```bash at the start of the block for proper
syntax highlighting and markdownlint compliance.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @README.md:

Around line 399-402: The fenced code block showing the tracker command lacks a
language specifier; update that Markdown block (the triple-backtick block
containing "tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m
examples/ask_and_execute.dip") to include the "bash" language identifier so it
reads ```bash at the start of the block for proper syntax highlighting and
markdownlint compliance.
</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `fc03ed77-660c-4601-8cea-c129e708fdd5`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between a1939ab3203ab02ac3161ab321c6b39e1fdc99d5 and 6345d72d8773c39e7b155aabd417a18c2e9ffa03.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `README.md`

</details>

</details>

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T18:07:58Z

+    },
+})
+if result.Status == pipeline.OutcomeBudgetExceeded {
+    log.Printf("halt: %s, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD)


The README code sample uses %s to print result.Cost.LimitsHit, but LimitsHit is a []string in the API, so this snippet won't compile if copied verbatim. Consider using %v or strings.Join(result.Cost.LimitsHit, ", ") to match the actual type.

Suggested change

log.Printf("halt: %s, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD)

log.Printf("halt: %v, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD)

Copilot · 2026-04-15T18:07:59Z

 	traceEntry.EdgeTo = target
 	s.trace.AddEntry(*traceEntry)
+	e.emitCostUpdate(s)
+	if lr := e.checkBudgetAfterEmit(s); lr != nil {
+		return "", false, lr.result, nil
+	}


Budget enforcement is applied on the retry path in handleRetryWithinBudget (emitCostUpdate + checkBudgetAfterEmit), but the retry-exhausted path (handleRetryExhausted) still appends the retry TraceEntry and then continues to fallback/fail without running the BudgetGuard. This can let a run continue (via fallback_retry_target) even after the last retry attempt pushed totals over MaxTotalTokens/MaxCostCents/MaxWallTime, and it will also skip emitting a final cost_updated snapshot for that attempt. Consider invoking emitCostUpdate(s) and checkBudgetAfterEmit(s) immediately after s.trace.AddEntry(*traceEntry) in handleRetryExhausted (both fallback and no-fallback branches), returning OutcomeBudgetExceeded when tripped.

Copilot · 2026-04-15T18:07:59Z

+// printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded.
+func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) {
+	fmt.Println()
+	fmt.Println("─── HALTED: budget exceeded ───────────────────────────────")
+	if len(result.BudgetLimitsHit) > 0 {
+		fmt.Printf("  reason: %s\n", strings.Join(result.BudgetLimitsHit, ", "))
+	}
+	if tracker != nil {
+		total := tracker.TotalUsage()
+		if total.InputTokens > 0 || total.OutputTokens > 0 {
+			totalToks := total.InputTokens + total.OutputTokens
+			fmt.Printf("  spent:  %s tokens", formatNumber(totalToks))
+			if total.EstimatedCost > 0 {
+				fmt.Printf(", $%.4f", total.EstimatedCost)
+			}
+			fmt.Println()
+		}


printBudgetHaltBanner calculates "spent" from the passed TokenTracker only. This will under-report (or omit) spend when TokenTracker isn't populated (e.g., claude-code/acp backends or any case where usage is only available via trace stats), even though EngineResult.Usage/Trace.AggregateUsage contains the authoritative totals used by BudgetGuard. Consider rendering tokens/cost from result.Usage (or result.Trace.AggregateUsage()) as the primary source, and only using TokenTracker as a fallback.

Suggested change

// printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded.

func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) {

fmt.Println()

fmt.Println("─── HALTED: budget exceeded ───────────────────────────────")

if len(result.BudgetLimitsHit) > 0 {

fmt.Printf(" reason: %s\n", strings.Join(result.BudgetLimitsHit, ", "))

}

if tracker != nil {

total := tracker.TotalUsage()

if total.InputTokens > 0 || total.OutputTokens > 0 {

totalToks := total.InputTokens + total.OutputTokens

fmt.Printf(" spent: %s tokens", formatNumber(totalToks))

if total.EstimatedCost > 0 {

fmt.Printf(", $%.4f", total.EstimatedCost)

}

fmt.Println()

}

// budgetUsageForDisplay returns the best available aggregate usage for summary output.

// Prefer usage captured on the EngineResult/trace, and only fall back to TokenTracker.

func budgetUsageForDisplay(result *pipeline.EngineResult, tracker *llm.TokenTracker) (llm.TokenUsage, bool) {

if result != nil {

total := result.Usage

if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 {

return total, true

}

total = result.Trace.AggregateUsage()

if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 {

return total, true

}

}

if tracker != nil {

total := tracker.TotalUsage()

if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 {

return total, true

}

}

return llm.TokenUsage{}, false

}

// printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded.

func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) {

fmt.Println()

fmt.Println("─── HALTED: budget exceeded ───────────────────────────────")

if result != nil && len(result.BudgetLimitsHit) > 0 {

fmt.Printf(" reason: %s\n", strings.Join(result.BudgetLimitsHit, ", "))

}

if total, ok := budgetUsageForDisplay(result, tracker); ok {

totalToks := total.InputTokens + total.OutputTokens

fmt.Printf(" spent: %s tokens", formatNumber(totalToks))

if total.EstimatedCost > 0 {

fmt.Printf(", $%.4f", total.EstimatedCost)

}

fmt.Println()

Copilot · 2026-04-15T18:07:59Z

+	fmt.Fprintln(w, "  suggestion:   raise the relevant --max-tokens, --max-cost, or --max-wall-time flag,")
+	fmt.Fprintln(w, "                or remove the Config.Budget value in your pipeline configuration")


The budget-halt suggestion text is a bit misleading for CLI users and graph-attr based budgets: it says to "remove the Config.Budget value", but budget limits can also come from --max-* flags (and may later come from workflow attrs). Consider rephrasing to something like "remove or raise the configured budget limits" and mention the relevant source(s) (flags vs library Config).

Suggested change

fmt.Fprintln(w, " suggestion: raise the relevant --max-tokens, --max-cost, or --max-wall-time flag,")

fmt.Fprintln(w, " or remove the Config.Budget value in your pipeline configuration")

fmt.Fprintln(w, " suggestion: raise or remove the configured budget limits")

fmt.Fprintln(w, " (--max-tokens, --max-cost, --max-wall-time, or pipeline/library budget config)")

Refs #17

clintecker and others added 15 commits April 14, 2026 14:46

test(agent): assert Session.Run populates SessionResult.Provider

9ff8cc6

Guards the wiring between SessionConfig.Provider and the downstream per-provider rollup in pipeline/trace.go. Follow-up to 730eb50. Refs #62

Revert "test(adapter): lock budget attrs pass-through"

0a74ab7

This reverts commit 11cbfaf.

docs: drop stale CostReport comment

8926592

The BudgetGuard wiring landed in 720b75e; the trailing sentence is now factually wrong. Refs #17

docs: token and cost governance

ba12601

Documents the new Result.Cost / BudgetGuard / CLI flags / streaming events in CHANGELOG, README, and CLAUDE.md. Notes the follow-up blocker (#67) for .dip workflow-attr configuration. Refs #17 #62

docs: add token & cost governance implementation plan

74334f7

Refs #17 #62

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

coderabbitai Bot requested changes Apr 14, 2026

View reviewed changes

clintecker requested a review from Copilot April 15, 2026 15:47

Copilot started reviewing on behalf of clintecker April 15, 2026 15:48 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

coderabbitai Bot requested changes Apr 15, 2026

View reviewed changes

Comment thread README.md Outdated

clintecker requested a review from Copilot April 15, 2026 18:01

Copilot started reviewing on behalf of clintecker April 15, 2026 18:02 View session

docs: fix README CLI example flag ordering

6345d72

Flags before pipeline path to match existing examples. Refs #17

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Copilot AI reviewed Apr 15, 2026

View reviewed changes

docs: add bash lang identifier to README code block

afe4067

Refs #17

clintecker merged commit ad9ce91 into main Apr 15, 2026
2 checks passed

This was referenced Apr 15, 2026

feat: achieve CLI↔library feature parity #76

Closed

Expert Panel Review: .dip Format Support — Consolidated Report #39

Closed

coderabbitai Bot mentioned this pull request Apr 17, 2026

feat: CLI-library feature parity (Phase 1 NDJSON + Phase 2) #101

Merged

6 tasks

-	fs.IntVar(&cfg.maxTokens, "max-tokens", 0, "Halt if total tokens across the run exceed this value (0 = no limit)")
-	fs.IntVar(&cfg.maxCostCents, "max-cost", 0, "Halt if total cost in cents exceeds this value (0 = no limit)")
-	fs.DurationVar(&cfg.maxWallTime, "max-wall-time", 0, "Halt if pipeline wall time exceeds this duration (0 = no limit)")
+func parseRunFlags(args []string, cfg runConfig) (runConfig, error) {
+	if err := validateBackend(cfg.backend); err != nil {
+		return cfg, err
+	}
+	if cfg.maxTokens < 0 {
+		return cfg, fmt.Errorf("invalid --max-tokens: must be >= 0")
+	}
+	if cfg.maxCostCents < 0 {
+		return cfg, fmt.Errorf("invalid --max-cost: must be >= 0")
+	}
+	if cfg.maxWallTime < 0 {
+		return cfg, fmt.Errorf("invalid --max-wall-time: must be >= 0")
+	}
+	return cfg, nil
+}

		Architecture: Part 1 promotes `llm.TokenTracker` from a CLI-summary helper to a first-class library output by surfacing its per-provider totals on `Result` / `EngineResult`, pricing each provider via `llm.EstimateCost`, and emitting a new `EventCostUpdated` event after every trace entry so consumers see streaming updates. Part 2 adds a `BudgetGuard` evaluated inside the engine loop after each node's outcome is applied; on breach it writes a terminal trace entry, emits `EventBudgetExceeded`, and returns an `EngineResult` with status `OutcomeBudgetExceeded`. Part 3 wires graph attrs (`max_total_tokens`, `max_cost_cents`, `max_wall_time`) and CLI flags that override them.

	log.Printf("halt: %s, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD)
	log.Printf("halt: %v, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD)

		fmt.Fprintln(w, " suggestion: raise the relevant --max-tokens, --max-cost, or --max-wall-time flag,")
		fmt.Fprintln(w, " or remove the Config.Budget value in your pipeline configuration")

Conversation

clintecker commented Apr 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope note

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

clintecker commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

coderabbitai Bot Apr 14, 2026 •

edited

Loading

coderabbitai Bot Apr 14, 2026 •

edited

Loading

coderabbitai Bot Apr 14, 2026 •

edited

Loading

coderabbitai Bot Apr 14, 2026 •

edited

Loading

coderabbitai Bot Apr 14, 2026 •

edited

Loading

coderabbitai Bot Apr 14, 2026 •

edited

Loading

coderabbitai Bot Apr 14, 2026 •

edited

Loading