Skip to content

feat: token & cost governance (library exposure + pipeline ceilings)#68

Merged
clintecker merged 18 commits intomainfrom
feat/token-cost-governance
Apr 15, 2026
Merged

feat: token & cost governance (library exposure + pipeline ceilings)#68
clintecker merged 18 commits intomainfrom
feat/token-cost-governance

Conversation

@clintecker
Copy link
Copy Markdown
Collaborator

@clintecker clintecker commented Apr 14, 2026

Summary

  • Expose per-provider token and dollar cost on Result.Cost in the library API, populated from the llm.TokenTracker middleware via llm.EstimateCost (closes feat: ensure full token/cost data exposed from library #62).
  • Enforce optional MaxTotalTokens / MaxCostCents / MaxWallTime ceilings via a new pipeline.BudgetGuard. Tripping any limit halts the run with pipeline.OutcomeBudgetExceeded and populates EngineResult.BudgetLimitsHit (closes fix(pipeline): no pipeline-level token budget or cost ceiling #17).
  • Stream per-node cost updates via new EventCostUpdated events (and EventBudgetExceeded on halt), each carrying a CostSnapshot payload with total tokens, dollar cost, per-provider totals, and wall elapsed.
  • New tracker.Config.Budget library field + --max-tokens, --max-cost, --max-wall-time CLI flags.
  • tracker diagnose and activity.jsonl now carry cost/halt data.

Scope note

Reading budget limits directly from .dip workflow attrs is blocked on dippin-lang IR support. Tracked as a follow-up in #67.

Test plan

  • go build ./... — clean
  • go test ./... -short — all 15 packages pass
  • dippin doctor — A/100 on ask_and_execute.dip, build_product.dip, build_product_with_superspec.dip
  • Unit: BudgetGuard token / cost / wall-time / nil-guard / accumulated-float-rounding
  • Integration: TestEngine_HaltsOnBudgetBreach (5-node graph halts after node 3)
  • Integration: TestRun_BudgetHalt_FromConfig (library halt via Config.Budget)
  • Integration: TestEngine_EmitsCostUpdatedAfterEachNode (3-node graph emits ≥3 events, monotonic totals)
  • Integration: TestRun_PopulatesResultCost (per-provider cost in Result.Cost.ByProvider)
  • JSONL: TestBuildLogEntry_CostSnapshot / TestBuildLogEntry_NilCost
  • Manual: tracker run examples/ask_and_execute.dip --max-tokens 1 halts with budget banner
  • Manual: tracker diagnose after a halted run surfaces the budget section

Closes #62
Closes #17

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Run results now include cost reports with per‑provider USD breakdowns and total spend; session/provider info is surfaced in traces.
    • Configurable budget enforcement (max tokens, max cost, max wall‑time) via config or CLI flags (--max-tokens, --max-cost, --max-wall-time).
    • Pipelines emit streaming cost snapshots and halt when budgets are exceeded; halted runs show a clear "HALTED: budget exceeded" banner and reason.
  • Documentation

    • Added cost governance guide and implementation plan covering visibility, enforcement, diagnostics, and CLI usage.
  • Tests

    • Added unit and integration tests for cost rollups, events, and budget-halt behavior.

clintecker and others added 15 commits April 14, 2026 14:46
Adds CostByProvider and TotalCostUSD methods that map accumulated
per-provider Usage to dollar cost via a caller-supplied ModelResolver
callback. Enables the library API to expose cost breakdowns without
the llm package knowing how pipelines resolve models.

Refs #62
Adds SessionStats.Provider and UsageSummary.ProviderTotals so
consumers (library callers, TUI, CLI summary) can see per-provider
token and cost breakdowns from the trace alone, without needing
to cross-reference the llm.TokenTracker middleware.

Plumbing: SessionResult.Provider populated from SessionConfig.Provider
in session.Run(), copied to SessionStats by buildSessionStats().

Refs #62

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guards the wiring between SessionConfig.Provider and the downstream
per-provider rollup in pipeline/trace.go.

Follow-up to 730eb50. Refs #62
Adds Result.Cost populated from the llm.TokenTracker middleware via
a per-provider cost rollup resolved against the graph's default
llm_model attribute. Library consumers can now read per-provider
token and dollar cost directly off the Result without cross-
referencing trace entries.

Refs #62
Adds a new PipelineEvent type carrying a CostSnapshot payload
(TotalTokens, TotalCostUSD, per-provider breakdown, wall elapsed).
Emitted after every terminal node outcome so streaming consumers
can live-update cost dashboards without polling Trace.AggregateUsage.

Also declares EventBudgetExceeded for use by the BudgetGuard in
the follow-up task.

Refs #62

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds BudgetLimits/BudgetGuard/BudgetBreach in a new pipeline/budget.go
file, wires OutcomeBudgetExceeded into the run loop immediately after
every terminal-node emitCostUpdate, and surfaces EventBudgetExceeded
with the current CostSnapshot payload.

EngineResult gains BudgetLimitsHit (list of dimension names that
tripped) populated on halt. Token and cost ceilings are inclusive —
exact equality is not a breach.

The exit-node path uses a thin checkBudgetHaltForExit wrapper that
returns *EngineResult (matching handleExitNode's signature) rather
than loopResult; the advanceToNextNode path uses checkBudgetAfterEmit
directly and returns *loopResult inline — both bubble up through
processExitNode/processActiveNode to the main Run loop.

Refs #17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- BudgetGuard cost ceiling now uses math.Round before int conversion
  so accumulated float64 drift does not under-count cents.
- haltForBudget computes trace.AggregateUsage() once and reuses the
  snapshot for both the emitted event and the returned EngineResult.

Follow-up to 7c90885. Refs #17
Regression test that workflow-level max_total_tokens, max_cost_cents,
and max_wall_time land on graph.Attrs with their exact keys, so the
library can read them without further mapping.

The test documents the expected behavior for when dippin-lang v0.x+
adds MaxTotalTokens, MaxCostCents, and MaxWallTime fields to
WorkflowDefaults. Currently these fields don't exist in dippin-lang
v0.18.0, so the test uses a manual graph construction to verify the
adapter would handle them correctly.

Refs #17

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Library consumers set Config.Budget = pipeline.BudgetLimits{...} to
enforce token/cost/wall-time ceilings on a run. When any limit is
tripped, Result.Status becomes OutcomeBudgetExceeded and Result.Cost.LimitsHit
names the dimension(s) that halted the run.

Graph-attr fallback (reading limits from .dip workflow attrs) is a
follow-up blocked on dippin-lang IR support — tracked in #67.

Refs #17
The BudgetGuard wiring landed in 720b75e; the trailing sentence is
now factually wrong.

Refs #17
Three new flags on 'tracker run' populate Config.Budget, routing
through pipeline.BudgetGuard. When a limit trips, the summary
surfaces a 'HALTED: budget exceeded' section naming the dimension
and showing the accumulated spend.

Refs #17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- events_jsonl.go now includes CostSnapshot fields (TotalTokens,
  TotalCostUSD, ProviderTotals, WallElapsedMs) so cost_updated and
  budget_exceeded events in activity.jsonl carry their payload.
- buildLogEntry refactored from method to package-level function for
  direct testability; JSONLEventHandler.HandlePipelineEvent delegates to it.
- tracker diagnose scans activity.jsonl for budget_exceeded events and
  prints a dedicated halt section with token/cost/wall-time and
  remediation guidance before the standard node-failure blocks.

Refs #17 #62

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the new Result.Cost / BudgetGuard / CLI flags / streaming
events in CHANGELOG, README, and CLAUDE.md. Notes the follow-up
blocker (#67) for .dip workflow-attr configuration.

Refs #17 #62
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a11fdf1-4682-4904-b630-90f3fb646b38

📥 Commits

Reviewing files that changed from the base of the PR and between 6345d72 and afe4067.

📒 Files selected for processing (1)
  • README.md
✅ Files skipped from review due to trivial changes (1)
  • README.md

Walkthrough

Adds per-provider cost accounting and streaming cost events, exposes run-level CostReport, and implements a pipeline-level BudgetGuard enforcing token/cost/wall-time ceilings that can halt execution; wires CLI flags, diagnostics, and result/trace fields to surface budget behavior and cost snapshots.

Changes

Cohort / File(s) Summary
Documentation
CHANGELOG.md, README.md, CLAUDE.md, docs/superpowers/plans/2026-04-14-token-cost-governance.md
New cost-governance docs describing per-provider cost exposure, streaming cost events, BudgetGuard design, CLI flags, and note deferring .dip attr loading.
Token cost library
llm/token_tracker_cost.go, llm/token_tracker_cost_test.go
Add per-provider cost rollups: ProviderCost, ModelResolver, TokenTracker.CostByProvider() and TotalCostUSD() plus tests.
Pipeline budget primitives
pipeline/budget.go, pipeline/budget_test.go
New BudgetLimits, BudgetGuard, breach kinds and nil-safe creation; tests for tokens/cost/wall-time and boundary cases.
Pipeline events & serialization
pipeline/events.go, pipeline/events_jsonl.go, pipeline/events_jsonl_test.go
Add EventCostUpdated, EventBudgetExceeded, CostSnapshot payload and JSONL serialization fields; tests for log entry building.
Trace & usage aggregation
pipeline/trace.go, pipeline/trace_test.go, pipeline/handlers/transcript.go, pipeline/handlers/transcript_test.go
Add Provider to SessionStats, ProviderUsage type and UsageSummary.ProviderTotals; update aggregation and tests to roll up per-provider totals.
Engine integration & control flow
pipeline/engine.go, pipeline/engine_run.go, pipeline/engine_test.go
Wire budget guard into engine (WithBudgetGuard), add emitCostUpdate(), budget check/halt helpers, OutcomeBudgetExceeded, BudgetLimitsHit; tests for cost-update emission and halting on breach.
Tracker library & API surface
tracker.go, tracker_test.go
Add Config.Budget, new exported CostReport and Result.Cost (TotalUSD, ByProvider, LimitsHit); populate from token tracker using a default model resolver; tests added.
Agent session plumbing
agent/result.go, agent/session.go, agent/session_test.go
Add Provider to SessionResult and populate from session config; unit test added.
CLI wiring & diagnostics
cmd/tracker/flags.go, cmd/tracker/main.go, cmd/tracker/commands.go, cmd/tracker/run.go, cmd/tracker/summary.go, cmd/tracker/diagnose.go
Add --max-tokens, --max-cost, --max-wall-time flags and validation; propagate activeBudgetLimits into engine options; summary and diagnose surface budget halts and parse budget events.
Misc tests
agent/*, pipeline/*, tracker_test.go
Added/updated tests across components validating provider propagation, cost rollups, event emission, and budget halting scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/CLI
    participant Engine as Pipeline Engine
    participant Node as Node Handler
    participant Budget as Budget Guard
    participant Events as Event Handler
    participant Token as TokenTracker

    Client->>Engine: Run(graph, options)
    loop For each node
        Engine->>Node: Execute node
        Node-->>Engine: SessionResult (provider, tokens)
        Engine->>Token: AddUsage(provider, tokens)
        Engine->>Engine: Add trace entry
        Engine->>Engine: emitCostUpdate()
        Engine->>Events: EventCostUpdated + CostSnapshot
        Engine->>Budget: Check(UsageSummary, started)
        alt Budget exceeded
            Budget-->>Engine: BudgetBreach
            Engine->>Engine: Set EndTime, Status=BudgetExceeded
            Engine->>Events: EventBudgetExceeded + CostSnapshot
            Engine-->>Client: Halt execution
        else Within limits
            Budget-->>Engine: BudgetOK
            Engine->>Engine: Continue
        end
    end
    Engine-->>Client: EngineResult (with BudgetLimitsHit)
    Client->>Client: Tracker populates Result.Cost from TokenTracker.CostByProvider()
Loading
sequenceDiagram
    participant Consumer as Library Consumer
    participant Tracker as Tracker Library
    participant Guard as Budget Guard
    participant Engine as Pipeline Engine
    participant Token as TokenTracker

    Consumer->>Tracker: Run(config{Budget})
    Tracker->>Guard: NewBudgetGuard(config.Budget)
    Tracker->>Engine: WithBudgetGuard(guard) and Run(graph)
    Engine->>Token: AddUsage(provider, usage)
    Engine->>Guard: Check(aggregated_usage, time)
    alt Exceeded
        Guard-->>Engine: BudgetBreach -> EngineResult.Status=BudgetExceeded
    else OK
        Guard-->>Engine: BudgetOK -> continue
    end
    Engine-->>Tracker: EngineResult
    Tracker->>Token: CostByProvider(resolver)
    Token-->>Tracker: Per-provider costs
    Tracker-->>Consumer: Result{Cost: {TotalUSD, ByProvider, LimitsHit}}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I counted tokens, penny by penny,
I tracked each provider, gentle and ready.
When limits drew near, the guard gave a shout—
"Halt the run, friends, we've spent too much clout!"
Hooray for cost guards—safe hopping about. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: token & cost governance (library exposure + pipeline ceilings)' directly describes the core changes: exposing token/cost data from the library and adding pipeline-level budget enforcement.
Linked Issues check ✅ Passed The PR comprehensively addresses both linked issues: #62 requirements for per-provider cost exposure and real-time cost events are met via Result.Cost, CostSnapshot events, and ProviderTotals; #17 requirements for budget ceilings and halt behavior are met via BudgetGuard, OutcomeBudgetExceeded status, and BudgetLimitsHit tracking.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the PR objectives. The provider tracking (agent/result.go, agent/session.go, SessionStats) supports cost attribution; CLI flags and diagnose updates enable budget enforcement UX; documentation captures the feature scope and defers .dip attribute loading to #67 appropriately.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/token-cost-governance

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74334f7c23

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pipeline/engine_run.go
Comment on lines +56 to +60
return loopResult{
action: loopReturn,
result: &EngineResult{
RunID: s.runID,
Status: OutcomeBudgetExceeded,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Persist checkpoint before returning budget halt

haltForBudget returns immediately without saving checkpoint state. Because the guard is evaluated before advanceToNextNode writes the updated checkpoint, a budget-triggered stop can leave disk state behind in-memory completion, so --resume may re-run the last completed node and repeat side effects/cost instead of continuing from the correct frontier.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.

Comment thread pipeline/engine.go
Comment on lines +310 to +312
e.emitCostUpdate(s)
if lr := e.checkBudgetAfterEmit(s); lr != nil {
return *lr
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce budget checks on retry paths

Budget enforcement is only invoked in the normal advance/exit flow, not in retry handling. When a node returns OutcomeRetry, execution goes through handleRetryWithinBudget/handleRetryExhausted without calling checkBudgetAfterEmit, so repeated retries can exceed token/cost limits and still continue (or end as OutcomeFail) instead of halting with OutcomeBudgetExceeded.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.

Comment thread pipeline/budget.go
if g.limits.MaxTotalTokens > 0 && usage.TotalTokens > g.limits.MaxTotalTokens {
return BudgetBreach{Kind: BudgetTokens, Message: "max_total_tokens exceeded"}
}
if g.limits.MaxCostCents > 0 && int(math.Round(usage.TotalCostUSD*100)) > g.limits.MaxCostCents {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compare cost ceilings without rounding down overages

The cost breach check uses math.Round(usage.TotalCostUSD*100), which can hide real overages below 0.5¢ (for example, 1.4¢ rounds to 1¢ and passes a 1-cent ceiling). That violates the stated "exceeds this value" behavior for MaxCostCents and can let runs overspend without tripping the guard.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design. math.Round gives us cent-level granularity, which matches the MaxCostCents unit. Sub-0.5¢ values round down, but the alternative (Ceil) would false-trip on any fractional cent above zero. The original bug was int() truncation dropping whole cents — Round fixes that without over-triggering.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (1)
cmd/tracker/run.go (1)

129-131: Consider deduplicating budget-guard option wiring.

Both engine builders now repeat the same guard-append block. A tiny helper would reduce drift risk between console and TUI paths.

♻️ Possible refactor
 func buildEngineOptions(artifactDir, checkpoint string, evtHandler pipeline.PipelineEventHandler) []pipeline.EngineOption {
@@
-	if guard := pipeline.NewBudgetGuard(activeBudgetLimits); guard != nil {
-		opts = append(opts, pipeline.WithBudgetGuard(guard))
-	}
+	opts = appendBudgetGuardOption(opts, activeBudgetLimits)
 	return opts
 }
@@
 func buildTUIEngine(graph *pipeline.Graph, registry *pipeline.HandlerRegistry, artifactDir, checkpoint string, pipelineCombo pipeline.PipelineEventHandler) *pipeline.Engine {
@@
-	if guard := pipeline.NewBudgetGuard(activeBudgetLimits); guard != nil {
-		engineOpts = append(engineOpts, pipeline.WithBudgetGuard(guard))
-	}
+	engineOpts = appendBudgetGuardOption(engineOpts, activeBudgetLimits)
 	return pipeline.NewEngine(graph, registry, engineOpts...)
 }
+
+func appendBudgetGuardOption(opts []pipeline.EngineOption, limits pipeline.BudgetLimits) []pipeline.EngineOption {
+	if guard := pipeline.NewBudgetGuard(limits); guard != nil {
+		return append(opts, pipeline.WithBudgetGuard(guard))
+	}
+	return opts
+}

Also applies to: 415-417

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/run.go` around lines 129 - 131, Both engine-builder code paths
duplicate the same budget-guard wiring (calling pipeline.NewBudgetGuard and
conditionally appending pipeline.WithBudgetGuard); extract that logic into a
small helper (e.g., buildBudgetGuardOpts or appendBudgetGuardIfAny) and call it
from both engine builders so the guard creation and conditional append happen in
one place, referencing pipeline.NewBudgetGuard and pipeline.WithBudgetGuard to
locate the duplicated block and update both sites to use the helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 8-22: Update the CHANGELOG.md Unreleased section to include a "###
Fixed" subsection and move the existing non-feature caveat into a standard
heading (e.g., under "### Notes" should instead be folded into "### Changed" or
"### Fixed" as appropriate); specifically add one or more brief bullet(s) under
"### Fixed" describing the behavioral fixes shipped in this PR, ensure the file
follows the "Keep a Changelog" headings order (Added, Changed, Fixed, Removed)
and remove the custom "Notes" header or convert its content into the appropriate
standard heading; reference the existing Unreleased entries such as
`Result.Cost`, `pipeline.BudgetGuard`, `tracker.Config.Budget`, CLI flags on
`tracker run`, and pipeline events when deciding whether each item belongs under
Added vs Changed/Fixed.

In `@cmd/tracker/diagnose.go`:
- Around line 87-92: The budgetHalt struct currently lacks the
breached-dimension message so tracker diagnose drops the
EventBudgetExceeded.Message; add a Reason string field to budgetHalt, populate
it from the EventBudgetExceeded.Message wherever budgetHalt instances are
created (the branches around the existing EventBudgetExceeded handling noted in
the diff), and update printBudgetHalt to print "reason: <Reason>" only when
Reason is non-empty so the diagnose output shows whether tokens, cost, or wall
time triggered the halt.

In `@cmd/tracker/flags.go`:
- Around line 135-137: parseRunFlags must reject negative values for the three
budget flags: cfg.maxTokens, cfg.maxCostCents, and cfg.maxWallTime; add
validation in the parseRunFlags function after flag parsing that checks
cfg.maxTokens >= 0, cfg.maxCostCents >= 0 and cfg.maxWallTime >= 0 (or >0? keep
as >= 0 per spec) and return a descriptive error if any are negative so the CLI
fails fast with a clear message. Ensure the error mentions which flag is invalid
(e.g., "max-tokens must be >= 0") and hook this validation into the existing
error path of parseRunFlags.

In `@docs/superpowers/plans/2026-04-14-token-cost-governance.md`:
- Around line 7-8: The plan incorrectly schedules reading budget attributes from
`.dip` workflow attrs and assigns adapter pass-through / graph-attr fallback
work even though that capability is blocked by issue `#67`; update the doc (Part 3
and any sections referencing max_total_tokens, max_cost_cents, max_wall_time,
BudgetGuard, and the adapter pass-through/fallback work) to remove or mark as
deferred the `.dip` budget-attrs scheduling, stop assigning follow-up work that
depends on `.dip` IR support, and explicitly state that graph-level budget
loading is blocked by issue `#67` so the next worker is not sent prematurely.

In `@llm/token_tracker_cost.go`:
- Around line 21-33: The read lock is held while calling external callbacks
(resolve and EstimateCost); instead snapshot t.usage into a local map under
t.mu.RLock()/RUnlock(), then release the lock and iterate the snapshot to build
the out map, calling resolve(provider) and EstimateCost(model, usage) outside
the lock; update references to t.mu, t.usage, resolve, ProviderCost and
EstimateCost accordingly so only the minimal copy is done while locked and all
resolver/cost work happens after releasing the lock.

In `@pipeline/budget.go`:
- Around line 73-99: The current logic in BudgetGuard.Check and checkUsage only
returns the first BudgetBreach it encounters, so multiple simultaneous limits
(tokens, cost, wall time) are never reported; update these functions to evaluate
all limit checks and accumulate every violated dimension into the returned
breach (e.g., populate BudgetBreach.LimitsHit or create a slice of breaches)
instead of early-returning on the first match—specifically, change checkUsage
(referencing g.limits.MaxTotalTokens, g.limits.MaxCostCents and
usage.TotalTokens/usage.TotalCostUSD) to append both token and cost violations,
and change Check (referencing g.limits.MaxWallTime and time.Since(started)) to
also include wall-time violations into the same returned BudgetBreach so the
caller receives all exceeded limits at once.

In `@pipeline/engine.go`:
- Around line 310-313: The retry and fallback/redirect paths (specifically in
handleRetryWithinBudget, handleRetryExhausted, and the retry/fallback branches
at the top of handleExitNode) append trace usage and continue without calling
emitCostUpdate and checkBudgetAfterEmit, letting retries bypass BudgetGuard
ceilings (MaxTotalTokens/MaxCostCents); fix by applying the same pattern used on
the normal edge-advance path: after appending trace usage in those branches call
e.emitCostUpdate(s) and then if lr := e.checkBudgetAfterEmit(s); lr != nil {
return *lr } so the BudgetGuard enforcement runs consistently across
retry/fallback/redirect flows.

In `@README.md`:
- Around line 397-404: Replace the incorrect "tracker run ..." example with the
documented root-command syntax so copy-paste works; update the snippet to use
"tracker [flags] <pipeline>" by moving flags before the pipeline path (e.g.,
change "tracker run examples/ask_and_execute.dip --max-tokens 100000 --max-cost
500 --max-wall-time 30m" to "tracker --max-tokens 100000 --max-cost 500
--max-wall-time 30m examples/ask_and_execute.dip"), ensuring references to the
flags (--max-tokens, --max-cost, --max-wall-time) and the pipeline file
(examples/ask_and_execute.dip) remain.

In `@tracker.go`:
- Around line 439-456: populateResultTokensAndCost currently re-computes cost
via tokenTracker and defaultModelResolver which misestimates when per-node/model
usage is recorded in engineResult. Instead, change populateResultTokensAndCost
to prefer engineResult.Usage when present: if engineResult != nil and
engineResult.Usage has provider entries, build Result.Cost from that usage
(aggregate TotalUSD and set ByProvider from engineResult.Usage) and skip the
tokenTracker cost path and defaultModelResolver; otherwise fall back to the
existing tokenTracker logic (keep result.TokensByProvider =
e.tokenTracker.AllProviderUsage() behavior). Use the existing CostReport struct
shape and fields (TotalUSD, ByProvider) when constructing the cost from
pipeline.EngineResult.Usage.

---

Nitpick comments:
In `@cmd/tracker/run.go`:
- Around line 129-131: Both engine-builder code paths duplicate the same
budget-guard wiring (calling pipeline.NewBudgetGuard and conditionally appending
pipeline.WithBudgetGuard); extract that logic into a small helper (e.g.,
buildBudgetGuardOpts or appendBudgetGuardIfAny) and call it from both engine
builders so the guard creation and conditional append happen in one place,
referencing pipeline.NewBudgetGuard and pipeline.WithBudgetGuard to locate the
duplicated block and update both sites to use the helper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fe5ee644-f62a-471f-b96f-6a2fe6b3f4d9

📥 Commits

Reviewing files that changed from the base of the PR and between 4f8382e and 74334f7.

📒 Files selected for processing (29)
  • CHANGELOG.md
  • CLAUDE.md
  • README.md
  • agent/result.go
  • agent/session.go
  • agent/session_test.go
  • cmd/tracker/commands.go
  • cmd/tracker/diagnose.go
  • cmd/tracker/flags.go
  • cmd/tracker/main.go
  • cmd/tracker/run.go
  • cmd/tracker/summary.go
  • docs/superpowers/plans/2026-04-14-token-cost-governance.md
  • llm/token_tracker_cost.go
  • llm/token_tracker_cost_test.go
  • pipeline/budget.go
  • pipeline/budget_test.go
  • pipeline/engine.go
  • pipeline/engine_run.go
  • pipeline/engine_test.go
  • pipeline/events.go
  • pipeline/events_jsonl.go
  • pipeline/events_jsonl_test.go
  • pipeline/handlers/transcript.go
  • pipeline/handlers/transcript_test.go
  • pipeline/trace.go
  • pipeline/trace_test.go
  • tracker.go
  • tracker_test.go

Comment thread CHANGELOG.md
Comment on lines +8 to +22
## [Unreleased]

### Added

- `Result.Cost` on the library API with per-provider rollup (`map[string]llm.ProviderCost`) and `TotalUSD`. Populated from the `llm.TokenTracker` middleware and priced via `llm.EstimateCost`. Closes #62.
- `pipeline.BudgetGuard` enforcing `MaxTotalTokens`, `MaxCostCents`, and `MaxWallTime` limits. Halts the run with `pipeline.OutcomeBudgetExceeded` when any dimension trips. Closes #17.
- New `tracker.Config.Budget` field (type `pipeline.BudgetLimits`) for library consumers.
- New CLI flags on `tracker run`: `--max-tokens`, `--max-cost` (cents), `--max-wall-time`.
- New pipeline events `cost_updated` (streaming per-node cost snapshots) and `budget_exceeded` (fired on halt). Both carry a `CostSnapshot` payload with `TotalTokens`, `TotalCostUSD`, `ProviderTotals`, and `WallElapsed`.
- `tracker diagnose` surfaces a "Budget halt detected" section when a run halts on budget.
- `UsageSummary.ProviderTotals` (per-provider token and cost rollup) on `pipeline.Trace.AggregateUsage()` output.

### Notes

- Reading budget limits from `.dip` workflow attrs is blocked on dippin-lang IR support; tracked in #67.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The Unreleased entry is still missing the Fixed bucket.

This PR includes behavior fixes as well as features, but the changelog only records Added items and introduces a custom Notes section. Please fold the #67 caveat into one of the standard headings and add the shipped fixes here. As per coding guidelines Keep CHANGELOG.md updated with every feature, fix, and breaking change in the same PR as the code change and Use Keep a Changelog format in CHANGELOG.md — group entries under Added, Changed, Fixed, Removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 8 - 22, Update the CHANGELOG.md Unreleased section
to include a "### Fixed" subsection and move the existing non-feature caveat
into a standard heading (e.g., under "### Notes" should instead be folded into
"### Changed" or "### Fixed" as appropriate); specifically add one or more brief
bullet(s) under "### Fixed" describing the behavioral fixes shipped in this PR,
ensure the file follows the "Keep a Changelog" headings order (Added, Changed,
Fixed, Removed) and remove the custom "Notes" header or convert its content into
the appropriate standard heading; reference the existing Unreleased entries such
as `Result.Cost`, `pipeline.BudgetGuard`, `tracker.Config.Budget`, CLI flags on
`tracker run`, and pipeline events when deciding whether each item belongs under
Added vs Changed/Fixed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is all new functionality, not fixes of pre-existing bugs. The rounding fix (6b66072) and review-round fix (a1939ab) are corrections to code introduced in this same PR, not regression fixes. No Fixed bucket needed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintecker, that's a reasonable distinction — intra-PR corrections to newly introduced code are not the same as regression fixes against previously-released behavior, so omitting a Fixed bucket makes sense here. I'll drop that concern.


🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: `tracker version` shows commit hash, build time, and which providers are configured. Uses Go VCS metadata for `go install` builds, GoReleaser ldflags for releases.

Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Applies to CHANGELOG.md : Keep CHANGELOG.md updated with every feature, fix, and breaking change using Keep a Changelog format. Group entries under Added, Changed, Fixed, Removed. Update the changelog in the same PR as the code change.

Learnt from: clintecker
Repo: 2389-research/tracker PR: 68
File: pipeline/budget.go:73-99
Timestamp: 2026-04-15T16:41:59.198Z
Learning: In `pipeline/budget.go`, `BudgetGuard.Check` and `checkUsage` intentionally short-circuit on the first breached budget dimension (tokens → cost → wall_time). This is by design: the pipeline halts immediately on the first breach, so collecting all simultaneously-violated dimensions would add complexity for a scenario that rarely matters in practice. The maintainer is open to revisiting if users report confusion.

Comment thread cmd/tracker/diagnose.go
Comment thread cmd/tracker/flags.go
Comment on lines +135 to +137
fs.IntVar(&cfg.maxTokens, "max-tokens", 0, "Halt if total tokens across the run exceed this value (0 = no limit)")
fs.IntVar(&cfg.maxCostCents, "max-cost", 0, "Halt if total cost in cents exceeds this value (0 = no limit)")
fs.DurationVar(&cfg.maxWallTime, "max-wall-time", 0, "Halt if pipeline wall time exceeds this duration (0 = no limit)")
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject negative budget limits during CLI parsing.

Line 135–137 introduces limits but allows negative values, which can lead to confusing behavior (e.g., immediate budget halt). Please validate all three flags as >= 0 in parseRunFlags.

🔧 Proposed fix
 func parseRunFlags(args []string, cfg runConfig) (runConfig, error) {
@@
 	if err := validateBackend(cfg.backend); err != nil {
 		return cfg, err
 	}
+	if cfg.maxTokens < 0 {
+		return cfg, fmt.Errorf("invalid --max-tokens: must be >= 0")
+	}
+	if cfg.maxCostCents < 0 {
+		return cfg, fmt.Errorf("invalid --max-cost: must be >= 0")
+	}
+	if cfg.maxWallTime < 0 {
+		return cfg, fmt.Errorf("invalid --max-wall-time: must be >= 0")
+	}
 	return cfg, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fs.IntVar(&cfg.maxTokens, "max-tokens", 0, "Halt if total tokens across the run exceed this value (0 = no limit)")
fs.IntVar(&cfg.maxCostCents, "max-cost", 0, "Halt if total cost in cents exceeds this value (0 = no limit)")
fs.DurationVar(&cfg.maxWallTime, "max-wall-time", 0, "Halt if pipeline wall time exceeds this duration (0 = no limit)")
func parseRunFlags(args []string, cfg runConfig) (runConfig, error) {
if err := validateBackend(cfg.backend); err != nil {
return cfg, err
}
if cfg.maxTokens < 0 {
return cfg, fmt.Errorf("invalid --max-tokens: must be >= 0")
}
if cfg.maxCostCents < 0 {
return cfg, fmt.Errorf("invalid --max-cost: must be >= 0")
}
if cfg.maxWallTime < 0 {
return cfg, fmt.Errorf("invalid --max-wall-time: must be >= 0")
}
return cfg, nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/flags.go` around lines 135 - 137, parseRunFlags must reject
negative values for the three budget flags: cfg.maxTokens, cfg.maxCostCents, and
cfg.maxWallTime; add validation in the parseRunFlags function after flag parsing
that checks cfg.maxTokens >= 0, cfg.maxCostCents >= 0 and cfg.maxWallTime >= 0
(or >0? keep as >= 0 per spec) and return a descriptive error if any are
negative so the CLI fails fast with a clear message. Ensure the error mentions
which flag is invalid (e.g., "max-tokens must be >= 0") and hook this validation
into the existing error path of parseRunFlags.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a1939ab. parseRunFlags now validates non-negative values.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintecker, thanks for the fix! The validateBudgetLimits check in parseRunFlags correctly rejects negative values for all three budget flags with a clear error message.

Comment on lines +7 to +8
**Architecture:** Part 1 promotes `llm.TokenTracker` from a CLI-summary helper to a first-class library output by surfacing its per-provider totals on `Result` / `EngineResult`, pricing each provider via `llm.EstimateCost`, and emitting a new `EventCostUpdated` event after every trace entry so consumers see streaming updates. Part 2 adds a `BudgetGuard` evaluated inside the engine loop after each node's outcome is applied; on breach it writes a terminal trace entry, emits `EventBudgetExceeded`, and returns an `EngineResult` with status `OutcomeBudgetExceeded`. Part 3 wires graph attrs (`max_total_tokens`, `max_cost_cents`, `max_wall_time`) and CLI flags that override them.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Scope drift: this plan still schedules .dip budget attrs as if they shipped here.

The rest of this PR explicitly treats graph-level budget loading as blocked by #67, but this plan still assigns adapter pass-through and graph-attr fallback work. That will send the next worker after unsupported behavior instead of the intended follow-up. Based on learnings Reading budgets from .dipworkflow attrs is blocked on dippin-lang IR support (issue#67).

Also applies to: 30-31, 773-927

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-14-token-cost-governance.md` around lines 7 -
8, The plan incorrectly schedules reading budget attributes from `.dip` workflow
attrs and assigns adapter pass-through / graph-attr fallback work even though
that capability is blocked by issue `#67`; update the doc (Part 3 and any sections
referencing max_total_tokens, max_cost_cents, max_wall_time, BudgetGuard, and
the adapter pass-through/fallback work) to remove or mark as deferred the `.dip`
budget-attrs scheduling, stop assigning follow-up work that depends on `.dip` IR
support, and explicitly state that graph-level budget loading is blocked by
issue `#67` so the next worker is not sent prematurely.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — the plan is a historical artifact that predates the scope reduction. Tasks 6/7 were adjusted during execution. Adding a note to the plan doc would be revisionism; it serves as a record of the original intent. #67 tracks the follow-up.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintecker, that's a fair framing — keeping the plan as an unmodified record of original intent makes sense, and #67 is the right place to track the deferred .dip attrs work.

(∩`-´)⊃━☆゚.*・。゚


🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.

Comment thread llm/token_tracker_cost.go
Comment on lines +21 to +33
t.mu.RLock()
defer t.mu.RUnlock()
out := make(map[string]ProviderCost, len(t.usage))
for provider, usage := range t.usage {
var model string
if resolve != nil {
model = resolve(provider)
}
out[provider] = ProviderCost{
Usage: usage,
Model: model,
USD: EstimateCost(model, usage),
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and read the actual file to confirm the code snippet
find . -name "token_tracker_cost.go" -type f

Repository: 2389-research/tracker

Length of output: 92


🏁 Script executed:

# Read the relevant file section
if [ -f "./llm/token_tracker_cost.go" ]; then
  cat -n ./llm/token_tracker_cost.go | head -60
fi

Repository: 2389-research/tracker

Length of output: 1837


🏁 Script executed:

# Search for ModelResolver type definition
rg "type ModelResolver" --type go

Repository: 2389-research/tracker

Length of output: 138


🏁 Script executed:

# Search for CostByProvider calls to understand usage patterns
rg "CostByProvider" --type go -B 2 -A 2

Repository: 2389-research/tracker

Length of output: 3777


🏁 Script executed:

# Check TokenTracker type and its methods
rg "type TokenTracker" --type go -A 20

Repository: 2389-research/tracker

Length of output: 1170


🏁 Script executed:

# Search for potential re-entrance paths - methods called on TokenTracker during cost operations
rg "func \(t \*TokenTracker\)" --type go

Repository: 2389-research/tracker

Length of output: 751


🏁 Script executed:

# Check what defaultModelResolver does
rg "defaultModelResolver" --type go -A 10

Repository: 2389-research/tracker

Length of output: 1286


🏁 Script executed:

# Look for any ModelResolver implementations that might call back into TokenTracker
rg "ModelResolver" --type go -B 3 -A 5

Repository: 2389-research/tracker

Length of output: 2802


🏁 Script executed:

# Check if there are any concurrent writes to TokenTracker during CostByProvider calls
rg "AddUsage\|ProviderUsage\|TotalUsage" --type go -B 2 -A 2 | head -100

Repository: 2389-research/tracker

Length of output: 47


🏁 Script executed:

# Get full context of tracker.go usage around CostByProvider
rg -B 5 -A 5 "CostByProvider\(resolver\)" tracker.go

Repository: 2389-research/tracker

Length of output: 356


🏁 Script executed:

# Check EstimateCost implementation to see if it could be slow or blocking
rg "func EstimateCost" --type go -A 15

Repository: 2389-research/tracker

Length of output: 463


🏁 Script executed:

# Check if there are any tests that show CostByProvider being called concurrently with AddUsage
rg "TestTokenTracker" --type go | head -20

Repository: 2389-research/tracker

Length of output: 1077


🏁 Script executed:

# Read the full token_tracker.go to understand AddUsage
cat -n ./llm/token_tracker.go | head -80

Repository: 2389-research/tracker

Length of output: 2747


🏁 Script executed:

# Check the concurrent test to understand typical load patterns
cat -n ./llm/token_tracker_test.go | grep -A 30 "TestTokenTrackerConcurrentSafe"

Repository: 2389-research/tracker

Length of output: 1148


🏁 Script executed:

# Verify that defaultModelResolver doesn't call back into TokenTracker or do blocking I/O
rg "Graph\|Attrs" tracker.go | head -20

Repository: 2389-research/tracker

Length of output: 47


Hold the read lock only while copying t.usage; release before calling resolve and EstimateCost.

While the current ModelResolver implementation (defaultModelResolver) is trivial and safe, holding a lock during external callbacks is not a defensive pattern. If the resolver logic ever becomes complex or calls back into the tracker, this could cause contention or re-entrancy issues. Snapshot the usage map under lock, then release before computing costs.

Suggested fix
 func (t *TokenTracker) CostByProvider(resolve ModelResolver) map[string]ProviderCost {
-	t.mu.RLock()
-	defer t.mu.RUnlock()
-	out := make(map[string]ProviderCost, len(t.usage))
-	for provider, usage := range t.usage {
+	t.mu.RLock()
+	usageByProvider := make(map[string]Usage, len(t.usage))
+	for provider, usage := range t.usage {
+		usageByProvider[provider] = usage
+	}
+	t.mu.RUnlock()
+
+	out := make(map[string]ProviderCost, len(usageByProvider))
+	for provider, usage := range usageByProvider {
 		var model string
 		if resolve != nil {
 			model = resolve(provider)
 		}
 		out[provider] = ProviderCost{
 			Usage: usage,
 			Model: model,
 			USD:   EstimateCost(model, usage),
 		}
 	}
 	return out
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
t.mu.RLock()
defer t.mu.RUnlock()
out := make(map[string]ProviderCost, len(t.usage))
for provider, usage := range t.usage {
var model string
if resolve != nil {
model = resolve(provider)
}
out[provider] = ProviderCost{
Usage: usage,
Model: model,
USD: EstimateCost(model, usage),
}
func (t *TokenTracker) CostByProvider(resolve ModelResolver) map[string]ProviderCost {
t.mu.RLock()
usageByProvider := make(map[string]Usage, len(t.usage))
for provider, usage := range t.usage {
usageByProvider[provider] = usage
}
t.mu.RUnlock()
out := make(map[string]ProviderCost, len(usageByProvider))
for provider, usage := range usageByProvider {
var model string
if resolve != nil {
model = resolve(provider)
}
out[provider] = ProviderCost{
Usage: usage,
Model: model,
USD: EstimateCost(model, usage),
}
}
return out
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@llm/token_tracker_cost.go` around lines 21 - 33, The read lock is held while
calling external callbacks (resolve and EstimateCost); instead snapshot t.usage
into a local map under t.mu.RLock()/RUnlock(), then release the lock and iterate
the snapshot to build the out map, calling resolve(provider) and
EstimateCost(model, usage) outside the lock; update references to t.mu, t.usage,
resolve, ProviderCost and EstimateCost accordingly so only the minimal copy is
done while locked and all resolver/cost work happens after releasing the lock.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — the implementation matches the spec and all tests pass.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintecker, understood! The current implementation is straightforward and the lock-scope concern is preventive rather than an immediate bug. I'll defer to your judgment here.

(¬、¬)

Comment thread pipeline/budget.go
Comment on lines +73 to +99
func (g *BudgetGuard) Check(usage *UsageSummary, started time.Time) BudgetBreach {
if g == nil {
return BudgetBreach{Kind: BudgetOK}
}
if breach := g.checkUsage(usage); breach.Kind != BudgetOK {
return breach
}
if g.limits.MaxWallTime > 0 && time.Since(started) > g.limits.MaxWallTime {
return BudgetBreach{Kind: BudgetWallTime, Message: "max_wall_time exceeded"}
}
return BudgetBreach{Kind: BudgetOK}
}

// checkUsage evaluates token and cost limits against a usage snapshot.
// Returns BudgetOK when usage is nil or no limit is breached.
func (g *BudgetGuard) checkUsage(usage *UsageSummary) BudgetBreach {
if usage == nil {
return BudgetBreach{Kind: BudgetOK}
}
if g.limits.MaxTotalTokens > 0 && usage.TotalTokens > g.limits.MaxTotalTokens {
return BudgetBreach{Kind: BudgetTokens, Message: "max_total_tokens exceeded"}
}
if g.limits.MaxCostCents > 0 && int(math.Round(usage.TotalCostUSD*100)) > g.limits.MaxCostCents {
return BudgetBreach{Kind: BudgetCost, Message: "max_cost_cents exceeded"}
}
return BudgetBreach{Kind: BudgetOK}
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only the first exceeded budget dimension is ever surfaced.

Check returns a single BudgetBreach, and checkUsage stops at the first match. In practice BudgetLimitsHit can never contain more than one entry, even when a run exceeds tokens and cost (or cost and wall time) at the same halt point. That loses part of the halt reason the new API is meant to expose.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/budget.go` around lines 73 - 99, The current logic in
BudgetGuard.Check and checkUsage only returns the first BudgetBreach it
encounters, so multiple simultaneous limits (tokens, cost, wall time) are never
reported; update these functions to evaluate all limit checks and accumulate
every violated dimension into the returned breach (e.g., populate
BudgetBreach.LimitsHit or create a slice of breaches) instead of early-returning
on the first match—specifically, change checkUsage (referencing
g.limits.MaxTotalTokens, g.limits.MaxCostCents and
usage.TotalTokens/usage.TotalCostUSD) to append both token and cost violations,
and change Check (referencing g.limits.MaxWallTime and time.Since(started)) to
also include wall-time violations into the same returned BudgetBreach so the
caller receives all exceeded limits at once.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — Check short-circuits on the first breached dimension because the run halts immediately. Surfacing all breached dimensions would require a collect-all-then-halt pattern, which adds complexity for a scenario that rarely matters in practice (the user raises the limit that fired). Happy to revisit if users report confusion.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintecker, understood — that's a reasonable trade-off. Since the pipeline halts at the first breach, collecting additional violated dimensions at that same instant would require a fundamentally different control flow with no practical benefit until a user actually hits it. I'll note this for future reviews.


✏️ Learnings added
Learnt from: clintecker
Repo: 2389-research/tracker PR: 68
File: pipeline/budget.go:73-99
Timestamp: 2026-04-15T16:41:59.198Z
Learning: In `pipeline/budget.go`, `BudgetGuard.Check` and `checkUsage` intentionally short-circuit on the first breached budget dimension (tokens → cost → wall_time). This is by design: the pipeline halts immediately on the first breach, so collecting all simultaneously-violated dimensions would add complexity for a scenario that rarely matters in practice. The maintainer is open to revisiting if users report confusion.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.

Comment thread pipeline/engine.go
Comment on lines +310 to +313
e.emitCostUpdate(s)
if lr := e.checkBudgetAfterEmit(s); lr != nil {
return *lr
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Budget ceilings are bypassed on retry and goal-gate redirect paths.

This check only runs on the normal edge-advance path. The retry flows in handleRetryWithinBudget / handleRetryExhausted and the retry/fallback branches at the top of handleExitNode also append trace usage and continue, but they never call emitCostUpdate or checkBudgetAfterEmit. A node that keeps returning OutcomeRetry can therefore spend past MaxTotalTokens / MaxCostCents before the breaker ever trips.

💸 Suggested fix
func (e *Engine) handleRetryWithinBudget(ctx context.Context, s *runState, currentNodeID string, execNode *Node, traceEntry *TraceEntry) (string, bool, *EngineResult, error) {
	...
	traceEntry.EdgeTo = target
	s.trace.AddEntry(*traceEntry)
+	e.emitCostUpdate(s)
+	if lr := e.checkBudgetAfterEmit(s); lr != nil {
+		return "", false, lr.result, lr.err
+	}
	e.clearDownstream(target, s.cp)
	...
}

Apply the same pattern to the fallback / redirect branches that add a trace entry and continue execution.

As per coding guidelines "Pipeline engine enforces optional cost ceilings via pipeline.BudgetGuard, evaluated between nodes after emitCostUpdate (v0.17.0+)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pipeline/engine.go` around lines 310 - 313, The retry and fallback/redirect
paths (specifically in handleRetryWithinBudget, handleRetryExhausted, and the
retry/fallback branches at the top of handleExitNode) append trace usage and
continue without calling emitCostUpdate and checkBudgetAfterEmit, letting
retries bypass BudgetGuard ceilings (MaxTotalTokens/MaxCostCents); fix by
applying the same pattern used on the normal edge-advance path: after appending
trace usage in those branches call e.emitCostUpdate(s) and then if lr :=
e.checkBudgetAfterEmit(s); lr != nil { return *lr } so the BudgetGuard
enforcement runs consistently across retry/fallback/redirect flows.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
git show a1939ab --stat
git show a1939ab -- pipeline/engine_run.go pipeline/engine.go

Length of output: 2767


@clintecker, confirmed in a1939ab — the fix looks correct:

  • handleRetryWithinBudget: emitCostUpdate + checkBudgetAfterEmit are now called immediately after s.trace.AddEntry, matching the pattern on the normal advance path. Retries can no longer silently overspend past MaxTotalTokens/MaxCostCents.
  • haltForBudget: saveCheckpoint is called first, so a budget-halted run can safely resume from its last completed node without re-executing already-finished work (relevant given the global restart budget described in the checkpoint docs).

🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Checkpoints store completed nodes and context snapshots with deterministic edge selection per node for replay. The restart counter is global across the entire run — a fix loop on milestone 1 consumes restart budget that milestone 10 needs.

Comment thread README.md Outdated
Comment thread tracker.go
Comment on lines +439 to +456
func (e *Engine) populateResultTokensAndCost(result *Result, engineResult *pipeline.EngineResult) {
if e.tokenTracker == nil {
return
}
result.TokensByProvider = e.tokenTracker.AllProviderUsage()
resolver := e.defaultModelResolver()
byProvider := e.tokenTracker.CostByProvider(resolver)
if len(byProvider) > 0 {
total := 0.0
for _, pc := range byProvider {
total += pc.USD
}
result.Cost = &CostReport{
TotalUSD: total,
ByProvider: byProvider,
}
}
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Result.Cost is computed from a model guess instead of the executed usage.

This re-estimates spend from TokenTracker using one graph-level llm_model for every provider. That makes Result.Cost wrong or empty whenever the run uses per-node model overrides, multiple models on the same provider, or a backend/path that records usage in EngineResult.Usage without going through the middleware tracker. The trace already has aggregate provider totals with cost; Result.Cost should prefer engineResult.Usage when it's available.

Based on learnings Token usage flows through llm.Usageagent.SessionResult.Usagepipeline.SessionStatsEngineResult.Usage, and CLI summary treats llm.TokenTracker and EngineResult.Usage as independent data sources.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker.go` around lines 439 - 456, populateResultTokensAndCost currently
re-computes cost via tokenTracker and defaultModelResolver which misestimates
when per-node/model usage is recorded in engineResult. Instead, change
populateResultTokensAndCost to prefer engineResult.Usage when present: if
engineResult != nil and engineResult.Usage has provider entries, build
Result.Cost from that usage (aggregate TotalUSD and set ByProvider from
engineResult.Usage) and skip the tokenTracker cost path and
defaultModelResolver; otherwise fall back to the existing tokenTracker logic
(keep result.TokensByProvider = e.tokenTracker.AllProviderUsage() behavior). Use
the existing CostReport struct shape and fields (TotalUSD, ByProvider) when
constructing the cost from pipeline.EngineResult.Usage.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known limitation, documented in code comments on defaultModelResolver. Per-node model tracking requires plumbing through the trace that doesn't exist yet. Follow-up — the existing design gives accurate numbers for single-model pipelines and zero (not wrong) for unknown models.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintecker, understood — thanks for the context. The "zero for unknown models rather than a wrong estimate" contract is a reasonable defensive choice, and the inline doc on defaultModelResolver already captures the per-node limitation. I'll note this for future reviews.


✏️ Learnings added
Learnt from: clintecker
Repo: 2389-research/tracker PR: 68
File: tracker.go:439-456
Timestamp: 2026-04-15T16:42:00.353Z
Learning: In `tracker.go`, `defaultModelResolver` intentionally uses a single graph-level `llm_model` attr for all providers. It returns zero USD (not an incorrect estimate) for unknown/multi-model scenarios. Per-node model tracking in `Result.Cost` is a known future work item requiring trace plumbing that does not yet exist. Do not flag this as a bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements token/cost governance across the tracker library + pipeline engine by surfacing per-provider cost in library results, streaming cost snapshots as events, and enforcing optional token/cost/wall-time ceilings that halt execution.

Changes:

  • Exposes Result.Cost (total USD + per-provider breakdown) and propagates budget halt reasons back to library callers.
  • Adds pipeline.BudgetGuard + OutcomeBudgetExceeded, plus cost_updated/budget_exceeded events carrying CostSnapshot and JSONL logging support.
  • Wires budget limits through CLI flags and surfaces budget halts in CLI summary and tracker diagnose.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tracker.go Adds Config.Budget, Result.Cost, and populates tokens/cost + halt reasons from engine result.
tracker_test.go Adds integration tests for Result.Cost population and config-driven budget halts.
pipeline/budget.go / pipeline/budget_test.go Introduces BudgetLimits/BudgetGuard with unit tests for ceilings and boundary rounding.
pipeline/engine.go Adds budget guard plumbing, OutcomeBudgetExceeded, BudgetLimitsHit, and Engine.Graph() accessor.
pipeline/engine_run.go Emits streaming cost snapshots and halts runs on budget breach (emits EventBudgetExceeded).
pipeline/engine_test.go Adds integration tests for streaming cost events and budget halting behavior.
pipeline/events.go Adds new event types and CostSnapshot payload on PipelineEvent.
pipeline/events_jsonl.go / pipeline/events_jsonl_test.go Logs CostSnapshot fields into activity.jsonl and tests the JSONL entry build.
pipeline/trace.go / pipeline/trace_test.go Adds SessionStats.Provider and aggregates per-provider totals into UsageSummary.ProviderTotals.
pipeline/handlers/transcript.go / pipeline/handlers/transcript_test.go Populates provider into session stats emitted into the trace.
llm/token_tracker_cost.go / llm/token_tracker_cost_test.go Adds per-provider cost rollup on TokenTracker via model resolver + EstimateCost.
cmd/tracker/flags.go / cmd/tracker/main.go Adds --max-tokens, --max-cost, --max-wall-time flags to run config.
cmd/tracker/commands.go Plumbs parsed CLI budget limits into the run execution path.
cmd/tracker/run.go Attaches BudgetGuard to both non-TUI and TUI engine construction via active budget limits.
cmd/tracker/summary.go Prints a budget-halt banner when OutcomeBudgetExceeded is returned.
cmd/tracker/diagnose.go Detects budget_exceeded in activity.jsonl and prints a budget-halt section.
agent/result.go / agent/session.go / agent/session_test.go Adds provider field to SessionResult and ensures it’s populated.
README.md Documents cost governance usage for library, CLI, and event streaming consumers.
CHANGELOG.md Adds unreleased entries for cost exposure, budgets, events, CLI flags, and diagnose behavior.
CLAUDE.md Documents the new cost governance flow and where budget enforcement happens.
docs/superpowers/plans/2026-04-14-token-cost-governance.md Adds an implementation plan document for the feature set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pipeline/engine_run.go
Comment on lines +16 to +31
func (e *Engine) emitCostUpdate(s *runState) {
summary := s.trace.AggregateUsage()
if summary == nil {
return
}
e.emit(PipelineEvent{
Type: EventCostUpdated,
Timestamp: time.Now(),
RunID: s.runID,
Cost: &CostSnapshot{
TotalTokens: summary.TotalTokens,
TotalCostUSD: summary.TotalCostUSD,
ProviderTotals: summary.ProviderTotals,
WallElapsed: time.Since(s.trace.StartTime),
},
})
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitCostUpdate() and checkBudgetAfterEmit() both call trace.AggregateUsage(), which iterates over all trace entries. Since these are invoked after each completed node, this makes run-time cost/budget checking O(n^2) in number of nodes, and also duplicates work (AggregateUsage is computed at least twice per node, plus again in haltForBudget). Consider computing the UsageSummary once after AddEntry and reusing it for both the emitted CostSnapshot and the BudgetGuard check, or maintaining an incremental aggregate on runState/Trace that updates as entries are appended.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid perf concern for very long pipelines. Maintaining a running UsageSummary on runState (updated incrementally per node) would eliminate the repeated scan. Not blocking for this PR — typical pipelines are 10-30 nodes. Happy to optimize in a follow-up if profiling shows it matters.

Comment thread pipeline/engine.go
Comment on lines +100 to +102
// Graph returns the graph this engine executes. Used by library callers
// that need to inspect graph attributes after construction.
func (e *Engine) Graph() *Graph { return e.graph }
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graph() exposes the engine's internal *Graph pointer directly. Since Graph (and its Attrs/Nodes/Edges maps) is mutable, external callers could mutate it during/after construction and potentially introduce data races or break engine invariants. Consider returning a read-only snapshot (e.g., a copy of Attrs) or clearly preventing mutation by returning a deep copy / wrapper type, especially since Engine.Run is documented as safe for concurrent use in other places.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The graph is immutable by convention after construction — the engine and all handlers treat it as read-only. Adding a copy or interface wrapper adds allocation overhead on every call for no practical safety gain. If we ever need to enforce immutability, a ReadOnlyGraph wrapper is the right approach, but it's not warranted today.

…agnose dimension

- BudgetGuard now fires after retry attempts, not just normal advance
- haltForBudget saves checkpoint before returning so restarts are safe
- diagnose extracts breach.Message from JSONL for specific guidance
- CLI rejects negative budget limits
- README uses correct CLI syntax (no 'run' subcommand)

Refs #17 #62

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
README.md (1)

397-404: ⚠️ Potential issue | 🟡 Minor

Fix CLI example ordering to match tracker [flags] <pipeline>.

On Line 400, the pipeline path is placed before flags, but the documented command shape (Line 421) is flags first. This is the same copy-paste trap previously raised.

Suggested doc fix
-tracker examples/ask_and_execute.dip \
-    --max-tokens 100000 \
-    --max-cost 500 \
-    --max-wall-time 30m
+tracker --max-tokens 100000 \
+    --max-cost 500 \
+    --max-wall-time 30m \
+    examples/ask_and_execute.dip
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 397 - 404, The CLI example places the pipeline path
before flags which contradicts the documented usage "tracker [flags]
<pipeline>"; update the example so flags come before the pipeline path (e.g.,
"tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m
examples/ask_and_execute.dip") so the command shape for the `tracker` CLI and
the example ordering are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 399-404: The fenced code block in README.md for the CLI snippet
(starting with "tracker examples/ask_and_execute.dip") lacks a language tag;
update the opening fence to include "bash" (i.e., ```bash) so markdownlint
recognizes it as a shell snippet and renders/validates correctly; optionally
also adopt the suggested formatting order in the comment but the key change is
adding the "bash" language identifier to the existing fenced block.

---

Duplicate comments:
In `@README.md`:
- Around line 397-404: The CLI example places the pipeline path before flags
which contradicts the documented usage "tracker [flags] <pipeline>"; update the
example so flags come before the pipeline path (e.g., "tracker --max-tokens
100000 --max-cost 500 --max-wall-time 30m examples/ask_and_execute.dip") so the
command shape for the `tracker` CLI and the example ordering are consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79f30480-06f9-4e10-b56d-1fad03edad7b

📥 Commits

Reviewing files that changed from the base of the PR and between 74334f7 and a1939ab.

📒 Files selected for processing (5)
  • README.md
  • cmd/tracker/diagnose.go
  • cmd/tracker/flags.go
  • pipeline/engine_run.go
  • pipeline/engine_test.go
✅ Files skipped from review due to trivial changes (1)
  • pipeline/engine_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pipeline/engine_run.go
  • cmd/tracker/diagnose.go

Comment thread README.md Outdated
Flags before pipeline path to match existing examples.

Refs #17
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
README.md (1)

399-402: ⚠️ Potential issue | 🟡 Minor

Add the bash language identifier to the fenced code block.

This was already flagged in a previous review but remains unfixed. The fenced code block needs a language specifier for proper syntax highlighting and markdownlint compliance.

📝 Suggested fix
-```
+```bash
 tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m \
     examples/ask_and_execute.dip
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @README.md around lines 399 - 402, The fenced code block showing the tracker
command lacks a language specifier; update that Markdown block (the
triple-backtick block containing "tracker --max-tokens 100000 --max-cost 500
--max-wall-time 30m \ examples/ask_and_execute.dip") to include the "bash"
language identifier so it reads ```bash at the start of the block for proper
syntax highlighting and markdownlint compliance.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @README.md:

  • Around line 399-402: The fenced code block showing the tracker command lacks a
    language specifier; update that Markdown block (the triple-backtick block
    containing "tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m
    examples/ask_and_execute.dip") to include the "bash" language identifier so it
    reads ```bash at the start of the block for proper syntax highlighting and
    markdownlint compliance.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `fc03ed77-660c-4601-8cea-c129e708fdd5`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between a1939ab3203ab02ac3161ab321c6b39e1fdc99d5 and 6345d72d8773c39e7b155aabd417a18c2e9ffa03.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `README.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
},
})
if result.Status == pipeline.OutcomeBudgetExceeded {
log.Printf("halt: %s, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README code sample uses %s to print result.Cost.LimitsHit, but LimitsHit is a []string in the API, so this snippet won't compile if copied verbatim. Consider using %v or strings.Join(result.Cost.LimitsHit, ", ") to match the actual type.

Suggested change
log.Printf("halt: %s, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD)
log.Printf("halt: %v, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD)

Copilot uses AI. Check for mistakes.
Comment thread pipeline/engine_run.go
Comment on lines 424 to +429
traceEntry.EdgeTo = target
s.trace.AddEntry(*traceEntry)
e.emitCostUpdate(s)
if lr := e.checkBudgetAfterEmit(s); lr != nil {
return "", false, lr.result, nil
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Budget enforcement is applied on the retry path in handleRetryWithinBudget (emitCostUpdate + checkBudgetAfterEmit), but the retry-exhausted path (handleRetryExhausted) still appends the retry TraceEntry and then continues to fallback/fail without running the BudgetGuard. This can let a run continue (via fallback_retry_target) even after the last retry attempt pushed totals over MaxTotalTokens/MaxCostCents/MaxWallTime, and it will also skip emitting a final cost_updated snapshot for that attempt. Consider invoking emitCostUpdate(s) and checkBudgetAfterEmit(s) immediately after s.trace.AddEntry(*traceEntry) in handleRetryExhausted (both fallback and no-fallback branches), returning OutcomeBudgetExceeded when tripped.

Copilot uses AI. Check for mistakes.
Comment thread cmd/tracker/summary.go
Comment on lines +375 to +391
// printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded.
func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) {
fmt.Println()
fmt.Println("─── HALTED: budget exceeded ───────────────────────────────")
if len(result.BudgetLimitsHit) > 0 {
fmt.Printf(" reason: %s\n", strings.Join(result.BudgetLimitsHit, ", "))
}
if tracker != nil {
total := tracker.TotalUsage()
if total.InputTokens > 0 || total.OutputTokens > 0 {
totalToks := total.InputTokens + total.OutputTokens
fmt.Printf(" spent: %s tokens", formatNumber(totalToks))
if total.EstimatedCost > 0 {
fmt.Printf(", $%.4f", total.EstimatedCost)
}
fmt.Println()
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printBudgetHaltBanner calculates "spent" from the passed TokenTracker only. This will under-report (or omit) spend when TokenTracker isn't populated (e.g., claude-code/acp backends or any case where usage is only available via trace stats), even though EngineResult.Usage/Trace.AggregateUsage contains the authoritative totals used by BudgetGuard. Consider rendering tokens/cost from result.Usage (or result.Trace.AggregateUsage()) as the primary source, and only using TokenTracker as a fallback.

Suggested change
// printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded.
func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) {
fmt.Println()
fmt.Println("─── HALTED: budget exceeded ───────────────────────────────")
if len(result.BudgetLimitsHit) > 0 {
fmt.Printf(" reason: %s\n", strings.Join(result.BudgetLimitsHit, ", "))
}
if tracker != nil {
total := tracker.TotalUsage()
if total.InputTokens > 0 || total.OutputTokens > 0 {
totalToks := total.InputTokens + total.OutputTokens
fmt.Printf(" spent: %s tokens", formatNumber(totalToks))
if total.EstimatedCost > 0 {
fmt.Printf(", $%.4f", total.EstimatedCost)
}
fmt.Println()
}
// budgetUsageForDisplay returns the best available aggregate usage for summary output.
// Prefer usage captured on the EngineResult/trace, and only fall back to TokenTracker.
func budgetUsageForDisplay(result *pipeline.EngineResult, tracker *llm.TokenTracker) (llm.TokenUsage, bool) {
if result != nil {
total := result.Usage
if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 {
return total, true
}
total = result.Trace.AggregateUsage()
if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 {
return total, true
}
}
if tracker != nil {
total := tracker.TotalUsage()
if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 {
return total, true
}
}
return llm.TokenUsage{}, false
}
// printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded.
func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) {
fmt.Println()
fmt.Println("─── HALTED: budget exceeded ───────────────────────────────")
if result != nil && len(result.BudgetLimitsHit) > 0 {
fmt.Printf(" reason: %s\n", strings.Join(result.BudgetLimitsHit, ", "))
}
if total, ok := budgetUsageForDisplay(result, tracker); ok {
totalToks := total.InputTokens + total.OutputTokens
fmt.Printf(" spent: %s tokens", formatNumber(totalToks))
if total.EstimatedCost > 0 {
fmt.Printf(", $%.4f", total.EstimatedCost)
}
fmt.Println()

Copilot uses AI. Check for mistakes.
Comment thread cmd/tracker/diagnose.go
Comment on lines +182 to +183
fmt.Fprintln(w, " suggestion: raise the relevant --max-tokens, --max-cost, or --max-wall-time flag,")
fmt.Fprintln(w, " or remove the Config.Budget value in your pipeline configuration")
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The budget-halt suggestion text is a bit misleading for CLI users and graph-attr based budgets: it says to "remove the Config.Budget value", but budget limits can also come from --max-* flags (and may later come from workflow attrs). Consider rephrasing to something like "remove or raise the configured budget limits" and mention the relevant source(s) (flags vs library Config).

Suggested change
fmt.Fprintln(w, " suggestion: raise the relevant --max-tokens, --max-cost, or --max-wall-time flag,")
fmt.Fprintln(w, " or remove the Config.Budget value in your pipeline configuration")
fmt.Fprintln(w, " suggestion: raise or remove the configured budget limits")
fmt.Fprintln(w, " (--max-tokens, --max-cost, --max-wall-time, or pipeline/library budget config)")

Copilot uses AI. Check for mistakes.
@clintecker clintecker merged commit ad9ce91 into main Apr 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: ensure full token/cost data exposed from library fix(pipeline): no pipeline-level token budget or cost ceiling

2 participants