feat: token & cost governance (library exposure + pipeline ceilings)#68
feat: token & cost governance (library exposure + pipeline ceilings)#68clintecker merged 18 commits intomainfrom
Conversation
Adds CostByProvider and TotalCostUSD methods that map accumulated per-provider Usage to dollar cost via a caller-supplied ModelResolver callback. Enables the library API to expose cost breakdowns without the llm package knowing how pipelines resolve models. Refs #62
Adds SessionStats.Provider and UsageSummary.ProviderTotals so consumers (library callers, TUI, CLI summary) can see per-provider token and cost breakdowns from the trace alone, without needing to cross-reference the llm.TokenTracker middleware. Plumbing: SessionResult.Provider populated from SessionConfig.Provider in session.Run(), copied to SessionStats by buildSessionStats(). Refs #62 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Result.Cost populated from the llm.TokenTracker middleware via a per-provider cost rollup resolved against the graph's default llm_model attribute. Library consumers can now read per-provider token and dollar cost directly off the Result without cross- referencing trace entries. Refs #62
Adds a new PipelineEvent type carrying a CostSnapshot payload (TotalTokens, TotalCostUSD, per-provider breakdown, wall elapsed). Emitted after every terminal node outcome so streaming consumers can live-update cost dashboards without polling Trace.AggregateUsage. Also declares EventBudgetExceeded for use by the BudgetGuard in the follow-up task. Refs #62 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds BudgetLimits/BudgetGuard/BudgetBreach in a new pipeline/budget.go file, wires OutcomeBudgetExceeded into the run loop immediately after every terminal-node emitCostUpdate, and surfaces EventBudgetExceeded with the current CostSnapshot payload. EngineResult gains BudgetLimitsHit (list of dimension names that tripped) populated on halt. Token and cost ceilings are inclusive — exact equality is not a breach. The exit-node path uses a thin checkBudgetHaltForExit wrapper that returns *EngineResult (matching handleExitNode's signature) rather than loopResult; the advanceToNextNode path uses checkBudgetAfterEmit directly and returns *loopResult inline — both bubble up through processExitNode/processActiveNode to the main Run loop. Refs #17 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regression test that workflow-level max_total_tokens, max_cost_cents, and max_wall_time land on graph.Attrs with their exact keys, so the library can read them without further mapping. The test documents the expected behavior for when dippin-lang v0.x+ adds MaxTotalTokens, MaxCostCents, and MaxWallTime fields to WorkflowDefaults. Currently these fields don't exist in dippin-lang v0.18.0, so the test uses a manual graph construction to verify the adapter would handle them correctly. Refs #17 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 11cbfaf.
Library consumers set Config.Budget = pipeline.BudgetLimits{...} to
enforce token/cost/wall-time ceilings on a run. When any limit is
tripped, Result.Status becomes OutcomeBudgetExceeded and Result.Cost.LimitsHit
names the dimension(s) that halted the run.
Graph-attr fallback (reading limits from .dip workflow attrs) is a
follow-up blocked on dippin-lang IR support — tracked in #67.
Refs #17
Three new flags on 'tracker run' populate Config.Budget, routing through pipeline.BudgetGuard. When a limit trips, the summary surfaces a 'HALTED: budget exceeded' section naming the dimension and showing the accumulated spend. Refs #17 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- events_jsonl.go now includes CostSnapshot fields (TotalTokens, TotalCostUSD, ProviderTotals, WallElapsedMs) so cost_updated and budget_exceeded events in activity.jsonl carry their payload. - buildLogEntry refactored from method to package-level function for direct testability; JSONLEventHandler.HandlePipelineEvent delegates to it. - tracker diagnose scans activity.jsonl for budget_exceeded events and prints a dedicated halt section with token/cost/wall-time and remediation guidance before the standard node-failure blocks. Refs #17 #62 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughAdds per-provider cost accounting and streaming cost events, exposes run-level CostReport, and implements a pipeline-level BudgetGuard enforcing token/cost/wall-time ceilings that can halt execution; wires CLI flags, diagnostics, and result/trace fields to surface budget behavior and cost snapshots. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/CLI
participant Engine as Pipeline Engine
participant Node as Node Handler
participant Budget as Budget Guard
participant Events as Event Handler
participant Token as TokenTracker
Client->>Engine: Run(graph, options)
loop For each node
Engine->>Node: Execute node
Node-->>Engine: SessionResult (provider, tokens)
Engine->>Token: AddUsage(provider, tokens)
Engine->>Engine: Add trace entry
Engine->>Engine: emitCostUpdate()
Engine->>Events: EventCostUpdated + CostSnapshot
Engine->>Budget: Check(UsageSummary, started)
alt Budget exceeded
Budget-->>Engine: BudgetBreach
Engine->>Engine: Set EndTime, Status=BudgetExceeded
Engine->>Events: EventBudgetExceeded + CostSnapshot
Engine-->>Client: Halt execution
else Within limits
Budget-->>Engine: BudgetOK
Engine->>Engine: Continue
end
end
Engine-->>Client: EngineResult (with BudgetLimitsHit)
Client->>Client: Tracker populates Result.Cost from TokenTracker.CostByProvider()
sequenceDiagram
participant Consumer as Library Consumer
participant Tracker as Tracker Library
participant Guard as Budget Guard
participant Engine as Pipeline Engine
participant Token as TokenTracker
Consumer->>Tracker: Run(config{Budget})
Tracker->>Guard: NewBudgetGuard(config.Budget)
Tracker->>Engine: WithBudgetGuard(guard) and Run(graph)
Engine->>Token: AddUsage(provider, usage)
Engine->>Guard: Check(aggregated_usage, time)
alt Exceeded
Guard-->>Engine: BudgetBreach -> EngineResult.Status=BudgetExceeded
else OK
Guard-->>Engine: BudgetOK -> continue
end
Engine-->>Tracker: EngineResult
Tracker->>Token: CostByProvider(resolver)
Token-->>Tracker: Per-provider costs
Tracker-->>Consumer: Result{Cost: {TotalUSD, ByProvider, LimitsHit}}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74334f7c23
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return loopResult{ | ||
| action: loopReturn, | ||
| result: &EngineResult{ | ||
| RunID: s.runID, | ||
| Status: OutcomeBudgetExceeded, |
There was a problem hiding this comment.
Persist checkpoint before returning budget halt
haltForBudget returns immediately without saving checkpoint state. Because the guard is evaluated before advanceToNextNode writes the updated checkpoint, a budget-triggered stop can leave disk state behind in-memory completion, so --resume may re-run the last completed node and repeat side effects/cost instead of continuing from the correct frontier.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.
| e.emitCostUpdate(s) | ||
| if lr := e.checkBudgetAfterEmit(s); lr != nil { | ||
| return *lr |
There was a problem hiding this comment.
Enforce budget checks on retry paths
Budget enforcement is only invoked in the normal advance/exit flow, not in retry handling. When a node returns OutcomeRetry, execution goes through handleRetryWithinBudget/handleRetryExhausted without calling checkBudgetAfterEmit, so repeated retries can exceed token/cost limits and still continue (or end as OutcomeFail) instead of halting with OutcomeBudgetExceeded.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.
| if g.limits.MaxTotalTokens > 0 && usage.TotalTokens > g.limits.MaxTotalTokens { | ||
| return BudgetBreach{Kind: BudgetTokens, Message: "max_total_tokens exceeded"} | ||
| } | ||
| if g.limits.MaxCostCents > 0 && int(math.Round(usage.TotalCostUSD*100)) > g.limits.MaxCostCents { |
There was a problem hiding this comment.
Compare cost ceilings without rounding down overages
The cost breach check uses math.Round(usage.TotalCostUSD*100), which can hide real overages below 0.5¢ (for example, 1.4¢ rounds to 1¢ and passes a 1-cent ceiling). That violates the stated "exceeds this value" behavior for MaxCostCents and can let runs overspend without tripping the guard.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
By design. math.Round gives us cent-level granularity, which matches the MaxCostCents unit. Sub-0.5¢ values round down, but the alternative (Ceil) would false-trip on any fractional cent above zero. The original bug was int() truncation dropping whole cents — Round fixes that without over-triggering.
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (1)
cmd/tracker/run.go (1)
129-131: Consider deduplicating budget-guard option wiring.Both engine builders now repeat the same guard-append block. A tiny helper would reduce drift risk between console and TUI paths.
♻️ Possible refactor
func buildEngineOptions(artifactDir, checkpoint string, evtHandler pipeline.PipelineEventHandler) []pipeline.EngineOption { @@ - if guard := pipeline.NewBudgetGuard(activeBudgetLimits); guard != nil { - opts = append(opts, pipeline.WithBudgetGuard(guard)) - } + opts = appendBudgetGuardOption(opts, activeBudgetLimits) return opts } @@ func buildTUIEngine(graph *pipeline.Graph, registry *pipeline.HandlerRegistry, artifactDir, checkpoint string, pipelineCombo pipeline.PipelineEventHandler) *pipeline.Engine { @@ - if guard := pipeline.NewBudgetGuard(activeBudgetLimits); guard != nil { - engineOpts = append(engineOpts, pipeline.WithBudgetGuard(guard)) - } + engineOpts = appendBudgetGuardOption(engineOpts, activeBudgetLimits) return pipeline.NewEngine(graph, registry, engineOpts...) } + +func appendBudgetGuardOption(opts []pipeline.EngineOption, limits pipeline.BudgetLimits) []pipeline.EngineOption { + if guard := pipeline.NewBudgetGuard(limits); guard != nil { + return append(opts, pipeline.WithBudgetGuard(guard)) + } + return opts +}Also applies to: 415-417
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/run.go` around lines 129 - 131, Both engine-builder code paths duplicate the same budget-guard wiring (calling pipeline.NewBudgetGuard and conditionally appending pipeline.WithBudgetGuard); extract that logic into a small helper (e.g., buildBudgetGuardOpts or appendBudgetGuardIfAny) and call it from both engine builders so the guard creation and conditional append happen in one place, referencing pipeline.NewBudgetGuard and pipeline.WithBudgetGuard to locate the duplicated block and update both sites to use the helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 8-22: Update the CHANGELOG.md Unreleased section to include a "###
Fixed" subsection and move the existing non-feature caveat into a standard
heading (e.g., under "### Notes" should instead be folded into "### Changed" or
"### Fixed" as appropriate); specifically add one or more brief bullet(s) under
"### Fixed" describing the behavioral fixes shipped in this PR, ensure the file
follows the "Keep a Changelog" headings order (Added, Changed, Fixed, Removed)
and remove the custom "Notes" header or convert its content into the appropriate
standard heading; reference the existing Unreleased entries such as
`Result.Cost`, `pipeline.BudgetGuard`, `tracker.Config.Budget`, CLI flags on
`tracker run`, and pipeline events when deciding whether each item belongs under
Added vs Changed/Fixed.
In `@cmd/tracker/diagnose.go`:
- Around line 87-92: The budgetHalt struct currently lacks the
breached-dimension message so tracker diagnose drops the
EventBudgetExceeded.Message; add a Reason string field to budgetHalt, populate
it from the EventBudgetExceeded.Message wherever budgetHalt instances are
created (the branches around the existing EventBudgetExceeded handling noted in
the diff), and update printBudgetHalt to print "reason: <Reason>" only when
Reason is non-empty so the diagnose output shows whether tokens, cost, or wall
time triggered the halt.
In `@cmd/tracker/flags.go`:
- Around line 135-137: parseRunFlags must reject negative values for the three
budget flags: cfg.maxTokens, cfg.maxCostCents, and cfg.maxWallTime; add
validation in the parseRunFlags function after flag parsing that checks
cfg.maxTokens >= 0, cfg.maxCostCents >= 0 and cfg.maxWallTime >= 0 (or >0? keep
as >= 0 per spec) and return a descriptive error if any are negative so the CLI
fails fast with a clear message. Ensure the error mentions which flag is invalid
(e.g., "max-tokens must be >= 0") and hook this validation into the existing
error path of parseRunFlags.
In `@docs/superpowers/plans/2026-04-14-token-cost-governance.md`:
- Around line 7-8: The plan incorrectly schedules reading budget attributes from
`.dip` workflow attrs and assigns adapter pass-through / graph-attr fallback
work even though that capability is blocked by issue `#67`; update the doc (Part 3
and any sections referencing max_total_tokens, max_cost_cents, max_wall_time,
BudgetGuard, and the adapter pass-through/fallback work) to remove or mark as
deferred the `.dip` budget-attrs scheduling, stop assigning follow-up work that
depends on `.dip` IR support, and explicitly state that graph-level budget
loading is blocked by issue `#67` so the next worker is not sent prematurely.
In `@llm/token_tracker_cost.go`:
- Around line 21-33: The read lock is held while calling external callbacks
(resolve and EstimateCost); instead snapshot t.usage into a local map under
t.mu.RLock()/RUnlock(), then release the lock and iterate the snapshot to build
the out map, calling resolve(provider) and EstimateCost(model, usage) outside
the lock; update references to t.mu, t.usage, resolve, ProviderCost and
EstimateCost accordingly so only the minimal copy is done while locked and all
resolver/cost work happens after releasing the lock.
In `@pipeline/budget.go`:
- Around line 73-99: The current logic in BudgetGuard.Check and checkUsage only
returns the first BudgetBreach it encounters, so multiple simultaneous limits
(tokens, cost, wall time) are never reported; update these functions to evaluate
all limit checks and accumulate every violated dimension into the returned
breach (e.g., populate BudgetBreach.LimitsHit or create a slice of breaches)
instead of early-returning on the first match—specifically, change checkUsage
(referencing g.limits.MaxTotalTokens, g.limits.MaxCostCents and
usage.TotalTokens/usage.TotalCostUSD) to append both token and cost violations,
and change Check (referencing g.limits.MaxWallTime and time.Since(started)) to
also include wall-time violations into the same returned BudgetBreach so the
caller receives all exceeded limits at once.
In `@pipeline/engine.go`:
- Around line 310-313: The retry and fallback/redirect paths (specifically in
handleRetryWithinBudget, handleRetryExhausted, and the retry/fallback branches
at the top of handleExitNode) append trace usage and continue without calling
emitCostUpdate and checkBudgetAfterEmit, letting retries bypass BudgetGuard
ceilings (MaxTotalTokens/MaxCostCents); fix by applying the same pattern used on
the normal edge-advance path: after appending trace usage in those branches call
e.emitCostUpdate(s) and then if lr := e.checkBudgetAfterEmit(s); lr != nil {
return *lr } so the BudgetGuard enforcement runs consistently across
retry/fallback/redirect flows.
In `@README.md`:
- Around line 397-404: Replace the incorrect "tracker run ..." example with the
documented root-command syntax so copy-paste works; update the snippet to use
"tracker [flags] <pipeline>" by moving flags before the pipeline path (e.g.,
change "tracker run examples/ask_and_execute.dip --max-tokens 100000 --max-cost
500 --max-wall-time 30m" to "tracker --max-tokens 100000 --max-cost 500
--max-wall-time 30m examples/ask_and_execute.dip"), ensuring references to the
flags (--max-tokens, --max-cost, --max-wall-time) and the pipeline file
(examples/ask_and_execute.dip) remain.
In `@tracker.go`:
- Around line 439-456: populateResultTokensAndCost currently re-computes cost
via tokenTracker and defaultModelResolver which misestimates when per-node/model
usage is recorded in engineResult. Instead, change populateResultTokensAndCost
to prefer engineResult.Usage when present: if engineResult != nil and
engineResult.Usage has provider entries, build Result.Cost from that usage
(aggregate TotalUSD and set ByProvider from engineResult.Usage) and skip the
tokenTracker cost path and defaultModelResolver; otherwise fall back to the
existing tokenTracker logic (keep result.TokensByProvider =
e.tokenTracker.AllProviderUsage() behavior). Use the existing CostReport struct
shape and fields (TotalUSD, ByProvider) when constructing the cost from
pipeline.EngineResult.Usage.
---
Nitpick comments:
In `@cmd/tracker/run.go`:
- Around line 129-131: Both engine-builder code paths duplicate the same
budget-guard wiring (calling pipeline.NewBudgetGuard and conditionally appending
pipeline.WithBudgetGuard); extract that logic into a small helper (e.g.,
buildBudgetGuardOpts or appendBudgetGuardIfAny) and call it from both engine
builders so the guard creation and conditional append happen in one place,
referencing pipeline.NewBudgetGuard and pipeline.WithBudgetGuard to locate the
duplicated block and update both sites to use the helper.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fe5ee644-f62a-471f-b96f-6a2fe6b3f4d9
📒 Files selected for processing (29)
CHANGELOG.mdCLAUDE.mdREADME.mdagent/result.goagent/session.goagent/session_test.gocmd/tracker/commands.gocmd/tracker/diagnose.gocmd/tracker/flags.gocmd/tracker/main.gocmd/tracker/run.gocmd/tracker/summary.godocs/superpowers/plans/2026-04-14-token-cost-governance.mdllm/token_tracker_cost.gollm/token_tracker_cost_test.gopipeline/budget.gopipeline/budget_test.gopipeline/engine.gopipeline/engine_run.gopipeline/engine_test.gopipeline/events.gopipeline/events_jsonl.gopipeline/events_jsonl_test.gopipeline/handlers/transcript.gopipeline/handlers/transcript_test.gopipeline/trace.gopipeline/trace_test.gotracker.gotracker_test.go
| ## [Unreleased] | ||
|
|
||
| ### Added | ||
|
|
||
| - `Result.Cost` on the library API with per-provider rollup (`map[string]llm.ProviderCost`) and `TotalUSD`. Populated from the `llm.TokenTracker` middleware and priced via `llm.EstimateCost`. Closes #62. | ||
| - `pipeline.BudgetGuard` enforcing `MaxTotalTokens`, `MaxCostCents`, and `MaxWallTime` limits. Halts the run with `pipeline.OutcomeBudgetExceeded` when any dimension trips. Closes #17. | ||
| - New `tracker.Config.Budget` field (type `pipeline.BudgetLimits`) for library consumers. | ||
| - New CLI flags on `tracker run`: `--max-tokens`, `--max-cost` (cents), `--max-wall-time`. | ||
| - New pipeline events `cost_updated` (streaming per-node cost snapshots) and `budget_exceeded` (fired on halt). Both carry a `CostSnapshot` payload with `TotalTokens`, `TotalCostUSD`, `ProviderTotals`, and `WallElapsed`. | ||
| - `tracker diagnose` surfaces a "Budget halt detected" section when a run halts on budget. | ||
| - `UsageSummary.ProviderTotals` (per-provider token and cost rollup) on `pipeline.Trace.AggregateUsage()` output. | ||
|
|
||
| ### Notes | ||
|
|
||
| - Reading budget limits from `.dip` workflow attrs is blocked on dippin-lang IR support; tracked in #67. |
There was a problem hiding this comment.
The Unreleased entry is still missing the Fixed bucket.
This PR includes behavior fixes as well as features, but the changelog only records Added items and introduces a custom Notes section. Please fold the #67 caveat into one of the standard headings and add the shipped fixes here. As per coding guidelines Keep CHANGELOG.md updated with every feature, fix, and breaking change in the same PR as the code change and Use Keep a Changelog format in CHANGELOG.md — group entries under Added, Changed, Fixed, Removed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CHANGELOG.md` around lines 8 - 22, Update the CHANGELOG.md Unreleased section
to include a "### Fixed" subsection and move the existing non-feature caveat
into a standard heading (e.g., under "### Notes" should instead be folded into
"### Changed" or "### Fixed" as appropriate); specifically add one or more brief
bullet(s) under "### Fixed" describing the behavioral fixes shipped in this PR,
ensure the file follows the "Keep a Changelog" headings order (Added, Changed,
Fixed, Removed) and remove the custom "Notes" header or convert its content into
the appropriate standard heading; reference the existing Unreleased entries such
as `Result.Cost`, `pipeline.BudgetGuard`, `tracker.Config.Budget`, CLI flags on
`tracker run`, and pipeline events when deciding whether each item belongs under
Added vs Changed/Fixed.
There was a problem hiding this comment.
@clintecker, that's a reasonable distinction — intra-PR corrections to newly introduced code are not the same as regression fixes against previously-released behavior, so omitting a Fixed bucket makes sense here. I'll drop that concern.
🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: `tracker version` shows commit hash, build time, and which providers are configured. Uses Go VCS metadata for `go install` builds, GoReleaser ldflags for releases.
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Applies to CHANGELOG.md : Keep CHANGELOG.md updated with every feature, fix, and breaking change using Keep a Changelog format. Group entries under Added, Changed, Fixed, Removed. Update the changelog in the same PR as the code change.
Learnt from: clintecker
Repo: 2389-research/tracker PR: 68
File: pipeline/budget.go:73-99
Timestamp: 2026-04-15T16:41:59.198Z
Learning: In `pipeline/budget.go`, `BudgetGuard.Check` and `checkUsage` intentionally short-circuit on the first breached budget dimension (tokens → cost → wall_time). This is by design: the pipeline halts immediately on the first breach, so collecting all simultaneously-violated dimensions would add complexity for a scenario that rarely matters in practice. The maintainer is open to revisiting if users report confusion.
| fs.IntVar(&cfg.maxTokens, "max-tokens", 0, "Halt if total tokens across the run exceed this value (0 = no limit)") | ||
| fs.IntVar(&cfg.maxCostCents, "max-cost", 0, "Halt if total cost in cents exceeds this value (0 = no limit)") | ||
| fs.DurationVar(&cfg.maxWallTime, "max-wall-time", 0, "Halt if pipeline wall time exceeds this duration (0 = no limit)") |
There was a problem hiding this comment.
Reject negative budget limits during CLI parsing.
Line 135–137 introduces limits but allows negative values, which can lead to confusing behavior (e.g., immediate budget halt). Please validate all three flags as >= 0 in parseRunFlags.
🔧 Proposed fix
func parseRunFlags(args []string, cfg runConfig) (runConfig, error) {
@@
if err := validateBackend(cfg.backend); err != nil {
return cfg, err
}
+ if cfg.maxTokens < 0 {
+ return cfg, fmt.Errorf("invalid --max-tokens: must be >= 0")
+ }
+ if cfg.maxCostCents < 0 {
+ return cfg, fmt.Errorf("invalid --max-cost: must be >= 0")
+ }
+ if cfg.maxWallTime < 0 {
+ return cfg, fmt.Errorf("invalid --max-wall-time: must be >= 0")
+ }
return cfg, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fs.IntVar(&cfg.maxTokens, "max-tokens", 0, "Halt if total tokens across the run exceed this value (0 = no limit)") | |
| fs.IntVar(&cfg.maxCostCents, "max-cost", 0, "Halt if total cost in cents exceeds this value (0 = no limit)") | |
| fs.DurationVar(&cfg.maxWallTime, "max-wall-time", 0, "Halt if pipeline wall time exceeds this duration (0 = no limit)") | |
| func parseRunFlags(args []string, cfg runConfig) (runConfig, error) { | |
| if err := validateBackend(cfg.backend); err != nil { | |
| return cfg, err | |
| } | |
| if cfg.maxTokens < 0 { | |
| return cfg, fmt.Errorf("invalid --max-tokens: must be >= 0") | |
| } | |
| if cfg.maxCostCents < 0 { | |
| return cfg, fmt.Errorf("invalid --max-cost: must be >= 0") | |
| } | |
| if cfg.maxWallTime < 0 { | |
| return cfg, fmt.Errorf("invalid --max-wall-time: must be >= 0") | |
| } | |
| return cfg, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/tracker/flags.go` around lines 135 - 137, parseRunFlags must reject
negative values for the three budget flags: cfg.maxTokens, cfg.maxCostCents, and
cfg.maxWallTime; add validation in the parseRunFlags function after flag parsing
that checks cfg.maxTokens >= 0, cfg.maxCostCents >= 0 and cfg.maxWallTime >= 0
(or >0? keep as >= 0 per spec) and return a descriptive error if any are
negative so the CLI fails fast with a clear message. Ensure the error mentions
which flag is invalid (e.g., "max-tokens must be >= 0") and hook this validation
into the existing error path of parseRunFlags.
There was a problem hiding this comment.
Fixed in a1939ab. parseRunFlags now validates non-negative values.
There was a problem hiding this comment.
@clintecker, thanks for the fix! The validateBudgetLimits check in parseRunFlags correctly rejects negative values for all three budget flags with a clear error message.
| **Architecture:** Part 1 promotes `llm.TokenTracker` from a CLI-summary helper to a first-class library output by surfacing its per-provider totals on `Result` / `EngineResult`, pricing each provider via `llm.EstimateCost`, and emitting a new `EventCostUpdated` event after every trace entry so consumers see streaming updates. Part 2 adds a `BudgetGuard` evaluated inside the engine loop after each node's outcome is applied; on breach it writes a terminal trace entry, emits `EventBudgetExceeded`, and returns an `EngineResult` with status `OutcomeBudgetExceeded`. Part 3 wires graph attrs (`max_total_tokens`, `max_cost_cents`, `max_wall_time`) and CLI flags that override them. | ||
|
|
There was a problem hiding this comment.
Scope drift: this plan still schedules .dip budget attrs as if they shipped here.
The rest of this PR explicitly treats graph-level budget loading as blocked by #67, but this plan still assigns adapter pass-through and graph-attr fallback work. That will send the next worker after unsupported behavior instead of the intended follow-up. Based on learnings Reading budgets from .dipworkflow attrs is blocked on dippin-lang IR support (issue#67).
Also applies to: 30-31, 773-927
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/superpowers/plans/2026-04-14-token-cost-governance.md` around lines 7 -
8, The plan incorrectly schedules reading budget attributes from `.dip` workflow
attrs and assigns adapter pass-through / graph-attr fallback work even though
that capability is blocked by issue `#67`; update the doc (Part 3 and any sections
referencing max_total_tokens, max_cost_cents, max_wall_time, BudgetGuard, and
the adapter pass-through/fallback work) to remove or mark as deferred the `.dip`
budget-attrs scheduling, stop assigning follow-up work that depends on `.dip` IR
support, and explicitly state that graph-level budget loading is blocked by
issue `#67` so the next worker is not sent prematurely.
There was a problem hiding this comment.
Fair point — the plan is a historical artifact that predates the scope reduction. Tasks 6/7 were adjusted during execution. Adding a note to the plan doc would be revisionism; it serves as a record of the original intent. #67 tracks the follow-up.
There was a problem hiding this comment.
@clintecker, that's a fair framing — keeping the plan as an unmodified record of original intent makes sense, and #67 is the right place to track the deferred .dip attrs work.
(∩`-´)⊃━☆゚.*・。゚
🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.
| t.mu.RLock() | ||
| defer t.mu.RUnlock() | ||
| out := make(map[string]ProviderCost, len(t.usage)) | ||
| for provider, usage := range t.usage { | ||
| var model string | ||
| if resolve != nil { | ||
| model = resolve(provider) | ||
| } | ||
| out[provider] = ProviderCost{ | ||
| Usage: usage, | ||
| Model: model, | ||
| USD: EstimateCost(model, usage), | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and read the actual file to confirm the code snippet
find . -name "token_tracker_cost.go" -type fRepository: 2389-research/tracker
Length of output: 92
🏁 Script executed:
# Read the relevant file section
if [ -f "./llm/token_tracker_cost.go" ]; then
cat -n ./llm/token_tracker_cost.go | head -60
fiRepository: 2389-research/tracker
Length of output: 1837
🏁 Script executed:
# Search for ModelResolver type definition
rg "type ModelResolver" --type goRepository: 2389-research/tracker
Length of output: 138
🏁 Script executed:
# Search for CostByProvider calls to understand usage patterns
rg "CostByProvider" --type go -B 2 -A 2Repository: 2389-research/tracker
Length of output: 3777
🏁 Script executed:
# Check TokenTracker type and its methods
rg "type TokenTracker" --type go -A 20Repository: 2389-research/tracker
Length of output: 1170
🏁 Script executed:
# Search for potential re-entrance paths - methods called on TokenTracker during cost operations
rg "func \(t \*TokenTracker\)" --type goRepository: 2389-research/tracker
Length of output: 751
🏁 Script executed:
# Check what defaultModelResolver does
rg "defaultModelResolver" --type go -A 10Repository: 2389-research/tracker
Length of output: 1286
🏁 Script executed:
# Look for any ModelResolver implementations that might call back into TokenTracker
rg "ModelResolver" --type go -B 3 -A 5Repository: 2389-research/tracker
Length of output: 2802
🏁 Script executed:
# Check if there are any concurrent writes to TokenTracker during CostByProvider calls
rg "AddUsage\|ProviderUsage\|TotalUsage" --type go -B 2 -A 2 | head -100Repository: 2389-research/tracker
Length of output: 47
🏁 Script executed:
# Get full context of tracker.go usage around CostByProvider
rg -B 5 -A 5 "CostByProvider\(resolver\)" tracker.goRepository: 2389-research/tracker
Length of output: 356
🏁 Script executed:
# Check EstimateCost implementation to see if it could be slow or blocking
rg "func EstimateCost" --type go -A 15Repository: 2389-research/tracker
Length of output: 463
🏁 Script executed:
# Check if there are any tests that show CostByProvider being called concurrently with AddUsage
rg "TestTokenTracker" --type go | head -20Repository: 2389-research/tracker
Length of output: 1077
🏁 Script executed:
# Read the full token_tracker.go to understand AddUsage
cat -n ./llm/token_tracker.go | head -80Repository: 2389-research/tracker
Length of output: 2747
🏁 Script executed:
# Check the concurrent test to understand typical load patterns
cat -n ./llm/token_tracker_test.go | grep -A 30 "TestTokenTrackerConcurrentSafe"Repository: 2389-research/tracker
Length of output: 1148
🏁 Script executed:
# Verify that defaultModelResolver doesn't call back into TokenTracker or do blocking I/O
rg "Graph\|Attrs" tracker.go | head -20Repository: 2389-research/tracker
Length of output: 47
Hold the read lock only while copying t.usage; release before calling resolve and EstimateCost.
While the current ModelResolver implementation (defaultModelResolver) is trivial and safe, holding a lock during external callbacks is not a defensive pattern. If the resolver logic ever becomes complex or calls back into the tracker, this could cause contention or re-entrancy issues. Snapshot the usage map under lock, then release before computing costs.
Suggested fix
func (t *TokenTracker) CostByProvider(resolve ModelResolver) map[string]ProviderCost {
- t.mu.RLock()
- defer t.mu.RUnlock()
- out := make(map[string]ProviderCost, len(t.usage))
- for provider, usage := range t.usage {
+ t.mu.RLock()
+ usageByProvider := make(map[string]Usage, len(t.usage))
+ for provider, usage := range t.usage {
+ usageByProvider[provider] = usage
+ }
+ t.mu.RUnlock()
+
+ out := make(map[string]ProviderCost, len(usageByProvider))
+ for provider, usage := range usageByProvider {
var model string
if resolve != nil {
model = resolve(provider)
}
out[provider] = ProviderCost{
Usage: usage,
Model: model,
USD: EstimateCost(model, usage),
}
}
return out
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| t.mu.RLock() | |
| defer t.mu.RUnlock() | |
| out := make(map[string]ProviderCost, len(t.usage)) | |
| for provider, usage := range t.usage { | |
| var model string | |
| if resolve != nil { | |
| model = resolve(provider) | |
| } | |
| out[provider] = ProviderCost{ | |
| Usage: usage, | |
| Model: model, | |
| USD: EstimateCost(model, usage), | |
| } | |
| func (t *TokenTracker) CostByProvider(resolve ModelResolver) map[string]ProviderCost { | |
| t.mu.RLock() | |
| usageByProvider := make(map[string]Usage, len(t.usage)) | |
| for provider, usage := range t.usage { | |
| usageByProvider[provider] = usage | |
| } | |
| t.mu.RUnlock() | |
| out := make(map[string]ProviderCost, len(usageByProvider)) | |
| for provider, usage := range usageByProvider { | |
| var model string | |
| if resolve != nil { | |
| model = resolve(provider) | |
| } | |
| out[provider] = ProviderCost{ | |
| Usage: usage, | |
| Model: model, | |
| USD: EstimateCost(model, usage), | |
| } | |
| } | |
| return out | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@llm/token_tracker_cost.go` around lines 21 - 33, The read lock is held while
calling external callbacks (resolve and EstimateCost); instead snapshot t.usage
into a local map under t.mu.RLock()/RUnlock(), then release the lock and iterate
the snapshot to build the out map, calling resolve(provider) and
EstimateCost(model, usage) outside the lock; update references to t.mu, t.usage,
resolve, ProviderCost and EstimateCost accordingly so only the minimal copy is
done while locked and all resolver/cost work happens after releasing the lock.
There was a problem hiding this comment.
Acknowledged — the implementation matches the spec and all tests pass.
There was a problem hiding this comment.
@clintecker, understood! The current implementation is straightforward and the lock-scope concern is preventive rather than an immediate bug. I'll defer to your judgment here.
(¬、¬)
| func (g *BudgetGuard) Check(usage *UsageSummary, started time.Time) BudgetBreach { | ||
| if g == nil { | ||
| return BudgetBreach{Kind: BudgetOK} | ||
| } | ||
| if breach := g.checkUsage(usage); breach.Kind != BudgetOK { | ||
| return breach | ||
| } | ||
| if g.limits.MaxWallTime > 0 && time.Since(started) > g.limits.MaxWallTime { | ||
| return BudgetBreach{Kind: BudgetWallTime, Message: "max_wall_time exceeded"} | ||
| } | ||
| return BudgetBreach{Kind: BudgetOK} | ||
| } | ||
|
|
||
| // checkUsage evaluates token and cost limits against a usage snapshot. | ||
| // Returns BudgetOK when usage is nil or no limit is breached. | ||
| func (g *BudgetGuard) checkUsage(usage *UsageSummary) BudgetBreach { | ||
| if usage == nil { | ||
| return BudgetBreach{Kind: BudgetOK} | ||
| } | ||
| if g.limits.MaxTotalTokens > 0 && usage.TotalTokens > g.limits.MaxTotalTokens { | ||
| return BudgetBreach{Kind: BudgetTokens, Message: "max_total_tokens exceeded"} | ||
| } | ||
| if g.limits.MaxCostCents > 0 && int(math.Round(usage.TotalCostUSD*100)) > g.limits.MaxCostCents { | ||
| return BudgetBreach{Kind: BudgetCost, Message: "max_cost_cents exceeded"} | ||
| } | ||
| return BudgetBreach{Kind: BudgetOK} | ||
| } |
There was a problem hiding this comment.
Only the first exceeded budget dimension is ever surfaced.
Check returns a single BudgetBreach, and checkUsage stops at the first match. In practice BudgetLimitsHit can never contain more than one entry, even when a run exceeds tokens and cost (or cost and wall time) at the same halt point. That loses part of the halt reason the new API is meant to expose.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pipeline/budget.go` around lines 73 - 99, The current logic in
BudgetGuard.Check and checkUsage only returns the first BudgetBreach it
encounters, so multiple simultaneous limits (tokens, cost, wall time) are never
reported; update these functions to evaluate all limit checks and accumulate
every violated dimension into the returned breach (e.g., populate
BudgetBreach.LimitsHit or create a slice of breaches) instead of early-returning
on the first match—specifically, change checkUsage (referencing
g.limits.MaxTotalTokens, g.limits.MaxCostCents and
usage.TotalTokens/usage.TotalCostUSD) to append both token and cost violations,
and change Check (referencing g.limits.MaxWallTime and time.Since(started)) to
also include wall-time violations into the same returned BudgetBreach so the
caller receives all exceeded limits at once.
There was a problem hiding this comment.
By design — Check short-circuits on the first breached dimension because the run halts immediately. Surfacing all breached dimensions would require a collect-all-then-halt pattern, which adds complexity for a scenario that rarely matters in practice (the user raises the limit that fired). Happy to revisit if users report confusion.
There was a problem hiding this comment.
@clintecker, understood — that's a reasonable trade-off. Since the pipeline halts at the first breach, collecting additional violated dimensions at that same instant would require a fundamentally different control flow with no practical benefit until a user actually hits it. I'll note this for future reviews.
✏️ Learnings added
Learnt from: clintecker
Repo: 2389-research/tracker PR: 68
File: pipeline/budget.go:73-99
Timestamp: 2026-04-15T16:41:59.198Z
Learning: In `pipeline/budget.go`, `BudgetGuard.Check` and `checkUsage` intentionally short-circuit on the first breached budget dimension (tokens → cost → wall_time). This is by design: the pipeline halts immediately on the first breach, so collecting all simultaneously-violated dimensions would add complexity for a scenario that rarely matters in practice. The maintainer is open to revisiting if users report confusion.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.
| e.emitCostUpdate(s) | ||
| if lr := e.checkBudgetAfterEmit(s); lr != nil { | ||
| return *lr | ||
| } |
There was a problem hiding this comment.
Budget ceilings are bypassed on retry and goal-gate redirect paths.
This check only runs on the normal edge-advance path. The retry flows in handleRetryWithinBudget / handleRetryExhausted and the retry/fallback branches at the top of handleExitNode also append trace usage and continue, but they never call emitCostUpdate or checkBudgetAfterEmit. A node that keeps returning OutcomeRetry can therefore spend past MaxTotalTokens / MaxCostCents before the breaker ever trips.
💸 Suggested fix
func (e *Engine) handleRetryWithinBudget(ctx context.Context, s *runState, currentNodeID string, execNode *Node, traceEntry *TraceEntry) (string, bool, *EngineResult, error) {
...
traceEntry.EdgeTo = target
s.trace.AddEntry(*traceEntry)
+ e.emitCostUpdate(s)
+ if lr := e.checkBudgetAfterEmit(s); lr != nil {
+ return "", false, lr.result, lr.err
+ }
e.clearDownstream(target, s.cp)
...
}Apply the same pattern to the fallback / redirect branches that add a trace entry and continue execution.
As per coding guidelines "Pipeline engine enforces optional cost ceilings via pipeline.BudgetGuard, evaluated between nodes after emitCostUpdate (v0.17.0+)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pipeline/engine.go` around lines 310 - 313, The retry and fallback/redirect
paths (specifically in handleRetryWithinBudget, handleRetryExhausted, and the
retry/fallback branches at the top of handleExitNode) append trace usage and
continue without calling emitCostUpdate and checkBudgetAfterEmit, letting
retries bypass BudgetGuard ceilings (MaxTotalTokens/MaxCostCents); fix by
applying the same pattern used on the normal edge-advance path: after appending
trace usage in those branches call e.emitCostUpdate(s) and then if lr :=
e.checkBudgetAfterEmit(s); lr != nil { return *lr } so the BudgetGuard
enforcement runs consistently across retry/fallback/redirect flows.
There was a problem hiding this comment.
Fixed in a1939ab. Budget check now fires on retry paths (handleRetryWithinBudget), and haltForBudget saves the checkpoint before returning.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
git show a1939ab --stat
git show a1939ab -- pipeline/engine_run.go pipeline/engine.go
Length of output: 2767
@clintecker, confirmed in a1939ab — the fix looks correct:
handleRetryWithinBudget:emitCostUpdate+checkBudgetAfterEmitare now called immediately afters.trace.AddEntry, matching the pattern on the normal advance path. Retries can no longer silently overspend pastMaxTotalTokens/MaxCostCents.haltForBudget:saveCheckpointis called first, so a budget-halted run can safely resume from its last completed node without re-executing already-finished work (relevant given the global restart budget described in the checkpoint docs).
🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Checkpoints store completed nodes and context snapshots with deterministic edge selection per node for replay. The restart counter is global across the entire run — a fix loop on milestone 1 consumes restart budget that milestone 10 needs.
| func (e *Engine) populateResultTokensAndCost(result *Result, engineResult *pipeline.EngineResult) { | ||
| if e.tokenTracker == nil { | ||
| return | ||
| } | ||
| result.TokensByProvider = e.tokenTracker.AllProviderUsage() | ||
| resolver := e.defaultModelResolver() | ||
| byProvider := e.tokenTracker.CostByProvider(resolver) | ||
| if len(byProvider) > 0 { | ||
| total := 0.0 | ||
| for _, pc := range byProvider { | ||
| total += pc.USD | ||
| } | ||
| result.Cost = &CostReport{ | ||
| TotalUSD: total, | ||
| ByProvider: byProvider, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Result.Cost is computed from a model guess instead of the executed usage.
This re-estimates spend from TokenTracker using one graph-level llm_model for every provider. That makes Result.Cost wrong or empty whenever the run uses per-node model overrides, multiple models on the same provider, or a backend/path that records usage in EngineResult.Usage without going through the middleware tracker. The trace already has aggregate provider totals with cost; Result.Cost should prefer engineResult.Usage when it's available.
Based on learnings Token usage flows through llm.Usage → agent.SessionResult.Usage → pipeline.SessionStats → EngineResult.Usage, and CLI summary treats llm.TokenTracker and EngineResult.Usage as independent data sources.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tracker.go` around lines 439 - 456, populateResultTokensAndCost currently
re-computes cost via tokenTracker and defaultModelResolver which misestimates
when per-node/model usage is recorded in engineResult. Instead, change
populateResultTokensAndCost to prefer engineResult.Usage when present: if
engineResult != nil and engineResult.Usage has provider entries, build
Result.Cost from that usage (aggregate TotalUSD and set ByProvider from
engineResult.Usage) and skip the tokenTracker cost path and
defaultModelResolver; otherwise fall back to the existing tokenTracker logic
(keep result.TokensByProvider = e.tokenTracker.AllProviderUsage() behavior). Use
the existing CostReport struct shape and fields (TotalUSD, ByProvider) when
constructing the cost from pipeline.EngineResult.Usage.
There was a problem hiding this comment.
Known limitation, documented in code comments on defaultModelResolver. Per-node model tracking requires plumbing through the trace that doesn't exist yet. Follow-up — the existing design gives accurate numbers for single-model pipelines and zero (not wrong) for unknown models.
There was a problem hiding this comment.
@clintecker, understood — thanks for the context. The "zero for unknown models rather than a wrong estimate" contract is a reasonable defensive choice, and the inline doc on defaultModelResolver already captures the per-node limitation. I'll note this for future reviews.
✏️ Learnings added
Learnt from: clintecker
Repo: 2389-research/tracker PR: 68
File: tracker.go:439-456
Timestamp: 2026-04-15T16:42:00.353Z
Learning: In `tracker.go`, `defaultModelResolver` intentionally uses a single graph-level `llm_model` attr for all providers. It returns zero USD (not an incorrect estimate) for unknown/multi-model scenarios. Per-node model tracking in `Result.Cost` is a known future work item requiring trace plumbing that does not yet exist. Do not flag this as a bug.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: 2389-research/tracker PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-14T17:00:24.914Z
Learning: Token usage flows through three layers: llm.Usage per API call → agent.SessionResult.Usage across turns → pipeline.SessionStats in trace → EngineResult.Usage aggregate. For parallel execution, the parallel handler aggregates branch SessionStats.
There was a problem hiding this comment.
Pull request overview
Implements token/cost governance across the tracker library + pipeline engine by surfacing per-provider cost in library results, streaming cost snapshots as events, and enforcing optional token/cost/wall-time ceilings that halt execution.
Changes:
- Exposes
Result.Cost(total USD + per-provider breakdown) and propagates budget halt reasons back to library callers. - Adds
pipeline.BudgetGuard+OutcomeBudgetExceeded, pluscost_updated/budget_exceededevents carryingCostSnapshotand JSONL logging support. - Wires budget limits through CLI flags and surfaces budget halts in CLI summary and
tracker diagnose.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tracker.go |
Adds Config.Budget, Result.Cost, and populates tokens/cost + halt reasons from engine result. |
tracker_test.go |
Adds integration tests for Result.Cost population and config-driven budget halts. |
pipeline/budget.go / pipeline/budget_test.go |
Introduces BudgetLimits/BudgetGuard with unit tests for ceilings and boundary rounding. |
pipeline/engine.go |
Adds budget guard plumbing, OutcomeBudgetExceeded, BudgetLimitsHit, and Engine.Graph() accessor. |
pipeline/engine_run.go |
Emits streaming cost snapshots and halts runs on budget breach (emits EventBudgetExceeded). |
pipeline/engine_test.go |
Adds integration tests for streaming cost events and budget halting behavior. |
pipeline/events.go |
Adds new event types and CostSnapshot payload on PipelineEvent. |
pipeline/events_jsonl.go / pipeline/events_jsonl_test.go |
Logs CostSnapshot fields into activity.jsonl and tests the JSONL entry build. |
pipeline/trace.go / pipeline/trace_test.go |
Adds SessionStats.Provider and aggregates per-provider totals into UsageSummary.ProviderTotals. |
pipeline/handlers/transcript.go / pipeline/handlers/transcript_test.go |
Populates provider into session stats emitted into the trace. |
llm/token_tracker_cost.go / llm/token_tracker_cost_test.go |
Adds per-provider cost rollup on TokenTracker via model resolver + EstimateCost. |
cmd/tracker/flags.go / cmd/tracker/main.go |
Adds --max-tokens, --max-cost, --max-wall-time flags to run config. |
cmd/tracker/commands.go |
Plumbs parsed CLI budget limits into the run execution path. |
cmd/tracker/run.go |
Attaches BudgetGuard to both non-TUI and TUI engine construction via active budget limits. |
cmd/tracker/summary.go |
Prints a budget-halt banner when OutcomeBudgetExceeded is returned. |
cmd/tracker/diagnose.go |
Detects budget_exceeded in activity.jsonl and prints a budget-halt section. |
agent/result.go / agent/session.go / agent/session_test.go |
Adds provider field to SessionResult and ensures it’s populated. |
README.md |
Documents cost governance usage for library, CLI, and event streaming consumers. |
CHANGELOG.md |
Adds unreleased entries for cost exposure, budgets, events, CLI flags, and diagnose behavior. |
CLAUDE.md |
Documents the new cost governance flow and where budget enforcement happens. |
docs/superpowers/plans/2026-04-14-token-cost-governance.md |
Adds an implementation plan document for the feature set. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (e *Engine) emitCostUpdate(s *runState) { | ||
| summary := s.trace.AggregateUsage() | ||
| if summary == nil { | ||
| return | ||
| } | ||
| e.emit(PipelineEvent{ | ||
| Type: EventCostUpdated, | ||
| Timestamp: time.Now(), | ||
| RunID: s.runID, | ||
| Cost: &CostSnapshot{ | ||
| TotalTokens: summary.TotalTokens, | ||
| TotalCostUSD: summary.TotalCostUSD, | ||
| ProviderTotals: summary.ProviderTotals, | ||
| WallElapsed: time.Since(s.trace.StartTime), | ||
| }, | ||
| }) |
There was a problem hiding this comment.
emitCostUpdate() and checkBudgetAfterEmit() both call trace.AggregateUsage(), which iterates over all trace entries. Since these are invoked after each completed node, this makes run-time cost/budget checking O(n^2) in number of nodes, and also duplicates work (AggregateUsage is computed at least twice per node, plus again in haltForBudget). Consider computing the UsageSummary once after AddEntry and reusing it for both the emitted CostSnapshot and the BudgetGuard check, or maintaining an incremental aggregate on runState/Trace that updates as entries are appended.
There was a problem hiding this comment.
Valid perf concern for very long pipelines. Maintaining a running UsageSummary on runState (updated incrementally per node) would eliminate the repeated scan. Not blocking for this PR — typical pipelines are 10-30 nodes. Happy to optimize in a follow-up if profiling shows it matters.
| // Graph returns the graph this engine executes. Used by library callers | ||
| // that need to inspect graph attributes after construction. | ||
| func (e *Engine) Graph() *Graph { return e.graph } |
There was a problem hiding this comment.
Graph() exposes the engine's internal *Graph pointer directly. Since Graph (and its Attrs/Nodes/Edges maps) is mutable, external callers could mutate it during/after construction and potentially introduce data races or break engine invariants. Consider returning a read-only snapshot (e.g., a copy of Attrs) or clearly preventing mutation by returning a deep copy / wrapper type, especially since Engine.Run is documented as safe for concurrent use in other places.
There was a problem hiding this comment.
The graph is immutable by convention after construction — the engine and all handlers treat it as read-only. Adding a copy or interface wrapper adds allocation overhead on every call for no practical safety gain. If we ever need to enforce immutability, a ReadOnlyGraph wrapper is the right approach, but it's not warranted today.
…agnose dimension - BudgetGuard now fires after retry attempts, not just normal advance - haltForBudget saves checkpoint before returning so restarts are safe - diagnose extracts breach.Message from JSONL for specific guidance - CLI rejects negative budget limits - README uses correct CLI syntax (no 'run' subcommand) Refs #17 #62 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
README.md (1)
397-404:⚠️ Potential issue | 🟡 MinorFix CLI example ordering to match
tracker [flags] <pipeline>.On Line 400, the pipeline path is placed before flags, but the documented command shape (Line 421) is flags first. This is the same copy-paste trap previously raised.
Suggested doc fix
-tracker examples/ask_and_execute.dip \ - --max-tokens 100000 \ - --max-cost 500 \ - --max-wall-time 30m +tracker --max-tokens 100000 \ + --max-cost 500 \ + --max-wall-time 30m \ + examples/ask_and_execute.dip🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 397 - 404, The CLI example places the pipeline path before flags which contradicts the documented usage "tracker [flags] <pipeline>"; update the example so flags come before the pipeline path (e.g., "tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m examples/ask_and_execute.dip") so the command shape for the `tracker` CLI and the example ordering are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 399-404: The fenced code block in README.md for the CLI snippet
(starting with "tracker examples/ask_and_execute.dip") lacks a language tag;
update the opening fence to include "bash" (i.e., ```bash) so markdownlint
recognizes it as a shell snippet and renders/validates correctly; optionally
also adopt the suggested formatting order in the comment but the key change is
adding the "bash" language identifier to the existing fenced block.
---
Duplicate comments:
In `@README.md`:
- Around line 397-404: The CLI example places the pipeline path before flags
which contradicts the documented usage "tracker [flags] <pipeline>"; update the
example so flags come before the pipeline path (e.g., "tracker --max-tokens
100000 --max-cost 500 --max-wall-time 30m examples/ask_and_execute.dip") so the
command shape for the `tracker` CLI and the example ordering are consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 79f30480-06f9-4e10-b56d-1fad03edad7b
📒 Files selected for processing (5)
README.mdcmd/tracker/diagnose.gocmd/tracker/flags.gopipeline/engine_run.gopipeline/engine_test.go
✅ Files skipped from review due to trivial changes (1)
- pipeline/engine_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pipeline/engine_run.go
- cmd/tracker/diagnose.go
Flags before pipeline path to match existing examples. Refs #17
There was a problem hiding this comment.
♻️ Duplicate comments (1)
README.md (1)
399-402:⚠️ Potential issue | 🟡 MinorAdd the
bashlanguage identifier to the fenced code block.This was already flagged in a previous review but remains unfixed. The fenced code block needs a language specifier for proper syntax highlighting and markdownlint compliance.
📝 Suggested fix
-``` +```bash tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m \ examples/ask_and_execute.dip</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@README.mdaround lines 399 - 402, The fenced code block showing the tracker
command lacks a language specifier; update that Markdown block (the
triple-backtick block containing "tracker --max-tokens 100000 --max-cost 500
--max-wall-time 30m \ examples/ask_and_execute.dip") to include the "bash"
language identifier so it reads ```bash at the start of the block for proper
syntax highlighting and markdownlint compliance.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In@README.md:
- Around line 399-402: The fenced code block showing the tracker command lacks a
language specifier; update that Markdown block (the triple-backtick block
containing "tracker --max-tokens 100000 --max-cost 500 --max-wall-time 30m
examples/ask_and_execute.dip") to include the "bash" language identifier so it
reads ```bash at the start of the block for proper syntax highlighting and
markdownlint compliance.</details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Organization UI **Review profile**: CHILL **Plan**: Pro **Run ID**: `fc03ed77-660c-4601-8cea-c129e708fdd5` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between a1939ab3203ab02ac3161ab321c6b39e1fdc99d5 and 6345d72d8773c39e7b155aabd417a18c2e9ffa03. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `README.md` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }, | ||
| }) | ||
| if result.Status == pipeline.OutcomeBudgetExceeded { | ||
| log.Printf("halt: %s, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD) |
There was a problem hiding this comment.
The README code sample uses %s to print result.Cost.LimitsHit, but LimitsHit is a []string in the API, so this snippet won't compile if copied verbatim. Consider using %v or strings.Join(result.Cost.LimitsHit, ", ") to match the actual type.
| log.Printf("halt: %s, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD) | |
| log.Printf("halt: %v, spent $%.4f", result.Cost.LimitsHit, result.Cost.TotalUSD) |
| traceEntry.EdgeTo = target | ||
| s.trace.AddEntry(*traceEntry) | ||
| e.emitCostUpdate(s) | ||
| if lr := e.checkBudgetAfterEmit(s); lr != nil { | ||
| return "", false, lr.result, nil | ||
| } |
There was a problem hiding this comment.
Budget enforcement is applied on the retry path in handleRetryWithinBudget (emitCostUpdate + checkBudgetAfterEmit), but the retry-exhausted path (handleRetryExhausted) still appends the retry TraceEntry and then continues to fallback/fail without running the BudgetGuard. This can let a run continue (via fallback_retry_target) even after the last retry attempt pushed totals over MaxTotalTokens/MaxCostCents/MaxWallTime, and it will also skip emitting a final cost_updated snapshot for that attempt. Consider invoking emitCostUpdate(s) and checkBudgetAfterEmit(s) immediately after s.trace.AddEntry(*traceEntry) in handleRetryExhausted (both fallback and no-fallback branches), returning OutcomeBudgetExceeded when tripped.
| // printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded. | ||
| func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) { | ||
| fmt.Println() | ||
| fmt.Println("─── HALTED: budget exceeded ───────────────────────────────") | ||
| if len(result.BudgetLimitsHit) > 0 { | ||
| fmt.Printf(" reason: %s\n", strings.Join(result.BudgetLimitsHit, ", ")) | ||
| } | ||
| if tracker != nil { | ||
| total := tracker.TotalUsage() | ||
| if total.InputTokens > 0 || total.OutputTokens > 0 { | ||
| totalToks := total.InputTokens + total.OutputTokens | ||
| fmt.Printf(" spent: %s tokens", formatNumber(totalToks)) | ||
| if total.EstimatedCost > 0 { | ||
| fmt.Printf(", $%.4f", total.EstimatedCost) | ||
| } | ||
| fmt.Println() | ||
| } |
There was a problem hiding this comment.
printBudgetHaltBanner calculates "spent" from the passed TokenTracker only. This will under-report (or omit) spend when TokenTracker isn't populated (e.g., claude-code/acp backends or any case where usage is only available via trace stats), even though EngineResult.Usage/Trace.AggregateUsage contains the authoritative totals used by BudgetGuard. Consider rendering tokens/cost from result.Usage (or result.Trace.AggregateUsage()) as the primary source, and only using TokenTracker as a fallback.
| // printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded. | |
| func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) { | |
| fmt.Println() | |
| fmt.Println("─── HALTED: budget exceeded ───────────────────────────────") | |
| if len(result.BudgetLimitsHit) > 0 { | |
| fmt.Printf(" reason: %s\n", strings.Join(result.BudgetLimitsHit, ", ")) | |
| } | |
| if tracker != nil { | |
| total := tracker.TotalUsage() | |
| if total.InputTokens > 0 || total.OutputTokens > 0 { | |
| totalToks := total.InputTokens + total.OutputTokens | |
| fmt.Printf(" spent: %s tokens", formatNumber(totalToks)) | |
| if total.EstimatedCost > 0 { | |
| fmt.Printf(", $%.4f", total.EstimatedCost) | |
| } | |
| fmt.Println() | |
| } | |
| // budgetUsageForDisplay returns the best available aggregate usage for summary output. | |
| // Prefer usage captured on the EngineResult/trace, and only fall back to TokenTracker. | |
| func budgetUsageForDisplay(result *pipeline.EngineResult, tracker *llm.TokenTracker) (llm.TokenUsage, bool) { | |
| if result != nil { | |
| total := result.Usage | |
| if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 { | |
| return total, true | |
| } | |
| total = result.Trace.AggregateUsage() | |
| if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 { | |
| return total, true | |
| } | |
| } | |
| if tracker != nil { | |
| total := tracker.TotalUsage() | |
| if total.InputTokens > 0 || total.OutputTokens > 0 || total.EstimatedCost > 0 { | |
| return total, true | |
| } | |
| } | |
| return llm.TokenUsage{}, false | |
| } | |
| // printBudgetHaltBanner prints a prominent halt notice when a budget limit was exceeded. | |
| func printBudgetHaltBanner(result *pipeline.EngineResult, tracker *llm.TokenTracker) { | |
| fmt.Println() | |
| fmt.Println("─── HALTED: budget exceeded ───────────────────────────────") | |
| if result != nil && len(result.BudgetLimitsHit) > 0 { | |
| fmt.Printf(" reason: %s\n", strings.Join(result.BudgetLimitsHit, ", ")) | |
| } | |
| if total, ok := budgetUsageForDisplay(result, tracker); ok { | |
| totalToks := total.InputTokens + total.OutputTokens | |
| fmt.Printf(" spent: %s tokens", formatNumber(totalToks)) | |
| if total.EstimatedCost > 0 { | |
| fmt.Printf(", $%.4f", total.EstimatedCost) | |
| } | |
| fmt.Println() |
| fmt.Fprintln(w, " suggestion: raise the relevant --max-tokens, --max-cost, or --max-wall-time flag,") | ||
| fmt.Fprintln(w, " or remove the Config.Budget value in your pipeline configuration") |
There was a problem hiding this comment.
The budget-halt suggestion text is a bit misleading for CLI users and graph-attr based budgets: it says to "remove the Config.Budget value", but budget limits can also come from --max-* flags (and may later come from workflow attrs). Consider rephrasing to something like "remove or raise the configured budget limits" and mention the relevant source(s) (flags vs library Config).
| fmt.Fprintln(w, " suggestion: raise the relevant --max-tokens, --max-cost, or --max-wall-time flag,") | |
| fmt.Fprintln(w, " or remove the Config.Budget value in your pipeline configuration") | |
| fmt.Fprintln(w, " suggestion: raise or remove the configured budget limits") | |
| fmt.Fprintln(w, " (--max-tokens, --max-cost, --max-wall-time, or pipeline/library budget config)") |
Summary
Result.Costin the library API, populated from thellm.TokenTrackermiddleware viallm.EstimateCost(closes feat: ensure full token/cost data exposed from library #62).MaxTotalTokens/MaxCostCents/MaxWallTimeceilings via a newpipeline.BudgetGuard. Tripping any limit halts the run withpipeline.OutcomeBudgetExceededand populatesEngineResult.BudgetLimitsHit(closes fix(pipeline): no pipeline-level token budget or cost ceiling #17).EventCostUpdatedevents (andEventBudgetExceededon halt), each carrying aCostSnapshotpayload with total tokens, dollar cost, per-provider totals, and wall elapsed.tracker.Config.Budgetlibrary field +--max-tokens,--max-cost,--max-wall-timeCLI flags.tracker diagnoseandactivity.jsonlnow carry cost/halt data.Scope note
Reading budget limits directly from
.dipworkflow attrs is blocked on dippin-lang IR support. Tracked as a follow-up in #67.Test plan
go build ./...— cleango test ./... -short— all 15 packages passdippin doctor— A/100 onask_and_execute.dip,build_product.dip,build_product_with_superspec.dipBudgetGuardtoken / cost / wall-time / nil-guard / accumulated-float-roundingTestEngine_HaltsOnBudgetBreach(5-node graph halts after node 3)TestRun_BudgetHalt_FromConfig(library halt viaConfig.Budget)TestEngine_EmitsCostUpdatedAfterEachNode(3-node graph emits ≥3 events, monotonic totals)TestRun_PopulatesResultCost(per-provider cost inResult.Cost.ByProvider)TestBuildLogEntry_CostSnapshot/TestBuildLogEntry_NilCosttracker run examples/ask_and_execute.dip --max-tokens 1halts with budget bannertracker diagnoseafter a halted run surfaces the budget sectionCloses #62
Closes #17
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests