diff --git a/docs/adr/27137-add-ambient-context-metric.md b/docs/adr/27137-add-ambient-context-metric.md new file mode 100644 index 00000000000..d15d6e8178b --- /dev/null +++ b/docs/adr/27137-add-ambient-context-metric.md @@ -0,0 +1,77 @@ +# ADR-27137: Track Ambient Context via First LLM Invocation Token Metrics + +**Date**: 2026-04-19 +**Status**: Draft +**Deciders**: pelikhan, Copilot + +--- + +## Part 1 — Narrative (Human-Friendly) + +### Context + +The gh-aw tooling collects and aggregates per-run token usage from the firewall proxy's `token-usage.jsonl` log. Aggregate totals (total input, cache-read, and output tokens) are already surfaced in `audit` and `logs` outputs, but they conflate system-prompt overhead with actual task-execution cost. Teams want to compare how "heavy" the ambient context (system prompt, tools list, memory) is across different workflow configurations without writing custom log analysis. The first LLM invocation in a run is a natural proxy for ambient context size because it fires before the agent has accumulated any conversation history, so its input token count primarily reflects the static context loaded at startup. + +### Decision + +We will introduce an `AmbientContextMetrics` struct that captures the token footprint (`input_tokens`, `cached_tokens`, `effective_tokens`) of the chronologically first LLM invocation in `token-usage.jsonl`, and expose it as an optional `ambient_context` field in both the `audit` and `logs` JSON output schemas. Chronological ordering is determined by the `timestamp` field (RFC 3339 / RFC 3339 Nano); file order is used as a stable tiebreaker for entries that share a timestamp or lack one. The `effective_tokens` value is defined as `input_tokens + cache_read_tokens`, consistent with the existing effective-token convention in the codebase. + +### Alternatives Considered + +#### Alternative 1: Average or Median Token Count Across All Invocations + +Computing an average or median across all invocations was considered as a way to characterize "typical" invocation cost. This was rejected because it mixes task-execution turns — which accumulate conversation history and grow over time — with the initial system-prompt turn, making it a poor proxy for ambient context size. The metric would also vary with run length, complicating cross-run comparisons. + +#### Alternative 2: Expose the Full Ordered Invocation List and Let Consumers Filter + +Surfacing the complete sorted list of token usage entries in the output and letting downstream tools select the first entry was considered to give consumers maximum flexibility. This was rejected because it would significantly increase output payload size for long-running agents (which may make hundreds of LLM calls) and because the first-invocation semantic is stable and well-understood enough to encode directly in the tool. + +#### Alternative 3: Use the Invocation with the Fewest Input Tokens as a Proxy + +Using the minimum-input-token invocation as a proxy for ambient context (assuming "lighter" calls reflect smaller context) was considered. This was rejected because minimum-token invocations can occur at any point during a run if the agent routes cheap subtasks to a smaller or cheaper model, making the metric unreliable as an ambient context indicator. + +### Consequences + +#### Positive +- Teams can compare ambient context overhead across workflow configurations using a single structured field without parsing raw logs. +- The metric is optional (`omitempty`) so existing JSON consumers of `audit` and `logs` output are not broken when it is absent. +- Chronological sorting of token usage entries is now an explicit, tested behavior that can be reused for future metrics that need temporal ordering. + +#### Negative +- The first invocation is a heuristic proxy, not a guaranteed measure of system-prompt size. If a workflow fires a lightweight "warm-up" or health-check LLM call before the main agent invocation, the metric will reflect that call's token counts rather than the agent's true ambient context. +- Adding `AmbientContext` to `TokenUsageSummary` changes `parseTokenUsageFile` from streaming aggregation to collect-then-aggregate, which increases peak memory usage proportionally to the number of log entries (though this is bounded by the `1 MB` scanner buffer and is not expected to be significant in practice). + +#### Neutral +- `token_usage.go` now imports `time` from the standard library for timestamp parsing. +- The `parseTokenUsageFile` function's internal processing order changed (collect all entries, then aggregate), but the functional output for existing aggregate fields (`TotalInputTokens`, `CacheEfficiency`, etc.) is unchanged. +- Reference documentation for `audit` and `logs` commands was updated to describe the new field. + +--- + +## Part 2 — Normative Specification (RFC 2119) + +> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). + +### Ambient Context Extraction + +1. Implementations **MUST** compute ambient context metrics from the single earliest (chronologically first) token usage entry in `token-usage.jsonl`. +2. Implementations **MUST** sort entries by the `timestamp` field using RFC 3339 Nano format first, falling back to RFC 3339 format, when timestamps are present. +3. Implementations **MUST** use file-insertion order (entry index) as a stable tiebreaker when two entries share a timestamp or when one or both entries lack a timestamp. +4. Implementations **MUST NOT** include token counts from any invocation other than the first sorted entry in the `AmbientContextMetrics` calculation. +5. Implementations **MUST** set `effective_tokens` to `input_tokens + cache_read_tokens` for the ambient context metric. +6. Implementations **SHOULD** return `nil` and omit the field when no token usage entries are available, rather than emitting a zero-value struct. + +### Output Schema + +1. Implementations **MUST** expose the `ambient_context` field as an optional (`omitempty`) JSON object in the `MetricsData` struct used by `audit` JSON output. +2. Implementations **MUST** expose the `ambient_context` field as an optional (`omitempty`) JSON field on each `RunData` entry in `logs` JSON output. +3. Implementations **MUST NOT** render `ambient_context` in console-formatted tabular output; the field **MUST** carry a `console:"-"` tag. +4. Implementations **MAY** expose `ambient_context` in future report sections (e.g., audit diff, multi-run trend analysis) as the metric matures. + +### Conformance + +An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance. + +--- + +*ADR created by [adr-writer agent]. Review and finalize before changing status from Draft to Accepted.* diff --git a/docs/src/content/docs/reference/audit.md b/docs/src/content/docs/reference/audit.md index 2ac7e1dadc5..8a347901ff8 100644 --- a/docs/src/content/docs/reference/audit.md +++ b/docs/src/content/docs/reference/audit.md @@ -51,6 +51,11 @@ gh aw audit 1234567890 --repo owner/repo **Report sections** (rendered in Markdown or JSON): Overview, Comparison, Task/Domain, Behavior Fingerprint, Agentic Assessments, Metrics, Key Findings, Recommendations, Observability Insights, Performance Metrics, Engine Config, Prompt Analysis, Session Analysis, Safe Output Summary, MCP Server Health, Jobs, Downloaded Files, Missing Tools, Missing Data, Noops, MCP Failures, Firewall Analysis, Policy Analysis, Redacted Domains, Errors, Warnings, Tool Usage, MCP Tool Usage, Created Items. +The Metrics section includes an `ambient_context` object when available. Ambient context captures the first LLM inference footprint for the run: +- `ambient_context.input_tokens` — input tokens for the first invocation +- `ambient_context.cached_tokens` — cache-read tokens reused by the first invocation +- `ambient_context.effective_tokens` — `input_tokens + cached_tokens` + ## `gh aw audit diff [...]` Compare behavior between workflow runs. Detects policy regressions, new unauthorized domains, behavioral drift, and changes in MCP tool usage or run metrics. @@ -118,6 +123,8 @@ This feature is built into the `gh aw logs` command via the `--format` flag. The report output includes an executive summary, domain inventory, metrics trends, MCP server health, and per-run breakdown. It detects cross-run anomalies such as domain access spikes, elevated MCP error rates, and connection rate changes. +For each run in detailed logs JSON output, an `ambient_context` object is included when token usage data is available. It reflects only the first LLM invocation in the run (`input_tokens`, `cached_tokens`, `effective_tokens`). + **Examples:** ```bash diff --git a/pkg/cli/audit_ambient_context_test.go b/pkg/cli/audit_ambient_context_test.go new file mode 100644 index 00000000000..8cafb20b905 --- /dev/null +++ b/pkg/cli/audit_ambient_context_test.go @@ -0,0 +1,37 @@ +//go:build !integration + +package cli + +import ( + "testing" + "time" + + "github.com/github/gh-aw/pkg/workflow" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestBuildAuditDataIncludesAmbientContext(t *testing.T) { + processedRun := ProcessedRun{ + Run: WorkflowRun{ + DatabaseID: 1, + WorkflowName: "test", + Status: "completed", + Conclusion: "success", + CreatedAt: time.Now(), + }, + TokenUsage: &TokenUsageSummary{ + AmbientContext: &AmbientContextMetrics{ + InputTokens: 1200, + CachedTokens: 300, + EffectiveTokens: 1500, + }, + }, + } + + auditData := buildAuditData(processedRun, workflow.LogMetrics{}, nil) + require.NotNil(t, auditData.Metrics.AmbientContext, "ambient context should be populated") + assert.Equal(t, 1200, auditData.Metrics.AmbientContext.InputTokens, "input tokens should match") + assert.Equal(t, 300, auditData.Metrics.AmbientContext.CachedTokens, "cached tokens should match") + assert.Equal(t, 1500, auditData.Metrics.AmbientContext.EffectiveTokens, "effective tokens should match") +} diff --git a/pkg/cli/audit_report.go b/pkg/cli/audit_report.go index 4b458291b73..fc90342c07c 100644 --- a/pkg/cli/audit_report.go +++ b/pkg/cli/audit_report.go @@ -98,13 +98,14 @@ type OverviewData struct { // MetricsData contains execution metrics type MetricsData struct { - TokenUsage int `json:"token_usage,omitempty" console:"header:Token Usage,format:number,omitempty"` - EffectiveTokens int `json:"effective_tokens,omitempty" console:"header:Effective Tokens,format:number,omitempty"` - EstimatedCost float64 `json:"estimated_cost,omitempty" console:"header:Estimated Cost,format:cost,omitempty"` - ActionMinutes float64 `json:"action_minutes,omitempty" console:"header:Action Minutes,omitempty"` - Turns int `json:"turns,omitempty" console:"header:Turns,omitempty"` - ErrorCount int `json:"error_count" console:"header:Errors"` - WarningCount int `json:"warning_count" console:"header:Warnings"` + TokenUsage int `json:"token_usage,omitempty" console:"header:Token Usage,format:number,omitempty"` + EffectiveTokens int `json:"effective_tokens,omitempty" console:"header:Effective Tokens,format:number,omitempty"` + AmbientContext *AmbientContextMetrics `json:"ambient_context,omitempty" console:"title:Ambient Context,omitempty"` + EstimatedCost float64 `json:"estimated_cost,omitempty" console:"header:Estimated Cost,format:cost,omitempty"` + ActionMinutes float64 `json:"action_minutes,omitempty" console:"header:Action Minutes,omitempty"` + Turns int `json:"turns,omitempty" console:"header:Turns,omitempty"` + ErrorCount int `json:"error_count" console:"header:Errors"` + WarningCount int `json:"warning_count" console:"header:Warnings"` } // JobData contains information about individual jobs @@ -285,6 +286,9 @@ func buildAuditData(processedRun ProcessedRun, metrics LogMetrics, mcpToolUsage } else if run.EffectiveTokens > 0 { metricsData.EffectiveTokens = run.EffectiveTokens } + if processedRun.TokenUsage != nil && processedRun.TokenUsage.AmbientContext != nil { + metricsData.AmbientContext = processedRun.TokenUsage.AmbientContext + } // Populate ActionMinutes from run duration so it is always visible even // when token/turn metrics are zero (e.g. Codex runs that exit early). diff --git a/pkg/cli/logs_ambient_context_test.go b/pkg/cli/logs_ambient_context_test.go new file mode 100644 index 00000000000..aafe58b35aa --- /dev/null +++ b/pkg/cli/logs_ambient_context_test.go @@ -0,0 +1,42 @@ +//go:build !integration + +package cli + +import ( + "testing" + "time" + + "github.com/github/gh-aw/pkg/testutil" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestBuildLogsDataIncludesAmbientContext(t *testing.T) { + tmpDir := testutil.TempDir(t, "logs-ambient-context") + processedRuns := []ProcessedRun{ + { + Run: WorkflowRun{ + DatabaseID: 1, + WorkflowName: "test", + Status: "completed", + Conclusion: "success", + CreatedAt: time.Now(), + LogsPath: tmpDir, + }, + TokenUsage: &TokenUsageSummary{ + AmbientContext: &AmbientContextMetrics{ + InputTokens: 800, + CachedTokens: 200, + EffectiveTokens: 1000, + }, + }, + }, + } + + data := buildLogsData(processedRuns, tmpDir, nil) + require.Len(t, data.Runs, 1, "should produce a single run") + require.NotNil(t, data.Runs[0].AmbientContext, "ambient context should be included") + assert.Equal(t, 800, data.Runs[0].AmbientContext.InputTokens, "input tokens should match") + assert.Equal(t, 200, data.Runs[0].AmbientContext.CachedTokens, "cached tokens should match") + assert.Equal(t, 1000, data.Runs[0].AmbientContext.EffectiveTokens, "effective tokens should match") +} diff --git a/pkg/cli/logs_report.go b/pkg/cli/logs_report.go index 2f3372b701e..ee311b76cab 100644 --- a/pkg/cli/logs_report.go +++ b/pkg/cli/logs_report.go @@ -76,50 +76,51 @@ type LogsSummary struct { // RunData contains information about a single workflow run type RunData struct { - DatabaseID int64 `json:"database_id" console:"header:Run ID"` - Number int `json:"number" console:"-"` - WorkflowName string `json:"workflow_name" console:"header:Workflow"` - WorkflowPath string `json:"workflow_path" console:"-"` - Agent string `json:"agent,omitempty" console:"header:Agent,omitempty"` - Status string `json:"status" console:"header:Status"` - Conclusion string `json:"conclusion,omitempty" console:"-"` - Classification string `json:"classification" console:"-"` - Duration string `json:"duration,omitempty" console:"header:Duration,omitempty"` - ActionMinutes float64 `json:"action_minutes,omitempty" console:"header:Action Minutes,omitempty"` - TokenUsage int `json:"token_usage,omitempty" console:"header:Tokens,format:number,omitempty"` - EffectiveTokens int `json:"effective_tokens,omitempty" console:"header:Effective Tokens,format:number,omitempty"` - EstimatedCost float64 `json:"estimated_cost,omitempty" console:"header:Cost ($),format:cost,omitempty"` - Turns int `json:"turns,omitempty" console:"header:Turns,omitempty"` - ErrorCount int `json:"error_count" console:"header:Errors"` - WarningCount int `json:"warning_count" console:"header:Warnings"` - MissingToolCount int `json:"missing_tool_count" console:"header:Missing Tools"` - MissingDataCount int `json:"missing_data_count" console:"header:Missing Data"` - SafeItemsCount int `json:"safe_items_count,omitempty" console:"header:Safe Items,omitempty"` - CreatedAt time.Time `json:"created_at" console:"header:Created"` - StartedAt time.Time `json:"started_at,omitzero" console:"-"` - UpdatedAt time.Time `json:"updated_at,omitzero" console:"-"` - URL string `json:"url" console:"-"` - LogsPath string `json:"logs_path" console:"header:Logs Path"` - Event string `json:"event" console:"-"` - Branch string `json:"branch" console:"-"` - HeadSHA string `json:"head_sha,omitempty" console:"-"` - DisplayTitle string `json:"display_title,omitempty" console:"-"` - Repository string `json:"repository,omitempty" console:"-"` - Organization string `json:"organization,omitempty" console:"-"` - Ref string `json:"ref,omitempty" console:"-"` - SHA string `json:"sha,omitempty" console:"-"` - Actor string `json:"actor,omitempty" console:"-"` - RunAttempt string `json:"run_attempt,omitempty" console:"-"` - TargetRepo string `json:"target_repo,omitempty" console:"-"` - EventName string `json:"event_name,omitempty" console:"-"` - Comparison *AuditComparisonData `json:"comparison,omitempty" console:"-"` - TaskDomain *TaskDomainInfo `json:"task_domain,omitempty" console:"-"` - BehaviorFingerprint *BehaviorFingerprint `json:"behavior_fingerprint,omitempty" console:"-"` - AgenticAssessments []AgenticAssessment `json:"agentic_assessments,omitempty" console:"-"` - AwContext *AwContext `json:"context,omitempty" console:"-"` // aw_context data from aw_info.json - TokenUsageSummary *TokenUsageSummary `json:"token_usage_summary,omitempty" console:"-"` // Token usage from firewall proxy - GitHubAPICalls int `json:"github_api_calls,omitempty" console:"header:GitHub API Calls,format:number,omitempty"` // GitHub API calls made during the run - AvgTimeBetweenTurns string `json:"avg_time_between_turns,omitempty" console:"-"` // Average time between consecutive LLM API calls (TBT) + DatabaseID int64 `json:"database_id" console:"header:Run ID"` + Number int `json:"number" console:"-"` + WorkflowName string `json:"workflow_name" console:"header:Workflow"` + WorkflowPath string `json:"workflow_path" console:"-"` + Agent string `json:"agent,omitempty" console:"header:Agent,omitempty"` + Status string `json:"status" console:"header:Status"` + Conclusion string `json:"conclusion,omitempty" console:"-"` + Classification string `json:"classification" console:"-"` + Duration string `json:"duration,omitempty" console:"header:Duration,omitempty"` + ActionMinutes float64 `json:"action_minutes,omitempty" console:"header:Action Minutes,omitempty"` + TokenUsage int `json:"token_usage,omitempty" console:"header:Tokens,format:number,omitempty"` + EffectiveTokens int `json:"effective_tokens,omitempty" console:"header:Effective Tokens,format:number,omitempty"` + AmbientContext *AmbientContextMetrics `json:"ambient_context,omitempty" console:"-"` + EstimatedCost float64 `json:"estimated_cost,omitempty" console:"header:Cost ($),format:cost,omitempty"` + Turns int `json:"turns,omitempty" console:"header:Turns,omitempty"` + ErrorCount int `json:"error_count" console:"header:Errors"` + WarningCount int `json:"warning_count" console:"header:Warnings"` + MissingToolCount int `json:"missing_tool_count" console:"header:Missing Tools"` + MissingDataCount int `json:"missing_data_count" console:"header:Missing Data"` + SafeItemsCount int `json:"safe_items_count,omitempty" console:"header:Safe Items,omitempty"` + CreatedAt time.Time `json:"created_at" console:"header:Created"` + StartedAt time.Time `json:"started_at,omitzero" console:"-"` + UpdatedAt time.Time `json:"updated_at,omitzero" console:"-"` + URL string `json:"url" console:"-"` + LogsPath string `json:"logs_path" console:"header:Logs Path"` + Event string `json:"event" console:"-"` + Branch string `json:"branch" console:"-"` + HeadSHA string `json:"head_sha,omitempty" console:"-"` + DisplayTitle string `json:"display_title,omitempty" console:"-"` + Repository string `json:"repository,omitempty" console:"-"` + Organization string `json:"organization,omitempty" console:"-"` + Ref string `json:"ref,omitempty" console:"-"` + SHA string `json:"sha,omitempty" console:"-"` + Actor string `json:"actor,omitempty" console:"-"` + RunAttempt string `json:"run_attempt,omitempty" console:"-"` + TargetRepo string `json:"target_repo,omitempty" console:"-"` + EventName string `json:"event_name,omitempty" console:"-"` + Comparison *AuditComparisonData `json:"comparison,omitempty" console:"-"` + TaskDomain *TaskDomainInfo `json:"task_domain,omitempty" console:"-"` + BehaviorFingerprint *BehaviorFingerprint `json:"behavior_fingerprint,omitempty" console:"-"` + AgenticAssessments []AgenticAssessment `json:"agentic_assessments,omitempty" console:"-"` + AwContext *AwContext `json:"context,omitempty" console:"-"` // aw_context data from aw_info.json + TokenUsageSummary *TokenUsageSummary `json:"token_usage_summary,omitempty" console:"-"` // Token usage from firewall proxy + GitHubAPICalls int `json:"github_api_calls,omitempty" console:"header:GitHub API Calls,format:number,omitempty"` // GitHub API calls made during the run + AvgTimeBetweenTurns string `json:"avg_time_between_turns,omitempty" console:"-"` // Average time between consecutive LLM API calls (TBT) } // buildLogsData creates structured logs data from processed runs @@ -192,6 +193,11 @@ func buildLogsData(processedRuns []ProcessedRun, outputDir string, continuation comparison := buildAuditComparisonForProcessedRuns(pr, processedRuns) + var ambientContext *AmbientContextMetrics + if pr.TokenUsage != nil { + ambientContext = pr.TokenUsage.AmbientContext + } + runData := RunData{ DatabaseID: run.DatabaseID, Number: run.Number, @@ -203,6 +209,7 @@ func buildLogsData(processedRuns []ProcessedRun, outputDir string, continuation Classification: deriveRunClassification(comparison), TokenUsage: run.TokenUsage, EffectiveTokens: run.EffectiveTokens, + AmbientContext: ambientContext, EstimatedCost: run.EstimatedCost, ActionMinutes: run.ActionMinutes, Turns: run.Turns, diff --git a/pkg/cli/token_usage.go b/pkg/cli/token_usage.go index 79d05c8c2a5..9818d0ca08d 100644 --- a/pkg/cli/token_usage.go +++ b/pkg/cli/token_usage.go @@ -8,6 +8,7 @@ import ( "path/filepath" "sort" "strings" + "time" "github.com/github/gh-aw/pkg/logger" "github.com/github/gh-aw/pkg/timeutil" @@ -33,6 +34,13 @@ type TokenUsageEntry struct { ResponseBytes int `json:"response_bytes"` } +// AmbientContextMetrics captures token footprint for the first LLM invocation. +type AmbientContextMetrics struct { + InputTokens int `json:"input_tokens" console:"header:Ambient Input,format:number"` + CachedTokens int `json:"cached_tokens" console:"header:Ambient Cached,format:number"` + EffectiveTokens int `json:"effective_tokens" console:"header:Ambient Effective,format:number"` +} + // TokenUsageSummary contains aggregated token usage from the firewall proxy type TokenUsageSummary struct { TotalInputTokens int `json:"total_input_tokens" console:"header:Input Tokens,format:number"` @@ -44,6 +52,7 @@ type TokenUsageSummary struct { TotalResponseBytes int `json:"total_response_bytes"` CacheEfficiency float64 `json:"cache_efficiency"` TotalEffectiveTokens int `json:"total_effective_tokens" console:"header:Effective Tokens,format:number"` + AmbientContext *AmbientContextMetrics `json:"ambient_context,omitempty"` ByModel map[string]*ModelTokenUsage `json:"by_model"` } @@ -96,6 +105,7 @@ func parseTokenUsageFile(filePath string, customWeights *types.TokenWeights) (*T // Increase buffer size for potentially large lines scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024) + entries := make([]TokenUsageEntry, 0) lineNum := 0 for scanner.Scan() { lineNum++ @@ -109,7 +119,19 @@ func parseTokenUsageFile(filePath string, customWeights *types.TokenWeights) (*T tokenUsageLog.Printf("Skipping invalid JSON at line %d: %v", lineNum, err) continue } + entries = append(entries, entry) + } + + if err := scanner.Err(); err != nil { + return nil, fmt.Errorf("error reading token usage file: %w", err) + } + + if len(entries) == 0 { + tokenUsageLog.Print("No token usage entries found") + return nil, nil + } + for _, entry := range entries { // Aggregate totals summary.TotalInputTokens += entry.InputTokens summary.TotalOutputTokens += entry.OutputTokens @@ -139,15 +161,6 @@ func parseTokenUsageFile(filePath string, customWeights *types.TokenWeights) (*T m.ResponseBytes += entry.ResponseBytes } - if err := scanner.Err(); err != nil { - return nil, fmt.Errorf("error reading token usage file: %w", err) - } - - if summary.TotalRequests == 0 { - tokenUsageLog.Print("No token usage entries found") - return nil, nil - } - // Compute cache efficiency: cache_read / (input + cache_read) totalInputPlusCacheRead := summary.TotalInputTokens + summary.TotalCacheReadTokens if totalInputPlusCacheRead > 0 { @@ -160,10 +173,67 @@ func parseTokenUsageFile(filePath string, customWeights *types.TokenWeights) (*T // Compute effective tokens using per-model multipliers (with optional custom overrides) populateEffectiveTokensWithCustomWeights(summary, customWeights) + summary.AmbientContext = extractAmbientContextMetrics(entries) return summary, nil } +func extractAmbientContextMetrics(entries []TokenUsageEntry) *AmbientContextMetrics { + if len(entries) == 0 { + return nil + } + + type orderedTokenEntry struct { + entry TokenUsageEntry + timestamp time.Time + hasTimestamp bool + order int + } + + ordered := make([]orderedTokenEntry, 0, len(entries)) + for i, entry := range entries { + ts, hasTimestamp := parseTokenUsageTimestamp(entry.Timestamp) + ordered = append(ordered, orderedTokenEntry{ + entry: entry, + timestamp: ts, + hasTimestamp: hasTimestamp, + order: i, + }) + } + + sort.SliceStable(ordered, func(i, j int) bool { + left := ordered[i] + right := ordered[j] + if left.hasTimestamp && right.hasTimestamp { + return left.timestamp.Before(right.timestamp) + } + if left.hasTimestamp != right.hasTimestamp { + return left.hasTimestamp + } + return left.order < right.order + }) + + firstCall := ordered[0].entry + return &AmbientContextMetrics{ + InputTokens: firstCall.InputTokens, + CachedTokens: firstCall.CacheReadTokens, + EffectiveTokens: firstCall.InputTokens + firstCall.CacheReadTokens, + } +} + +func parseTokenUsageTimestamp(value string) (time.Time, bool) { + if value == "" { + return time.Time{}, false + } + if ts, err := time.Parse(time.RFC3339Nano, value); err == nil { + return ts, true + } + if ts, err := time.Parse(time.RFC3339, value); err == nil { + return ts, true + } + return time.Time{}, false +} + // findTokenUsageFile searches for token-usage.jsonl in the run directory func findTokenUsageFile(runDir string) string { // Primary path: sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl diff --git a/pkg/cli/token_usage_test.go b/pkg/cli/token_usage_test.go index b7a4b4bab21..1e5ca341bf5 100644 --- a/pkg/cli/token_usage_test.go +++ b/pkg/cli/token_usage_test.go @@ -71,6 +71,39 @@ func TestParseTokenUsageFile(t *testing.T) { assert.InDelta(t, expectedEfficiency, summary.CacheEfficiency, 0.001, "cache efficiency") }) + t.Run("extracts ambient context from first chronological invocation", func(t *testing.T) { + tmpDir := testutil.TempDir(t, "token-usage") + filePath := filepath.Join(tmpDir, "token-usage.jsonl") + + content := `{"timestamp":"2026-04-01T17:58:00.000Z","request_id":"2","provider":"anthropic","model":"claude-sonnet-4-6","path":"/v1/messages","status":200,"streaming":true,"input_tokens":12,"output_tokens":10,"cache_read_tokens":99,"cache_write_tokens":0,"duration_ms":4000,"response_bytes":3000} +{"timestamp":"2026-04-01T17:56:00.000Z","request_id":"1","provider":"anthropic","model":"claude-sonnet-4-6","path":"/v1/messages","status":200,"streaming":true,"input_tokens":7,"output_tokens":5,"cache_read_tokens":3,"cache_write_tokens":0,"duration_ms":1000,"response_bytes":500}` + require.NoError(t, os.WriteFile(filePath, []byte(content+"\n"), 0o644), "should write test file") + + summary, err := parseTokenUsageFile(filePath, nil) + require.NoError(t, err, "should parse without error") + require.NotNil(t, summary, "should return non-nil summary") + require.NotNil(t, summary.AmbientContext, "ambient context should be present") + assert.Equal(t, 7, summary.AmbientContext.InputTokens, "ambient input tokens should come from first invocation") + assert.Equal(t, 3, summary.AmbientContext.CachedTokens, "ambient cached tokens should come from first invocation") + assert.Equal(t, 10, summary.AmbientContext.EffectiveTokens, "ambient effective tokens should be input + cached") + }) + + t.Run("ambient context defaults cached tokens to zero when absent", func(t *testing.T) { + tmpDir := testutil.TempDir(t, "token-usage") + filePath := filepath.Join(tmpDir, "token-usage.jsonl") + + content := `{"timestamp":"2026-04-01T17:56:00.000Z","request_id":"1","provider":"anthropic","model":"claude-sonnet-4-6","path":"/v1/messages","status":200,"streaming":true,"input_tokens":11,"output_tokens":5,"duration_ms":1000,"response_bytes":500}` + require.NoError(t, os.WriteFile(filePath, []byte(content+"\n"), 0o644), "should write test file") + + summary, err := parseTokenUsageFile(filePath, nil) + require.NoError(t, err, "should parse without error") + require.NotNil(t, summary, "should return non-nil summary") + require.NotNil(t, summary.AmbientContext, "ambient context should be present") + assert.Equal(t, 11, summary.AmbientContext.InputTokens, "ambient input tokens should match") + assert.Equal(t, 0, summary.AmbientContext.CachedTokens, "missing cached tokens should default to zero") + assert.Equal(t, 11, summary.AmbientContext.EffectiveTokens, "ambient effective tokens should fall back to input only") + }) + t.Run("empty file returns nil", func(t *testing.T) { tmpDir := testutil.TempDir(t, "token-usage") filePath := filepath.Join(tmpDir, "token-usage.jsonl")