Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions docs/adr/28003-fallback-audit-metrics-without-aw-info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# ADR-28003: Fallback Strategy for Audit Metrics When aw_info.json Is Absent

**Date**: 2026-04-23
**Status**: Draft
**Deciders**: pelikhan

---

## Part 1 — Narrative (Human-Friendly)

### Context

The `gh aw audit` command aggregates run-level metrics (token usage, turn count, engine config) to produce audit reports. These metrics are primarily sourced from `aw_info.json`, a structured artifact written by newer workflow runs. Legacy runs that predate the introduction of `aw_info.json` do not produce this artifact, causing the audit command to emit `engine_config: null`, `metrics.token_usage: null`, and `metrics.turns: null` even when alternative data sources — `agent_usage.json` and raw agent log files (`agent-stdio.log`, `events.jsonl`) — are present in the run directory. This gap reduces the usefulness of audit reports for historical analysis and fleet-wide comparisons.

### Decision

We will implement a multi-level fallback strategy in the audit pipeline that recovers metrics from alternative artifacts and log files when `aw_info.json` is absent. For token usage, the pipeline will fall back to `agent_usage.json` when the firewall proxy `token-usage.jsonl` is unavailable. For engine config, the pipeline will infer the engine by parsing available log files with all registered engine parsers and selecting the parser that recovers the strongest signal (prioritizing turn count, then token usage, then tool calls). For turn count and token usage in the audit report, the pipeline will cascade through: run-level parsed metrics → artifact token summaries → log inference.

### Alternatives Considered

#### Alternative 1: Require aw_info.json and Backfill Historical Data

Enforce `aw_info.json` as a mandatory artifact and run a one-time migration to retroactively populate it for historical runs. This was rejected because it requires coordinating infrastructure changes across all historical workflow runs and cannot recover data that was never recorded.

#### Alternative 2: Surface Null Values and Document Limitations

Accept `null` metric fields for older runs and document that pre-`aw_info.json` runs have incomplete audit data. This was rejected because it degrades the audit tool's utility for historical fleet analysis and provides no path forward for operators who need accurate metrics across their entire run history.

### Consequences

#### Positive
- Audit reports are populated for legacy runs, enabling accurate historical fleet analysis.
- The fallback chain is additive and non-destructive: runs with `aw_info.json` are unaffected.
- `agent_usage.json` token data (including `effective_tokens`) is surfaced through the same `TokenUsageSummary` abstraction already used by the primary path.

#### Negative
- The audit pipeline now has three distinct code paths for metric acquisition, increasing complexity and surface area for bugs.
- Inferred engine identification via log scoring is heuristic: the parser selection algorithm (weighted by turns, then token usage, then tool calls) may misidentify the engine when log content is ambiguous or shared across parsers.
- `agent_usage.json` is treated as a single-request summary, so per-model and per-request breakdowns are not available via this fallback.

#### Neutral
- The `TokenUsageEntry` struct gains an `effective_tokens` field to accommodate `agent_usage.json` data; `token-usage.jsonl` entries omit this field and continue using computed effective token totals.
- The engine inference function (`inferBestEngineMetricsFromContent`) iterates all registered engines and may add latency proportional to the number of registered parsers for runs without `aw_info.json`.

---

## Part 2 — Normative Specification (RFC 2119)

> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).

### Token Usage Acquisition

1. Implementations **MUST** attempt to load token usage from `token-usage.jsonl` (firewall proxy log) first.
2. Implementations **MUST** fall back to `agent_usage.json` when `token-usage.jsonl` is absent or cannot be located.
3. Implementations **MUST NOT** apply custom token weight overrides (from `aw_info.json`) to the `agent_usage.json` fallback path, as custom weights are only meaningful alongside the firewall proxy data.
4. Implementations **SHOULD** search for `agent_usage.json` at the root of the run directory before walking subdirectories, to minimize filesystem traversal.

### Audit Metric Fallback Chain

1. Implementations **MUST** populate `metrics.token_usage` by cascading through, in order: (1) run-level parsed log metrics, (2) `input_tokens + output_tokens` from the artifact `TokenUsageSummary`, (3) token usage inferred from log content.
2. Implementations **MUST** populate `metrics.turns` by cascading through, in order: (1) run-level parsed log metrics, (2) turn count inferred from log content.
3. Implementations **MUST NOT** overwrite a non-zero metric value with a fallback value from a lower-priority source.

### Engine Config Inference

1. When `aw_info.json` is absent, implementations **MUST** attempt engine inference by parsing available log files using all engines registered in the global engine registry.
2. Implementations **MUST** select the inferred engine by maximising a weighted score: `turns * 100000 + len(tool_calls) * 1000 + token_usage`.
3. Implementations **MUST NOT** return an inferred engine config if no registered engine parser recovers any useful signal (turns, token usage, or tool calls).
4. Implementations **SHOULD** prefer `events.jsonl` over `agent-stdio.log` for engine inference when both are present.

### Conformance

An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.

---

*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24834078573) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
99 changes: 99 additions & 0 deletions pkg/cli/audit_expanded.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package cli

import (
"encoding/json"
"errors"
"fmt"
"os"
"path/filepath"
Expand All @@ -11,6 +12,7 @@ import (

"github.com/github/gh-aw/pkg/logger"
"github.com/github/gh-aw/pkg/timeutil"
"github.com/github/gh-aw/pkg/workflow"
)

var auditExpandedLog = logger.New("cli:audit_expanded")
Expand Down Expand Up @@ -112,13 +114,27 @@ func findAwInfoPath(logsPath string) string {

// extractEngineConfig parses aw_info.json and returns an AuditEngineConfig
func extractEngineConfig(logsPath string) *AuditEngineConfig {
return extractEngineConfigWithInferredEngine(logsPath, "")
}

func extractEngineConfigWithInferredEngine(logsPath, inferredEngineID string) *AuditEngineConfig {
if logsPath == "" {
return nil
}

awInfoPath := findAwInfoPath(logsPath)
if awInfoPath == "" {
auditExpandedLog.Printf("aw_info.json not found in %s", logsPath)
if inferredEngineID != "" {
registry := workflow.GetGlobalEngineRegistry()
if engine, err := registry.GetEngine(inferredEngineID); err == nil {
auditExpandedLog.Printf("Inferred engine config without aw_info.json: engine=%s", inferredEngineID)
return &AuditEngineConfig{
EngineID: inferredEngineID,
EngineName: engine.GetDisplayName(),
}
}
}
return nil
}
awInfo, err := parseAwInfo(awInfoPath, false)
Expand Down Expand Up @@ -148,6 +164,89 @@ func extractEngineConfig(logsPath string) *AuditEngineConfig {
return config
}

func inferFallbackLogMetrics(logsPath string) (LogMetrics, string) {
if logsPath == "" {
return LogMetrics{}, ""
}

if eventsJSONLPath := findEventsJSONLFile(logsPath); eventsJSONLPath != "" {
if metrics, err := parseEventsJSONLFile(eventsJSONLPath, false); err == nil && hasUsefulFallbackMetrics(metrics) {
return metrics, "copilot"
}
}

agentLogPath := findAgentStdioLogPath(logsPath)
if agentLogPath == "" {
return LogMetrics{}, ""
}
content, err := os.ReadFile(agentLogPath)
if err != nil {
return LogMetrics{}, ""
}
return inferBestEngineMetricsFromContent(string(content))
}

func findAgentStdioLogPath(logsPath string) string {
root := filepath.Join(logsPath, "agent-stdio.log")
if _, err := os.Stat(root); err == nil {
return root
}

var found string
walkErr := filepath.Walk(logsPath, func(path string, info os.FileInfo, err error) error {
if err != nil || info == nil || info.IsDir() {
return nil
}
if info.Name() == "agent-stdio.log" {
found = path
return filepath.SkipAll
}
return nil
})
if walkErr != nil && !errors.Is(walkErr, filepath.SkipAll) {
auditExpandedLog.Printf("Failed while searching for agent-stdio.log in %s: %v", logsPath, walkErr)
}
return found
}

func hasUsefulFallbackMetrics(metrics LogMetrics) bool {
return metrics.TokenUsage > 0 || metrics.Turns > 0 || metrics.EstimatedCost > 0 || len(metrics.ToolCalls) > 0
}

func inferBestEngineMetricsFromContent(logContent string) (LogMetrics, string) {
registry := workflow.GetGlobalEngineRegistry()
engineIDs := registry.GetSupportedEngines()
const (
// Prioritize selecting parsers that recover turn count first (primary signal for audit quality),
// then token usage, then tool call shape.
fallbackTurnsWeight = 100000
fallbackToolCallsWeight = 1000
)

var bestMetrics LogMetrics
var bestEngineID string
bestScore := -1

for _, engineID := range engineIDs {
engine, err := registry.GetEngine(engineID)
if err != nil {
continue
}
metrics := engine.ParseLogMetrics(logContent, false)
score := metrics.TokenUsage + (metrics.Turns * fallbackTurnsWeight) + (len(metrics.ToolCalls) * fallbackToolCallsWeight)
if score > bestScore {
bestScore = score
bestMetrics = metrics
bestEngineID = engineID
}
}

if !hasUsefulFallbackMetrics(bestMetrics) {
return LogMetrics{}, ""
}
return bestMetrics, bestEngineID
}

// extractPromptAnalysis reads prompt.txt and returns analysis metrics
func extractPromptAnalysis(logsPath string) *PromptAnalysis {
if logsPath == "" {
Expand Down
23 changes: 23 additions & 0 deletions pkg/cli/audit_expanded_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,29 @@ func TestExtractEngineConfigWithDetails(t *testing.T) {
assert.Equal(t, "org/repo", result.Repository, "Repository should match")
}

func TestExtractEngineConfigInferredWithoutAwInfo(t *testing.T) {
tmpDir := testutil.TempDir(t, "engine-infer-*")
logContent := `{"type":"result","subtype":"success","num_turns":3,"usage":{"input_tokens":100,"output_tokens":200}}`
require.NoError(t, os.WriteFile(filepath.Join(tmpDir, "agent-stdio.log"), []byte(logContent), 0o644))

_, inferredEngineID := inferFallbackLogMetrics(tmpDir)
result := extractEngineConfigWithInferredEngine(tmpDir, inferredEngineID)
require.NotNil(t, result, "Engine config should be inferred when aw_info.json is missing but agent log is available")
assert.NotEmpty(t, result.EngineID, "Inferred engine ID should not be empty")
}

func TestInferFallbackLogMetricsFindsNestedAgentStdioLog(t *testing.T) {
tmpDir := testutil.TempDir(t, "engine-infer-nested-*")
nestedDir := filepath.Join(tmpDir, "agent", "logs")
require.NoError(t, os.MkdirAll(nestedDir, 0o755))
logContent := `{"type":"result","subtype":"success","num_turns":4,"usage":{"input_tokens":120,"output_tokens":80}}`
require.NoError(t, os.WriteFile(filepath.Join(nestedDir, "agent-stdio.log"), []byte(logContent), 0o644))

metrics, inferredEngineID := inferFallbackLogMetrics(tmpDir)
assert.Positive(t, metrics.Turns, "Fallback metrics should be extracted from nested agent-stdio.log")
assert.NotEmpty(t, inferredEngineID, "Engine ID should be inferred from nested agent-stdio.log")
}

func TestExtractPromptAnalysis(t *testing.T) {
tests := []struct {
name string
Expand Down
28 changes: 27 additions & 1 deletion pkg/cli/audit_report.go
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,32 @@ func buildAuditData(processedRun ProcessedRun, metrics LogMetrics, mcpToolUsage
WarningCount: run.WarningCount,
}

needsFallbackMetrics := metricsData.TokenUsage == 0 || metricsData.Turns == 0
needsFallbackEngineConfig := run.LogsPath != "" && findAwInfoPath(run.LogsPath) == ""
var fallbackMetrics LogMetrics
var inferredEngineID string
if run.LogsPath != "" && (needsFallbackMetrics || needsFallbackEngineConfig) {
fallbackMetrics, inferredEngineID = inferFallbackLogMetrics(run.LogsPath)
}

// Fallback token usage: when the run-level metric is missing/zero for older
// runs, use aggregated input+output tokens from agent_usage/token usage artifacts.
if metricsData.TokenUsage == 0 && processedRun.TokenUsage != nil {
metricsData.TokenUsage = processedRun.TokenUsage.TotalInputTokens + processedRun.TokenUsage.TotalOutputTokens
}
if metricsData.TokenUsage == 0 && metrics.TokenUsage > 0 {
metricsData.TokenUsage = metrics.TokenUsage
}
if metricsData.Turns == 0 && metrics.Turns > 0 {
metricsData.Turns = metrics.Turns
}
if metricsData.TokenUsage == 0 && fallbackMetrics.TokenUsage > 0 {
metricsData.TokenUsage = fallbackMetrics.TokenUsage
}
if metricsData.Turns == 0 && fallbackMetrics.Turns > 0 {
metricsData.Turns = fallbackMetrics.Turns
}

// Populate effective tokens from the firewall proxy summary when available,
// otherwise fall back to the effective tokens stored on the run itself.
if processedRun.TokenUsage != nil && processedRun.TokenUsage.TotalEffectiveTokens > 0 {
Expand Down Expand Up @@ -347,7 +373,7 @@ func buildAuditData(processedRun ProcessedRun, metrics LogMetrics, mcpToolUsage
performanceMetrics := generatePerformanceMetrics(processedRun, metricsData, toolUsage)

// Extract expanded audit data
engineConfig := extractEngineConfig(run.LogsPath)
engineConfig := extractEngineConfigWithInferredEngine(run.LogsPath, inferredEngineID)
promptAnalysis := extractPromptAnalysis(run.LogsPath)
sessionAnalysis := buildSessionAnalysis(processedRun, metrics)
safeOutputSummary := buildSafeOutputSummary(createdItems)
Expand Down
29 changes: 29 additions & 0 deletions pkg/cli/audit_report_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -896,6 +896,35 @@ func TestBuildAuditDataMinimal(t *testing.T) {
_ = auditData.Jobs
}

func TestBuildAuditDataFallbackMetricsWithoutAwInfo(t *testing.T) {
tmpDir := testutil.TempDir(t, "audit-fallback-*")
logContent := `{"type":"result","subtype":"success","num_turns":7,"usage":{"input_tokens":100,"output_tokens":200}}`
require.NoError(t, os.WriteFile(filepath.Join(tmpDir, "agent-stdio.log"), []byte(logContent), 0o644))

processedRun := ProcessedRun{
Run: WorkflowRun{
DatabaseID: 42,
WorkflowName: "Fallback Metrics Workflow",
Status: "completed",
Conclusion: "success",
LogsPath: tmpDir,
TokenUsage: 0,
Turns: 0,
},
TokenUsage: &TokenUsageSummary{
TotalInputTokens: 5944,
TotalOutputTokens: 8698,
TotalEffectiveTokens: 243846,
TotalCacheReadTokens: 1170605,
TotalCacheWriteTokens: 86049,
},
}

auditData := buildAuditData(processedRun, workflow.LogMetrics{}, nil)
assert.Equal(t, 14642, auditData.Metrics.TokenUsage, "token usage should fall back to input+output from agent usage summary")
assert.Equal(t, 7, auditData.Metrics.Turns, "turns should fall back to inferred value from agent log")
}

func TestRenderJSONComplete(t *testing.T) {
auditData := AuditData{
Overview: OverviewData{
Expand Down
Loading