github · pelikhan · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/docs/adr/28003-fallback-audit-metrics-without-aw-info.md b/docs/adr/28003-fallback-audit-metrics-without-aw-info.md
@@ -0,0 +1,77 @@
+# ADR-28003: Fallback Strategy for Audit Metrics When aw_info.json Is Absent
+
+**Date**: 2026-04-23
+**Status**: Draft
+**Deciders**: pelikhan
+
+---
+
+## Part 1 — Narrative (Human-Friendly)
+
+### Context
+
+The `gh aw audit` command aggregates run-level metrics (token usage, turn count, engine config) to produce audit reports. These metrics are primarily sourced from `aw_info.json`, a structured artifact written by newer workflow runs. Legacy runs that predate the introduction of `aw_info.json` do not produce this artifact, causing the audit command to emit `engine_config: null`, `metrics.token_usage: null`, and `metrics.turns: null` even when alternative data sources — `agent_usage.json` and raw agent log files (`agent-stdio.log`, `events.jsonl`) — are present in the run directory. This gap reduces the usefulness of audit reports for historical analysis and fleet-wide comparisons.
+
+### Decision
+
+We will implement a multi-level fallback strategy in the audit pipeline that recovers metrics from alternative artifacts and log files when `aw_info.json` is absent. For token usage, the pipeline will fall back to `agent_usage.json` when the firewall proxy `token-usage.jsonl` is unavailable. For engine config, the pipeline will infer the engine by parsing available log files with all registered engine parsers and selecting the parser that recovers the strongest signal (prioritizing turn count, then token usage, then tool calls). For turn count and token usage in the audit report, the pipeline will cascade through: run-level parsed metrics → artifact token summaries → log inference.
+
+### Alternatives Considered
+
+#### Alternative 1: Require aw_info.json and Backfill Historical Data
+
+Enforce `aw_info.json` as a mandatory artifact and run a one-time migration to retroactively populate it for historical runs. This was rejected because it requires coordinating infrastructure changes across all historical workflow runs and cannot recover data that was never recorded.
+
+#### Alternative 2: Surface Null Values and Document Limitations
+
+Accept `null` metric fields for older runs and document that pre-`aw_info.json` runs have incomplete audit data. This was rejected because it degrades the audit tool's utility for historical fleet analysis and provides no path forward for operators who need accurate metrics across their entire run history.
+
+### Consequences
+
+#### Positive
+- Audit reports are populated for legacy runs, enabling accurate historical fleet analysis.
+- The fallback chain is additive and non-destructive: runs with `aw_info.json` are unaffected.
+- `agent_usage.json` token data (including `effective_tokens`) is surfaced through the same `TokenUsageSummary` abstraction already used by the primary path.
+
+#### Negative
+- The audit pipeline now has three distinct code paths for metric acquisition, increasing complexity and surface area for bugs.
+- Inferred engine identification via log scoring is heuristic: the parser selection algorithm (weighted by turns, then token usage, then tool calls) may misidentify the engine when log content is ambiguous or shared across parsers.
+- `agent_usage.json` is treated as a single-request summary, so per-model and per-request breakdowns are not available via this fallback.
+
+#### Neutral
+- The `TokenUsageEntry` struct gains an `effective_tokens` field to accommodate `agent_usage.json` data; `token-usage.jsonl` entries omit this field and continue using computed effective token totals.
+- The engine inference function (`inferBestEngineMetricsFromContent`) iterates all registered engines and may add latency proportional to the number of registered parsers for runs without `aw_info.json`.
+
+---
+
+## Part 2 — Normative Specification (RFC 2119)
+
+> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).
+
+### Token Usage Acquisition
+
+1. Implementations **MUST** attempt to load token usage from `token-usage.jsonl` (firewall proxy log) first.
+2. Implementations **MUST** fall back to `agent_usage.json` when `token-usage.jsonl` is absent or cannot be located.
+3. Implementations **MUST NOT** apply custom token weight overrides (from `aw_info.json`) to the `agent_usage.json` fallback path, as custom weights are only meaningful alongside the firewall proxy data.
+4. Implementations **SHOULD** search for `agent_usage.json` at the root of the run directory before walking subdirectories, to minimize filesystem traversal.
+
+### Audit Metric Fallback Chain
+
+1. Implementations **MUST** populate `metrics.token_usage` by cascading through, in order: (1) run-level parsed log metrics, (2) `input_tokens + output_tokens` from the artifact `TokenUsageSummary`, (3) token usage inferred from log content.
+2. Implementations **MUST** populate `metrics.turns` by cascading through, in order: (1) run-level parsed log metrics, (2) turn count inferred from log content.
+3. Implementations **MUST NOT** overwrite a non-zero metric value with a fallback value from a lower-priority source.
+
+### Engine Config Inference
+
+1. When `aw_info.json` is absent, implementations **MUST** attempt engine inference by parsing available log files using all engines registered in the global engine registry.
+2. Implementations **MUST** select the inferred engine by maximising a weighted score: `turns * 100000 + len(tool_calls) * 1000 + token_usage`.
+3. Implementations **MUST NOT** return an inferred engine config if no registered engine parser recovers any useful signal (turns, token usage, or tool calls).
+4. Implementations **SHOULD** prefer `events.jsonl` over `agent-stdio.log` for engine inference when both are present.
+
+### Conformance
+
+An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.
+
+---
+
+*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24834078573) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
diff --git a/pkg/cli/audit_expanded.go b/pkg/cli/audit_expanded.go
@@ -2,6 +2,7 @@ package cli
 
 import (
 	"encoding/json"
+	"errors"
 	"fmt"
 	"os"
 	"path/filepath"
@@ -11,6 +12,7 @@ import (
 
 	"github.com/github/gh-aw/pkg/logger"
 	"github.com/github/gh-aw/pkg/timeutil"
+	"github.com/github/gh-aw/pkg/workflow"
 )
 
 var auditExpandedLog = logger.New("cli:audit_expanded")
@@ -112,13 +114,27 @@ func findAwInfoPath(logsPath string) string {
 
 // extractEngineConfig parses aw_info.json and returns an AuditEngineConfig
 func extractEngineConfig(logsPath string) *AuditEngineConfig {
+	return extractEngineConfigWithInferredEngine(logsPath, "")
+}
+
+func extractEngineConfigWithInferredEngine(logsPath, inferredEngineID string) *AuditEngineConfig {
 	if logsPath == "" {
 		return nil
 	}
 
 	awInfoPath := findAwInfoPath(logsPath)
 	if awInfoPath == "" {
 		auditExpandedLog.Printf("aw_info.json not found in %s", logsPath)
+		if inferredEngineID != "" {
+			registry := workflow.GetGlobalEngineRegistry()
+			if engine, err := registry.GetEngine(inferredEngineID); err == nil {
+				auditExpandedLog.Printf("Inferred engine config without aw_info.json: engine=%s", inferredEngineID)
+				return &AuditEngineConfig{
+					EngineID:   inferredEngineID,
+					EngineName: engine.GetDisplayName(),
+				}
+			}
+		}
 		return nil
 	}
 	awInfo, err := parseAwInfo(awInfoPath, false)
@@ -148,6 +164,89 @@ func extractEngineConfig(logsPath string) *AuditEngineConfig {
 	return config
 }
 
+func inferFallbackLogMetrics(logsPath string) (LogMetrics, string) {
+	if logsPath == "" {
+		return LogMetrics{}, ""
+	}
+
+	if eventsJSONLPath := findEventsJSONLFile(logsPath); eventsJSONLPath != "" {
+		if metrics, err := parseEventsJSONLFile(eventsJSONLPath, false); err == nil && hasUsefulFallbackMetrics(metrics) {
+			return metrics, "copilot"
+		}
+	}
+
+	agentLogPath := findAgentStdioLogPath(logsPath)
+	if agentLogPath == "" {
+		return LogMetrics{}, ""
+	}
+	content, err := os.ReadFile(agentLogPath)
+	if err != nil {
+		return LogMetrics{}, ""
+	}
+	return inferBestEngineMetricsFromContent(string(content))
+}
+
+func findAgentStdioLogPath(logsPath string) string {
+	root := filepath.Join(logsPath, "agent-stdio.log")
+	if _, err := os.Stat(root); err == nil {
+		return root
+	}
+
+	var found string
+	walkErr := filepath.Walk(logsPath, func(path string, info os.FileInfo, err error) error {
+		if err != nil || info == nil || info.IsDir() {
+			return nil
+		}
+		if info.Name() == "agent-stdio.log" {
+			found = path
+			return filepath.SkipAll
+		}
+		return nil
+	})
+	if walkErr != nil && !errors.Is(walkErr, filepath.SkipAll) {
+		auditExpandedLog.Printf("Failed while searching for agent-stdio.log in %s: %v", logsPath, walkErr)
+	}
+	return found
+}
+
+func hasUsefulFallbackMetrics(metrics LogMetrics) bool {
+	return metrics.TokenUsage > 0 || metrics.Turns > 0 || metrics.EstimatedCost > 0 || len(metrics.ToolCalls) > 0
+}
+
+func inferBestEngineMetricsFromContent(logContent string) (LogMetrics, string) {
+	registry := workflow.GetGlobalEngineRegistry()
+	engineIDs := registry.GetSupportedEngines()
+	const (
+		// Prioritize selecting parsers that recover turn count first (primary signal for audit quality),
+		// then token usage, then tool call shape.
+		fallbackTurnsWeight     = 100000
+		fallbackToolCallsWeight = 1000
+	)
+
+	var bestMetrics LogMetrics
+	var bestEngineID string
+	bestScore := -1
+
+	for _, engineID := range engineIDs {
+		engine, err := registry.GetEngine(engineID)
+		if err != nil {
+			continue
+		}
+		metrics := engine.ParseLogMetrics(logContent, false)
+		score := metrics.TokenUsage + (metrics.Turns * fallbackTurnsWeight) + (len(metrics.ToolCalls) * fallbackToolCallsWeight)
+		if score > bestScore {
+			bestScore = score
+			bestMetrics = metrics
+			bestEngineID = engineID
+		}
+	}
+
+	if !hasUsefulFallbackMetrics(bestMetrics) {
+		return LogMetrics{}, ""
+	}
+	return bestMetrics, bestEngineID
+}
+
 // extractPromptAnalysis reads prompt.txt and returns analysis metrics
 func extractPromptAnalysis(logsPath string) *PromptAnalysis {
 	if logsPath == "" {

diff --git a/pkg/cli/audit_expanded_test.go b/pkg/cli/audit_expanded_test.go
@@ -111,6 +111,29 @@ func TestExtractEngineConfigWithDetails(t *testing.T) {
 	assert.Equal(t, "org/repo", result.Repository, "Repository should match")
 }
 
+func TestExtractEngineConfigInferredWithoutAwInfo(t *testing.T) {
+	tmpDir := testutil.TempDir(t, "engine-infer-*")
+	logContent := `{"type":"result","subtype":"success","num_turns":3,"usage":{"input_tokens":100,"output_tokens":200}}`
+	require.NoError(t, os.WriteFile(filepath.Join(tmpDir, "agent-stdio.log"), []byte(logContent), 0o644))
+
+	_, inferredEngineID := inferFallbackLogMetrics(tmpDir)
+	result := extractEngineConfigWithInferredEngine(tmpDir, inferredEngineID)
+	require.NotNil(t, result, "Engine config should be inferred when aw_info.json is missing but agent log is available")
+	assert.NotEmpty(t, result.EngineID, "Inferred engine ID should not be empty")
+}
+
+func TestInferFallbackLogMetricsFindsNestedAgentStdioLog(t *testing.T) {
+	tmpDir := testutil.TempDir(t, "engine-infer-nested-*")
+	nestedDir := filepath.Join(tmpDir, "agent", "logs")
+	require.NoError(t, os.MkdirAll(nestedDir, 0o755))
+	logContent := `{"type":"result","subtype":"success","num_turns":4,"usage":{"input_tokens":120,"output_tokens":80}}`
+	require.NoError(t, os.WriteFile(filepath.Join(nestedDir, "agent-stdio.log"), []byte(logContent), 0o644))
+
+	metrics, inferredEngineID := inferFallbackLogMetrics(tmpDir)
+	assert.Positive(t, metrics.Turns, "Fallback metrics should be extracted from nested agent-stdio.log")
+	assert.NotEmpty(t, inferredEngineID, "Engine ID should be inferred from nested agent-stdio.log")
+}
+
 func TestExtractPromptAnalysis(t *testing.T) {
 	tests := []struct {
 		name            string

diff --git a/pkg/cli/audit_report.go b/pkg/cli/audit_report.go
@@ -279,6 +279,32 @@ func buildAuditData(processedRun ProcessedRun, metrics LogMetrics, mcpToolUsage
 		WarningCount:  run.WarningCount,
 	}
 
+	needsFallbackMetrics := metricsData.TokenUsage == 0 || metricsData.Turns == 0
+	needsFallbackEngineConfig := run.LogsPath != "" && findAwInfoPath(run.LogsPath) == ""
+	var fallbackMetrics LogMetrics
+	var inferredEngineID string
+	if run.LogsPath != "" && (needsFallbackMetrics || needsFallbackEngineConfig) {
+		fallbackMetrics, inferredEngineID = inferFallbackLogMetrics(run.LogsPath)
+	}
+
+	// Fallback token usage: when the run-level metric is missing/zero for older
+	// runs, use aggregated input+output tokens from agent_usage/token usage artifacts.
+	if metricsData.TokenUsage == 0 && processedRun.TokenUsage != nil {
+		metricsData.TokenUsage = processedRun.TokenUsage.TotalInputTokens + processedRun.TokenUsage.TotalOutputTokens
+	}
+	if metricsData.TokenUsage == 0 && metrics.TokenUsage > 0 {
+		metricsData.TokenUsage = metrics.TokenUsage
+	}
+	if metricsData.Turns == 0 && metrics.Turns > 0 {
+		metricsData.Turns = metrics.Turns
+	}
+	if metricsData.TokenUsage == 0 && fallbackMetrics.TokenUsage > 0 {
+		metricsData.TokenUsage = fallbackMetrics.TokenUsage
+	}
+	if metricsData.Turns == 0 && fallbackMetrics.Turns > 0 {
+		metricsData.Turns = fallbackMetrics.Turns
+	}
+
 	// Populate effective tokens from the firewall proxy summary when available,
 	// otherwise fall back to the effective tokens stored on the run itself.
 	if processedRun.TokenUsage != nil && processedRun.TokenUsage.TotalEffectiveTokens > 0 {
@@ -347,7 +373,7 @@ func buildAuditData(processedRun ProcessedRun, metrics LogMetrics, mcpToolUsage
 	performanceMetrics := generatePerformanceMetrics(processedRun, metricsData, toolUsage)
 
 	// Extract expanded audit data
-	engineConfig := extractEngineConfig(run.LogsPath)
+	engineConfig := extractEngineConfigWithInferredEngine(run.LogsPath, inferredEngineID)
 	promptAnalysis := extractPromptAnalysis(run.LogsPath)
 	sessionAnalysis := buildSessionAnalysis(processedRun, metrics)
 	safeOutputSummary := buildSafeOutputSummary(createdItems)

diff --git a/pkg/cli/audit_report_test.go b/pkg/cli/audit_report_test.go
@@ -896,6 +896,35 @@ func TestBuildAuditDataMinimal(t *testing.T) {
 	_ = auditData.Jobs
 }
 
+func TestBuildAuditDataFallbackMetricsWithoutAwInfo(t *testing.T) {
+	tmpDir := testutil.TempDir(t, "audit-fallback-*")
+	logContent := `{"type":"result","subtype":"success","num_turns":7,"usage":{"input_tokens":100,"output_tokens":200}}`
+	require.NoError(t, os.WriteFile(filepath.Join(tmpDir, "agent-stdio.log"), []byte(logContent), 0o644))
+
+	processedRun := ProcessedRun{
+		Run: WorkflowRun{
+			DatabaseID:   42,
+			WorkflowName: "Fallback Metrics Workflow",
+			Status:       "completed",
+			Conclusion:   "success",
+			LogsPath:     tmpDir,
+			TokenUsage:   0,
+			Turns:        0,
+		},
+		TokenUsage: &TokenUsageSummary{
+			TotalInputTokens:      5944,
+			TotalOutputTokens:     8698,
+			TotalEffectiveTokens:  243846,
+			TotalCacheReadTokens:  1170605,
+			TotalCacheWriteTokens: 86049,
+		},
+	}
+
+	auditData := buildAuditData(processedRun, workflow.LogMetrics{}, nil)
+	assert.Equal(t, 14642, auditData.Metrics.TokenUsage, "token usage should fall back to input+output from agent usage summary")
+	assert.Equal(t, 7, auditData.Metrics.Turns, "turns should fall back to inferred value from agent log")
+}
+
 func TestRenderJSONComplete(t *testing.T) {
 	auditData := AuditData{
 		Overview: OverviewData{