[audit-workflows] Daily Agentic Workflow Audit — 2026-04-28 #28988
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #29196. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Today's audit covers 218 workflow runs on 2026-04-28, spanning 29 unique run types. The fleet operated at a 97.7% success rate with 5 runs reporting errors. No missing tools were detected. One cache memory miss was recorded (Step Name Alignment). Total spend was $9.59 across 16.8M tokens.
This is the first audit run for this repository — no historical baselines are available yet. Baselines will accumulate over time.
Key Metrics
Engine distribution (last 24h): Claude: 4 runs · Copilot: 8 runs · Other/Unknown: 16 runs
Workflow Health Trends
Run activity peaked in the 19:00–21:00 UTC window, driven by bulk smoke test and CI runs. The 5 error-producing runs were spread across the day with no apparent clustering, suggesting isolated failures rather than a systemic issue.
Token & Cost Trends
Token spend is highly concentrated in a few high-value Claude runs (Serena Go Expert, Copilot CLI Deep Research, Audit Agent). The majority of runs (smoke tests, deployment monitors) are zero-cost infrastructure checks. The 3-hour moving average shows cost activity is front-loaded in the early UTC morning hours.
Issues Detected
missing_data(cache memory miss).🔍 Observability Insights (4 findings)
[Medium] Execution Drift — Test Quality Sentinel
Turn count varied from 4 to 12 across runs (avg 8.0). This suggests the workflow's task shape is not stable — either the prompt is underspecified, the input varies significantly, or the agent is handling edge cases differently. Recommend adding turn-count bounds or tightening the task prompt.
[Medium] Firewall Block Pressure — Daily Workflow Updater
9 out of 79 requests (11%) were blocked. This is the highest block pressure of any workflow today. Blocked domains include
proxy.golang.org:443andstorage.googleapis.com:443, suggesting Go module or artifact fetching is being firewalled. Review the allowed-domains list for this workflow.[Medium] 4 High-Anomaly Events
Cross-run log analysis flagged 4 events with anomaly score > 0.6 across 100 runs. These events deviate from learned event templates and warrant review. No specific root cause was identified today — monitor for recurrence.
[Info] All runs read-only in -24h window
Zero write-capable safe outputs in the last-24h window (all 100 recent runs stayed read-only). This is consistent with audit/monitoring workflows. No concern.
🔒 Firewall Summary
Blocked domain breakdown (via Daily Workflow Updater):
proxy.golang.org:443— Go module proxy (blocked)storage.googleapis.com:443— GCS artifact storage (partially blocked)(unknown)— unclassified requests (blocked)All blocking is happening within the Daily Workflow Updater. The other 5 workflows in the firewall log had 0 blocked requests.
🛠️ MCP Server Usage
Top tools:
Read(27 calls),serena_get_symbols_overview(12),serena_find_file(11),github_search_issues(6),serena_list_dir(5)Recommendations
Daily Workflow Updater — firewall review: The 11% block rate for Go-related domains suggests the allowed-domains list needs updating. Add
proxy.golang.org:443andstorage.googleapis.com:443if Go module fetching is a legitimate workflow requirement.Test Quality Sentinel — prompt stabilization: The 4–12 turn variance signals an unstable task shape. Consider adding explicit scope bounds to the prompt (e.g., "analyze at most N files") or reviewing what input triggers high-turn runs.
Smoke Claude run investigation: The cancelled run (25075888667) recorded 4 errors but 0 tokens — this pattern suggests a pre-agent infrastructure failure rather than an agent error. Worth checking if the underlying smoke test trigger has a known flakiness issue.
Step Name Alignment — cache seeding: The
missing_datacall indicates the cache was empty when this workflow ran. If this workflow depends on cache data populated by another workflow, check the execution order.Establish baselines: This is the first audit run. After 5–7 days of data accumulates, meaningful trend baselines will be available for error rate, cost, and turn count comparisons.
References:
Beta Was this translation helpful? Give feedback.
All reactions