[observability] Agentic Workflow Observability Report — 2026-04-20 #27292
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Observability Kit. A newer discussion is available at Discussion #27636. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Observability window covers 2026-04-20 02:20Z – 09:00Z (~7 hours). A 14-day window was requested but the data collection hit a timeout after 50 runs. A continuation cursor is available for deeper historical analysis. All episodes are
standalonewith no lineage edges — no orchestrated DAGs were detected in this snapshot.Executive Summary
50 runs across 35 workflows were analyzed. No runs were classified
risky; 4 werechanged(all due toturns_increase). The dominant operational signal is pervasive resource heaviness: 17 of 50 runs (34%) carry a high-severityresource_heavy_for_domainassessment, distributed across 13 distinct workflows. Three workflows showpoor_agentic_control. One workflow — Design Decision Gate 🏗️ — crosses the escalation threshold with 2 consecutive runs showing high resource heaviness and increasing turn counts.A separate concern: Daily CLI Tools Exploratory Tester logged 42 blocked network requests in a single run, and Daily CLI Performance Agent logged 19. These blocked-request counts stand out and warrant access-policy review.
Key Metrics
riskychangedresource_heavy_for_domainpoor_agentic_controlpartially_reducible(medium+)Highest Risk Episodes
Design Decision Gate 🏗️ — Issue Response / General Automation, claude engine
This is the only workflow crossing an escalation threshold. Across 3 runs today, turns escalated from 4 → 8 → 10 with cohort-matched baselines confirming
turns_increaseon the last two. Both later runs carry highresource_heavy_for_domainand the most recent also carries mediumpoor_agentic_control(exploratory execution, broad tool usage, 9 tool types). The agentic fraction is ~50%, meaning half the work could be moved to deterministic pre-steps.resource_heavy_for_domainHIGHresource_heavy_for_domainHIGH +poor_agentic_controlMEDIUMAI Moderator — codex engine, 6/6 runs blocked
Every single run in the window has at least 1 blocked request. One run additionally carries
resource_heavy_for_domainmedium. The consistent blocking pattern (not an occasional spike) suggests the workflow is routinely attempting access that the firewall disallows. This is a design-level issue worth reviewing.Daily CLI Tools Exploratory Tester — copilot engine
Single run §24650053759 logged 42 blocked requests and 28 turns. Assessment: high
resource_heavy_for_domain+ mediumpartially_reducible(92% of turns could move to deterministic steps). This run has minimal agentic fraction (0.07) — the task is largely read-only data gathering masquerading as an agentic workflow.Episode Regressions
turns_increase× 2turns_increase× 1turns_increase× 1Recommended Actions
Design Decision Gate 🏗️ — Tighten instructions, reduce tool breadth from 9 types. Move ~50% data-gathering steps to pre-agent deterministic frontmatter steps. Consider a smaller model (
claude-haiku-4-5). An escalation issue has been filed.Daily CLI Tools Exploratory Tester — With 92% data-gathering turns and 42 blocked requests, this workflow is over-engineered for its task. Restructure as a deterministic script with a thin agentic post-step, and review what domains are being blocked.
Daily CLI Performance Agent — 19 blocked requests indicates persistent network policy friction. Review
network.allowedin the workflow frontmatter.AI Moderator — Investigate why every run results in a blocked request. The 6/6 blocking rate is not noise; it's a systematic access mismatch. Review what the agent is trying to reach and whether it should be allowed.
Documentation Unbloat / Layout Specification Maintainer / jsweep — These are the top token consumers (4.9M, 3.5M, 2.4M tokens). All are single runs but worth monitoring for cost efficiency. Layout Specification Maintainer also shows
poor_agentic_control.Per-Workflow Resource Profile (all 35 workflows)
Behavioral Fingerprint Breakdown
The
exploratory / selective_write / heavycohort (4 runs) is the highest-priority group. These runs take broad exploratory action AND write output. Worth checking whether write posture is intentional for each.Optimization Candidates (low-urgency)
These workflows are consistently lean, directed, and narrow — good candidates for deterministic automation if they require no real inference:
Workflows always using
latest_successfallback (no cohort match possible yet): Most single-run workflows fall here by definition. As they accumulate history, baseline quality will improve.Data Collection Notes
logstool hit a timeout after ~2 minutes. A continuation cursor (before_run_id: 24645419031, start_date: 2026-04-06) is available for deeper historical analysis.standalone— noedges[]were populated, meaning no multi-workflow DAGs were detected. This could reflect the true topology (all workflows run independently) or a lineage detection gap if any workflows triggered others.none), likely slash-command or manual triggers with incomplete metadata.partially_reducibleassessment accuracy.References:
Beta Was this translation helpful? Give feedback.
All reactions