Skip to content

[otel-advisor] OTel improvement: add env-var fallbacks for gh-aw.workflow.name in conclusion spans #26299

@github-actions

Description

@github-actions

📡 OTel Instrumentation Improvement: gh-aw.workflow.name missing from conclusion spans on failed runs

Analysis Date: 2026-04-14
Priority: High
Effort: Small (< 2h)

Problem

sendJobConclusionSpan resolves workflowName exclusively from /tmp/gh-aw/aw_info.json:

// send_otlp_span.cjs line 684
const workflowName = awInfo.workflow_name || "";

aw_info.json is written by the agent job step. When a job fails before the agent runs — or on non-agent jobs (safe-outputs, activation-only, check jobs) — aw_info.json is absent or empty. The conclusion span then emits gh-aw.workflow.name: "".

By contrast, sendJobSetupSpan has a three-layer fallback (line 486):

const workflowName = process.env.GH_AW_INFO_WORKFLOW_NAME || process.env.GITHUB_WORKFLOW || "";

GH_AW_INFO_WORKFLOW_NAME is injected by the gh-aw compiler at the job level and is always present in the runner environment, including in the post-step where conclusion spans are sent. So conclusion spans can read it — they just don't today.

Why This Matters (DevOps Perspective)

The conclusion span is the primary failure signal: it carries gh-aw.agent.conclusion, error events, token counts, and rate-limit data. Engineers debugging outages filter by gh-aw.workflow.name to scope dashboards to a specific workflow.

Because conclusion spans for failed runs have an empty workflow name, those spans are invisible in any Grafana/Honeycomb/Datadog panel that groups or filters by gh-aw.workflow.name. The failure case — exactly the signal that matters most for MTTR reduction — drops out of the view. The result is that failure-rate panels built on conclusion spans silently undercount failures per workflow.

Current Behavior

// Current: actions/setup/js/send_otlp_span.cjs (line 684)
// Only reads aw_info.json; no env-var fallback.
const workflowName = awInfo.workflow_name || "";

If aw_info.json is missing, gh-aw.workflow.name is set to "" in the OTLP span and in the JSONL mirror.

For comparison, the setup span (line 486) reads:

// actions/setup/js/send_otlp_span.cjs (line 486)
const workflowName = process.env.GH_AW_INFO_WORKFLOW_NAME || process.env.GITHUB_WORKFLOW || "";

Proposed Change

Add the same env-var fallback chain to sendJobConclusionSpan:

// Proposed: actions/setup/js/send_otlp_span.cjs (line 684)
// Mirror the three-layer fallback used by sendJobSetupSpan so conclusion spans
// always carry workflow name even when aw_info.json is absent (early failures,
// non-agent jobs).
const workflowName = awInfo.workflow_name || process.env.GH_AW_INFO_WORKFLOW_NAME || process.env.GITHUB_WORKFLOW || "";

No other changes are needed — workflowName is already passed to buildAttr("gh-aw.workflow.name", workflowName) on the next line.

Expected Outcome

After this change:

  • In Grafana / Honeycomb / Datadog: gh-aw.workflow.name will be populated in conclusion spans for all jobs — including failed runs that never reached the agent step. Failure-rate panels grouped by workflow name will no longer silently drop failed jobs.
  • In the JSONL mirror: Every conclusion span in /tmp/gh-aw/otel.jsonl will include the correct gh-aw.workflow.name, making artifact-based post-hoc debugging consistent with what is sent to the OTLP backend.
  • For on-call engineers: Querying gh-aw.workflow.name = "triage-issues" will now return both setup and conclusion spans, enabling end-to-end trace correlation without gaps.
Implementation Steps
  • In actions/setup/js/send_otlp_span.cjs, change line 684 from const workflowName = awInfo.workflow_name || ""; to const workflowName = awInfo.workflow_name || process.env.GH_AW_INFO_WORKFLOW_NAME || process.env.GITHUB_WORKFLOW || "";
  • Add a test in actions/setup/js/action_conclusion_otlp.test.cjs (or send_otlp_span.test.cjs) asserting that when aw_info.json is absent but GH_AW_INFO_WORKFLOW_NAME is set, the conclusion span carries the correct gh-aw.workflow.name attribute
  • Run cd actions/setup/js && npx vitest run (or make test-unit) to confirm all tests pass
  • Run make fmt to ensure formatting
  • Open a PR referencing this issue

Evidence from Static Code Analysis

No Sentry MCP was available for live span queries in this environment. The gap is confirmed by direct code inspection:

Location workflowName resolution
sendJobSetupSpansend_otlp_span.cjs:486 process.env.GH_AW_INFO_WORKFLOW_NAME || process.env.GITHUB_WORKFLOW || "" ✅ robust
sendJobConclusionSpansend_otlp_span.cjs:684 awInfo.workflow_name || "" ⚠️ no env-var fallback

The asymmetry is deliberate enough to have separate, named variables but the fallback chain was not carried forward when the conclusion span was written. Because GH_AW_INFO_WORKFLOW_NAME is a compiler-injected job-level env var (documented in the JSDoc at line 428), it is available in the post-step runner environment where conclusion spans are emitted.

Related Files

  • actions/setup/js/send_otlp_span.cjs — primary change location (line 684)
  • actions/setup/js/action_conclusion_otlp.cjs — orchestrates conclusion span (no change needed)
  • actions/setup/js/action_conclusion_otlp.test.cjs — test coverage to add
  • actions/setup/js/send_otlp_span.test.cjs — optional unit test for the fallback logic

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by Daily OTel Instrumentation Advisor · ● 161K ·

  • expires on Apr 21, 2026, 9:30 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions