Skip to content

[WIP] Investigate CI failure in run #38646#19386

Closed
Copilot wants to merge 1 commit intomainfrom
copilot/investigate-ci-failure-again
Closed

[WIP] Investigate CI failure in run #38646#19386
Copilot wants to merge 1 commit intomainfrom
copilot/investigate-ci-failure-again

Conversation

Copy link
Contributor

Copilot AI commented Mar 3, 2026

  • Add workflow-level concurrency group to ci.yml so entire runs are cancelled atomically (conclusion cancelled) rather than individual jobs being cancelled mid-execution (leaving run as failure)
  • Update ci-doctor.md investigation protocol to detect and handle concurrency-only cancellations (call noop instead of creating a spurious issue)
  • Run make recompile to regenerate ci-doctor.lock.yml
  • Run make agent-finish to validate all changes
Original prompt

This section details on the original issue you should resolve

<issue_title>[CI Failure Doctor] CI Failure Investigation - Run #38646</issue_title>
<issue_description>### Summary
The CI run was aborted because the build, security, and fuzz jobs were canceled while they were still executing; none of their logs show any failing tests or compilation errors.

Failure Details

Root Cause Analysis

Each of the affected jobs (build, security, the fuzz matrix) declares a concurrency group with cancel-in-progress: true so that a newer run for the same branch cancels any earlier jobs (see .github/workflows/ci.yml lines 507‑514 for build, 1400‑1406 for security, and 1246‑1254 for fuzz). The logs show ##[error]The operation was canceled for every job (e.g., build job line 576, security job line 332, fuzz job lines 310 and 273) after they had already run through their steps, which strongly indicates GitHub canceled them because a later run claimed the concurrency slot rather than because of a test failure.

Failed Jobs and Errors

  • build##[error]The operation was canceled after make recompile finished.
  • security##[error]The operation was canceled while the fuzz seed corpus was about to run.
  • fuzz (Workflow-Triggers & Workflow-Parsing matrices) – both were canceled mid-matrix with the same ##[error]The operation was canceled message after FuzzParseTriggerShorthand/FuzzTemplateRendering were running.
Investigation Findings
  • The build job uses concurrency group ci-$\{\{ github.ref }}-build (lines 507-514 of ci.yml), so if a newer build job starts on the same ref the previous job is canceled automatically.
  • The security job is in group ci-$\{\{ github.ref }}-security (lines 1400-1406). Likewise, each fuzz matrix member joins ci-$\{\{ github.ref }}-fuzz-$\{\{ matrix.group }} (lines 1246-1254).
  • Because the cancellation happened after all of the usual steps (dependency download, make build, fuzz runs) and only the cancellation log entries remain, this run never exposed a reproducible code failure; a newer run likely grabbed the concurrency slot and caused the older jobs to shut down.

Recommended Actions

  • Re-run the CI workflow for b75f455b490ca570c5b0aa174644f0447933b0e9 (or the latest commit) after confirming no other run is already running for main, so the concurrency groups no longer cancel the job prematurely.

Prevention Strategies

  • Before re-running CI, check GitHub Actions for newer runs on main; if a newer run exists, wait for it to finish or cancel it so the concurrency groups do not automatically terminate this job.

AI Team Self-Improvement

When a failure only surfaces as ##[error]The operation was canceled, explicitly call out the concurrency groups, point out that there were no prior errors in the logs, and recommend rerunning the latest run instead of chasing phantom test failures.

Historical Context

Cancelled runs due to the cancel-in-progress: true concurrency groups are expected if multiple runs execute on the same branch; no other recent CI Failure Doctor investigations appear to remain open for this pattern.

🩺 Diagnosis provided by CI Failure Doctor ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d
  • expires on Mar 4, 2026, 9:54 AM UTC

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI Failure Doctor] CI Failure Investigation - Run #38646

2 participants