Skip to content

CI: CDash status check is sticky-fail per commit SHA, blocks PRs after Azure rerun #6033

@hjmjohnson

Description

@hjmjohnson

Problem

The CDash GitHub commit status on PRs is sticky-fail per commit SHA: once any build for a given SHA submits a failing result to open.cdash.org, the CDash check stays red even when subsequent reruns of the same Azure DevOps pipeline submit a passing result for the same SHA.

This was observed on #6028 (commit a1a983d8a4), where:

  1. The initial ITK.macOS Azure pipeline reported one flaky test failure
  2. The macOS test failure was unrelated to the PR (FFTW is not enabled in that pipeline)
  3. The Azure pipeline was retriggered with /azp run and passed cleanly
  4. The Azure check turned green, but the CDash check remained red, blocking merge
$ gh pr checks 6028 --repo InsightSoftwareConsortium/ITK | grep -E 'fail|CDash'
CDash       fail    0    https://open.cdash.org/index.php?project=Insight&value1=a1a983d8a4...

The 0 duration and the URL pattern indicate this is a legacy GitHub commit Status (not a Check Run), posted by the open.cdash.org GitHub App when ITK's ctest_submit reports build/test results.

Root cause

itk_common.cmake (on the dashboard branch) submits CTest results to open.cdash.org via ctest_submit. The CDash server's GitHub App integration then computes a single aggregate status across all submissions ever received for that SHA and posts it to GitHub. Once any build records a failure for the SHA, subsequent submissions cannot improve the aggregate from "any failed" to "all passed" — the GitHub status stays red.

This makes flaky tests in any one pipeline (Azure macOS, Linux, Windows, GitHub Actions ARM, etc.) into a permanent merge blocker for that commit, even after the flake is reproduced-not.

Candidate solutions

Option 1 — Override CDash status from each workflow (smallest change)

Add a final "Refresh CDash status" step to every workflow that submits to CDash. The legacy Statuses API is last-write-wins for a given context, so a fresh state=success posted from the workflow after a green run will overwrite any stale red.

Example for .github/workflows/arm.yml:

- name: Refresh CDash status
  if: always()
  env:
    GH_TOKEN: \${{ secrets.GITHUB_TOKEN }}
    STATE: \${{ job.status == 'success' && 'success' || 'failure' }}
    SHA: \${{ github.event.pull_request.head.sha || github.sha }}
  run: |
    gh api repos/\${{ github.repository }}/statuses/\$SHA \\
      -f state=\$STATE \\
      -f context=CDash \\
      -f description=\"Refreshed by \${{ github.workflow }}\" \\
      -f target_url=https://open.cdash.org/index.php?project=Insight\\&value1=\$SHA

The same block can be added to the Azure DevOps pipeline YAMLs as a final task.

Pros: small, local, no infrastructure changes.
Cons: races with the open.cdash.org GitHub App — if CDash re-aggregates after the override is posted, the stale red can return. Forked PRs need elevated token permissions (pull_request_target or PAT secret).

Option 2 — Disable the CDash GitHub App and post status entirely from workflows

Stop relying on open.cdash.org's auto-status integration. Each workflow becomes the sole source of the CDash status, posting it after ctest_submit returns.

Pros: removes the race entirely, gives ITK full control over what "CDash" means as a check.
Cons: requires coordinating across all workflows + the Azure pipelines simultaneously; loses the per-build-detail status the CDash App provides.

Option 3 — Configure open.cdash.org to report latest-submission only (server-side fix)

If the CDash admin UI for project=Insight exposes "GitHub status: latest submission per SHA" vs "aggregate per SHA", switching to latest-only would fix this without any repo changes.

Pros: zero workflow churn.
Cons: requires CDash admin access at open.cdash.org and may not be configurable per-project; semantics of "latest" still need to be defined for parallel pipelines submitting to the same SHA.

Option 4 — Move CDash result reporting into a Check Run

Replace the legacy Status with a Check Run created/updated by a dedicated workflow listening for Azure DevOps webhook events. Check Runs support an explicit "Re-run" button in the GitHub UI and don't aggregate the same way.

Pros: users can re-run from the PR UI; better diagnostics.
Cons: largest implementation effort; requires a webhook receiver or polling job.

Suggested next step

Start with Option 1 as a stopgap (add the refresh step to arm.yml, pixi.yml, and Azure DevOps pipeline YAMLs). Confirm it clears stale fails on a test PR. Investigate Option 3 in parallel since it would obsolete Option 1.

Reproduction

References

  • Modules/ThirdParty/.../ITK-dashboard/itk_common.cmake (on dashboard branch) — ctest_submit call
  • .github/workflows/arm.yml — example of a workflow that drives ctest -S ITK-dashboard/dashboard.cmake
  • GitHub Statuses API: https://docs.github.com/en/rest/commits/statuses

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions