Skip to content

fix(ci): add merge_group trigger to Merge Gate so it reports in queue#921

Merged
danielmeppiel merged 3 commits intomainfrom
fix/merge-gate-merge-queue
Apr 24, 2026
Merged

fix(ci): add merge_group trigger to Merge Gate so it reports in queue#921
danielmeppiel merged 3 commits intomainfrom
fix/merge-gate-merge-queue

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Problem

Branch protection / merge-queue ruleset requires the gate check, but merge-gate.yml only triggered on pull_request and workflow_dispatch. In the merge queue, GitHub fires merge_group events against a temp merge commit -- the gate check was never created on that SHA, so PRs sat in the queue with gate stuck in "Expected -- Waiting for status to be reported" indefinitely.

Concrete example: PR #899. All real checks green, only gate perpetually pending, blocking the merge.

Fix

Single-authority gate pattern preserved -- one ruleset entry, one mental model -- now extended to merge queue.

.github/workflows/merge-gate.yml

  • Add merge_group trigger (types: [checks_requested]) alongside existing pull_request + workflow_dispatch.
  • Resolve head SHA per event:
    • workflow_dispatch -> gh api .../pulls/N --jq .head.sha
    • merge_group -> github.event.merge_group.head_sha (the temp merge commit, NOT the PR head)
    • pull_request -> github.event.pull_request.head.sha
  • Branch EXPECTED_CHECKS by event:
    • PR / dispatch: Build & Test (Linux),APM Self-Check
    • Merge queue: above + Build (Linux),Smoke Test (Linux),Integration Tests (Linux),Release Validation (Linux) (the ci-integration.yml checks)
  • Bump TIMEOUT_MIN 30 -> 55 and job timeout-minutes 35 -> 60. ci-integration.yml's theoretical worst-case critical path is ~50m (Build -> Smoke -> Integration[20m] -> Release Validation[20m]); current observation is ~5m. Sized for growth.
  • Update header comment + recovery instructions for both contexts.

.github/scripts/ci/merge_gate_wait.sh

  • New optional EVENT_NAME env var. On exit code 2 ("check never started"), emit event-aware recovery: in merge_group context, pushing a commit does NOT retrigger the merge_group event -- the user must remove and re-add the PR to the queue.
  • Add &filter=latest to the Checks API query so GitHub returns only the latest run per name, removing reliance on client-side sort and pagination.

Why not just list checks directly in the merge-queue ruleset?

Considered. Rejected because the gate pattern works well at the PR tier today, and stretching it to the merge-queue tier keeps a single ruleset entry (gate), a single mental model, and a single place to add/rename checks. Listing checks directly would require ruleset edits on every check rename.

Concurrency

Existing key merge-gate-${{ pull_request.number || inputs.pr_number || github.ref }} falls through to github.ref in merge_group context. That ref is refs/heads/gh-readonly-queue/main/pr-N-<sha> -- unique per queue entry -- so cancel-in-progress dedupes correctly within a single temp branch and never collides across PR <-> merge_group channels.

Self-deadlock

gate is intentionally absent from EXPECTED_CHECKS in both event contexts.

Audit

Design audited against live GitHub docs by a sub-agent grounded in:

  • merge_group webhook payload -- confirms checks_requested is the only action and head_sha field is correct.
  • Managing a merge queue -- confirms checks must be reported against the merge_group head SHA, and that branch-protection required checks apply to both PR and queue contexts.
  • Checks API runs -- confirms filter=latest semantics and per_page behavior.

Audit findings folded into this PR:

  • BLOCKER (fixed): Recovery message in merge_gate_wait.sh was wrong for merge_group (told users to push a commit). Now event-aware.
  • NICE-TO-HAVE (fixed): Replaced client-side sort_by(started_at) with server-side filter=latest.
  • IMPORTANT (fixed): Bumped poll budget + job timeout above the theoretical worst-case critical path of ci-integration.yml.

Audit verdict: SHIP with these specific changes (all included).

Pre-existing observation flagged but out of scope: paths-ignore on the pull_request trigger means doc-only PRs would never run gate and theoretically could not satisfy the PR ruleset. Not actually a problem today (doc-only PRs aren't blocked in practice -- worth a separate investigation).

Validation

  • bash -n .github/scripts/ci/merge_gate_wait.sh clean.
  • YAML parses; triggers [pull_request, merge_group, workflow_dispatch]; env wiring matches spec.
  • This PR's own gate run will exercise the pull_request path; merging into the queue will exercise the merge_group path.

Fixes the queue-stall behaviour observed on #899.

Branch protection / merge-queue ruleset requires the 'gate' check on
both PR-time and merge-queue contexts, but the gate workflow only
fired on 'pull_request'. In the merge queue, GitHub fires 'merge_group'
events against a temp merge commit -- the gate check was never created
on that SHA, so PRs sat in the queue with 'gate' stuck in
'Expected -- Waiting for status to be reported' indefinitely
(observed on PR #899).

Changes
-------
.github/workflows/merge-gate.yml
- Add 'merge_group' (types: checks_requested) and keep existing
  'pull_request' + 'workflow_dispatch' triggers.
- Resolve head SHA per event:
    workflow_dispatch -> gh api .../pulls/N --jq .head.sha
    merge_group       -> github.event.merge_group.head_sha
    pull_request      -> github.event.pull_request.head.sha
- Branch EXPECTED_CHECKS by event:
    pull_request / workflow_dispatch: 'Build & Test (Linux),APM Self-Check'
    merge_group: + 'Build (Linux),Smoke Test (Linux),
                   Integration Tests (Linux),Release Validation (Linux)'
  (the merge_group-only checks emitted by ci-integration.yml plus the
  ci.yml checks that also run on merge_group)
- Bump TIMEOUT_MIN 30 -> 55 and job timeout-minutes 35 -> 60 to absorb
  ci-integration.yml's theoretical worst-case critical path (Build ->
  Smoke -> Integration[20m] -> Release Validation[20m]).
- Update header comment + recovery instructions to cover both contexts.

.github/scripts/ci/merge_gate_wait.sh
- Accept new optional EVENT_NAME env var; emit event-aware recovery
  instructions on exit code 2 (in merge_group context, pushing a commit
  does NOT retrigger the merge_group event -- the user must re-queue).
- Add '&filter=latest' to the Checks API query so GitHub returns only
  the latest run per name, removing reliance on client-side sort and
  pagination order.

Concurrency
-----------
The existing key 'merge-gate-${{ pull_request.number || inputs.pr_number
|| github.ref }}' falls through to github.ref in merge_group context.
github.ref there is 'refs/heads/gh-readonly-queue/main/pr-N-<sha>',
unique per queue entry, so cancel-in-progress dedupes correctly within
a single temp branch and never collides across PR/merge_group channels.

Self-deadlock
-------------
'gate' is intentionally absent from EXPECTED_CHECKS in both contexts.

Audit
-----
Design audited against live GitHub docs:
- docs.github.com/.../webhook-events-and-payloads#merge_group
- docs.github.com/.../managing-a-merge-queue
- docs.github.com/en/rest/checks/runs
Verdict: ship with the event-aware recovery message included here.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 24, 2026 22:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the repository's single-authority CI "gate" workflow so it also reports a gate check in merge queue (merge_group) context, preventing merge queue entries from stalling with "Expected -- Waiting for status to be reported".

Changes:

  • Add merge_group trigger to .github/workflows/merge-gate.yml and resolve the correct SHA per event type.
  • Branch EXPECTED_CHECKS to include Tier 2 (ci-integration.yml) checks when running under merge_group, and increase timeouts accordingly.
  • Update .github/scripts/ci/merge_gate_wait.sh to accept an optional EVENT_NAME, improve recovery messaging for merge queue, and query the Checks API with filter=latest.
Show a summary per file
File Description
.github/workflows/merge-gate.yml Triggers gate on merge_group and polls the correct set of checks for merge queue SHAs.
.github/scripts/ci/merge_gate_wait.sh Adds merge-queue-aware recovery messaging and improves check-run querying with filter=latest.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment thread .github/workflows/merge-gate.yml Outdated
Comment on lines 44 to 49
@@ -34,16 +47,21 @@ on:
- 'docs/**'
- '.gitignore'
- 'LICENSE'
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull_request.paths-ignore excludes docs/** (and a couple of root files). If the ruleset requires the gate check for PRs, a docs-only PR can end up with gate never being created ("Expected -- Waiting") because both Merge Gate and ci.yml are skipped. Consider either removing these paths-ignore entries (in both workflows) or adding explicit logic so docs-only PRs still report a gate result without waiting on checks that will never run.

See below for a potential fix:


Copilot uses AI. Check for mistakes.
Comment on lines +104 to +105
# latest run per name (avoids client-side sort / pagination races
# when a check has been re-run on the same SHA).
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new comment says filter=latest "avoids client-side sort", but the code still does sort_by(.started_at) | reverse when selecting .[0]. Consider tweaking the comment to reflect reality (e.g., filter=latest reduces pagination/rerun ambiguity; client-side sort remains as a defensive tie-breaker).

Suggested change
# latest run per name (avoids client-side sort / pagination races
# when a check has been re-run on the same SHA).
# latest run per name. This reduces pagination and re-run ambiguity
# on the same SHA; the client-side sort below remains as a defensive
# tie-breaker before selecting .[0].

Copilot uses AI. Check for mistakes.
danielmeppiel and others added 2 commits April 25, 2026 00:26
Both .github/workflows/merge-gate.yml and .github/workflows/ci.yml
carried identical paths-ignore (docs/**, .gitignore, LICENSE). For a
docs-only PR neither workflow fires, so the 'gate' check-run is never
created -- if the PR ruleset requires 'gate', branch protection
displays it as 'Expected -- Waiting' forever and the PR cannot merge.

Removing paths-ignore from BOTH (not just one) is required: dropping
it only from merge-gate.yml would leave the gate polling for ci.yml
checks that never appear, timing out at TIMEOUT_MIN with exit 2 (false
failure). Removing from both means ci.yml runs on docs-only PRs (~5
min of free GitHub-hosted runner time) and the gate aggregates as
normal -- coherent regardless of which ruleset tier requires gate.

Caught in code review on PR #921. Same observation was flagged but
left out-of-scope in the original PR description; folding in now.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit e0eb0c0 into main Apr 24, 2026
7 checks passed
@danielmeppiel danielmeppiel deleted the fix/merge-gate-merge-queue branch April 24, 2026 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants