fix(ci): add merge_group trigger to Merge Gate so it reports in queue#921
fix(ci): add merge_group trigger to Merge Gate so it reports in queue#921danielmeppiel merged 3 commits intomainfrom
Conversation
Branch protection / merge-queue ruleset requires the 'gate' check on both PR-time and merge-queue contexts, but the gate workflow only fired on 'pull_request'. In the merge queue, GitHub fires 'merge_group' events against a temp merge commit -- the gate check was never created on that SHA, so PRs sat in the queue with 'gate' stuck in 'Expected -- Waiting for status to be reported' indefinitely (observed on PR #899). Changes ------- .github/workflows/merge-gate.yml - Add 'merge_group' (types: checks_requested) and keep existing 'pull_request' + 'workflow_dispatch' triggers. - Resolve head SHA per event: workflow_dispatch -> gh api .../pulls/N --jq .head.sha merge_group -> github.event.merge_group.head_sha pull_request -> github.event.pull_request.head.sha - Branch EXPECTED_CHECKS by event: pull_request / workflow_dispatch: 'Build & Test (Linux),APM Self-Check' merge_group: + 'Build (Linux),Smoke Test (Linux), Integration Tests (Linux),Release Validation (Linux)' (the merge_group-only checks emitted by ci-integration.yml plus the ci.yml checks that also run on merge_group) - Bump TIMEOUT_MIN 30 -> 55 and job timeout-minutes 35 -> 60 to absorb ci-integration.yml's theoretical worst-case critical path (Build -> Smoke -> Integration[20m] -> Release Validation[20m]). - Update header comment + recovery instructions to cover both contexts. .github/scripts/ci/merge_gate_wait.sh - Accept new optional EVENT_NAME env var; emit event-aware recovery instructions on exit code 2 (in merge_group context, pushing a commit does NOT retrigger the merge_group event -- the user must re-queue). - Add '&filter=latest' to the Checks API query so GitHub returns only the latest run per name, removing reliance on client-side sort and pagination order. Concurrency ----------- The existing key 'merge-gate-${{ pull_request.number || inputs.pr_number || github.ref }}' falls through to github.ref in merge_group context. github.ref there is 'refs/heads/gh-readonly-queue/main/pr-N-<sha>', unique per queue entry, so cancel-in-progress dedupes correctly within a single temp branch and never collides across PR/merge_group channels. Self-deadlock ------------- 'gate' is intentionally absent from EXPECTED_CHECKS in both contexts. Audit ----- Design audited against live GitHub docs: - docs.github.com/.../webhook-events-and-payloads#merge_group - docs.github.com/.../managing-a-merge-queue - docs.github.com/en/rest/checks/runs Verdict: ship with the event-aware recovery message included here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Extends the repository's single-authority CI "gate" workflow so it also reports a gate check in merge queue (merge_group) context, preventing merge queue entries from stalling with "Expected -- Waiting for status to be reported".
Changes:
- Add
merge_grouptrigger to.github/workflows/merge-gate.ymland resolve the correct SHA per event type. - Branch
EXPECTED_CHECKSto include Tier 2 (ci-integration.yml) checks when running undermerge_group, and increase timeouts accordingly. - Update
.github/scripts/ci/merge_gate_wait.shto accept an optionalEVENT_NAME, improve recovery messaging for merge queue, and query the Checks API withfilter=latest.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/merge-gate.yml | Triggers gate on merge_group and polls the correct set of checks for merge queue SHAs. |
| .github/scripts/ci/merge_gate_wait.sh | Adds merge-queue-aware recovery messaging and improves check-run querying with filter=latest. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 2
| @@ -34,16 +47,21 @@ on: | |||
| - 'docs/**' | |||
| - '.gitignore' | |||
| - 'LICENSE' | |||
There was a problem hiding this comment.
pull_request.paths-ignore excludes docs/** (and a couple of root files). If the ruleset requires the gate check for PRs, a docs-only PR can end up with gate never being created ("Expected -- Waiting") because both Merge Gate and ci.yml are skipped. Consider either removing these paths-ignore entries (in both workflows) or adding explicit logic so docs-only PRs still report a gate result without waiting on checks that will never run.
See below for a potential fix:
| # latest run per name (avoids client-side sort / pagination races | ||
| # when a check has been re-run on the same SHA). |
There was a problem hiding this comment.
The new comment says filter=latest "avoids client-side sort", but the code still does sort_by(.started_at) | reverse when selecting .[0]. Consider tweaking the comment to reflect reality (e.g., filter=latest reduces pagination/rerun ambiguity; client-side sort remains as a defensive tie-breaker).
| # latest run per name (avoids client-side sort / pagination races | |
| # when a check has been re-run on the same SHA). | |
| # latest run per name. This reduces pagination and re-run ambiguity | |
| # on the same SHA; the client-side sort below remains as a defensive | |
| # tie-breaker before selecting .[0]. |
Both .github/workflows/merge-gate.yml and .github/workflows/ci.yml carried identical paths-ignore (docs/**, .gitignore, LICENSE). For a docs-only PR neither workflow fires, so the 'gate' check-run is never created -- if the PR ruleset requires 'gate', branch protection displays it as 'Expected -- Waiting' forever and the PR cannot merge. Removing paths-ignore from BOTH (not just one) is required: dropping it only from merge-gate.yml would leave the gate polling for ci.yml checks that never appear, timing out at TIMEOUT_MIN with exit 2 (false failure). Removing from both means ci.yml runs on docs-only PRs (~5 min of free GitHub-hosted runner time) and the gate aggregates as normal -- coherent regardless of which ruleset tier requires gate. Caught in code review on PR #921. Same observation was flagged but left out-of-scope in the original PR description; folding in now. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
Branch protection / merge-queue ruleset requires the
gatecheck, butmerge-gate.ymlonly triggered onpull_requestandworkflow_dispatch. In the merge queue, GitHub firesmerge_groupevents against a temp merge commit -- the gate check was never created on that SHA, so PRs sat in the queue withgatestuck in "Expected -- Waiting for status to be reported" indefinitely.Concrete example: PR #899. All real checks green, only
gateperpetually pending, blocking the merge.Fix
Single-authority gate pattern preserved -- one ruleset entry, one mental model -- now extended to merge queue.
.github/workflows/merge-gate.ymlmerge_grouptrigger (types: [checks_requested]) alongside existingpull_request+workflow_dispatch.workflow_dispatch->gh api .../pulls/N --jq .head.shamerge_group->github.event.merge_group.head_sha(the temp merge commit, NOT the PR head)pull_request->github.event.pull_request.head.shaEXPECTED_CHECKSby event:Build & Test (Linux),APM Self-CheckBuild (Linux),Smoke Test (Linux),Integration Tests (Linux),Release Validation (Linux)(theci-integration.ymlchecks)TIMEOUT_MIN30 -> 55 and jobtimeout-minutes35 -> 60.ci-integration.yml's theoretical worst-case critical path is ~50m (Build -> Smoke -> Integration[20m] -> Release Validation[20m]); current observation is ~5m. Sized for growth..github/scripts/ci/merge_gate_wait.shEVENT_NAMEenv var. On exit code 2 ("check never started"), emit event-aware recovery: inmerge_groupcontext, pushing a commit does NOT retrigger the merge_group event -- the user must remove and re-add the PR to the queue.&filter=latestto the Checks API query so GitHub returns only the latest run per name, removing reliance on client-side sort and pagination.Why not just list checks directly in the merge-queue ruleset?
Considered. Rejected because the gate pattern works well at the PR tier today, and stretching it to the merge-queue tier keeps a single ruleset entry (
gate), a single mental model, and a single place to add/rename checks. Listing checks directly would require ruleset edits on every check rename.Concurrency
Existing key
merge-gate-${{ pull_request.number || inputs.pr_number || github.ref }}falls through togithub.refin merge_group context. That ref isrefs/heads/gh-readonly-queue/main/pr-N-<sha>-- unique per queue entry -- socancel-in-progressdedupes correctly within a single temp branch and never collides across PR <-> merge_group channels.Self-deadlock
gateis intentionally absent fromEXPECTED_CHECKSin both event contexts.Audit
Design audited against live GitHub docs by a sub-agent grounded in:
checks_requestedis the only action andhead_shafield is correct.filter=latestsemantics andper_pagebehavior.Audit findings folded into this PR:
merge_gate_wait.shwas wrong formerge_group(told users to push a commit). Now event-aware.sort_by(started_at)with server-sidefilter=latest.ci-integration.yml.Audit verdict: SHIP with these specific changes (all included).
Pre-existing observation flagged but out of scope:
paths-ignoreon thepull_requesttrigger means doc-only PRs would never rungateand theoretically could not satisfy the PR ruleset. Not actually a problem today (doc-only PRs aren't blocked in practice -- worth a separate investigation).Validation
bash -n .github/scripts/ci/merge_gate_wait.shclean.[pull_request, merge_group, workflow_dispatch]; env wiring matches spec.pull_requestpath; merging into the queue will exercise themerge_grouppath.Fixes the queue-stall behaviour observed on #899.