Harden MCP Gateway startup health check against transient port-binding delays by Copilot · Pull Request #26697 · github/gh-aw

Copilot · 2026-04-16T17:41:54Z

Start MCP Gateway could fail on transient startup races: the gateway process launched, but /health polling used rigid timing and could exit before port 80 became reachable under runner contention. This updates startup probing to tolerate short bind delays without changing failure semantics for genuinely unhealthy gateways.

Health-check timing now uses bounded exponential backoff
- Updated both startup paths:
  - actions/setup/js/start_mcp_gateway.cjs
  - actions/setup/sh/start_mcp_gateway.sh
- Retry cadence now ramps quickly (250ms -> 500ms -> 1s) and then stays capped, while preserving the existing long retry horizon (120 attempts).
- This reduces false negatives immediately after container launch while still surfacing real startup failures.
Shell implementation avoids per-retry subprocess overhead
- Replaced dynamic delay calculation via subprocess with direct branch-based delay tiers in the retry loop.
- Keeps behavior explicit and deterministic in constrained runner environments.
Startup logging/attempt accounting tightened
- JS path now reports actual attempts made for clearer diagnostics when readiness is reached early or fails late.
Focused shell test coverage updated for backoff config
- actions/setup/sh/start_mcp_gateway_test.sh assertions were expanded to verify the expected backoff tiers and retry-delay usage in the startup script.

# start_mcp_gateway.sh retry tiering
if [ $RETRY_COUNT -eq 1 ]; then
  RETRY_DELAY="0.25"
elif [ $RETRY_COUNT -eq 2 ]; then
  RETRY_DELAY="0.5"
else
  RETRY_DELAY="1"
fi
sleep "$RETRY_DELAY"

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c923f8eb-a610-4433-b2c6-cfae71b2811d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2026-04-16T18:32:22Z

  let httpCode = 0;
  let healthBody = "";
  let succeeded = false;
+  let attemptsMade = 0;


@copilot use withRetry helper

Updated to use the shared withRetry helper for MCP gateway health polling in actions/setup/js/start_mcp_gateway.cjs (commit 7682cd2), with equivalent retry behavior preserved.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0e241816-4acc-4ea2-98b5-d6db16911e8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Hardens MCP Gateway startup readiness checks to reduce false negatives caused by transient port-binding delays by introducing bounded exponential backoff in both JS and shell startup paths.

Changes:

Updated health-check polling in JS and shell to use exponential backoff capped at 1s while keeping the same retry horizon.
Improved JS logging to report actual attempts made.
Expanded shell test assertions to validate the new backoff configuration.

Show a summary per file

File	Description
actions/setup/sh/start_mcp_gateway.sh	Implements capped exponential backoff (250ms→500ms→1s) between health-check attempts.
actions/setup/js/start_mcp_gateway.cjs	Adds exponential backoff delays and correct attempt-count reporting in logs.
actions/setup/sh/start_mcp_gateway_test.sh	Adds assertions to confirm the backoff logic exists in the shell script.
.github/mcp.json	Introduces an MCP server configuration file (appears unrelated to gateway health-check changes).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 4/4 changed files
Comments generated: 2

Copilot · 2026-04-16T18:39:12Z

+{
+  "mcpServers": {
+    "github-agentic-workflows": {
+      "command": "gh",
+      "args": [
+        "aw",
+        "mcp-server"
+      ]
+    }
+  }
+}


This new MCP configuration file isn’t mentioned in the PR description and appears unrelated to hardening the gateway startup health check. If it’s intentional, please add rationale to the PR description; otherwise consider removing it from this PR to keep scope focused.

Copilot · 2026-04-16T18:39:13Z

+  if grep -q "RETRY_COUNT -eq 1" "$SCRIPT_PATH" &&
+    grep -q "RETRY_COUNT -eq 2" "$SCRIPT_PATH" &&
+    grep -q "elif \[ \$RETRY_COUNT -eq 2 \]" "$SCRIPT_PATH" &&
+    grep -q "else" "$SCRIPT_PATH" &&
+    grep -q "RETRY_DELAY=\"0.25\"" "$SCRIPT_PATH" &&
+    grep -q "RETRY_DELAY=\"0.5\"" "$SCRIPT_PATH" &&
+    grep -q "RETRY_DELAY=\"1\"" "$SCRIPT_PATH" &&
+    grep -q "attempt 3+ -> 1s" "$SCRIPT_PATH" &&
+    grep -q "sleep \"\$RETRY_DELAY\"" "$SCRIPT_PATH"; then


These greps are likely to produce false positives/negatives because they (a) match very generic tokens (e.g., else) that may appear elsewhere, and (b) over-constrain formatting (exact bracket/spacing/quoting) while not guaranteeing the lines are part of the same backoff block. Consider asserting a single, more specific pattern (e.g., matching the exact if/elif/else lines including sleep \"$RETRY_DELAY\"), or limiting the search to the health-check section (e.g., by extracting a window around the backoff comment and matching within it). This will make the test resilient to harmless refactors and more accurately validate intent.

Suggested change

if grep -q "RETRY_COUNT -eq 1" "$SCRIPT_PATH" &&

grep -q "RETRY_COUNT -eq 2" "$SCRIPT_PATH" &&

grep -q "elif \[ \$RETRY_COUNT -eq 2 \]" "$SCRIPT_PATH" &&

grep -q "else" "$SCRIPT_PATH" &&

grep -q "RETRY_DELAY=\"0.25\"" "$SCRIPT_PATH" &&

grep -q "RETRY_DELAY=\"0.5\"" "$SCRIPT_PATH" &&

grep -q "RETRY_DELAY=\"1\"" "$SCRIPT_PATH" &&

grep -q "attempt 3+ -> 1s" "$SCRIPT_PATH" &&

grep -q "sleep \"\$RETRY_DELAY\"" "$SCRIPT_PATH"; then

if awk '

/if \[ \$RETRY_COUNT -eq 1 \]; then/ { saw_if=1 }

saw_if && /RETRY_DELAY="0\.25"/ { saw_delay1=1 }

saw_delay1 && /elif \[ \$RETRY_COUNT -eq 2 \]; then/ { saw_elif=1 }

saw_elif && /RETRY_DELAY="0\.5"/ { saw_delay2=1 }

saw_delay2 && /else/ { saw_else=1 }

saw_else && /RETRY_DELAY="1"/ { saw_delay3=1 }

saw_delay3 && /attempt 3\+ -> 1s/ { saw_comment=1 }

saw_comment && /sleep "\$RETRY_DELAY"/ { saw_sleep=1 }

END { exit saw_sleep ? 0 : 1 }

' "$SCRIPT_PATH"; then

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0e241816-4acc-4ea2-98b5-d6db16911e8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-04-16T18:41:50Z

🧪 Test Quality Sentinel Report

Test Quality Score: N/A

i️ No Go or JavaScript test files were modified in this PR — shell test is outside scoring scope.

Metric	Value
New/modified tests analyzed	0 (Go / JS)
✅ Design tests (behavioral contracts)	—
⚠️ Implementation tests (low value)	—
Tests with error/edge cases	—
Duplicate test clusters	0
Test inflation detected	No
🚨 Coding-guideline violations	None

Language Support

Tests analyzed:

🐹 Go (*_test.go): 0 tests
🟨 JavaScript (*.test.cjs, *.test.js): 0 tests (vitest)

i️ Shell test detected but outside scoring scope: actions/setup/sh/start_mcp_gateway_test.sh was modified (+9 / -1 lines). Shell tests are detected but excluded from the Go/JS behavioral scoring rubric.

Shell Test Observation (Informational)

The modified shell test function checks that start_mcp_gateway.sh contains the new 3-tier hardcoded backoff constants by running grep against the script source. This is a structural/implementation-style check (it verifies code patterns are present rather than exercising the runtime backoff behavior), which is typical for shell "pattern tests" but worth noting:

✅ Covers all three delay tiers ("0.25", "0.5", "1") and the comment attempt 3+ -> 1s
✅ Verifies the sleep "$RETRY_DELAY" invocation is present
✅ Reflects the refactoring from the awk-based formula to the explicit if/elif/else structure
⚠️ Does not exercise the backoff at runtime (no integration test executing the retry loop with a mock HTTP endpoint), but that is expected for this style of lightweight shell pattern test

Test-to-production line ratio: ~1.3:1 (9 test lines vs 9 production lines) — well within the 2:1 threshold.

Verdict

✅ Check passed. No Go or JavaScript tests were added or modified. No coding-guideline violations detected. The shell test update correctly tracks the production code refactoring.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: [§24527431225]

🧪 Test quality analysis by Test Quality Sentinel · ● 577.1K · ◷

github-actions

✅ Test Quality Sentinel: N/A score (no Go/JS tests in scope). No coding-guideline violations detected. The only test file changed is a shell script (start_mcp_gateway_test.sh) which is outside the Go/JavaScript scoring rubric. The shell test correctly reflects the production code refactoring.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0e241816-4acc-4ea2-98b5-d6db16911e8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Initial plan

9a1376c

Copilot AI assigned Copilot and pelikhan Apr 16, 2026

Copilot started work on behalf of pelikhan April 16, 2026 17:41 View session

Copilot AI linked an issue Apr 16, 2026 that may be closed by this pull request

MCP Gateway: port 80 health check fails with no retry on transient container startup delay #26696

Closed

Copilot AI and others added 7 commits April 16, 2026 17:51

chore: outline plan for MCP gateway health check retry fix

6194792

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c923f8eb-a610-4433-b2c6-cfae71b2811d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

fix: add MCP gateway health-check retry backoff for startup delays

5fd98d1

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c923f8eb-a610-4433-b2c6-cfae71b2811d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

refactor: cap MCP gateway startup backoff without subprocess overhead

70aa68e

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c923f8eb-a610-4433-b2c6-cfae71b2811d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

test: tighten MCP gateway backoff assertions

93af3fc

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c923f8eb-a610-4433-b2c6-cfae71b2811d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

chore: simplify MCP gateway JS backoff calculation

f5cb3d5

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c923f8eb-a610-4433-b2c6-cfae71b2811d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

docs: clarify backoff formula and strengthen shell pattern checks

2d880a4

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c923f8eb-a610-4433-b2c6-cfae71b2811d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

test: assert documented third backoff tier

d874474

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c923f8eb-a610-4433-b2c6-cfae71b2811d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix MCP Gateway port 80 health check to include retry logic~~ Harden MCP Gateway startup health check against transient port-binding delays Apr 16, 2026

Copilot AI requested a review from pelikhan April 16, 2026 18:09

Copilot finished work on behalf of pelikhan April 16, 2026 18:09

pelikhan reviewed Apr 16, 2026

View reviewed changes

Copilot started work on behalf of pelikhan April 16, 2026 18:32 View session

pelikhan marked this pull request as ready for review April 16, 2026 18:36

Copilot AI review requested due to automatic review settings April 16, 2026 18:36

chore: plan follow-up for review comment

0bb9333

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0e241816-4acc-4ea2-98b5-d6db16911e8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI reviewed Apr 16, 2026

View reviewed changes

refactor: use shared withRetry helper in MCP gateway health polling

7682cd2

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0e241816-4acc-4ea2-98b5-d6db16911e8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions Bot mentioned this pull request Apr 16, 2026

[aw] No-Op Runs #25214

Closed

chore: clarify withRetry attempt semantics in gateway health check

ebf5c12

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0e241816-4acc-4ea2-98b5-d6db16911e8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions Bot approved these changes Apr 16, 2026

View reviewed changes

docs: clarify intentional retry policy for gateway health polling

7b4184f

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0e241816-4acc-4ea2-98b5-d6db16911e8a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan April 16, 2026 18:44

Copilot AI requested a review from pelikhan April 16, 2026 18:44

Merge branch 'main' into copilot/fix-mcp-gateway-health-check

3588931

pelikhan merged commit 0443968 into main Apr 16, 2026
53 of 54 checks passed

pelikhan deleted the copilot/fix-mcp-gateway-health-check branch April 16, 2026 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden MCP Gateway startup health check against transient port-binding delays#26697

Harden MCP Gateway startup health check against transient port-binding delays#26697
pelikhan merged 13 commits intomainfrom
copilot/fix-mcp-gateway-health-check

Copilot AI commented Apr 16, 2026 •

edited

Loading

Uh oh!

pelikhan Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-  if grep -q "RETRY_COUNT -eq 1" "$SCRIPT_PATH" &&
-    grep -q "RETRY_COUNT -eq 2" "$SCRIPT_PATH" &&
-    grep -q "elif \[ \$RETRY_COUNT -eq 2 \]" "$SCRIPT_PATH" &&
-    grep -q "else" "$SCRIPT_PATH" &&
-    grep -q "RETRY_DELAY=\"0.25\"" "$SCRIPT_PATH" &&
-    grep -q "RETRY_DELAY=\"0.5\"" "$SCRIPT_PATH" &&
-    grep -q "RETRY_DELAY=\"1\"" "$SCRIPT_PATH" &&
-    grep -q "attempt 3+ -> 1s" "$SCRIPT_PATH" &&
-    grep -q "sleep \"\$RETRY_DELAY\"" "$SCRIPT_PATH"; then
+  if awk '
+    /if \[ \$RETRY_COUNT -eq 1 \]; then/ { saw_if=1 }
+    saw_if && /RETRY_DELAY="0\.25"/ { saw_delay1=1 }
+    saw_delay1 && /elif \[ \$RETRY_COUNT -eq 2 \]; then/ { saw_elif=1 }
+    saw_elif && /RETRY_DELAY="0\.5"/ { saw_delay2=1 }
+    saw_delay2 && /else/ { saw_else=1 }
+    saw_else && /RETRY_DELAY="1"/ { saw_delay3=1 }
+    saw_delay3 && /attempt 3\+ -> 1s/ { saw_comment=1 }
+    saw_comment && /sleep "\$RETRY_DELAY"/ { saw_sleep=1 }
+    END { exit saw_sleep ? 0 : 1 }
+  ' "$SCRIPT_PATH"; then

Conversation

Copilot AI commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelikhan Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 16, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: N/A

Language Support

Shell Test Observation (Informational)

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 16, 2026 •

edited

Loading