-
Notifications
You must be signed in to change notification settings - Fork 297
Closed as not planned
Labels
cookieIssue Monster Loves Cookies!Issue Monster Loves Cookies!
Description
🏥 CI Failure Investigation - Run #36032
Summary
Integration: CLI Completion & Other fails because TestMCPRegistryClient_LiveGetServer now hits the live MCP registry and the service is returning 503 upstream connect error or disconnect/reset before headers with a delayed connect failure, so the test cannot reach io.github.netdata/mcp-server.
Failure Details
- Run: 22068117409
- Commit:
5e5b9d282752b1430867cdc76a09603348c08d4c - Trigger: push
Root Cause Analysis
TestMCPRegistryClient_LiveGetServerconnects to the live MCP registry while exercisingGetServer; the registry returned503 upstream connect error or disconnect/reset before headerswith the latest retry reportingdelayed connect error: Connection refused, so the subtest cannot complete.- Every subtest (
get_github_serverandget_nonexistent_server) tries to assert specific output but receives the same 503, which is treated as a failure instead of being skipped or mocked.
Failed Jobs and Errors
- Integration: CLI Completion & Other:
TestMCPRegistryClient_LiveGetServer/get_github_servermcp_registry_live_test.go:141:GetServer failed for 'io.github.netdata/mcp-server': MCP registry returned status 503: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused
- Integration: CLI Completion & Other:
TestMCPRegistryClient_LiveGetServer/get_nonexistent_servermcp_registry_live_test.go:175:Expected error to contain 'not found in registry', got: MCP registry returned status 503: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused
Investigation Findings
- Running
go test -v -tags integration ./pkg/cli -run TestMCPRegistryClient_LiveGetServeragainst the live registry reproduces the 503/delayed connect error because the test talks toio.github.netdata/mcp-serverand the registry is currently refusing connections. - The integration suite therefore fails before reporting a specific test since the package-level run detects the panic/failure and aborts, logging that no individual test passed cleanly.
Recommended Actions
- Guard
TestMCPRegistryClient_LiveGetServer(and similar MCP live tests) so that 5xx/delayed-connect responses are skipped or stubbed instead of failing the suite, e.g., detect the 503 and mark the test as skipped when the registry is unreachable. - Replace the live MCP dependency in CI with a stub or canned response when possible so transient outages do not break the workflow.
- Rerun the integration job after MCP connectivity is restored to confirm there are no additional regressions.
Prevention Strategies
- Avoid calling production MCP services directly from CI without handling known failure modes (503s, connection refused, etc.) and mark the tests as flaky or skipped when the service is down.
- Use local stubs or recorded fixtures for MCP responses in GitHub Actions so network availability does not gate the whole suite.
AI Team Self-Improvement
- When generating tests that talk to MCP or other external services, guard them with explicit skip/retry logic and explain that 5xx/delayed connect errors should not be treated as regressions.
- Prefer mocking remote MCP responses in CI workflows so the tests stay deterministic even if the upstream service is temporarily unreachable.
Historical Context
- Run #35694 had the same
TestMCPRegistryClient_LiveGetServerfailure because the MCP registry returned a 503; see #15700 for the prior investigation.
🩺 Diagnosis provided by CI Failure Doctor
To install this workflow, run
gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. View source at https://github.com/githubnext/agentics/tree/ea350161ad5dcc9624cf510f134c6a9e39a6f94d/workflows/ci-doctor.md.
- expires on Feb 17, 2026, 3:26 PM UTC
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
cookieIssue Monster Loves Cookies!Issue Monster Loves Cookies!