-
Notifications
You must be signed in to change notification settings - Fork 298
Description
🏥 CI Failure Investigation - Run #35694
Summary
lint-go and the Integration: CLI Completion & Other suite both failed on run 35694. Staticcheck flagged the newly added Docker validation error strings, and TestMCPRegistryClient_LiveGetServer now trips because the live MCP registry returns 503 when the runner tries to connect.
Failure Details
- Run: 22020009447
- Commit:
a5599b1a75f80a6473c871fd727f63ae214c845b - Trigger: push
Root Cause Analysis
pkg/workflow/docker_validation.gonow early-returns when Docker is unavailable, but thefmt.Errorfstrings at lines 95 and 103 still begin with capitalizedDocker, which violates staticcheck ST1005 and causeslint-goto fail.TestMCPRegistryClient_LiveGetServerhits the live MCP registry as part of the integration suite, and the service returned503 upstream connect error or disconnect/reset before headerswith the latest retry reportingdelayed connect error: Connection refused, so the test no longer succeeds when the registry is unreachable.
Failed Jobs and Errors
- lint-go:
staticcheckreported ST1005 onpkg/workflow/docker_validation.go:95and:103because the error strings start with uppercaseDocker not installed.../Docker daemon not running.... - Integration: CLI Completion & Other:
TestMCPRegistryClient_LiveGetServer/get_github_serverfailed aftermcp_registry_live_test.go:141reportedMCP registry returned status 503: upstream connect error or disconnect/reset before headers ... delayed connect error: Connection refused.
Investigation Findings
- Running
make golintlocally reproduces the staticcheck violation because the new error messages still begin with a capitalized word and staticcheck enforces lowercase. - Running
go test -tags integration ./pkg/cli -run TestMCPRegistryClient_LiveGetServernow returns the 503 error added above; the test hits the remote MCP registry and cannot proceed when the registry or network refuses the connection.
Recommended Actions
- Start the two
fmt.Errorfmessages inpkg/workflow/docker_validation.gowith lowercase text (e.g.,docker not installed...) so staticcheck ST1005 no longer fails the lint job. - Make
TestMCPRegistryClient_LiveGetServerresilient to MCP registry outages (skip when 5xx/delayed connect occurs, stub the service, or gate the test behind a flag) so a transient 503 does not break CI.
Prevention Strategies
- Run
staticcheck/make golintafter editing error strings or introducing new guard clauses that emit user-facing messages. - Avoid calling live MCP services in CI tests unless the failure mode (network/5xx) is gracefully handled; prefer stubs or explicit skips for unreachable endpoints.
AI Team Self-Improvement
- When adjusting error handling, ensure any new error messages start lowercase to satisfy staticcheck ST1005.
- Guard integration tests that talk to MCP (or other external services) with retry/skip logic so transient 5xx responses do not fail the suite.
Historical Context
Run #35690 already reported a related lint failure and a Docker daemon-dependent integration test failure in issue #15697, but this run adds a separate instability triggered by the MCP registry returning 503s.
🩺 Diagnosis provided by CI Failure Doctor
🩺 Diagnosis provided by CI Failure Doctor
To install this workflow, run
gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. View source at https://github.com/githubnext/agentics/tree/ea350161ad5dcc9624cf510f134c6a9e39a6f94d/workflows/ci-doctor.md.
- expires on Feb 15, 2026, 3:49 PM UTC