Skip to content

[CI Failure Doctor] CI Failure Investigation - Run #35694 #15700

@github-actions

Description

@github-actions

🏥 CI Failure Investigation - Run #35694

Summary

lint-go and the Integration: CLI Completion & Other suite both failed on run 35694. Staticcheck flagged the newly added Docker validation error strings, and TestMCPRegistryClient_LiveGetServer now trips because the live MCP registry returns 503 when the runner tries to connect.

Failure Details

  • Run: 22020009447
  • Commit: a5599b1a75f80a6473c871fd727f63ae214c845b
  • Trigger: push

Root Cause Analysis

  1. pkg/workflow/docker_validation.go now early-returns when Docker is unavailable, but the fmt.Errorf strings at lines 95 and 103 still begin with capitalized Docker, which violates staticcheck ST1005 and causes lint-go to fail.
  2. TestMCPRegistryClient_LiveGetServer hits the live MCP registry as part of the integration suite, and the service returned 503 upstream connect error or disconnect/reset before headers with the latest retry reporting delayed connect error: Connection refused, so the test no longer succeeds when the registry is unreachable.

Failed Jobs and Errors

  • lint-go: staticcheck reported ST1005 on pkg/workflow/docker_validation.go:95 and :103 because the error strings start with uppercase Docker not installed.../Docker daemon not running....
  • Integration: CLI Completion & Other: TestMCPRegistryClient_LiveGetServer/get_github_server failed after mcp_registry_live_test.go:141 reported MCP registry returned status 503: upstream connect error or disconnect/reset before headers ... delayed connect error: Connection refused.

Investigation Findings

  • Running make golint locally reproduces the staticcheck violation because the new error messages still begin with a capitalized word and staticcheck enforces lowercase.
  • Running go test -tags integration ./pkg/cli -run TestMCPRegistryClient_LiveGetServer now returns the 503 error added above; the test hits the remote MCP registry and cannot proceed when the registry or network refuses the connection.

Recommended Actions

  • Start the two fmt.Errorf messages in pkg/workflow/docker_validation.go with lowercase text (e.g., docker not installed...) so staticcheck ST1005 no longer fails the lint job.
  • Make TestMCPRegistryClient_LiveGetServer resilient to MCP registry outages (skip when 5xx/delayed connect occurs, stub the service, or gate the test behind a flag) so a transient 503 does not break CI.

Prevention Strategies

  • Run staticcheck/make golint after editing error strings or introducing new guard clauses that emit user-facing messages.
  • Avoid calling live MCP services in CI tests unless the failure mode (network/5xx) is gracefully handled; prefer stubs or explicit skips for unreachable endpoints.

AI Team Self-Improvement

  • When adjusting error handling, ensure any new error messages start lowercase to satisfy staticcheck ST1005.
  • Guard integration tests that talk to MCP (or other external services) with retry/skip logic so transient 5xx responses do not fail the suite.

Historical Context

Run #35690 already reported a related lint failure and a Docker daemon-dependent integration test failure in issue #15697, but this run adds a separate instability triggered by the MCP registry returning 503s.

🩺 Diagnosis provided by CI Failure Doctor

🩺 Diagnosis provided by CI Failure Doctor

To install this workflow, run gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. View source at https://github.com/githubnext/agentics/tree/ea350161ad5dcc9624cf510f134c6a9e39a6f94d/workflows/ci-doctor.md.

  • expires on Feb 15, 2026, 3:49 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    cookieIssue Monster Loves Cookies!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions