Skip to content

[CI Failure Doctor] CI Failure Investigation - Run #35690 #15697

@github-actions

Description

@github-actions

🏥 CI Failure Investigation - Run #35690

Summary

Lint and integration test suites now fail because the new Docker validation checks surfaced two issues: staticcheck trips over the capitalized error strings introduced in validateDockerImage, and TestValidateContainerImages/valid_container_image hard-fails when the runner’s Docker daemon is unreachable.

Failure Details

Root Cause Analysis

  1. validateDockerImage now returns early when Docker is missing or the daemon is down, but the associated fmt.Errorf strings still begin with a capital D, which violates staticcheck ST1005 and makes lint-go fail.
  2. TestValidateContainerImages relies on Docker working, yet the new daemon check surfaces a failure in the GitHub runner because the daemon isn’t responsive. The tests skip only when the CLI is missing and so now report the daemon error instead of skipping.

Failed Jobs and Errors

  • lint-go: staticcheck reported ST1005 on pkg/workflow/docker_validation.go:95/103 because the error strings start with a capitalized Docker.
  • Integration: Workflow Actions & Containers: TestValidateContainerImages/valid_container_image now returns "Docker daemon not running - could not validate container image 'alpine:latest'" because validateDockerImage fails early when isDockerDaemonRunning() is false.

Investigation Findings

  • The lint failure comes directly from the new docker validation guard; both error return paths now emit capitalized messages that staticcheck flags. Lowercasing those strings clears the lint error.
  • The integration failure occurs because the runner exposes the docker CLI but the daemon isn’t responsive, so validateDockerImage reports the daemon error and the subtest fails rather than skipping.
  • Running go test -tags integration ./pkg/workflow -run TestValidateContainerImages locally was blocked: the environment tried to download Go 1.25.0 (forbidden) and the local toolchain is 1.24.13, so the command could not finish.

Recommended Actions

  • Lowercase the fmt.Errorf messages in pkg/workflow/docker_validation.go so staticcheck ST1005 no longer fails the lint job.
  • Guard TestValidateContainerImages (or the tt.skipIfNoDocker path) with an isDockerDaemonRunning() check so tests skip when the daemon isn’t responsive instead of failing.

Prevention Strategies

Document that any integration tests hitting Docker should check both CLI availability and daemon health before asserting success, and run staticcheck locally after touching error strings to avoid uppercase violations.

AI Team Self-Improvement

  • Before landing Docker validation changes, ensure error messages start lowercase to satisfy staticcheck.
  • When adding integration coverage that relies on Docker, gate the tests with both exec.LookPath("docker") and isDockerDaemonRunning() so missing daemons are treated as skips.

Historical Context

No existing [CI Failure Doctor] issue referenced run #35690; this appears to be a new failure pattern introduced by the recent Docker validation perf change.

🩺 Diagnosis provided by CI Failure Doctor

To install this workflow, run gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. View source at https://github.com/githubnext/agentics/tree/ea350161ad5dcc9624cf510f134c6a9e39a6f94d/workflows/ci-doctor.md.

  • expires on Feb 15, 2026, 3:32 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    cookieIssue Monster Loves Cookies!

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions