Skip to content

compile --actionlint reports zero errors but exits nonzero (false negative or integration bug) #20629

@grahame-white

Description

@grahame-white

Summary

When running gh aw compile --actionlint (particularly in strict mode), the step can report “0 errors” (or display "No issues found" in its summary) but still exit with a nonzero status. This causes CI and pre-commit validations to fail despite clean workflows, and is not consistent with direct actionlint usage. Manual reruns using direct actionlint often pass, indicating the bug is in gh-aw's integration, not the workflows themselves.

Root Cause Analysis

Most likely root cause:

  • gh aw maintains its own aggregate stats for actionlint findings, displaying "No issues found" if the count of errors and warnings is zero (see ActionlintStats and displayActionlintSummary in pkg/cli/actionlint.go).
  • However, orchestration logic (see compile_orchestration.go) also checks the error return from runBatchActionlint(...), which encapsulates not just linter findings, but also subprocess failures, JSON parsing errors, file access issues, or Docker invocation errors.
  • Therefore, it is possible for the summary of results (based on parsed linter findings) to be clean, but for the overall process to still exit with failure if:
    • Docker execution fails,
    • actionlint emits unrecognized output,
    • JSON output is truncated, malformed, or missing,
    • file system or path handling fails for a subset of batch files,
    • other integration errors occur.

Consequence:

  • Users are told their workflows pass linting ("0 errors"), but their CI/pipeline (or pre-commit hook) fails, creating confusion and churn.

Reproduction Steps

  1. Run: gh aw compile --actionlint (optionally with --strict or as part of pre-commit/CI)
  2. Observe: Output contains "No issues found" or "0 errors found" in the summary
  3. But: The command exits nonzero
  4. Run: Directly invoke actionlint on each generated .lock.yml (e.g. actionlint .github/workflows/*.lock.yml)
  5. Observe: Direct actionlint passes with exit code 0

Likely underlying mechanisms

  • Partial batch failure: If one file in a batch has an invocation issue, but the rest parse fine, the summary counts the parsed subset, process still exits with failure
  • Subprocess error handling: runBatchActionlint(...) may return an error on tool or parsing issues, rather than just on findings
  • Docker or path quirks: Dockerized actionlint may fail if paths are not mounted/translated correctly even though the file compiles and actionlint parses fine when run locally

Remediation Proposal

  1. In pkg/cli/actionlint.go and the orchestration path, make clear distinctions between:
    • Lint findings (errors/warnings)
    • Tooling/integration failures (subprocess, Docker, parsing, etc)
  2. If lint findings are zero but tooling integration fails, display an explicit message:
    • e.g. "No issues found, but actionlint invocation failed. This likely indicates a tooling or integration error, not a workflow problem."
  3. Never display "No issues found" in the same run as a nonzero exit code unless this is a real workflow validation failure
  4. Consider returning a custom error type or status code to distinguish linter failures from integration/tooling failures in orchestration.
  5. Add regression test coverage: zero findings + integration failure must emit a unique error, assist users in distinguishing false negatives.

Impact

  • Prevents churn and confusion in PR check failure root-cause analysis
  • Aligns CI, local validation, and direct actionlint runs
  • Speeds up identifying real regressions when failures actually occur

References


NOTE: This issue was prepared following CONTRIBUTING.md agentic analysis guidelines. If more reproduction detail or implementation steps are needed, please request follow-up.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions