Skip to content

Fix non-deterministic engine validation error message#14313

Merged
pelikhan merged 2 commits intomainfrom
copilot/fix-cmd-tests-nondeterminism
Feb 7, 2026
Merged

Fix non-deterministic engine validation error message#14313
pelikhan merged 2 commits intomainfrom
copilot/fix-cmd-tests-nondeterminism

Conversation

Copy link
Contributor

Copilot AI commented Feb 7, 2026

TestValidateEngine was failing intermittently because the error message format depended on Go map iteration order, which is non-deterministic. The test expected 'claude', 'codex', 'copilot', or 'custom' but the actual output could be any permutation like 'codex', 'copilot', 'custom', 'claude'.

Changes

  • Sort engine names before formatting to ensure deterministic output
  • Add formatListWithOr helper for proper grammatical conjunction (handles 0/1/2/3+ items with oxford comma)
  • Update test assertion to use prefix matching instead of exact equality, allowing "Did you mean?" suggestions

Result

Error messages now consistently display as:

invalid engine value 'gpt4'. Must be 'claude', 'codex', 'copilot', or 'custom'

The formatListWithOr helper can be reused for other CLI list formatting needs.

Original prompt

This section details on the original issue you should resolve

<issue_title>[CI Failure Doctor] CMD Tests fail because validateEngine list formatting is nondeterministic</issue_title>
<issue_description># 🏥 CI Failure Investigation - Run github/gh-aw#34185

Summary

The Integration: CMD Tests job running go test ./cmd/gh-aw failed due to TestValidateEngine asserting that the validation error string matches a hard-coded list of engine IDs, but the CLI code formatted that list in a nondeterministic order that changed when the registry map iteration order shifted.

Failure Details

Root Cause Analysis

validateEngine called workflow.GetGlobalEngineRegistry().GetSupportedEngines() and printed the result via strings.Join without sorting or inserting an "or" before the final entry. Because GetSupportedEngines iterates a map, the engine names sometimes come back in a different order (the failed log shows codex, copilot, custom, claude), which no longer matched the string that TestValidateEngine was hard-coded to expect ('claude', 'codex', 'copilot', or 'custom'). The exact string comparison therefore started failing even though the underlying validation logic was still correct.

Failed Jobs and Errors

  • Integration: CMD Tests (job 62830258415) – TestValidateEngine failed with invalid engine value 'gpt4'. Must be 'codex', 'copilot', 'custom', 'claude', but the test asserted the message had to say 'claude', 'codex', 'copilot', or 'custom'.

Investigation Findings

  • Logs show TestValidateEngine failing in cmd/gh-aw/main_entry_test.go because err.Error() no longer matched the literal string the test constructs.
  • The engine registry list is generated from a map, so iteration order is not stable. The error message also lacked the conjunction "or", so the test could not pass even when the names were the same but ordered differently.
  • Manual go test ./cmd/gh-aw invocation currently hits proxy.golang.org and fails with 403 Forbidden while downloading modules, so the job cannot be rerun locally until the proxy access issue is resolved.

Recommended Actions

  • Sort the supported engine list and humanize the joined string (including an "or" before the final entry) before embedding it in the error message so that the message is deterministic and matches what TestValidateEngine expects. (Fix applied in cmd/gh-aw/main.go.)
  • Once the environment can reach proxy.golang.org, rerun go test ./cmd/gh-aw (or the full Integration: CMD Tests job) to confirm the fix and that no additional regressions occur.

Prevention Strategies

Format validation error messages that depend on registry contents with deterministic sorting and natural-language connectors instead of relying on map ordering; tests that compare exact strings can remain stable if the formatter is predictable.

AI Team Self-Improvement

When comparing error output to golden strings, always sort any slice derived from a map and render it with human-readable punctuation (e.g., add "or" before the final entry) so the text stays stable even if the map order changes.

Historical Context

TestValidateEngine has previously failed whenever the engine registry order shifted, so this failure matches a known pattern: the tests assert a hard-coded string while the CLI code reconstructs that string from a map. This investigation reinforces that the formatting must be deterministic rather than relying on go map iteration order.

AI generated by CI Failure Doctor

To add this workflow in your repository, run gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. See usage guide.

  • expires on Feb 8, 2026, 6:02 AM UTC

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

- Add formatListWithOr helper function to format lists with "or" conjunction
- Sort engine names before formatting error message for deterministic output
- Update validateEngine to use sorted engines with proper "or" formatting
- Update test to check for prefix match instead of exact match to allow "Did you mean" suggestions
- Add comprehensive tests for formatListWithOr helper function

Fixes issue where TestValidateEngine failed due to non-deterministic map iteration order when listing supported engines.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix nondeterministic list formatting in validateEngine Fix non-deterministic engine validation error message Feb 7, 2026
Copilot AI requested a review from pelikhan February 7, 2026 08:18
@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2026

🔍 PR Triage Results

Category: bug | Risk: low | Priority: 42/100

Scores Breakdown

  • Impact: 20/50 - Fixes test flakiness in CLI command validation. Low direct user impact but improves CI reliability and developer experience by eliminating non-deterministic test failures.
  • Urgency: 8/30 - Test stability fix (4.1 hours old). Addresses intermittent CI failures. Currently unstable. Draft status.
  • Quality: 14/20 - Good description with clear root cause analysis. Adds reusable formatListWithOr helper function. Draft status. CI needs stabilization.

📋 Recommended Action: batch_review

This test stability fix should be reviewed together with PRs #14312 and #14301 as part of the code quality improvements batch (batch-code-quality-001). The PR adds a reusable helper for formatting lists that could benefit other parts of the codebase.

Batch Context: Group with other test fixes and code quality improvements for efficient review.


Triaged by PR Triage Agent on 2026-02-07

AI generated by PR Triage Agent

@pelikhan pelikhan marked this pull request as ready for review February 7, 2026 13:13
Copilot AI review requested due to automatic review settings February 7, 2026 13:13
@pelikhan pelikhan merged commit e64069c into main Feb 7, 2026
1 check passed
@pelikhan pelikhan deleted the copilot/fix-cmd-tests-nondeterminism branch February 7, 2026 13:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes intermittent failures in the cmd/gh-aw CLI integration tests by making the engine validation error message deterministic, eliminating nondeterministic Go map iteration ordering from user-facing output.

Changes:

  • Sort supported engine IDs before rendering them in validateEngine error messages.
  • Add a formatListWithOr helper to produce grammatically correct lists (including Oxford comma).
  • Update TestValidateEngine to assert using prefix matching so optional “Did you mean” suggestions don’t break the test.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
cmd/gh-aw/main.go Sorts engine list and uses formatListWithOr to build deterministic, readable validation errors.
cmd/gh-aw/main_entry_test.go Changes the assertion to prefix-match the base error message to allow appended suggestions.
cmd/gh-aw/format_list_test.go Adds unit tests for formatListWithOr list formatting behavior.
Comments suppressed due to low confidence (1)

cmd/gh-aw/main.go:73

  • return fmt.Errorf("%s", errMsg) is equivalent to returning errors.New(errMsg) (or fmt.Errorf("%s", ...) without adding value). Consider using errors.New(errMsg) to make it clear this is a static message and avoid the redundant formatting call.
		errMsg := fmt.Sprintf("invalid engine value '%s'. Must be %s", engine, formattedList)

		if len(suggestions) > 0 {
			errMsg = fmt.Sprintf("invalid engine value '%s'. Must be %s.\n\nDid you mean: %s?",
				engine, formattedList, suggestions[0])
		}

		return fmt.Errorf("%s", errMsg)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +90 to 95
// Check that error message contains the expected format
// Error may include "Did you mean" suggestions, so we check if it starts with the base message
expectedMsg := fmt.Sprintf("invalid engine value '%s'. Must be 'claude', 'codex', 'copilot', or 'custom'", tt.engine)
if tt.errMessage != "" && !strings.HasPrefix(err.Error(), expectedMsg) {
t.Errorf("validateEngine(%q) error message = %v, want to start with %v", tt.engine, err.Error(), expectedMsg)
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tt.errMessage is no longer used to form the expected assertion (it only gates the check). This makes the table data misleading and harder to maintain. Consider either (a) removing errMessage from the test cases and keying off expectErr, or (b) using tt.errMessage to build the expected prefix (e.g., include the full base message there) so each case asserts what it declares.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI Failure Doctor] CMD Tests fail because validateEngine list formatting is nondeterministic

3 participants