Test: Intentional build and test failures for LLM plugin#22779
Closed
Test: Intentional build and test failures for LLM plugin#22779
Conversation
Improve the custom prompt to ignore non-real failures (the intentional exit 1 trigger, Danger PR Check, and broken/skipped jobs) and respond briefly when no actual failures exist. Migrate model from Sonnet 4.5 to Sonnet 4.6. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use iangmaia/claude-summarize fork which adds build_log_mode, max_log_lines, and on-failure trigger. This feeds only failed job logs to Claude (capped at 1500 lines) for more focused analysis with less noise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add upload-claude-analysis.sh that checks non-essential step outcomes before uploading the Claude analysis pipeline. When only Danger or other non-critical jobs failed, the pipeline is not uploaded — no analysis, no annotation, no PR comment. Also simplify the custom prompt now that Danger is filtered upstream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When real failures co-occur with a Danger failure, Claude should still ignore Danger rather than wasting analysis space on it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…L scalar Agent-Logs-Url: https://github.com/wordpress-mobile/WordPress-Android/sessions/f2282ed4-20ab-42a2-b37b-6fac3bfa3514 Co-authored-by: mokagio <1218433+mokagio@users.noreply.github.com>
…tial co-failures Agent-Logs-Url: https://github.com/wordpress-mobile/WordPress-Android/sessions/1f6e9e79-7580-4d15-97d2-c85bb5452559 Co-authored-by: mokagio <1218433+mokagio@users.noreply.github.com>
The jq filter only matched state == "failed", missing jobs with state == "timed_out". This could cause Claude to be skipped when a real job times out alongside a non-essential failure like Danger. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test the LLM CI plugin by introducing: - A compilation error (reference to non-existent type) - A test assertion failure (swapped expected values) --- Generated with the help of Claude Code, https://code.claude.com Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that Claude analysis is skipped when only Danger (non-essential) fails. --- Generated with the help of Claude Code, https://claude.ai/code Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>
Contributor
|
|
Contributor
|
|
Merged
1 task
`buildkite-agent step get outcome` returns `hard_failed`, not `failed`. The wrong string meant non-essential failures were never detected. --- Generated with the help of Claude Code, https://claude.ai/code Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>
Adds a step that always fails and registers it in the non-essential array to verify multi-entry gating. --- Generated with the help of Claude Code, https://claude.ai/code Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>
This reverts commit 885f030.
Query total failures first via the API. When zero, exit early instead of falling through to the non-essential check which assumed failures. --- Generated with the help of Claude Code, https://claude.ai/code Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Test plan
Posted by Claude Code (Opus 4.6) on behalf of @mokagio with approval.
🤖 Generated with Claude Code
350e6db forces a build failure, which resulted in the expected build failure annotation
b88be39 reverts the above, leaving only Danger as a failure. Unfortunately, the script did not behave as expected and still run the build failure analysis
78f16aa fixes it:
885f030 added another non-essential step to verify the array checks
91b15eb brought the build to green (I added the label and milestone to satisfy Danger) and revealed a bug: The gating step logged "Real failures detected, running Claude analysis" despite zero failures. The logic assumes non_essential_failures == 0 means essential steps failed, but it's also 0 when nothing failed.
Finally, 85e1e9c fixed it (TBD)