Skip to content

Test: Intentional build and test failures for LLM plugin#22779

Closed
mokagio wants to merge 13 commits intotrunkfrom
iangmaia/test-llm-plugin-failures
Closed

Test: Intentional build and test failures for LLM plugin#22779
mokagio wants to merge 13 commits intotrunkfrom
iangmaia/test-llm-plugin-failures

Conversation

@mokagio
Copy link
Copy Markdown
Contributor

@mokagio mokagio commented Apr 9, 2026

Summary

Test plan

  • CI runs and fails as expected (build error + test failure)
  • The LLM plugin posts a comment explaining the failures

Posted by Claude Code (Opus 4.6) on behalf of @mokagio with approval.

🤖 Generated with Claude Code


350e6db forces a build failure, which resulted in the expected build failure annotation

image

b88be39 reverts the above, leaving only Danger as a failure. Unfortunately, the script did not behave as expected and still run the build failure analysis

image

78f16aa fixes it:

image

885f030 added another non-essential step to verify the array checks

image

91b15eb brought the build to green (I added the label and milestone to satisfy Danger) and revealed a bug: The gating step logged "Real failures detected, running Claude analysis" despite zero failures. The logic assumes non_essential_failures == 0 means essential steps failed, but it's also 0 when nothing failed.

image

Finally, 85e1e9c fixed it (TBD)

iangmaia and others added 8 commits April 8, 2026 17:41
Improve the custom prompt to ignore non-real failures (the intentional
exit 1 trigger, Danger PR Check, and broken/skipped jobs) and respond
briefly when no actual failures exist. Migrate model from Sonnet 4.5
to Sonnet 4.6.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use iangmaia/claude-summarize fork which adds build_log_mode, max_log_lines,
and on-failure trigger. This feeds only failed job logs to Claude (capped at
1500 lines) for more focused analysis with less noise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add upload-claude-analysis.sh that checks non-essential step outcomes
before uploading the Claude analysis pipeline. When only Danger or
other non-critical jobs failed, the pipeline is not uploaded — no
analysis, no annotation, no PR comment.

Also simplify the custom prompt now that Danger is filtered upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When real failures co-occur with a Danger failure, Claude should still
ignore Danger rather than wasting analysis space on it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The jq filter only matched state == "failed", missing jobs with
state == "timed_out". This could cause Claude to be skipped when a
real job times out alongside a non-essential failure like Danger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test the LLM CI plugin by introducing:
- A compilation error (reference to non-existent type)
- A test assertion failure (swapped expected values)

---

Generated with the help of Claude Code, https://code.claude.com

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mokagio mokagio self-assigned this Apr 9, 2026
Verify that Claude analysis is skipped when only Danger (non-essential) fails.

---

Generated with the help of Claude Code, https://claude.ai/code

Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>
@wpmobilebot
Copy link
Copy Markdown
Contributor

wpmobilebot commented Apr 9, 2026

App Icon📲 You can test the changes from this Pull Request in WordPress Android by scanning the QR code below to install the corresponding build.

App NameWordPress Android
Build TypeDebug
Versionpr22779-85e1e9c
Build Number1488
Application IDorg.wordpress.android.prealpha
Commit85e1e9c
Installation URL0eim8c093oo1o
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

@wpmobilebot
Copy link
Copy Markdown
Contributor

wpmobilebot commented Apr 9, 2026

App Icon📲 You can test the changes from this Pull Request in Jetpack Android by scanning the QR code below to install the corresponding build.

App NameJetpack Android
Build TypeDebug
Versionpr22779-85e1e9c
Build Number1488
Application IDcom.jetpack.android.prealpha
Commit85e1e9c
Installation URL12g6qctnku9vo
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

mokagio and others added 3 commits April 9, 2026 14:25
`buildkite-agent step get outcome` returns `hard_failed`, not `failed`.
The wrong string meant non-essential failures were never detected.

---

Generated with the help of Claude Code, https://claude.ai/code

Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>
Adds a step that always fails and registers it in
the non-essential array to verify multi-entry gating.

---

Generated with the help of Claude Code, https://claude.ai/code

Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>
@mokagio mokagio added this to the 26.8 milestone Apr 9, 2026
Query total failures first via the API.
When zero, exit early instead of falling through
to the non-essential check which assumed failures.

---

Generated with the help of Claude Code, https://claude.ai/code

Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>
@mokagio mokagio closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants