Fix Claude build analysis gate: correctly skip only when all failures are non-essential by iangmaia · Pull Request #22760 · wordpress-mobile/WordPress-Android

iangmaia · 2026-04-02T16:38:54Z

Description

Improves the Claude Build Analysis Buildkite integration:

Model upgrade: Sonnet 4.5 → Sonnet 4.6
Custom fork: Switches to iangmaia/claude-summarize fork with build_log_mode: "failed" and max_log_lines: 1500 for focused log analysis
Smart gating: Skips Claude analysis entirely when only non-essential jobs (e.g. Danger) fail — no annotation, no noise
Co-failure correctness: When non-essential jobs fail alongside real failures, Claude still runs. Uses the Buildkite API to count total failures and compare against non-essential failures. Fails safe (runs Claude if API is unreachable).
Prompt improvements: Tells Claude to ignore Danger, broken jobs, and the analysis job itself. Uses YAML literal scalar (|) to preserve list formatting.
Script hardening: Portable shebang, proper Bash array for step keys, timed_out jobs included in failure count.

Test Steps

Push a change that breaks the build → Claude analysis should run and annotate real failures
Push a change where only Danger fails → Claude analysis should be skipped entirely (no annotation)
If both Danger and a real test fail → Claude should still run

I have considered if this change warrants user-facing release notes and have added them to RELEASE-NOTES.txt if necessary.

wpmobilebot · 2026-04-02T17:24:06Z

📲 You can test the changes from this Pull Request in Jetpack Android by scanning the QR code below to install the corresponding build.

	App Name	Jetpack Android
	Build Type	Debug
	Version	pr22760-ab82dac
	Build Number	`1488`
	Application ID	`com.jetpack.android.prealpha`
	Commit	`ab82dac`
	Installation URL	5r3fh0k3ostko

Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

wpmobilebot · 2026-04-02T17:24:30Z

📲 You can test the changes from this Pull Request in WordPress Android by scanning the QR code below to install the corresponding build.

	App Name	WordPress Android
	Build Type	Debug
	Version	pr22760-ab82dac
	Build Number	`1488`
	Application ID	`org.wordpress.android.prealpha`
	Commit	`ab82dac`
	Installation URL	09mofgpufsmcg

Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

sonarqubecloud · 2026-04-02T19:17:55Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

.buildkite/commands/upload-claude-analysis.sh

mokagio · 2026-04-08T04:08:01Z

.buildkite/commands/upload-claude-analysis.sh

+# Non-essential steps whose failure alone should NOT trigger Claude analysis.
+# Add step keys here as needed.
+NON_ESSENTIAL_STEPS="danger"
+
+for step_key in ${NON_ESSENTIAL_STEPS}; do
+  OUTCOME=$(buildkite-agent step get outcome --step "${step_key}" 2>/dev/null || echo "not_run")
+  if [ "${OUTCOME}" = "failed" ]; then
+    echo "Only non-essential job '${step_key}' failed, skipping Claude analysis"
+    exit 0
+  fi
+done


@copilot can you check this logic for me? My Bash is not strong, but the input in the for loop does not look like an array to me.

You're right — it was a plain string, not an array. With an unquoted string variable, the for loop relies on word splitting (which works for simple keys but is fragile and can trigger glob expansion). Fixed in 0bbf3dd by converting NON_ESSENTIAL_STEPS to a proper Bash array and iterating with "${NON_ESSENTIAL_STEPS[@]}":

NON_ESSENTIAL_STEPS=("danger") for step_key in "${NON_ESSENTIAL_STEPS[@]}"; do

Adding more step keys is now just a matter of adding elements to the array: ("danger" "other-step").

@copilot Another question: In the way the check is, I get the idea that if one of the non-essential steps fail, e.g. danger, we'll hit the exit 0 line before we have a chance to check whether any of the essential steps failed. Or am I missing something?

You're right — that was a real logic bug. The old code exited as soon as it saw any non-essential step fail, so if Danger and essential tests both failed, Claude would be skipped incorrectly.

Fixed in 9ceba24. The new approach:

Loop through all non-essential steps and count their failures (no early exit).

If zero non-essential failures → something essential failed → run Claude.

If some non-essential failures → query the Buildkite REST API for the total failed job count in this build.

Only skip Claude when non_essential_failures == total_failures (i.e., all failures are accounted for by known non-essential steps).

If the API call fails → fail-safe: run Claude anyway.

Only skip Claude when non_essential_failures == total_failures (i.e., all failures are accounted for by known non-essential steps).

But in that case, if we have two non-essential failures and only one failed (e.g. danger and swiftlint are non-essential. danger is green, but swiftlint failed) we would not hit the return condition, would we?

@mokagio I believe I've fixed the issues you mentioned / found with Co-pilot in this PR and in the others. I was reluctant to use the build status call (it was also suggested to me by Claude) but it's probably the best way in terms of correctness and reliability of the pipeline.

I've tweaked the solution a little bit and updated on 3e7fd18, but the main changes are done in other repos.

Thanks for looking into it! 🙇

mokagio · 2026-04-08T04:11:58Z

.buildkite/claude-analysis.yml

+          build_log_mode: "failed"
+          trigger: "on-failure"
+          max_log_lines: 1500
+          custom_prompt: >


TIL about folded scalar > and literal scalar |.

> (folded scalar) joins lines into a single line — newlines become spaces. | (literal scalar) preserves newlines as-is. For a prompt sent to an LLM, > is fine since the model doesn't care about line breaks. | would matter if the exact formatting needed to be preserved (e.g., code blocks, numbered lists where you want literal newlines). In this case either works, but > is the more natural choice for prose.

Opus 4.6 says that > is better for prose, but I disagree. The prompt here has an ordered list and an intentional new line. If models care about all caps and bold, we should also carry over lists and paragraphs.

Suggested change

custom_prompt: >

custom_prompt: |

Agreed — the prompt has an ordered list and a blank paragraph separator, so | (literal scalar) is the right choice to preserve that structure. Changed in 0bbf3dd.

Improve the custom prompt to ignore non-real failures (the intentional exit 1 trigger, Danger PR Check, and broken/skipped jobs) and respond briefly when no actual failures exist. Migrate model from Sonnet 4.5 to Sonnet 4.6. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use iangmaia/claude-summarize fork which adds build_log_mode, max_log_lines, and on-failure trigger. This feeds only failed job logs to Claude (capped at 1500 lines) for more focused analysis with less noise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add upload-claude-analysis.sh that checks non-essential step outcomes before uploading the Claude analysis pipeline. When only Danger or other non-critical jobs failed, the pipeline is not uploaded — no analysis, no annotation, no PR comment. Also simplify the custom prompt now that Danger is filtered upstream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When real failures co-occur with a Danger failure, Claude should still ignore Danger rather than wasting analysis space on it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…L scalar Agent-Logs-Url: https://github.com/wordpress-mobile/WordPress-Android/sessions/f2282ed4-20ab-42a2-b37b-6fac3bfa3514 Co-authored-by: mokagio <1218433+mokagio@users.noreply.github.com>

…tial co-failures Agent-Logs-Url: https://github.com/wordpress-mobile/WordPress-Android/sessions/1f6e9e79-7580-4d15-97d2-c85bb5452559 Co-authored-by: mokagio <1218433+mokagio@users.noreply.github.com>

The jq filter only matched state == "failed", missing jobs with state == "timed_out". This could cause Claude to be skipped when a real job times out alongside a non-essential failure like Danger. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mokagio · 2026-04-09T04:21:48Z

Thanks @iangmaia !

I opened #22779 to test this. I verified the build analysis runs when there are build failures. Unfortunately, the script did not behave as expected when only Danger failed and still run the build failure analysis

I had Claude monitor the pipeline and this is what it guessed

⏺ CI Results — Build #25938

  The gating step did not skip Claude analysis as expected. Here's what happened:

  All builds and tests passed. Only Danger failed (missing label/milestone) — as expected.

  Bug found: The upload-claude-analysis.sh gating script failed to detect Danger's failure. The
  buildkite-agent step get outcome --step "danger" call likely can't resolve the step key when
  it's nested inside a group: "Linters" block in the pipeline YAML. It falls back to "not_run",
  so non_essential_failures stays 0, and the script hits the early exit path assuming essential
  steps must have failed.

  The net effect: Claude analysis ran unnecessarily and (correctly) reported the build is healthy
   — but ideally it shouldn't have run at all.

  Want to investigate the grouped step key issue in upload-claude-analysis.sh?

`buildkite-agent step get outcome` returns `hard_failed`, not `failed`. The wrong string meant non-essential failures were never detected. --- Generated with the help of Claude Code, https://claude.ai/code Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>

mokagio · 2026-04-09T05:42:05Z

1ddc7f4 fixes the issue reported above.

I run a few more scenarios in the #22779 PR to see what happened and it was all good.

Ready to ship.

Query total failures first via the API. When zero, exit early instead of falling through to the non-essential check which assumed failures. --- Generated with the help of Claude Code, https://claude.ai/code Co-Authored-By: Claude Code Opus 4.6 <noreply@anthropic.com>

mokagio · 2026-04-09T10:01:57Z

@iangmaia sorry, I went a bit rogue here and committed directly on your PR. Copilot had already added commits here so I just rolled with it. Besides, this seemed to me like mostly a matter of trial and error with the script till the criteria were met, more that a whole design overhaul.

iangmaia self-assigned this Apr 2, 2026

iangmaia added the Tooling label Apr 2, 2026

iangmaia added this to the 26.8 milestone Apr 2, 2026

iangmaia requested review from mokagio and twstokes April 2, 2026 18:58

iangmaia enabled auto-merge (squash) April 2, 2026 19:01

mokagio reviewed Apr 8, 2026

View reviewed changes

Copilot started work on behalf of mokagio April 8, 2026 04:12 View session

auto-merge was automatically disabled April 8, 2026 04:13
Head branch was pushed to by a user without write access

Copilot AI changed the title ~~Update Claude build analysis prompt and model to Sonnet 4.6~~ Fix Bash script robustness and YAML scalar type in Claude build analysis Apr 8, 2026

Copilot finished work on behalf of mokagio April 8, 2026 04:14

Copilot AI requested a review from mokagio April 8, 2026 04:14

Copilot started work on behalf of mokagio April 8, 2026 04:18 View session

Copilot AI changed the title ~~Fix Bash script robustness and YAML scalar type in Claude build analysis~~ Fix Claude build analysis gate: correctly skip only when all failures are non-essential Apr 8, 2026

Copilot finished work on behalf of mokagio April 8, 2026 04:25

This was referenced Apr 8, 2026

Update Claude build analysis prompt and model to Sonnet 4.6 woocommerce/woocommerce-android#15605

Open

Update Claude build analysis prompt and model to Sonnet 4.6 woocommerce/woocommerce-ios#16884

Open

iangmaia and others added 7 commits April 8, 2026 17:41

Add Danger back to the prompt exclusion list

6238001

When real failures co-occur with a Danger failure, Claude should still ignore Danger rather than wasting analysis space on it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Address review feedback: fix shebang, use Bash array, use literal YAM…

9b69bc1

…L scalar Agent-Logs-Url: https://github.com/wordpress-mobile/WordPress-Android/sessions/f2282ed4-20ab-42a2-b37b-6fac3bfa3514 Co-authored-by: mokagio <1218433+mokagio@users.noreply.github.com>

Fix upload-claude-analysis.sh to correctly handle essential+non-essen…

06a203a

…tial co-failures Agent-Logs-Url: https://github.com/wordpress-mobile/WordPress-Android/sessions/1f6e9e79-7580-4d15-97d2-c85bb5452559 Co-authored-by: mokagio <1218433+mokagio@users.noreply.github.com>

iangmaia force-pushed the iangmaia/update-claude-build-analysis branch from 6950c3a to 3e7fd18 Compare April 8, 2026 15:41

mokagio mentioned this pull request Apr 9, 2026

Test: Intentional build and test failures for LLM plugin #22779

Closed

2 tasks

mokagio enabled auto-merge (squash) April 9, 2026 07:36

mokagio approved these changes Apr 9, 2026

View reviewed changes

mokagio merged commit 167a2d9 into trunk Apr 9, 2026
21 checks passed

mokagio deleted the iangmaia/update-claude-build-analysis branch April 9, 2026 07:37

mokagio mentioned this pull request Apr 9, 2026

Update Claude build analysis prompt and model to Sonnet 4.6 wordpress-mobile/WordPress-iOS#25477

Merged

Conversation

iangmaia commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Steps

Uh oh!

wpmobilebot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wpmobilebot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Apr 2, 2026

Quality Gate passed

Uh oh!

Uh oh!

mokagio Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

mokagio Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

mokagio Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

iangmaia Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mokagio Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

mokagio commented Apr 9, 2026

Uh oh!

mokagio commented Apr 9, 2026

Uh oh!

Uh oh!

mokagio commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iangmaia commented Apr 2, 2026 •

edited

Loading

wpmobilebot commented Apr 2, 2026 •

edited

Loading

wpmobilebot commented Apr 2, 2026 •

edited

Loading

iangmaia Apr 8, 2026 •

edited

Loading