From bc8b786b3ef7cd7c74ac2c5ec8e888ebb4639079 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 13:29:15 -0600
Subject: [PATCH 01/44] ci-analysis: replace canned recommendations with JSON
 summary + agent reasoning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Apply the Data vs. Reasoning Boundary pattern:
- Script emits [CI_ANALYSIS_SUMMARY] JSON block with structured facts
  (totalFailedJobs, failedJobNames, knownIssues, prCorrelation, recommendationHint)
- Removed 47-line if/elseif recommendation chain producing canned prose
- Added 'Generating Recommendations' section to SKILL.md with decision table
- Updated 'Presenting Results' to reference JSON summary flow
- Agent now reasons over structured data instead of parroting script output

Tested with Claude Sonnet 4 and GPT-5 against PR #124232 — both rated
JSON completeness 4/5 and generated better recommendations than the old
heuristic.
---
 .github/skills/ci-analysis/SKILL.md           |  79 +++++++---
 .../ci-analysis/scripts/Get-CIStatus.ps1      | 143 +++++++++++-------
 2 files changed, 141 insertions(+), 81 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 13177698c00a33..28bd78e3946920 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: ci-analysis
-description: Analyze CI build and test status from Azure DevOps and Helix for dotnet repository PRs. Use when checking CI status, investigating failures, determining if a PR is ready to merge, or given URLs containing dev.azure.com or helix.dot.net. Also use when asked "why is CI red", "test failures", "retry CI", "rerun tests", or "is CI green".
+description: Analyze CI build and test status from Azure DevOps and Helix for dotnet repository PRs. Use when checking CI status, investigating failures, determining if a PR is ready to merge, or given URLs containing dev.azure.com or helix.dot.net. Also use when asked "why is CI red", "test failures", "retry CI", "rerun tests", "is CI green", "build failed", "checks failing", or "flaky tests".
 ---
 
 # Azure DevOps and Helix CI Analysis
@@ -9,6 +9,8 @@ Analyze CI build status and test failures in Azure DevOps and Helix for dotnet r
 
 > 🚨 **NEVER** use `gh pr review --approve` or `--request-changes`. Only `--comment` is allowed. Approval and blocking are human-only actions.
 
+**Workflow**: Run the script → read the human-readable output + `[CI_ANALYSIS_SUMMARY]` JSON → synthesize recommendations yourself. The script collects data; you generate the advice.
+
 ## When to Use This Skill
 
 Use this skill when:
@@ -81,11 +83,13 @@ The script operates in three distinct modes depending on what information you ha
 6. Fetches console logs (with `-ShowLogs`)
 7. Searches for known issues with "Known Build Error" label
 8. Correlates failures with PR file changes
-9. **Provides smart retry recommendations**
+9. **Emits structured summary** — `[CI_ANALYSIS_SUMMARY]` JSON block with all key facts for the agent to reason over
+
+> **After the script runs**, you (the agent) generate recommendations. The script collects data; you synthesize the advice. See [Generating Recommendations](#generating-recommendations) below.
 
 ### Build ID Mode (`-BuildId`)
 1. Fetches the build timeline directly (skips PR discovery)
-2. Performs steps 3–7 and 9 from PR Analysis Mode, but does **not** fetch Build Analysis known issues or correlate failures with PR file changes (those require a PR number)
+2. Performs steps 3–7 from PR Analysis Mode, but does **not** fetch Build Analysis known issues or correlate failures with PR file changes (those require a PR number). Still emits `[CI_ANALYSIS_SUMMARY]` JSON.
 
 ### Helix Job Mode (`-HelixJob` [and optional `-WorkItem`])
 1. With `-HelixJob` alone: enumerates work items for the job and summarizes their status
@@ -116,16 +120,40 @@ The script operates in three distinct modes depending on what information you ha
 
 > ❌ **Missing packages on flow PRs are NOT always infrastructure failures.** When a codeflow or dependency-update PR fails with "package not found" or "version not available", don't assume it's a feed propagation delay. Flow PRs bring in behavioral changes from upstream repos that can cause the build to request *different* packages than before. Example: an SDK flow changed runtime pack resolution logic, causing builds to look for `Microsoft.NETCore.App.Runtime.browser-wasm` (CoreCLR — doesn't exist) instead of `Microsoft.NETCore.App.Runtime.Mono.browser-wasm` (what had always been used). The fix was in the flowed code, not in feed infrastructure. Always check *which* package is missing and *why* it's being requested before diagnosing as infrastructure.
 
-## Retry Recommendations
+## Generating Recommendations
+
+After the script outputs the `[CI_ANALYSIS_SUMMARY]` JSON block, **you** synthesize recommendations. Do not parrot the JSON — reason over it.
+
+### Decision logic
+
+Read `recommendationHint` as a starting point, then layer in context:
+
+| Hint | Action |
+|------|--------|
+| `BUILD_SUCCESSFUL` | No failures. Confirm CI is green. |
+| `KNOWN_ISSUES_DETECTED` | Known tracked issues found. Recommend retry if failures match known issues. Link the issues. |
+| `LIKELY_PR_RELATED` | Failures correlate with PR changes. Lead with "fix these before retrying" and list `correlatedFiles`. |
+| `POSSIBLY_TRANSIENT` | No correlation with PR changes, no known issues. Suggest checking main branch, searching for issues, or retrying. |
+| `REVIEW_REQUIRED` | Could not auto-determine cause. Review failures manually. |
 
-The script provides a recommendation at the end:
+Then layer in nuance the heuristic can't capture:
 
-| Recommendation | Meaning |
-|----------------|---------|
-| **KNOWN ISSUES DETECTED** | Tracked issues found that may correlate with failures. Review details. |
-| **LIKELY PR-RELATED** | Failures correlate with PR changes. Fix issues first. |
-| **POSSIBLY TRANSIENT** | No clear cause - check main branch, search for issues. |
-| **REVIEW REQUIRED** | Could not auto-determine cause. Manual review needed. |
+- **Mixed signals**: Some failures match known issues AND some correlate with PR changes → separate them. Known issues = safe to retry; correlated = fix first.
+- **Canceled jobs with recoverable results**: If `canceledJobNames` is non-empty, mention that canceled jobs may have passing Helix results (see "Recovering Results from Canceled Jobs").
+- **Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
+- **Multiple builds**: If `builds` has >1 entry, `lastBuildJobSummary` reflects only the last build — use `totalFailedJobs` for the aggregate count.
+- **BuildId mode**: `knownIssues` and `prCorrelation` will be empty (those require a PR number). Don't say "no known issues" — say "Build Analysis not available in BuildId mode."
+- **Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on main. See the anti-patterns in "Interpreting Results" above.
+
+### How to Retry
+
+- **AzDO builds**: Comment `/azp run {pipeline-name}` on the PR (e.g., `/azp run dotnet-sdk-public`)
+- **All pipelines**: Comment `/azp run` to retry all failing pipelines
+- **Helix work items**: Cannot be individually retried — must re-run the entire AzDO build
+
+### Tone
+
+Be direct. Lead with the most important finding. Use 2-4 bullet points, not long paragraphs. Distinguish what's known vs. uncertain.
 
 ## Analysis Workflow
 
@@ -133,7 +161,8 @@ The script provides a recommendation at the end:
 2. **Run the script** with `-ShowLogs` for detailed failure info
 3. **Check Build Analysis** - Known issues are safe to retry
 4. **Correlate with PR changes** - Same files failing = likely PR-related
-5. **Interpret patterns** (but don't jump to conclusions):
+5. **Compare with baseline** - If a test passes on main but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
+6. **Interpret patterns** (but don't jump to conclusions):
    - Same error across many jobs → Real code issue
    - Build Analysis flags a known issue → Safe to retry
    - Failure is **not** in Build Analysis → Investigate further before assuming transient
@@ -142,19 +171,21 @@ The script provides a recommendation at the end:
 
 ## Presenting Results
 
-The script provides a recommendation at the end, but this is based on heuristics and may be incomplete. Before presenting conclusions to the user:
-
-> ❌ **Don't blindly trust the script's recommendation.** The heuristic can misclassify failures. If the recommendation says "POSSIBLY TRANSIENT" but you see the same test failing 5 times on the same code path the PR touched — it's PR-related.
+The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both:
 
-1. Review the detailed failure information, not just the summary
-2. Look for patterns the script may have missed (e.g., related failures across jobs)
-3. Consider the PR context (what files changed, what the PR is trying to do)
-4. Present findings with appropriate caveats - state what is known vs. uncertain
-5. If the script's recommendation seems inconsistent with the details, trust the details
+1. Read the JSON summary for structured facts (failed jobs, known issues, PR correlation, recommendation hint)
+2. Read the human-readable output for failure details, console logs, and error messages
+3. Reason over both to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
+4. Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
+5. Consider the PR context (what files changed, what the PR is trying to do)
+6. Present findings with appropriate caveats — state what is known vs. uncertain
 
 ## References
 
 - **Helix artifacts & binlogs**: See [references/helix-artifacts.md](references/helix-artifacts.md)
+- **Binlog comparison (passing vs failing)**: See [references/binlog-comparison.md](references/binlog-comparison.md)
+- **Subagent delegation patterns**: See [references/delegation-patterns.md](references/delegation-patterns.md)
+- **Azure CLI deep investigation**: See [references/azure-cli.md](references/azure-cli.md)
 - **Manual investigation steps**: See [references/manual-investigation.md](references/manual-investigation.md)
 - **AzDO/Helix details**: See [references/azdo-helix-reference.md](references/azdo-helix-reference.md)
 
@@ -164,9 +195,9 @@ Canceled jobs (typically from timeouts) often still have useful artifacts. The H
 
 **To investigate canceled jobs:**
 
-1. **Download build artifacts**: Use the AzDO artifacts API to get `Logs_Build_*` pipeline artifacts for the canceled job. These contain binlogs even for canceled jobs.
-2. **Extract Helix job IDs**: Use the MSBuild MCP server to load the `SendToHelix.binlog` and search for `"Sent Helix Job"` messages. Each contains a Helix job ID.
-3. **Query Helix directly**: For each job ID, query `https://helix.dot.net/api/2019-06-17/jobs/{jobId}/workitems` to get actual pass/fail results.
+1. **Download build artifacts**: Use `az pipelines runs artifact download` (see [references/azure-cli.md](references/azure-cli.md)) to get pipeline artifacts for the canceled job. These contain binlogs even for canceled jobs.
+2. **Extract Helix job IDs**: Use the MSBuild MCP server to load the `SendToHelix.binlog` and search for `"Sent Helix Job"` messages. Each contains a Helix job ID. See [references/binlog-comparison.md](references/binlog-comparison.md) for the full "binlogs to find binlogs" workflow.
+3. **Query Helix directly**: For each job ID, use the CI script: `./scripts/Get-CIStatus.ps1 -HelixJob "{GUID}" -FindBinlogs`
 
 **Example**: A `browser-wasm windows WasmBuildTests` job was canceled after 3 hours. The binlog (truncated) still contained 12 Helix job IDs. Querying them revealed all 226 work items passed — the "failure" was purely a timeout in the AzDO wrapper.
 
@@ -278,5 +309,5 @@ This is especially useful when:
 4. Use `-SearchMihuBot` for semantic search of related issues
 5. Binlogs in artifacts help diagnose MSB4018 task failures
 6. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
-7. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** — `state` contains `SUCCESS`/`FAILURE` directly
+7. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
 8. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index e386604be3fa5e..c4c053a89af93c 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -1757,6 +1757,9 @@ try {
     $totalFailedJobs = 0
     $totalLocalFailures = 0
     $allFailuresForCorrelation = @()
+    $allFailedJobNames = @()
+    $allCanceledJobNames = @()
+    $lastBuildJobSummary = $null
 
     foreach ($currentBuildId in $buildIds) {
         Write-Host "`n=== Azure DevOps Build $currentBuildId ===" -ForegroundColor Yellow
@@ -1800,7 +1803,37 @@ try {
         # Also check for local test failures (non-Helix)
         $localTestFailures = Get-LocalTestFailures -Timeline $timeline -BuildId $currentBuildId
 
-        if ((-not $failedJobs -or $failedJobs.Count -eq 0) -and $localTestFailures.Count -eq 0) {
+        # Accumulate totals and compute job summary BEFORE any continue branches
+        $totalFailedJobs += $failedJobs.Count
+        $totalLocalFailures += $localTestFailures.Count
+        $allFailedJobNames += @($failedJobs | ForEach-Object { $_.name })
+        $allCanceledJobNames += @($canceledJobs | ForEach-Object { $_.name })
+
+        $allJobs = @()
+        $succeededJobs = 0
+        $pendingJobs = 0
+        $canceledJobCount = 0
+        $skippedJobs = 0
+        $warningJobs = 0
+        if ($timeline -and $timeline.records) {
+            $allJobs = @($timeline.records | Where-Object { $_.type -eq "Job" })
+            $succeededJobs = @($allJobs | Where-Object { $_.result -eq "succeeded" }).Count
+            $warningJobs = @($allJobs | Where-Object { $_.result -eq "succeededWithIssues" }).Count
+            $pendingJobs = @($allJobs | Where-Object { -not $_.result -or $_.state -eq "pending" -or $_.state -eq "inProgress" }).Count
+            $canceledJobCount = @($allJobs | Where-Object { $_.result -eq "canceled" }).Count
+            $skippedJobs = @($allJobs | Where-Object { $_.result -eq "skipped" }).Count
+        }
+        $lastBuildJobSummary = [ordered]@{
+            total = $allJobs.Count
+            succeeded = $succeededJobs
+            failed = if ($failedJobs) { $failedJobs.Count } else { 0 }
+            canceled = $canceledJobCount
+            pending = $pendingJobs
+            warnings = $warningJobs
+            skipped = $skippedJobs
+        }
+
+        if((-not $failedJobs -or $failedJobs.Count -eq 0) -and $localTestFailures.Count -eq 0) {
             if ($buildStatus -and $buildStatus.Status -eq "inProgress") {
                 Write-Host "`nNo failures yet - build still in progress" -ForegroundColor Cyan
                 Write-Host "Run again later to check for failures, or use -NoCache to get fresh data" -ForegroundColor Gray
@@ -1885,7 +1918,6 @@ try {
             Write-Host "`n=== Summary ===" -ForegroundColor Yellow
             Write-Host "Local test failures: $($localTestFailures.Count)" -ForegroundColor Red
             Write-Host "Build URL: https://dev.azure.com/$Organization/$Project/_build/results?buildId=$currentBuildId" -ForegroundColor Cyan
-            $totalLocalFailures += $localTestFailures.Count
             continue
         }
 
@@ -2055,25 +2087,6 @@ try {
         }
     }
 
-    $totalFailedJobs += $failedJobs.Count
-    $totalLocalFailures += $localTestFailures.Count
-
-    # Compute job summary from timeline
-    $allJobs = @()
-    $succeededJobs = 0
-    $pendingJobs = 0
-    $canceledJobCount = 0
-    $skippedJobs = 0
-    $warningJobs = 0
-    if ($timeline -and $timeline.records) {
-        $allJobs = @($timeline.records | Where-Object { $_.type -eq "Job" })
-        $succeededJobs = @($allJobs | Where-Object { $_.result -eq "succeeded" }).Count
-        $warningJobs = @($allJobs | Where-Object { $_.result -eq "succeededWithIssues" }).Count
-        $pendingJobs = @($allJobs | Where-Object { -not $_.result -or $_.state -eq "pending" -or $_.state -eq "inProgress" }).Count
-        $canceledJobCount = @($allJobs | Where-Object { $_.result -eq "canceled" }).Count
-        $skippedJobs = @($allJobs | Where-Object { $_.result -eq "skipped" }).Count
-    }
-
     Write-Host "`n=== Build $currentBuildId Summary ===" -ForegroundColor Yellow
     if ($allJobs.Count -gt 0) {
         $parts = @()
@@ -2121,54 +2134,70 @@ if ($buildIds.Count -gt 1) {
     }
 }
 
-# Smart retry recommendation
-Write-Host "`n=== Recommendation ===" -ForegroundColor Magenta
-
-if ($knownIssuesFromBuildAnalysis.Count -gt 0) {
-    $knownIssueCount = $knownIssuesFromBuildAnalysis.Count
-    Write-Host "KNOWN ISSUES DETECTED" -ForegroundColor Yellow
-    Write-Host "$knownIssueCount tracked issue(s) found that may correlate with failures above." -ForegroundColor White
-    Write-Host "Review the failure details and linked issues to determine if retry is needed." -ForegroundColor Gray
+# Build structured summary and emit as JSON
+$summary = [ordered]@{
+    mode = $PSCmdlet.ParameterSetName
+    repository = $Repository
+    prNumber = if ($PSCmdlet.ParameterSetName -eq 'PRNumber') { $PRNumber } else { $null }
+    builds = @($buildIds | ForEach-Object {
+        [ordered]@{
+            buildId = $_
+            url = "https://dev.azure.com/$Organization/$Project/_build/results?buildId=$_"
+        }
+    })
+    totalFailedJobs = $totalFailedJobs
+    totalLocalFailures = $totalLocalFailures
+    lastBuildJobSummary = if ($lastBuildJobSummary) { $lastBuildJobSummary } else { [ordered]@{
+        total = 0; succeeded = 0; failed = 0; canceled = 0; pending = 0; warnings = 0; skipped = 0
+    } }
+    failedJobNames = @($allFailedJobNames)
+    canceledJobNames = @($allCanceledJobNames)
+    knownIssues = @($knownIssuesFromBuildAnalysis | ForEach-Object {
+        [ordered]@{ number = $_.Number; title = $_.Title; url = $_.Url }
+    })
+    prCorrelation = [ordered]@{
+        changedFileCount = $prChangedFiles.Count
+        hasCorrelation = $false
+        correlatedFiles = @()
+    }
+    recommendationHint = ""
 }
-elseif ($totalFailedJobs -eq 0 -and $totalLocalFailures -eq 0) {
-    Write-Host "BUILD SUCCESSFUL" -ForegroundColor Green
-    Write-Host "No failures detected." -ForegroundColor White
-}
-elseif ($prChangedFiles.Count -gt 0 -and $allFailuresForCorrelation.Count -gt 0) {
-    # Check if failures correlate with PR changes
-    $hasCorrelation = $false
+
+# Compute PR correlation
+if ($prChangedFiles.Count -gt 0 -and $allFailuresForCorrelation.Count -gt 0) {
+    $correlatedFiles = @()
     foreach ($failure in $allFailuresForCorrelation) {
         $failureText = ($failure.Errors + $failure.HelixLogs + $failure.FailedTests) -join " "
         foreach ($file in $prChangedFiles) {
             $fileName = [System.IO.Path]::GetFileNameWithoutExtension($file)
             if ($failureText -match [regex]::Escape($fileName)) {
-                $hasCorrelation = $true
-                break
+                $correlatedFiles += $file
             }
         }
-        if ($hasCorrelation) { break }
-    }
-    
-    if ($hasCorrelation) {
-        Write-Host "LIKELY PR-RELATED" -ForegroundColor Red
-        Write-Host "Failures appear to correlate with files changed in this PR." -ForegroundColor White
-        Write-Host "Review the 'PR Change Correlation' section above and fix the issues before retrying." -ForegroundColor Gray
-    }
-    else {
-        Write-Host "POSSIBLY TRANSIENT" -ForegroundColor Yellow
-        Write-Host "No known issues matched, but failures don't clearly correlate with PR changes." -ForegroundColor White
-        Write-Host "Consider:" -ForegroundColor Gray
-        Write-Host "  1. Check if same tests are failing on main branch" -ForegroundColor Gray
-        Write-Host "  2. Search for existing issues: gh issue list --label 'Known Build Error' --search '<test name>'" -ForegroundColor Gray
-        Write-Host "  3. If infrastructure-related (device not found, network errors), retry may help" -ForegroundColor Gray
     }
+    $correlatedFiles = @($correlatedFiles | Select-Object -Unique)
+    $summary.prCorrelation.hasCorrelation = $correlatedFiles.Count -gt 0
+    $summary.prCorrelation.correlatedFiles = $correlatedFiles
 }
-else {
-    Write-Host "REVIEW REQUIRED" -ForegroundColor Yellow
-    Write-Host "Could not automatically determine failure cause." -ForegroundColor White
-    Write-Host "Review the failures above to determine if they are PR-related or infrastructure issues." -ForegroundColor Gray
+
+# Compute recommendation hint
+if ($knownIssuesFromBuildAnalysis.Count -gt 0) {
+    $summary.recommendationHint = "KNOWN_ISSUES_DETECTED"
+} elseif ($totalFailedJobs -eq 0 -and $totalLocalFailures -eq 0) {
+    $summary.recommendationHint = "BUILD_SUCCESSFUL"
+} elseif ($summary.prCorrelation.hasCorrelation) {
+    $summary.recommendationHint = "LIKELY_PR_RELATED"
+} elseif ($prChangedFiles.Count -gt 0) {
+    $summary.recommendationHint = "POSSIBLY_TRANSIENT"
+} else {
+    $summary.recommendationHint = "REVIEW_REQUIRED"
 }
 
+Write-Host ""
+Write-Host "[CI_ANALYSIS_SUMMARY]"
+Write-Host ($summary | ConvertTo-Json -Depth 5)
+Write-Host "[/CI_ANALYSIS_SUMMARY]"
+
 }
 catch {
     Write-Error "Error: $_"

From a3934944f2533108a660ae6aab61c2058b3dc485 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 13:49:23 -0600
Subject: [PATCH 02/44] ci-analysis: add Step 0 context gathering, structured
 output, verify-before-claiming

- Add Step 0: Gather Context section with PR type classification table
  (code, flow, backport, merge, dependency update) that determines
  interpretation framework
- Add Step 3: Verify before claiming - systematic checklist before
  labeling failures as infrastructure/transient/PR-related
- Add structured output format (summary verdict, failure details,
  recommended actions)
- Replace 'main branch' with 'target branch' throughout - backports
  and release-branch PRs need comparison against their actual base,
  not main
- Remove redundant tip (covered by Step 0)
---
 .github/skills/ci-analysis/SKILL.md | 94 ++++++++++++++++++++++-------
 1 file changed, 71 insertions(+), 23 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 28bd78e3946920..96a607ff542584 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -9,7 +9,7 @@ Analyze CI build status and test failures in Azure DevOps and Helix for dotnet r
 
 > 🚨 **NEVER** use `gh pr review --approve` or `--request-changes`. Only `--comment` is allowed. Approval and blocking are human-only actions.
 
-**Workflow**: Run the script → read the human-readable output + `[CI_ANALYSIS_SUMMARY]` JSON → synthesize recommendations yourself. The script collects data; you generate the advice.
+**Workflow**: Gather PR context (Step 0) → run the script → read the human-readable output + `[CI_ANALYSIS_SUMMARY]` JSON → synthesize recommendations yourself. The script collects data; you generate the advice.
 
 ## When to Use This Skill
 
@@ -114,7 +114,7 @@ The script operates in three distinct modes depending on what information you ha
 
 **Local test failures**: Some repos (e.g., dotnet/sdk) run tests directly on build agents. These can also match known issues - search for the test name with the "Known Build Error" label.
 
-> ⚠️ **Be cautious labeling failures as "infrastructure."** If Build Analysis didn't flag a failure as a known issue, treat it as potentially real — even if it looks like a device failure, Docker issue, or network timeout. Only conclude "infrastructure" when you have strong evidence (e.g., identical failure on main branch, Build Analysis match, or confirmed outage). Dismissing failures as transient without evidence delays real bug discovery.
+> ⚠️ **Be cautious labeling failures as "infrastructure."** If Build Analysis didn't flag a failure as a known issue, treat it as potentially real — even if it looks like a device failure, Docker issue, or network timeout. Only conclude "infrastructure" when you have strong evidence (e.g., identical failure on the target branch, Build Analysis match, or confirmed outage). Dismissing failures as transient without evidence delays real bug discovery.
 
 > ❌ **Don't confuse "environment-related" with "infrastructure."** A test that fails because a required framework isn't installed (e.g., .NET 2.2) is a **test defect** — the test has wrong assumptions about what's available. Infrastructure failures are *transient*: network timeouts, Docker pull failures, agent crashes, disk space. If the failure would reproduce 100% of the time on any machine with the same setup, it's a code/test issue, not infra. The word "environment" in the error doesn't make it an infrastructure problem.
 
@@ -133,7 +133,7 @@ Read `recommendationHint` as a starting point, then layer in context:
 | `BUILD_SUCCESSFUL` | No failures. Confirm CI is green. |
 | `KNOWN_ISSUES_DETECTED` | Known tracked issues found. Recommend retry if failures match known issues. Link the issues. |
 | `LIKELY_PR_RELATED` | Failures correlate with PR changes. Lead with "fix these before retrying" and list `correlatedFiles`. |
-| `POSSIBLY_TRANSIENT` | No correlation with PR changes, no known issues. Suggest checking main branch, searching for issues, or retrying. |
+| `POSSIBLY_TRANSIENT` | No correlation with PR changes, no known issues. Suggest checking the target branch, searching for issues, or retrying. |
 | `REVIEW_REQUIRED` | Could not auto-determine cause. Review failures manually. |
 
 Then layer in nuance the heuristic can't capture:
@@ -143,7 +143,7 @@ Then layer in nuance the heuristic can't capture:
 - **Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
 - **Multiple builds**: If `builds` has >1 entry, `lastBuildJobSummary` reflects only the last build — use `totalFailedJobs` for the aggregate count.
 - **BuildId mode**: `knownIssues` and `prCorrelation` will be empty (those require a PR number). Don't say "no known issues" — say "Build Analysis not available in BuildId mode."
-- **Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on main. See the anti-patterns in "Interpreting Results" above.
+- **Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on the target branch. See the anti-patterns in "Interpreting Results" above.
 
 ### How to Retry
 
@@ -157,27 +157,76 @@ Be direct. Lead with the most important finding. Use 2-4 bullet points, not long
 
 ## Analysis Workflow
 
-1. **Read PR context first** - Check title, description, comments
-2. **Run the script** with `-ShowLogs` for detailed failure info
-3. **Check Build Analysis** - Known issues are safe to retry
-4. **Correlate with PR changes** - Same files failing = likely PR-related
-5. **Compare with baseline** - If a test passes on main but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
-6. **Interpret patterns** (but don't jump to conclusions):
+### Step 0: Gather Context (before running anything)
+
+Before running the script, read the PR to understand what you're analyzing. Context changes how you interpret every failure.
+
+1. **Read PR metadata** — title, description, author, labels, linked issues
+2. **Classify the PR type** — this determines your interpretation framework:
+
+| PR Type | How to detect | Interpretation shift |
+|---------|--------------|---------------------|
+| **Code PR** | Human author, code changes | Failures likely relate to the changes |
+| **Flow/Codeflow PR** | Author is `dotnet-maestro[bot]`, title mentions "Update dependencies" | Missing packages may be behavioral, not infrastructure (see anti-pattern below) |
+| **Backport** | Title mentions "backport", targets a release branch | Failures may be branch-specific; check if test exists on target branch |
+| **Merge PR** | Merging between branches (e.g., release → main) | Conflicts and merge artifacts cause failures, not the individual changes |
+| **Dependency update** | Bumps package versions, global.json changes | Build failures often trace to the dependency, not the PR's own code |
+
+3. **Check existing comments** — has someone already diagnosed the failures? Is there a retry pending?
+4. **Note the changed files** — you'll use these to evaluate correlation after the script runs
+
+> ❌ **Don't skip Step 0.** Running the script without PR context leads to misdiagnosis — especially for flow PRs where "package not found" looks like infrastructure but is actually a code issue.
+
+### Step 1: Run the script
+
+Run with `-ShowLogs` for detailed failure info.
+
+### Step 2: Analyze results
+
+1. **Check Build Analysis** — Known issues are safe to retry
+2. **Correlate with PR changes** — Same files failing = likely PR-related
+3. **Compare with baseline** — If a test passes on the target branch but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
+4. **Interpret patterns** (but don't jump to conclusions):
    - Same error across many jobs → Real code issue
    - Build Analysis flags a known issue → Safe to retry
    - Failure is **not** in Build Analysis → Investigate further before assuming transient
-   - Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against main branch first
+   - Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against the target branch first
    - Test timeout but tests passed → Executor issue, not test failure
 
+### Step 3: Verify before claiming
+
+Before stating a failure's cause, verify your claim:
+
+- **"Infrastructure failure"** → Did Build Analysis flag it? Does the same test pass on the target branch? If neither, don't call it infrastructure.
+- **"Transient/flaky"** → Has it failed before? Is there a known issue? A single non-reproducing failure isn't enough to call it flaky.
+- **"PR-related"** → Do the changed files actually relate to the failing test? Correlation in the script output is heuristic, not proof.
+- **"Safe to retry"** → Are ALL failures accounted for (known issues or infrastructure), or are you ignoring some?
+- **"Not related to this PR"** → Have you checked if the test passes on the target branch? Don't assume — verify.
+
 ## Presenting Results
 
-The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both:
+The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both to produce a structured response.
+
+### Output structure
+
+Use this format — adapt sections based on what you find:
+
+**1. Summary verdict** (1-2 sentences)
+Lead with the most important finding. Is CI green? Are failures PR-related? Known issues?
+
+**2. Failure details** (2-4 bullets)
+For each distinct failure category, state: what failed, why (known/correlated/unknown), and evidence.
+
+**3. Recommended actions** (numbered list)
+Specific next steps: retry, fix specific files, investigate further. Include `/azp run` commands if retrying.
+
+### How to synthesize
 
 1. Read the JSON summary for structured facts (failed jobs, known issues, PR correlation, recommendation hint)
 2. Read the human-readable output for failure details, console logs, and error messages
-3. Reason over both to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
-4. Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
-5. Consider the PR context (what files changed, what the PR is trying to do)
+3. Layer in Step 0 context — PR type, author intent, changed files
+4. Reason over all three to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
+5. Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
 6. Present findings with appropriate caveats — state what is known vs. uncertain
 
 ## References
@@ -303,11 +352,10 @@ This is especially useful when:
 
 ## Tips
 
-1. Read PR description and comments first for context
-2. Check if same test fails on main branch before assuming transient
-3. Look for `[ActiveIssue]` attributes for known skipped tests
-4. Use `-SearchMihuBot` for semantic search of related issues
-5. Binlogs in artifacts help diagnose MSB4018 task failures
-6. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
-7. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
-8. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls
+1. Check if same test fails on the target branch before assuming transient
+2. Look for `[ActiveIssue]` attributes for known skipped tests
+3. Use `-SearchMihuBot` for semantic search of related issues
+4. Binlogs in artifacts help diagnose MSB4018 task failures
+5. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
+6. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
+7. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls

From c3cbad94f503c7f562492b50b1487e0801c2608e Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 14:10:46 -0600
Subject: [PATCH 03/44] Address review: add missing reference files, fix
 BuildId mode wording, fix base/target terminology

---
 .github/skills/ci-analysis/SKILL.md           |   2 +-
 .../ci-analysis/references/azure-cli.md       |  93 +++++++++++
 .../references/binlog-comparison.md           | 144 ++++++++++++++++++
 .../references/delegation-patterns.md         | 101 ++++++++++++
 4 files changed, 339 insertions(+), 1 deletion(-)
 create mode 100644 .github/skills/ci-analysis/references/azure-cli.md
 create mode 100644 .github/skills/ci-analysis/references/binlog-comparison.md
 create mode 100644 .github/skills/ci-analysis/references/delegation-patterns.md

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 96a607ff542584..b0c75e4a409755 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -142,7 +142,7 @@ Then layer in nuance the heuristic can't capture:
 - **Canceled jobs with recoverable results**: If `canceledJobNames` is non-empty, mention that canceled jobs may have passing Helix results (see "Recovering Results from Canceled Jobs").
 - **Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
 - **Multiple builds**: If `builds` has >1 entry, `lastBuildJobSummary` reflects only the last build — use `totalFailedJobs` for the aggregate count.
-- **BuildId mode**: `knownIssues` and `prCorrelation` will be empty (those require a PR number). Don't say "no known issues" — say "Build Analysis not available in BuildId mode."
+- **BuildId mode**: `knownIssues` will be empty and `prCorrelation` will show `hasCorrelation = false` with `changedFileCount = 0` (PR correlation is not available without a PR number). Don't say "no known issues" or "no correlation" — say "Build Analysis and PR correlation not available in BuildId mode."
 - **Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on the target branch. See the anti-patterns in "Interpreting Results" above.
 
 ### How to Retry
diff --git a/.github/skills/ci-analysis/references/azure-cli.md b/.github/skills/ci-analysis/references/azure-cli.md
new file mode 100644
index 00000000000000..80c1c7e7880c2a
--- /dev/null
+++ b/.github/skills/ci-analysis/references/azure-cli.md
@@ -0,0 +1,93 @@
+# Deep Investigation with Azure CLI
+
+When the CI script and GitHub APIs aren't enough (e.g., investigating internal pipeline definitions or downloading build artifacts), use the Azure CLI with the `azure-devops` extension.
+
+> 💡 **Prefer `az pipelines` / `az devops` commands over raw REST API calls.** The CLI handles authentication, pagination, and JSON output formatting. Only fall back to manual `Invoke-RestMethod` calls when the CLI doesn't expose the endpoint you need (e.g., build timelines). The CLI's `--query` (JMESPath) and `-o table` flags are powerful for filtering without extra scripting.
+
+## Checking Authentication
+
+Before making AzDO API calls, verify the CLI is installed and authenticated:
+
+```powershell
+# Ensure az is on PATH (Windows may need a refresh after install)
+$env:Path = [System.Environment]::GetEnvironmentVariable("Path", "Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path", "User")
+
+# Check if az CLI is available
+az --version 2>$null | Select-Object -First 1
+
+# Check if logged in and get current account
+az account show --query "{name:name, user:user.name}" -o table 2>$null
+
+# If not logged in, prompt the user to authenticate:
+#   az login                              # Interactive browser login
+#   az login --use-device-code            # Device code flow (for remote/headless)
+
+# Get an AAD access token for AzDO REST API calls (only needed for raw REST)
+$accessToken = (az account get-access-token --resource 499b84ac-1321-427f-aa17-267ca6975798 --query accessToken -o tsv)
+$headers = @{ "Authorization" = "Bearer $accessToken" }
+```
+
+> ⚠️ If `az` is not installed, use `winget install -e --id Microsoft.AzureCLI` (Windows). The `azure-devops` extension is also required — install or verify it with `az extension add --name azure-devops` (safe to run if already installed). Ask the user to authenticate if needed.
+
+> ⚠️ **Do NOT use `az devops configure --defaults`** — it sets user-wide defaults that may not match the organization/project needed for dotnet repositories. Always pass `--org` and `--project` (or `-p`) explicitly on each command.
+
+## Querying Pipeline Definitions and Builds
+
+```powershell
+$org = "https://dev.azure.com/dnceng"
+$project = "internal"
+
+# Find a pipeline definition by name
+az pipelines list --name "dotnet-unified-build" --org $org -p $project --query "[].{id:id, name:name, path:path}" -o table
+
+# Get pipeline definition details (shows YAML path, triggers, etc.)
+az pipelines show --id 1330 --org $org -p $project --query "{id:id, name:name, yamlPath:process.yamlFilename, repo:repository.name}" -o table
+
+# List recent builds for a pipeline (with filtering)
+az pipelines runs list --pipeline-ids 1330 --branch "refs/heads/main" --top 5 --org $org -p $project --query "[].{id:id, result:result, finish:finishTime}" -o table
+
+# Get a specific build's details
+az pipelines runs show --id $buildId --org $org -p $project --query "{id:id, result:result, sourceBranch:sourceBranch}" -o table
+
+# List build artifacts
+az pipelines runs artifact list --run-id $buildId --org $org -p $project --query "[].{name:name, type:resource.type}" -o table
+
+# Download a build artifact
+az pipelines runs artifact download --run-id $buildId --artifact-name "TestBuild_linux_x64" --path "$env:TEMP\artifact" --org $org -p $project
+```
+
+## REST API Fallback
+
+Fall back to REST API only when the CLI doesn't expose what you need:
+
+```powershell
+# Get build timeline (stages, jobs, tasks with results and durations) — no CLI equivalent
+$accessToken = (az account get-access-token --resource 499b84ac-1321-427f-aa17-267ca6975798 --query accessToken -o tsv)
+$headers = @{ "Authorization" = "Bearer $accessToken" }
+$timelineUrl = "https://dev.azure.com/dnceng/internal/_apis/build/builds/$buildId/timeline?api-version=7.1"
+$timeline = (Invoke-RestMethod -Uri $timelineUrl -Headers $headers)
+$timeline.records | Where-Object { $_.result -eq "failed" -and $_.type -eq "Job" }
+```
+
+## Examining Pipeline YAML
+
+All dotnet repos that use arcade put their pipeline definitions under `eng/pipelines/`. Use `az pipelines show` to find the YAML file path, then fetch it:
+
+```powershell
+# Find the YAML path for a pipeline
+az pipelines show --id 1330 --org $org -p $project --query "{yamlPath:process.yamlFilename, repo:repository.name}" -o table
+
+# Fetch the YAML from the repo (example: dotnet/runtime's runtime-official pipeline)
+#   github-mcp-server-get_file_contents owner:dotnet repo:runtime path:eng/pipelines/runtime-official.yml
+
+# For VMR unified builds, the YAML is in dotnet/dotnet:
+#   github-mcp-server-get_file_contents owner:dotnet repo:dotnet path:eng/pipelines/unified-build.yml
+
+# Templates are usually in eng/pipelines/common/ or eng/pipelines/templates/
+```
+
+This is especially useful when:
+- A job name doesn't clearly indicate what it builds
+- You need to understand stage dependencies (why a job was canceled)
+- You want to find which template defines a specific step
+- Investigating whether a pipeline change caused new failures
diff --git a/.github/skills/ci-analysis/references/binlog-comparison.md b/.github/skills/ci-analysis/references/binlog-comparison.md
new file mode 100644
index 00000000000000..f179b2ff7564ad
--- /dev/null
+++ b/.github/skills/ci-analysis/references/binlog-comparison.md
@@ -0,0 +1,144 @@
+# Deep Investigation: Binlog Comparison
+
+When a test **passes on main but fails on a PR**, comparing MSBuild binlogs from both runs reveals the exact difference in task parameters without guessing.
+
+## When to Use This Pattern
+
+- Test assertion compares "expected vs actual" build outputs (e.g., CSC args, reference lists)
+- A build succeeds on one branch but fails on another with different MSBuild behavior
+- You need to find which MSBuild property/item change caused a specific task to behave differently
+
+## The Pattern: Delegate to Subagents
+
+> ⚠️ **Do NOT download, load, and parse binlogs in the main conversation context.** This burns 10+ turns on mechanical work. Delegate to subagents instead.
+
+### Step 1: Identify the two work items to compare
+
+Use `Get-CIStatus.ps1` to find the failing Helix job + work item, then find a corresponding passing build (recent PR merged to main, or a main CI run).
+
+**Finding Helix job IDs from build artifacts (binlogs to find binlogs):**
+When the failing work item's Helix job ID isn't visible (e.g., canceled jobs, or finding a matching job from a passing build), the IDs are inside the build's `SendToHelix.binlog`:
+
+1. Download the build artifact with `az`:
+   ```
+   az pipelines runs artifact list --run-id $buildId --org "https://dev.azure.com/dnceng-public" -p public --query "[].name" -o tsv
+   az pipelines runs artifact download --run-id $buildId --artifact-name "TestBuild_linux_x64" --path "$env:TEMP\artifact" --org "https://dev.azure.com/dnceng-public" -p public
+   ```
+2. Load the binlog and search for job IDs:
+   ```
+   binlog-load_binlog  path:"$env:TEMP\artifact\...\SendToHelix.binlog"
+   binlog-search_binlog  binlog_file:"..."  query:"Sent Helix Job"
+   ```
+3. Query each Helix job GUID with the CI script:
+   ```
+   ./scripts/Get-CIStatus.ps1 -HelixJob "{GUID}" -FindBinlogs
+   ```
+
+**For Helix work item binlogs (the common case):**
+The CI script shows binlog URLs directly when you query a specific work item:
+```
+./scripts/Get-CIStatus.ps1 -HelixJob "{JOB_ID}" -WorkItem "{WORK_ITEM}"
+# Output includes: 🔬 msbuild.binlog: https://helix...blob.core.windows.net/...
+```
+
+### Step 2: Dispatch parallel subagents for extraction
+
+Launch two `task` subagents (can run in parallel), each with a prompt like:
+
+```
+Download the msbuild.binlog from Helix job {JOB_ID} work item {WORK_ITEM}.
+Use the CI skill script to get the artifact URL:
+  C:\Users\lewing\.copilot\skills\ci-analysis\scripts\Get-CIStatus.ps1 -HelixJob "{JOB_ID}" -WorkItem "{WORK_ITEM}"
+Download the binlog URL to $env:TEMP\{label}.binlog.
+Load it with the binlog MCP server (binlog-load_binlog).
+Search for the {TASK_NAME} task (binlog-search_tasks_by_name).
+Get full task details (binlog-list_tasks_in_target) for the target containing the task.
+Extract the CommandLineArguments parameter value.
+Normalize paths:
+  - Replace Helix work dirs (/datadisks/disk1/work/XXXXXXXX) with {W}
+  - Replace runfile hashes (Program-[a-f0-9]+) with Program-{H}
+  - Replace temp dir names (dotnetSdkTests.[a-zA-Z0-9]+) with dotnetSdkTests.{T}
+Parse into individual args using regex: (?:"[^"]+"|/[^\s]+|[^\s]+)
+Sort the list and return it.
+Report the total arg count prominently.
+```
+
+**Important:** When diffing, look for **extra or missing args** (different count), not value differences in existing args. A Debug/Release difference in `/define:` is expected noise — an extra `/analyzerconfig:` or `/reference:` arg is the real signal.
+
+### Step 3: Diff the results
+
+With two normalized arg lists, `Compare-Object` instantly reveals the difference.
+
+## Useful Binlog MCP Queries
+
+After loading a binlog with `binlog-load_binlog`, use these queries (pass the loaded path as `binlog_file`):
+
+```
+# Find all invocations of a specific task
+binlog-search_tasks_by_name  binlog_file:"$env:TEMP\my.binlog"  taskName:"Csc"
+
+# Search for a property value
+binlog-search_binlog  binlog_file:"..."  query:"analysislevel"
+
+# Find what happened inside a specific target
+binlog-search_binlog  binlog_file:"..."  query:"under($target AddGlobalAnalyzerConfigForPackage_MicrosoftCodeAnalysisNetAnalyzers)"
+
+# Get all properties matching a pattern
+binlog-search_binlog  binlog_file:"..."  query:"GlobalAnalyzerConfig"
+
+# List tasks in a target (returns full parameter details including CommandLineArguments)
+binlog-list_tasks_in_target  binlog_file:"..."  projectId:22  targetId:167
+```
+
+## Path Normalization
+
+Helix work items run on different machines with different paths. Normalize before comparing:
+
+| Pattern | Replacement | Example |
+|---------|-------------|---------|
+| `/datadisks/disk1/work/[A-F0-9]{8}` | `{W}` | Helix work directory (Linux) |
+| `C:\h\w\[A-F0-9]{8}` | `{W}` | Helix work directory (Windows) |
+| `Program-[a-f0-9]{64}` | `Program-{H}` | Runfile content hash |
+| `dotnetSdkTests\.[a-zA-Z0-9]+` | `dotnetSdkTests.{T}` | Temp test directory |
+
+### After normalizing paths, focus on structural differences
+
+> ⚠️ **Ignore value-only differences in existing args** (e.g., Debug vs Release in `/define:`, different hash paths). These are expected configuration differences. Focus on **extra or missing args** — a different arg count indicates a real build behavior change.
+
+## Example: CscArguments Investigation
+
+A merge PR (release/10.0.3xx → main) had 208 CSC args vs 207 on main. The diff:
+
+```
+FAIL-ONLY: /analyzerconfig:{W}/p/d/sdk/11.0.100-ci/Sdks/Microsoft.NET.Sdk/analyzers/build/config/analysislevel_11_default.globalconfig
+```
+
+### What the binlog properties showed
+
+Both builds had identical property resolution:
+- `EffectiveAnalysisLevel = 11.0`
+- `_GlobalAnalyzerConfigFileName = analysislevel_11_default.globalconfig`
+- `_GlobalAnalyzerConfigFile = .../config/analysislevel_11_default.globalconfig`
+
+### The actual root cause
+
+The `AddGlobalAnalyzerConfigForPackage` target has an `Exists()` condition:
+```xml
+<ItemGroup Condition="Exists('$(_GlobalAnalyzerConfigFile_...)')">
+  <EditorConfigFiles Include="$(_GlobalAnalyzerConfigFile_...)" />
+</ItemGroup>
+```
+
+The merge's SDK layout **shipped** `analysislevel_11_default.globalconfig` on disk (from a newer roslyn-analyzers that flowed from 10.0.3xx), while main's SDK didn't have that file yet. Same property values, different files on disk = different build behavior.
+
+### Lesson learned
+
+Same MSBuild property resolution + different files on disk = different build behavior. Always check what's actually in the SDK layout, not just what the targets compute.
+
+## Anti-Patterns
+
+> ❌ **Don't manually split/parse CSC command lines in the main conversation.** CSC args have quoted paths, spaces, and complex structure. Regex parsing in PowerShell is fragile and burns turns on trial-and-error. Use a subagent.
+
+> ❌ **Don't assume the MSBuild property diff explains the behavior diff.** Two branches can compute identical property values but produce different outputs because of different files on disk, different NuGet packages, or different task assemblies. Compare the actual task invocation.
+
+> ❌ **Don't load large binlogs and browse them interactively in main context.** Use targeted searches: `binlog-search_tasks_by_name` for a specific task, `binlog-search_binlog` with a focused query. Get in, get the data, get out.
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
new file mode 100644
index 00000000000000..41c71c2d2fc822
--- /dev/null
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -0,0 +1,101 @@
+# Subagent Delegation Patterns
+
+CI investigations often involve repetitive, mechanical work that burns main conversation context. Delegate these to subagents.
+
+## Pattern 1: Scanning Multiple Console Logs
+
+**When:** Multiple failing work items across several jobs — need to extract and deduplicate test failure names.
+
+**Problem:** Each work item's console log can be thousands of lines. Reading 5+ logs in main context burns most of your context budget on raw output.
+
+**Delegate:**
+```
+Fetch Helix console logs for these work items and extract all unique test failures:
+
+Job: {JOB_ID_1}
+  Work items: dotnet.Tests.dll.19, dotnet.Tests.dll.23
+
+Job: {JOB_ID_2}
+  Work items: dotnet.Tests.dll.19
+
+For each, use:
+  C:\Users\lewing\.copilot\skills\ci-analysis\scripts\Get-CIStatus.ps1 -HelixJob "{JOB}" -WorkItem "{ITEM}"
+
+From the console output, extract lines matching xUnit failure format:
+  [xUnit.net HH:MM:SS.ss] TestNamespace.TestClass.TestMethod [FAIL]
+
+IMPORTANT: Lines with [OUTPUT] or [PASS] are NOT failures.
+Only lines ending with [FAIL] indicate actual test failures.
+
+Deduplicate across all work items.
+Return: unique FAIL test names + which work items they appeared in.
+```
+
+**Result:** A clean list of unique failures instead of pages of raw logs.
+
+## Pattern 2: Finding a Baseline Build
+
+**When:** A test fails on a PR — need to confirm it passes on main to prove the failure is PR-caused.
+
+**Problem:** Requires searching recent merged PRs or main CI runs, finding the matching build, locating the right Helix job and work item. Multiple API calls.
+
+**Delegate:**
+```
+Find a recent passing build on the main branch of dotnet/{REPO} that ran the same test leg as this failing build.
+
+Failing build: {BUILD_ID} (PR #{PR_NUMBER})
+Failing job name: {JOB_NAME} (e.g., "TestBuild linux x64")
+Failing work item: {WORK_ITEM} (e.g., "dotnet.Tests.dll.19")
+
+Steps:
+1. Use GitHub MCP to find recently merged PRs to main:
+   github-mcp-server-search_pull_requests query:"is:merged base:main" owner:dotnet repo:{REPO}
+2. Pick the most recent merged PR
+3. Run the CI script to check its build status:
+   ./scripts/Get-CIStatus.ps1 -PRNumber {MERGED_PR} -Repository "dotnet/{REPO}"
+4. Find the build that passed with the same job name
+5. Find the Helix job ID for that job (may need to download build artifacts — see azure-cli.md and binlog-comparison.md for "binlogs to find binlogs")
+6. Confirm the matching work item passed
+
+Return: the passing build ID, Helix job ID, and work item name, or "no recent passing build found".
+```
+
+## Pattern 3: Narrowing Merge Diffs to Relevant Files
+
+**When:** A large merge PR (hundreds of commits, hundreds of changed files) has test failures — need to identify which changes are relevant.
+
+**Problem:** `git diff` on a 458-file merge is overwhelming. Most changes are unrelated to the specific failure.
+
+**Delegate:**
+```
+Given these test failures on merge PR #{PR_NUMBER} (branch: {SOURCE} → {TARGET}):
+  - {TEST_1}
+  - {TEST_2}
+
+Find the changed files most likely to cause these failures.
+
+Steps:
+1. Get the list of changed files: git diff --name-only {TARGET}...{SOURCE}
+2. Filter to files matching these patterns (adjust per failure type):
+   - For MSBuild/build failures: *.targets, *.props, Directory.Build.*, eng/Versions.props
+   - For test failures: test project files, test assets
+   - For specific SDK areas: src/Tasks/, src/Cli/, src/WasmSdk/
+3. For each relevant file, show the key diff hunks (not the full diff)
+4. Look for version bumps, property changes, or behavioral changes
+
+Return: the 5-10 most relevant changed files with a one-line summary of what changed in each.
+```
+
+## Pattern 4: Parallel Binlog Extraction
+
+**When:** Comparing two builds — see [binlog-comparison.md](binlog-comparison.md).
+
+**Key insight:** Launch two subagents simultaneously (one per build). Each downloads a binlog, loads it into the MCP server, extracts task parameters, normalizes paths, and returns a sorted arg list. The main agent just diffs the two lists.
+
+## General Guidelines
+
+- **Use `task` agent type** for all delegation (it has shell + MCP access)
+- **Run independent tasks in parallel** (e.g., two binlog extractions)
+- **Include the CI script path** in every prompt — subagents don't inherit skill context
+- **Ask for structured output** — "return a list of X" not "show me what you find"
+- **Don't delegate interpretation** — subagents extract data, main agent interprets meaning

From 9a3194c6d799dd95eb1c53be03bd3d8d1b5c0c5a Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 14:12:26 -0600
Subject: [PATCH 04/44] Fix empty array falsy check in Get-HelixWorkItemDetails

When ListFiles returns an empty array (0 files), the empty array is
falsy in PowerShell, causing fallback to the Details endpoint's broken
URIs. Use \ -ne check instead.
---
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index c4c053a89af93c..91172e756f47f2 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -1390,7 +1390,7 @@ function Get-HelixWorkItemDetails {
         # (https://github.com/dotnet/dnceng/issues/6072). ListFiles returns direct
         # blob storage URIs that always work.
         $listFiles = Get-HelixWorkItemFiles -JobId $JobId -WorkItemName $WorkItemName
-        if ($listFiles) {
+        if ($null -ne $listFiles) {
             $response.Files = @($listFiles | ForEach-Object {
                 [PSCustomObject]@{
                     FileName = $_.Name

From 5cf7c953c13bed6a495d1bdba9b60623d5dc06f7 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 14:31:16 -0600
Subject: [PATCH 05/44] Address review: fix target-branch refs in reference
 docs, remove hardcoded path, update Three Modes table

---
 .github/skills/ci-analysis/SKILL.md                         | 2 +-
 .github/skills/ci-analysis/references/binlog-comparison.md  | 6 +++---
 .../skills/ci-analysis/references/delegation-patterns.md    | 2 +-
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1         | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index b0c75e4a409755..0f2dc6fa057e3f 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -66,7 +66,7 @@ The script operates in three distinct modes depending on what information you ha
 
 | You have... | Use | What you get |
 |-------------|-----|-------------|
-| A GitHub PR number | `-PRNumber 12345` | Full analysis: all builds, failures, known issues, retry recommendation |
+| A GitHub PR number | `-PRNumber 12345` | Full analysis: all builds, failures, known issues, structured JSON summary |
 | An AzDO build ID | `-BuildId 1276327` | Single build analysis: timeline, failures, Helix results |
 | A Helix job ID (optionally a specific work item) | `-HelixJob "..." [-WorkItem "..."]` | Deep dive: list work items for the job, or with `-WorkItem`, focus on a single work item's console logs, artifacts, and test results |
 
diff --git a/.github/skills/ci-analysis/references/binlog-comparison.md b/.github/skills/ci-analysis/references/binlog-comparison.md
index f179b2ff7564ad..b9a06c9cbea9b5 100644
--- a/.github/skills/ci-analysis/references/binlog-comparison.md
+++ b/.github/skills/ci-analysis/references/binlog-comparison.md
@@ -1,6 +1,6 @@
 # Deep Investigation: Binlog Comparison
 
-When a test **passes on main but fails on a PR**, comparing MSBuild binlogs from both runs reveals the exact difference in task parameters without guessing.
+When a test **passes on the target branch but fails on a PR**, comparing MSBuild binlogs from both runs reveals the exact difference in task parameters without guessing.
 
 ## When to Use This Pattern
 
@@ -14,7 +14,7 @@ When a test **passes on main but fails on a PR**, comparing MSBuild binlogs from
 
 ### Step 1: Identify the two work items to compare
 
-Use `Get-CIStatus.ps1` to find the failing Helix job + work item, then find a corresponding passing build (recent PR merged to main, or a main CI run).
+Use `Get-CIStatus.ps1` to find the failing Helix job + work item, then find a corresponding passing build (recent PR merged to the target branch, or a CI run on that branch).
 
 **Finding Helix job IDs from build artifacts (binlogs to find binlogs):**
 When the failing work item's Helix job ID isn't visible (e.g., canceled jobs, or finding a matching job from a passing build), the IDs are inside the build's `SendToHelix.binlog`:
@@ -48,7 +48,7 @@ Launch two `task` subagents (can run in parallel), each with a prompt like:
 ```
 Download the msbuild.binlog from Helix job {JOB_ID} work item {WORK_ITEM}.
 Use the CI skill script to get the artifact URL:
-  C:\Users\lewing\.copilot\skills\ci-analysis\scripts\Get-CIStatus.ps1 -HelixJob "{JOB_ID}" -WorkItem "{WORK_ITEM}"
+  ./scripts/Get-CIStatus.ps1 -HelixJob "{JOB_ID}" -WorkItem "{WORK_ITEM}"
 Download the binlog URL to $env:TEMP\{label}.binlog.
 Load it with the binlog MCP server (binlog-load_binlog).
 Search for the {TASK_NAME} task (binlog-search_tasks_by_name).
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index 41c71c2d2fc822..d430aca63ddd91 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -19,7 +19,7 @@ Job: {JOB_ID_2}
   Work items: dotnet.Tests.dll.19
 
 For each, use:
-  C:\Users\lewing\.copilot\skills\ci-analysis\scripts\Get-CIStatus.ps1 -HelixJob "{JOB}" -WorkItem "{ITEM}"
+  ./scripts/Get-CIStatus.ps1 -HelixJob "{JOB}" -WorkItem "{ITEM}"
 
 From the console output, extract lines matching xUnit failure format:
   [xUnit.net HH:MM:SS.ss] TestNamespace.TestClass.TestMethod [FAIL]
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 91172e756f47f2..f726055b7a9775 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -1833,7 +1833,7 @@ try {
             skipped = $skippedJobs
         }
 
-        if((-not $failedJobs -or $failedJobs.Count -eq 0) -and $localTestFailures.Count -eq 0) {
+        if ((-not $failedJobs -or $failedJobs.Count -eq 0) -and $localTestFailures.Count -eq 0) {
             if ($buildStatus -and $buildStatus.Status -eq "inProgress") {
                 Write-Host "`nNo failures yet - build still in progress" -ForegroundColor Cyan
                 Write-Host "Run again later to check for failures, or use -NoCache to get fresh data" -ForegroundColor Gray

From bce86bebf437322a7fdd933af7fc7a78c73a6b73 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 14:40:11 -0600
Subject: [PATCH 06/44] ci-analysis: move deep-dive content to references,
 reduce SKILL.md to ~3.3K tokens

Move 'Deep Investigation with Azure CLI' section (97 lines) and
detailed 'Recovering Results from Canceled Jobs' steps to references/.
Content already exists in references/azure-cli.md. Remove duplicate
'Canceled != Failed' callout. SKILL.md is now ~3.3K tokens, within
the 2K-4K target for script-driven skills.
---
 .github/skills/ci-analysis/SKILL.md | 112 +---------------------------
 1 file changed, 2 insertions(+), 110 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 0f2dc6fa057e3f..9198f5ec237501 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -96,8 +96,6 @@ The script operates in three distinct modes depending on what information you ha
 2. With `-HelixJob` and `-WorkItem`: queries the specific work item for status and artifacts
 3. Fetches console logs and file listings, displays detailed failure information
 
-> ⚠️ **Canceled ≠ Failed.** Canceled jobs often have completed Helix work items — the AzDO wrapper timed out but tests may have passed. See "Recovering Results from Canceled Jobs" below.
-
 ## Interpreting Results
 
 **Known Issues section**: Failures matching existing GitHub issues - these are tracked and being investigated.
@@ -240,116 +238,10 @@ Specific next steps: retry, fix specific files, investigate further. Include `/a
 
 ## Recovering Results from Canceled Jobs
 
-Canceled jobs (typically from timeouts) often still have useful artifacts. The Helix work items may have completed successfully even though the AzDO job was killed while waiting to collect results.
-
-**To investigate canceled jobs:**
-
-1. **Download build artifacts**: Use `az pipelines runs artifact download` (see [references/azure-cli.md](references/azure-cli.md)) to get pipeline artifacts for the canceled job. These contain binlogs even for canceled jobs.
-2. **Extract Helix job IDs**: Use the MSBuild MCP server to load the `SendToHelix.binlog` and search for `"Sent Helix Job"` messages. Each contains a Helix job ID. See [references/binlog-comparison.md](references/binlog-comparison.md) for the full "binlogs to find binlogs" workflow.
-3. **Query Helix directly**: For each job ID, use the CI script: `./scripts/Get-CIStatus.ps1 -HelixJob "{GUID}" -FindBinlogs`
-
-**Example**: A `browser-wasm windows WasmBuildTests` job was canceled after 3 hours. The binlog (truncated) still contained 12 Helix job IDs. Querying them revealed all 226 work items passed — the "failure" was purely a timeout in the AzDO wrapper.
+Canceled jobs (typically from timeouts) often still have useful Helix results. See [references/azure-cli.md](references/azure-cli.md) for artifact download steps and [references/binlog-comparison.md](references/binlog-comparison.md) for the "binlogs to find binlogs" workflow.
 
 **Key insight**: "Canceled" ≠ "Failed". Always check artifacts before concluding results are lost.
 
-## Deep Investigation with Azure CLI
-
-When the script and GitHub APIs aren't enough (e.g., investigating internal pipeline definitions or downloading build artifacts), you can use the Azure CLI with the `azure-devops` extension.
-
-> 💡 **Prefer `az pipelines` / `az devops` commands over raw REST API calls.** The CLI handles authentication, pagination, and JSON output formatting. Only fall back to manual `Invoke-RestMethod` calls when the CLI doesn't expose the endpoint you need (e.g., artifact download URLs, specialized timeline queries). The CLI's `--query` (JMESPath) and `-o table` flags are powerful for filtering without extra scripting.
-
-### Checking Azure CLI Authentication
-
-Before making direct AzDO API calls, verify the CLI is installed and authenticated:
-
-```powershell
-# Ensure az is on PATH (Windows may need a refresh after install)
-$env:Path = [System.Environment]::GetEnvironmentVariable("Path", "Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path", "User")
-
-# Check if az CLI is available
-az --version 2>$null | Select-Object -First 1
-
-# Check if logged in and get current account
-az account show --query "{name:name, user:user.name}" -o table 2>$null
-
-# If not logged in, prompt the user to authenticate:
-#   az login                              # Interactive browser login
-#   az login --use-device-code            # Device code flow (for remote/headless)
-
-# Get an AAD access token for AzDO REST API calls
-$accessToken = (az account get-access-token --resource 499b84ac-1321-427f-aa17-267ca6975798 --query accessToken -o tsv)
-$headers = @{ "Authorization" = "Bearer $accessToken" }
-```
-
-> ⚠️ If `az` is not installed, use `winget install -e --id Microsoft.AzureCLI` (Windows). The `azure-devops` extension is also required — install or verify it with `az extension add --name azure-devops` (safe to run if already installed). Ask the user to authenticate if needed.
-
-> ⚠️ **Do NOT use `az devops configure --defaults`** — it writes to a global config file and will cause conflicts if multiple agents are running concurrently. Always pass `--org` and `--project` (or `-p`) explicitly on each command.
-
-### Querying Pipeline Definitions and Builds
-
-When investigating build failures, it's often useful to look at the pipeline definition itself to understand what stages, jobs, and templates are involved.
-
-**Use `az` CLI commands first** — they're simpler and handle auth automatically. Set `$buildId` from a runs list or from the AzDO URL:
-
-```powershell
-$org = "https://dev.azure.com/dnceng"
-$project = "internal"
-
-# Find a pipeline definition by name
-az pipelines list --name "dotnet-unified-build" --org $org -p $project --query "[].{id:id, name:name, path:path}" -o table
-
-# Get pipeline definition details (shows YAML path, triggers, etc.)
-az pipelines show --id 1330 --org $org -p $project --query "{id:id, name:name, yamlPath:process.yamlFilename, repo:repository.name}" -o table
-
-# List recent builds for a pipeline (with filtering)
-az pipelines runs list --pipeline-ids 1330 --branch "refs/heads/main" --top 5 --org $org -p $project --query "[].{id:id, result:result, finish:finishTime}" -o table
-
-# Get a specific build's details
-az pipelines runs show --id $buildId --org $org -p $project --query "{id:id, result:result, sourceBranch:sourceBranch}" -o table
-
-# List build artifacts
-az pipelines runs artifact list --run-id $buildId --org $org -p $project --query "[].{name:name, type:resource.type}" -o table
-```
-
-**Fall back to REST API** only when the CLI doesn't expose what you need (e.g., build timelines, artifact downloads):
-
-```powershell
-# Get build timeline (stages, jobs, tasks with results and durations) — no CLI equivalent
-$accessToken = (az account get-access-token --resource 499b84ac-1321-427f-aa17-267ca6975798 --query accessToken -o tsv)
-$headers = @{ "Authorization" = "Bearer $accessToken" }
-$timelineUrl = "https://dev.azure.com/dnceng/internal/_apis/build/builds/$buildId/timeline?api-version=7.1"
-$timeline = (Invoke-RestMethod -Uri $timelineUrl -Headers $headers)
-$timeline.records | Where-Object { $_.result -eq "failed" -and $_.type -eq "Job" }
-
-# Download a specific artifact (e.g., build logs with binlogs) — no CLI equivalent for zip download
-$artifactName = "Windows_Workloads_x64_BuildPass2_BuildLogs_Attempt1"
-$downloadUrl = "https://dev.azure.com/dnceng/internal/_apis/build/builds/$buildId/artifacts?artifactName=$artifactName&api-version=7.1&`$format=zip"
-Invoke-WebRequest -Uri $downloadUrl -Headers $headers -OutFile "$env:TEMP\artifact.zip"
-```
-
-### Examining Pipeline YAML
-
-All dotnet repos that use arcade put their pipeline definitions under `eng/pipelines/`. Use `az pipelines show` to find the YAML file path, then fetch it:
-
-```powershell
-# Find the YAML path for a pipeline
-az pipelines show --id 1330 --org $org -p $project --query "{yamlPath:process.yamlFilename, repo:repository.name}" -o table
-
-# Fetch the YAML from the repo (example: dotnet/runtime's runtime-official pipeline)
-#   github-mcp-server-get_file_contents owner:dotnet repo:runtime path:eng/pipelines/runtime-official.yml
-
-# For VMR unified builds, the YAML is in dotnet/dotnet:
-#   github-mcp-server-get_file_contents owner:dotnet repo:dotnet path:eng/pipelines/unified-build.yml
-
-# Templates are usually in eng/pipelines/common/ or eng/pipelines/templates/
-```
-
-This is especially useful when:
-- A job name doesn't clearly indicate what it builds
-- You need to understand stage dependencies (why a job was canceled)
-- You want to find which template defines a specific step
-- Investigating whether a pipeline change caused new failures
-
 ## Tips
 
 1. Check if same test fails on the target branch before assuming transient
@@ -358,4 +250,4 @@ This is especially useful when:
 4. Binlogs in artifacts help diagnose MSB4018 task failures
 5. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
 6. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
-7. When investigating internal AzDO pipelines, check `az account show` first to verify authentication before making REST API calls
+7. When investigating internal AzDO pipelines, use the Azure CLI — see [references/azure-cli.md](references/azure-cli.md). Check `az account show` first to verify authentication

From e93161905595da0bf222c6f768a33f6c6eba8988 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 14:58:15 -0600
Subject: [PATCH 07/44] ci-analysis: add prior-build mismatch detection
 guidance

When a user asks about a job/error/cancellation that doesn't appear in
the current build results, the agent should ask if they're referring to
a prior build rather than silently missing context. Added Step 2 item 5
with concrete triggers: empty canceledJobNames when user mentions
cancellations, green build when user says CI is failing, missing job
names. Offers to re-run with -BuildId for the earlier build.
---
 .github/skills/ci-analysis/SKILL.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 9198f5ec237501..fe4a49c31f1670 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -190,6 +190,11 @@ Run with `-ShowLogs` for detailed failure info.
    - Failure is **not** in Build Analysis → Investigate further before assuming transient
    - Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against the target branch first
    - Test timeout but tests passed → Executor issue, not test failure
+5. **Check for mismatch with user's question** — The script only reports builds for the current head SHA. If the user asks about a job, error, or cancellation that doesn't appear in the results, **ask** if they're referring to a prior build. Common triggers:
+   - User mentions a canceled job but `canceledJobNames` is empty
+   - User says "CI is failing" but the latest build is green
+   - User references a specific job name not in the current results
+   Offer to re-run with `-BuildId` if the user can provide the earlier build ID from AzDO.
 
 ### Step 3: Verify before claiming
 

From 3a44083324e3eaed49682d21e459de1e732e351e Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 15:00:16 -0600
Subject: [PATCH 08/44] Fix POSSIBLY_TRANSIENT hint: require correlation data
 before claiming transient

The recommendationHint fell through to POSSIBLY_TRANSIENT when
prChangedFiles existed but allFailuresForCorrelation was empty (no
failure details collected). Now requires both conditions so we only
claim 'possibly transient' when correlation was actually attempted.
---
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index f726055b7a9775..c57c3037f7dd13 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -2187,7 +2187,7 @@ if ($knownIssuesFromBuildAnalysis.Count -gt 0) {
     $summary.recommendationHint = "BUILD_SUCCESSFUL"
 } elseif ($summary.prCorrelation.hasCorrelation) {
     $summary.recommendationHint = "LIKELY_PR_RELATED"
-} elseif ($prChangedFiles.Count -gt 0) {
+} elseif ($prChangedFiles.Count -gt 0 -and $allFailuresForCorrelation.Count -gt 0) {
     $summary.recommendationHint = "POSSIBLY_TRANSIENT"
 } else {
     $summary.recommendationHint = "REVIEW_REQUIRED"

From 28f02ccdba619e657ee4980ca4c359c53c2ed376 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 15:21:35 -0600
Subject: [PATCH 09/44] Address review: guard against timeline fetch failure,
 fix target-branch refs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- recommendationHint now checks for missing lastBuildJobSummary when
  builds were requested — emits REVIEW_REQUIRED instead of false
  BUILD_SUCCESSFUL when timeline API fails
- Fix hardcoded base:main in delegation-patterns.md search example
- Fix hardcoded refs/heads/main in azure-cli.md pipeline query
---
 .github/skills/ci-analysis/references/azure-cli.md          | 4 ++--
 .../skills/ci-analysis/references/delegation-patterns.md    | 6 +++---
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1         | 4 +++-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/.github/skills/ci-analysis/references/azure-cli.md b/.github/skills/ci-analysis/references/azure-cli.md
index 80c1c7e7880c2a..ba0c5995e2f42f 100644
--- a/.github/skills/ci-analysis/references/azure-cli.md
+++ b/.github/skills/ci-analysis/references/azure-cli.md
@@ -43,8 +43,8 @@ az pipelines list --name "dotnet-unified-build" --org $org -p $project --query "
 # Get pipeline definition details (shows YAML path, triggers, etc.)
 az pipelines show --id 1330 --org $org -p $project --query "{id:id, name:name, yamlPath:process.yamlFilename, repo:repository.name}" -o table
 
-# List recent builds for a pipeline (with filtering)
-az pipelines runs list --pipeline-ids 1330 --branch "refs/heads/main" --top 5 --org $org -p $project --query "[].{id:id, result:result, finish:finishTime}" -o table
+# List recent builds for a pipeline (replace {TARGET_BRANCH} with the PR's base branch, e.g., main or release/9.0)
+az pipelines runs list --pipeline-ids 1330 --branch "refs/heads/{TARGET_BRANCH}" --top 5 --org $org -p $project --query "[].{id:id, result:result, finish:finishTime}" -o table
 
 # Get a specific build's details
 az pipelines runs show --id $buildId --org $org -p $project --query "{id:id, result:result, sourceBranch:sourceBranch}" -o table
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index d430aca63ddd91..aa03a33f5a9ad5 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -48,9 +48,9 @@ Failing job name: {JOB_NAME} (e.g., "TestBuild linux x64")
 Failing work item: {WORK_ITEM} (e.g., "dotnet.Tests.dll.19")
 
 Steps:
-1. Use GitHub MCP to find recently merged PRs to main:
-   github-mcp-server-search_pull_requests query:"is:merged base:main" owner:dotnet repo:{REPO}
-2. Pick the most recent merged PR
+1. Use GitHub MCP to find recently merged PRs targeting the same base branch as this PR:
+   github-mcp-server-search_pull_requests query:"is:merged base:{TARGET_BRANCH}" owner:dotnet repo:{REPO}
+2. Set {TARGET_BRANCH} to the PR's base branch (e.g., main, release/9.0) and pick the most recent merged PR
 3. Run the CI script to check its build status:
    ./scripts/Get-CIStatus.ps1 -PRNumber {MERGED_PR} -Repository "dotnet/{REPO}"
 4. Find the build that passed with the same job name
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index c57c3037f7dd13..ccf50afac50b98 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -2181,7 +2181,9 @@ if ($prChangedFiles.Count -gt 0 -and $allFailuresForCorrelation.Count -gt 0) {
 }
 
 # Compute recommendation hint
-if ($knownIssuesFromBuildAnalysis.Count -gt 0) {
+if (-not $lastBuildJobSummary -and $buildIds.Count -gt 0) {
+    $summary.recommendationHint = "REVIEW_REQUIRED"
+} elseif ($knownIssuesFromBuildAnalysis.Count -gt 0) {
     $summary.recommendationHint = "KNOWN_ISSUES_DETECTED"
 } elseif ($totalFailedJobs -eq 0 -and $totalLocalFailures -eq 0) {
     $summary.recommendationHint = "BUILD_SUCCESSFUL"

From 13ef005c75f9e8876dc21440d58854b3c0eb5ed3 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 17:08:26 -0600
Subject: [PATCH 10/44] ci-analysis: emit JSON summary for no-builds and
 merge-conflict PRs

Instead of throwing when no AzDO builds are found, detect the PR's
mergeable_state and emit a proper [CI_ANALYSIS_SUMMARY] JSON block
with recommendationHint of MERGE_CONFLICTS or NO_BUILDS. This lets
the agent provide useful guidance (resolve conflicts, offer to check
prior builds) instead of crashing with an unstructured error.

Adds MERGE_CONFLICTS and NO_BUILDS to the hint decision table in
SKILL.md.
---
 .github/skills/ci-analysis/SKILL.md           |  2 +
 .../ci-analysis/scripts/Get-CIStatus.ps1      | 52 ++++++++++++++++++-
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index fe4a49c31f1670..e5d0a7fe2806ef 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -133,6 +133,8 @@ Read `recommendationHint` as a starting point, then layer in context:
 | `LIKELY_PR_RELATED` | Failures correlate with PR changes. Lead with "fix these before retrying" and list `correlatedFiles`. |
 | `POSSIBLY_TRANSIENT` | No correlation with PR changes, no known issues. Suggest checking the target branch, searching for issues, or retrying. |
 | `REVIEW_REQUIRED` | Could not auto-determine cause. Review failures manually. |
+| `MERGE_CONFLICTS` | PR has merge conflicts — CI won't run. Tell the user to resolve conflicts. Offer to analyze a previous build by ID. |
+| `NO_BUILDS` | No AzDO builds found (CI not triggered). Offer to check if CI needs to be triggered or analyze a previous build. |
 
 Then layer in nuance the heuristic can't capture:
 
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index ccf50afac50b98..12e0fd83c5a1c7 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -360,6 +360,12 @@ function Get-AzDOBuildIdFromPR {
         throw "Failed to fetch CI status for PR #$PR in $Repository - check PR number and permissions"
     }
 
+    # Check if PR has merge conflicts (no CI runs when mergeable_state is dirty)
+    $prMergeState = $null
+    try {
+        $prMergeState = gh api "repos/$Repository/pulls/$PR" --jq '.mergeable_state' 2>$null
+    } catch {}
+
     # Find ALL failing Azure DevOps builds
     $failingBuilds = @{}
     foreach ($line in $checksOutput) {
@@ -386,7 +392,15 @@ function Get-AzDOBuildIdFromPR {
                 }
             }
         }
-        throw "No CI build found for PR #$PR in $Repository - the CI pipeline has not been triggered yet"
+        if ($prMergeState -eq 'dirty') {
+            Write-Host "`nPR #$PR has merge conflicts (mergeable_state: dirty)" -ForegroundColor Red
+            Write-Host "CI will not run until conflicts are resolved." -ForegroundColor Yellow
+            Write-Host "Resolve conflicts and push to trigger CI, or use -BuildId to analyze a previous build." -ForegroundColor Gray
+            return @{ BuildIds = @(); Reason = "MERGE_CONFLICTS"; MergeState = $prMergeState }
+        }
+        Write-Host "`nNo CI build found for PR #$PR in $Repository" -ForegroundColor Red
+        Write-Host "The CI pipeline has not been triggered yet." -ForegroundColor Yellow
+        return @{ BuildIds = @(); Reason = "NO_BUILDS"; MergeState = $prMergeState }
     }
 
     # Return all unique failing build IDs
@@ -1737,8 +1751,42 @@ try {
     $buildIds = @()
     $knownIssuesFromBuildAnalysis = @()
     $prChangedFiles = @()
+    $noBuildReason = $null
     if ($PSCmdlet.ParameterSetName -eq 'PRNumber') {
-        $buildIds = @(Get-AzDOBuildIdFromPR -PR $PRNumber)
+        $buildResult = Get-AzDOBuildIdFromPR -PR $PRNumber
+        if ($buildResult -is [hashtable] -and $buildResult.Reason) {
+            # No builds found — emit summary with reason and exit
+            $noBuildReason = $buildResult.Reason
+            $buildIds = @()
+            $summary = [ordered]@{
+                mode = "PRNumber"
+                repository = $Repository
+                prNumber = $PRNumber
+                builds = @()
+                totalFailedJobs = 0
+                totalLocalFailures = 0
+                lastBuildJobSummary = [ordered]@{
+                    total = 0; succeeded = 0; failed = 0; canceled = 0; pending = 0; warnings = 0; skipped = 0
+                }
+                failedJobNames = @()
+                canceledJobNames = @()
+                knownIssues = @()
+                prCorrelation = [ordered]@{
+                    changedFileCount = 0
+                    hasCorrelation = $false
+                    correlatedFiles = @()
+                }
+                recommendationHint = if ($noBuildReason -eq "MERGE_CONFLICTS") { "MERGE_CONFLICTS" } else { "NO_BUILDS" }
+                noBuildReason = $noBuildReason
+                mergeState = $buildResult.MergeState
+            }
+            Write-Host ""
+            Write-Host "[CI_ANALYSIS_SUMMARY]"
+            Write-Host ($summary | ConvertTo-Json -Depth 5)
+            Write-Host "[/CI_ANALYSIS_SUMMARY]"
+            exit 0
+        }
+        $buildIds = @($buildResult)
 
         # Check Build Analysis for known issues
         $knownIssuesFromBuildAnalysis = @(Get-BuildAnalysisKnownIssues -PR $PRNumber)

From 7c66ba073abcefdd4ff7a52172e68bfb1779c640 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 17:16:31 -0600
Subject: [PATCH 11/44] Fix remaining main branch reference in
 delegation-patterns.md

---
 .github/skills/ci-analysis/references/delegation-patterns.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index aa03a33f5a9ad5..13417524af790f 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -41,7 +41,7 @@ Return: unique FAIL test names + which work items they appeared in.
 
 **Delegate:**
 ```
-Find a recent passing build on the main branch of dotnet/{REPO} that ran the same test leg as this failing build.
+Find a recent passing build on the PR's target/base branch of dotnet/{REPO} that ran the same test leg as this failing build.
 
 Failing build: {BUILD_ID} (PR #{PR_NUMBER})
 Failing job name: {JOB_NAME} (e.g., "TestBuild linux x64")

From 60f9b6e13e18e2b98250842d5d102e951b2eebbf Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 18:25:05 -0600
Subject: [PATCH 12/44] Add build progression analysis reference and fix step
 numbering

- Add references/build-progression-analysis.md: fact-gathering technique for
  correlating PR builds to commits when investigating current failures
- Add Step 4 (Check build progression) to SKILL.md Analysis Workflow
- Add reference link in References section
- Fix duplicate step 5 numbering (now 5 and 6)
---
 .github/skills/ci-analysis/SKILL.md           |  6 +-
 .../references/build-progression-analysis.md  | 92 +++++++++++++++++++
 2 files changed, 96 insertions(+), 2 deletions(-)
 create mode 100644 .github/skills/ci-analysis/references/build-progression-analysis.md

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index e5d0a7fe2806ef..95fe0c13eda6a4 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -186,13 +186,14 @@ Run with `-ShowLogs` for detailed failure info.
 1. **Check Build Analysis** — Known issues are safe to retry
 2. **Correlate with PR changes** — Same files failing = likely PR-related
 3. **Compare with baseline** — If a test passes on the target branch but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
-4. **Interpret patterns** (but don't jump to conclusions):
+4. **Check build progression** — If the PR has multiple builds (multiple pushes), check whether earlier builds passed. A failure that appeared after a specific push narrows the investigation to those commits. See [references/build-progression-analysis.md](references/build-progression-analysis.md). Present findings as facts, not fix recommendations.
+5. **Interpret patterns** (but don't jump to conclusions):
    - Same error across many jobs → Real code issue
    - Build Analysis flags a known issue → Safe to retry
    - Failure is **not** in Build Analysis → Investigate further before assuming transient
    - Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against the target branch first
    - Test timeout but tests passed → Executor issue, not test failure
-5. **Check for mismatch with user's question** — The script only reports builds for the current head SHA. If the user asks about a job, error, or cancellation that doesn't appear in the results, **ask** if they're referring to a prior build. Common triggers:
+6. **Check for mismatch with user's question** — The script only reports builds for the current head SHA. If the user asks about a job, error, or cancellation that doesn't appear in the results, **ask** if they're referring to a prior build. Common triggers:
    - User mentions a canceled job but `canceledJobNames` is empty
    - User says "CI is failing" but the latest build is green
    - User references a specific job name not in the current results
@@ -238,6 +239,7 @@ Specific next steps: retry, fix specific files, investigate further. Include `/a
 
 - **Helix artifacts & binlogs**: See [references/helix-artifacts.md](references/helix-artifacts.md)
 - **Binlog comparison (passing vs failing)**: See [references/binlog-comparison.md](references/binlog-comparison.md)
+- **Build progression (commit-to-build correlation)**: See [references/build-progression-analysis.md](references/build-progression-analysis.md)
 - **Subagent delegation patterns**: See [references/delegation-patterns.md](references/delegation-patterns.md)
 - **Azure CLI deep investigation**: See [references/azure-cli.md](references/azure-cli.md)
 - **Manual investigation steps**: See [references/manual-investigation.md](references/manual-investigation.md)
diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
new file mode 100644
index 00000000000000..3a1c7b5f3bcfe0
--- /dev/null
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -0,0 +1,92 @@
+# Deep Investigation: Build Progression Analysis
+
+When the current build is failing, the PR's build history can reveal whether the failure existed from the start or appeared after specific changes. This is a fact-gathering technique — like target-branch comparison — that provides context for understanding the current failure.
+
+## When to Use This Pattern
+
+- Standard analysis (script + logs) hasn't identified the root cause of the current failure
+- The PR has multiple pushes and you want to know whether earlier builds passed or failed
+- You need to understand whether a failure is inherent to the PR's approach or was introduced by a later change
+
+## The Pattern
+
+### Step 1: List all builds for the PR
+
+`gh pr checks` only shows checks for the current HEAD SHA. To see the full build history, query AzDO:
+
+```powershell
+$org = "https://dev.azure.com/dnceng-public"
+$project = "public"
+az pipelines runs list --branch "refs/pull/{PR}/merge" --top 20 --org $org -p $project `
+    --query "[].{id:id, result:result, sourceVersion:sourceVersion, finishTime:finishTime}" -o table
+```
+
+### Step 2: Map builds to commits
+
+For each build, identify the source commit and use `git log` to see what was included:
+
+```powershell
+# Get source version for a build
+az pipelines runs show --id $buildId --org $org -p $project `
+    --query "{id:id, result:result, sourceVersion:sourceVersion}" -o json
+
+# See what commits are between two source versions
+# git log --oneline $passingCommit..$failingCommit
+```
+
+### Step 3: Build a progression table
+
+Present the facts as a table:
+
+| Build | Source commit | Result | What changed since previous build |
+|-------|-------------|--------|----------------------------------|
+| 1284433 | abc123 | ✅ 9/9 | Initial PR commits |
+| 1286087 | def456 | ❌ 7/9 | Added commit C |
+| 1286967 | ghi789 | ❌ 7/9 | Modified commit C |
+
+### Step 4: Present findings, not conclusions
+
+Report what the progression shows:
+- Which builds passed and which failed
+- What commits were added between the last passing and first failing build
+- Whether the failing commits were added in response to review feedback (check review threads)
+
+**Do not** make fix recommendations based solely on build progression. The progression narrows the investigation — it doesn't determine the right fix. The human may have context about why changes were made, what constraints exist, or what the reviewer intended.
+
+## Checking review context
+
+When the progression shows that a failure appeared after new commits, check whether those commits were review-requested:
+
+```powershell
+# Get review comments with timestamps
+gh api "repos/{OWNER}/{REPO}/pulls/{PR}/comments" \
+    --jq '.[] | {author: .user.login, body: .body, created: .created_at}'
+```
+
+Present this as additional context: "Commit C was pushed after reviewer X commented requesting Y." Let the author decide how to proceed.
+
+## Combining with Binlog Comparison
+
+Build progression identifies **which change** correlates with the current failure. Binlog comparison (see [binlog-comparison.md](binlog-comparison.md)) shows **what's different** in the build between a passing and failing state. Together they provide a complete picture:
+
+1. Progression → "The current failure first appeared in build N+1, which added commit C"
+2. Binlog comparison → "In the current (failing) build, task X receives parameter Y=Z, whereas in the passing build it received Y=W"
+
+## Relationship to Target-Branch Comparison
+
+Both techniques compare a failing build against a passing one:
+
+| Technique | Passing build from | Answers |
+|-----------|-------------------|---------|
+| **Target-branch comparison** | Recent build on the base branch (e.g., main) | "Does this test pass without the PR's changes at all?" |
+| **Build progression** | Earlier build on the same PR | "Did this test pass with the PR's *earlier* changes?" |
+
+Use target-branch comparison first to confirm the failure is PR-related. Use build progression to narrow down *which part* of the PR introduced it.
+
+## Anti-Patterns
+
+> ❌ **Don't treat build history as a substitute for analyzing the current build.** The current build determines CI status. Build history is context for understanding and investigating the current failure.
+
+> ❌ **Don't make fix recommendations from progression alone.** "Build N passed and build N+1 failed after adding commit C" is a fact worth reporting. "Therefore revert commit C" is a judgment that requires more context than the agent has — the commit may be addressing a critical review concern, fixing a different bug, or partially correct.
+
+> ❌ **Don't assume earlier passing builds prove the original approach was complete.** A build may pass because it didn't change enough to trigger the failing test scenario. The reviewer who requested additional changes may have identified a real gap.

From eea00eb9b1713da85a4f1dec50192c2dcccb7502 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 18:36:33 -0600
Subject: [PATCH 13/44] =?UTF-8?q?Address=20review:=20consistent=20return?=
 =?UTF-8?q?=20type,=20fix=20main=E2=86=92target=20branch=20refs,=20fix=20l?=
 =?UTF-8?q?ine=20continuation?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../ci-analysis/references/build-progression-analysis.md  | 2 +-
 .../skills/ci-analysis/references/delegation-patterns.md  | 4 ++--
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1       | 8 ++++----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
index 3a1c7b5f3bcfe0..cf2d904fd27777 100644
--- a/.github/skills/ci-analysis/references/build-progression-analysis.md
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -59,7 +59,7 @@ When the progression shows that a failure appeared after new commits, check whet
 
 ```powershell
 # Get review comments with timestamps
-gh api "repos/{OWNER}/{REPO}/pulls/{PR}/comments" \
+gh api "repos/{OWNER}/{REPO}/pulls/{PR}/comments" `
     --jq '.[] | {author: .user.login, body: .body, created: .created_at}'
 ```
 
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index 13417524af790f..619e3b9ff8995b 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -35,9 +35,9 @@ Return: unique FAIL test names + which work items they appeared in.
 
 ## Pattern 2: Finding a Baseline Build
 
-**When:** A test fails on a PR — need to confirm it passes on main to prove the failure is PR-caused.
+**When:** A test fails on a PR — need to confirm it passes on the PR's target/base branch to prove the failure is PR-caused.
 
-**Problem:** Requires searching recent merged PRs or main CI runs, finding the matching build, locating the right Helix job and work item. Multiple API calls.
+**Problem:** Requires searching recent merged PRs or target branch CI runs, finding the matching build, locating the right Helix job and work item. Multiple API calls.
 
 **Delegate:**
 ```
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 12e0fd83c5a1c7..6dc984da20fcd6 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -388,7 +388,7 @@ function Get-AzDOBuildIdFromPR {
                 $buildIdStr = $anyBuildMatch.Groups[1].Value
                 $buildIdInt = 0
                 if ([int]::TryParse($buildIdStr, [ref]$buildIdInt)) {
-                    return @($buildIdInt)
+                    return @{ BuildIds = @($buildIdInt); Reason = $null; MergeState = $prMergeState }
                 }
             }
         }
@@ -413,7 +413,7 @@ function Get-AzDOBuildIdFromPR {
         }
     }
 
-    return $buildIds
+    return @{ BuildIds = $buildIds; Reason = $null; MergeState = $prMergeState }
 }
 
 function Get-BuildAnalysisKnownIssues {
@@ -1754,7 +1754,7 @@ try {
     $noBuildReason = $null
     if ($PSCmdlet.ParameterSetName -eq 'PRNumber') {
         $buildResult = Get-AzDOBuildIdFromPR -PR $PRNumber
-        if ($buildResult -is [hashtable] -and $buildResult.Reason) {
+        if ($buildResult.Reason) {
             # No builds found — emit summary with reason and exit
             $noBuildReason = $buildResult.Reason
             $buildIds = @()
@@ -1786,7 +1786,7 @@ try {
             Write-Host "[/CI_ANALYSIS_SUMMARY]"
             exit 0
         }
-        $buildIds = @($buildResult)
+        $buildIds = @($buildResult.BuildIds)
 
         # Check Build Analysis for known issues
         $knownIssuesFromBuildAnalysis = @(Get-BuildAnalysisKnownIssues -PR $PRNumber)

From 13403315adb9fdb866920625affb14120f509528 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 18:39:42 -0600
Subject: [PATCH 14/44] Extract Get-PRCorrelation helper to eliminate divergent
 duplication

---
 .../ci-analysis/scripts/Get-CIStatus.ps1      | 69 ++++++++-----------
 1 file changed, 30 insertions(+), 39 deletions(-)

diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 6dc984da20fcd6..2519b843c0f06c 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -573,17 +573,15 @@ function Get-PRCorrelation {
     return $correlations | Select-Object -Unique -Property File, MatchType
 }
 
-function Show-PRCorrelationSummary {
+function Get-PRCorrelation {
     param(
         [array]$ChangedFiles,
         [array]$AllFailures
     )
 
-    if ($ChangedFiles.Count -eq 0) {
-        return
-    }
+    $result = @{ CorrelatedFiles = @(); TestFiles = @() }
+    if ($ChangedFiles.Count -eq 0 -or $AllFailures.Count -eq 0) { return $result }
 
-    # Combine all failure info into searchable text
     $failureText = ($AllFailures | ForEach-Object {
         $_.TaskName
         $_.JobName
@@ -592,23 +590,12 @@ function Show-PRCorrelationSummary {
         $_.FailedTests -join "`n"
     }) -join "`n"
 
-    # Also include the raw local test failure messages which may contain test class names
-    # These come from the "issues" property on local failures
-
-    # Find correlations
-    $correlatedFiles = @()
-    $testFiles = @()
-
     foreach ($file in $ChangedFiles) {
         $fileName = [System.IO.Path]::GetFileNameWithoutExtension($file)
         $fileNameWithExt = [System.IO.Path]::GetFileName($file)
+        $baseTestName = $fileName -replace '\.[^.]+$', ''
 
-        # For files like NtAuthTests.FakeServer.cs, also check NtAuthTests
-        $baseTestName = $fileName -replace '\.[^.]+$', ''  # Remove .FakeServer etc.
-
-        # Check if this file appears in any failure
         $isCorrelated = $false
-
         if ($failureText -match [regex]::Escape($fileName) -or
             $failureText -match [regex]::Escape($fileNameWithExt) -or
             $failureText -match [regex]::Escape($file) -or
@@ -616,18 +603,31 @@ function Show-PRCorrelationSummary {
             $isCorrelated = $true
         }
 
-        # Track test files separately
-        $isTestFile = $file -match '\.Tests?\.' -or $file -match '[/\\]tests?[/\\]' -or $file -match 'Test\.cs$' -or $file -match 'Tests\.cs$'
-
         if ($isCorrelated) {
-            if ($isTestFile) {
-                $testFiles += $file
-            } else {
-                $correlatedFiles += $file
-            }
+            $isTestFile = $file -match '\.Tests?\.' -or $file -match '[/\\]tests?[/\\]' -or $file -match 'Test\.cs$' -or $file -match 'Tests\.cs$'
+            if ($isTestFile) { $result.TestFiles += $file } else { $result.CorrelatedFiles += $file }
         }
     }
 
+    $result.CorrelatedFiles = @($result.CorrelatedFiles | Select-Object -Unique)
+    $result.TestFiles = @($result.TestFiles | Select-Object -Unique)
+    return $result
+}
+
+function Show-PRCorrelationSummary {
+    param(
+        [array]$ChangedFiles,
+        [array]$AllFailures
+    )
+
+    if ($ChangedFiles.Count -eq 0) {
+        return
+    }
+
+    $correlation = Get-PRCorrelation -ChangedFiles $ChangedFiles -AllFailures $AllFailures
+    $correlatedFiles = $correlation.CorrelatedFiles
+    $testFiles = $correlation.TestFiles
+
     # Show results
     if ($correlatedFiles.Count -gt 0 -or $testFiles.Count -gt 0) {
         Write-Host "`n=== PR Change Correlation ===" -ForegroundColor Magenta
@@ -2211,21 +2211,12 @@ $summary = [ordered]@{
     recommendationHint = ""
 }
 
-# Compute PR correlation
+# Compute PR correlation using shared helper
 if ($prChangedFiles.Count -gt 0 -and $allFailuresForCorrelation.Count -gt 0) {
-    $correlatedFiles = @()
-    foreach ($failure in $allFailuresForCorrelation) {
-        $failureText = ($failure.Errors + $failure.HelixLogs + $failure.FailedTests) -join " "
-        foreach ($file in $prChangedFiles) {
-            $fileName = [System.IO.Path]::GetFileNameWithoutExtension($file)
-            if ($failureText -match [regex]::Escape($fileName)) {
-                $correlatedFiles += $file
-            }
-        }
-    }
-    $correlatedFiles = @($correlatedFiles | Select-Object -Unique)
-    $summary.prCorrelation.hasCorrelation = $correlatedFiles.Count -gt 0
-    $summary.prCorrelation.correlatedFiles = $correlatedFiles
+    $correlation = Get-PRCorrelation -ChangedFiles $prChangedFiles -AllFailures $allFailuresForCorrelation
+    $allCorrelated = @($correlation.CorrelatedFiles) + @($correlation.TestFiles) | Select-Object -Unique
+    $summary.prCorrelation.hasCorrelation = $allCorrelated.Count -gt 0
+    $summary.prCorrelation.correlatedFiles = @($allCorrelated)
 }
 
 # Compute recommendation hint

From bdcfe22481698cde76d65eb6fc420bb0dd4e9a66 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 18:51:02 -0600
Subject: [PATCH 15/44] Remove duplicate Get-PRCorrelation function (old dead
 code)

---
 .../ci-analysis/scripts/Get-CIStatus.ps1      | 37 -------------------
 1 file changed, 37 deletions(-)

diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 2519b843c0f06c..0f7e32b34537a4 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -536,43 +536,6 @@ function Get-PRChangedFiles {
     }
 }
 
-function Get-PRCorrelation {
-    param(
-        [array]$ChangedFiles,
-        [string]$FailureInfo
-    )
-
-    # Extract potential file/test names from the failure info
-    $correlations = @()
-
-    foreach ($file in $ChangedFiles) {
-        $fileName = [System.IO.Path]::GetFileNameWithoutExtension($file)
-        $fileNameWithExt = [System.IO.Path]::GetFileName($file)
-
-        # Check if the failure mentions this file
-        if ($FailureInfo -match [regex]::Escape($fileName) -or
-            $FailureInfo -match [regex]::Escape($fileNameWithExt)) {
-            $correlations += @{
-                File = $file
-                MatchType = "direct"
-            }
-        }
-
-        # Check for test file patterns
-        if ($file -match '\.Tests?\.' -or $file -match '/tests?/' -or $file -match '\\tests?\\') {
-            # This is a test file - check if the test name appears in failures
-            if ($FailureInfo -match [regex]::Escape($fileName)) {
-                $correlations += @{
-                    File = $file
-                    MatchType = "test"
-                }
-            }
-        }
-    }
-
-    return $correlations | Select-Object -Unique -Property File, MatchType
-}
-
 function Get-PRCorrelation {
     param(
         [array]$ChangedFiles,

From fe251bf2c8cb65d3e0c6ea81f4c5d8ef25ad275b Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 19:08:37 -0600
Subject: [PATCH 16/44] Rewrite delegation patterns: JSON output, parallel
 artifact extraction, no subagent reasoning

---
 .../references/delegation-patterns.md         | 126 +++++++++---------
 1 file changed, 65 insertions(+), 61 deletions(-)

diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index 619e3b9ff8995b..184f6816f52861 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -1,101 +1,105 @@
 # Subagent Delegation Patterns
 
-CI investigations often involve repetitive, mechanical work that burns main conversation context. Delegate these to subagents.
+CI investigations involve repetitive, mechanical work that burns main conversation context. Delegate data gathering to subagents; keep interpretation in the main agent.
 
 ## Pattern 1: Scanning Multiple Console Logs
 
-**When:** Multiple failing work items across several jobs — need to extract and deduplicate test failure names.
-
-**Problem:** Each work item's console log can be thousands of lines. Reading 5+ logs in main context burns most of your context budget on raw output.
+**When:** Multiple failing work items across several jobs.
 
 **Delegate:**
 ```
-Fetch Helix console logs for these work items and extract all unique test failures:
-
-Job: {JOB_ID_1}
-  Work items: dotnet.Tests.dll.19, dotnet.Tests.dll.23
+Extract all unique test failures from these Helix work items:
 
-Job: {JOB_ID_2}
-  Work items: dotnet.Tests.dll.19
+Job: {JOB_ID_1}, Work items: {ITEM_1}, {ITEM_2}
+Job: {JOB_ID_2}, Work items: {ITEM_3}
 
-For each, use:
+For each, run:
   ./scripts/Get-CIStatus.ps1 -HelixJob "{JOB}" -WorkItem "{ITEM}"
 
-From the console output, extract lines matching xUnit failure format:
-  [xUnit.net HH:MM:SS.ss] TestNamespace.TestClass.TestMethod [FAIL]
-
-IMPORTANT: Lines with [OUTPUT] or [PASS] are NOT failures.
-Only lines ending with [FAIL] indicate actual test failures.
+Extract lines ending with [FAIL] (xUnit format). Ignore [OUTPUT] and [PASS] lines.
 
-Deduplicate across all work items.
-Return: unique FAIL test names + which work items they appeared in.
+Return JSON: { "failures": [{ "test": "Namespace.Class.Method", "workItems": ["item1", "item2"] }] }
 ```
 
-**Result:** A clean list of unique failures instead of pages of raw logs.
-
 ## Pattern 2: Finding a Baseline Build
 
-**When:** A test fails on a PR — need to confirm it passes on the PR's target/base branch to prove the failure is PR-caused.
-
-**Problem:** Requires searching recent merged PRs or target branch CI runs, finding the matching build, locating the right Helix job and work item. Multiple API calls.
+**When:** A test fails on a PR — need to confirm it passes on the target branch.
 
 **Delegate:**
 ```
-Find a recent passing build on the PR's target/base branch of dotnet/{REPO} that ran the same test leg as this failing build.
+Find a recent passing build on {TARGET_BRANCH} of dotnet/{REPO} that ran the same test leg.
 
-Failing build: {BUILD_ID} (PR #{PR_NUMBER})
-Failing job name: {JOB_NAME} (e.g., "TestBuild linux x64")
-Failing work item: {WORK_ITEM} (e.g., "dotnet.Tests.dll.19")
+Failing build: {BUILD_ID}, job: {JOB_NAME}, work item: {WORK_ITEM}
 
 Steps:
-1. Use GitHub MCP to find recently merged PRs targeting the same base branch as this PR:
+1. Search for recently merged PRs:
    github-mcp-server-search_pull_requests query:"is:merged base:{TARGET_BRANCH}" owner:dotnet repo:{REPO}
-2. Set {TARGET_BRANCH} to the PR's base branch (e.g., main, release/9.0) and pick the most recent merged PR
-3. Run the CI script to check its build status:
-   ./scripts/Get-CIStatus.ps1 -PRNumber {MERGED_PR} -Repository "dotnet/{REPO}"
-4. Find the build that passed with the same job name
-5. Find the Helix job ID for that job (may need to download build artifacts — see azure-cli.md and binlog-comparison.md for "binlogs to find binlogs")
-6. Confirm the matching work item passed
-
-Return: the passing build ID, Helix job ID, and work item name, or "no recent passing build found".
-```
+2. Run: ./scripts/Get-CIStatus.ps1 -PRNumber {MERGED_PR} -Repository "dotnet/{REPO}"
+3. Find the build with same job name that passed
+4. Locate the Helix job ID (may need artifact download — see azure-cli.md)
 
-## Pattern 3: Narrowing Merge Diffs to Relevant Files
+Return JSON: { "found": true, "buildId": N, "helixJob": "...", "workItem": "...", "result": "Pass" }
+Or: { "found": false, "reason": "no passing build in last 5 merged PRs" }
 
-**When:** A large merge PR (hundreds of commits, hundreds of changed files) has test failures — need to identify which changes are relevant.
+If authentication fails or API returns errors, STOP and return the error — don't troubleshoot.
+```
+
+## Pattern 3: Extracting Merge PR Changed Files
 
-**Problem:** `git diff` on a 458-file merge is overwhelming. Most changes are unrelated to the specific failure.
+**When:** A large merge PR (hundreds of files) has test failures — need the file list for the main agent to analyze.
 
 **Delegate:**
 ```
-Given these test failures on merge PR #{PR_NUMBER} (branch: {SOURCE} → {TARGET}):
-  - {TEST_1}
-  - {TEST_2}
+List all changed files on merge PR #{PR_NUMBER} in dotnet/{REPO}.
+
+Use: github-mcp-server-pull_request_read method:get_files owner:dotnet repo:{REPO} pullNumber:{PR_NUMBER}
+
+For each file, note: path, change type (added/modified/deleted), lines changed.
+
+Return JSON: { "totalFiles": N, "files": [{ "path": "...", "changeType": "modified", "linesChanged": N }] }
+```
+
+> The main agent decides which files are relevant to the specific failures — don't filter in the subagent.
+
+## Pattern 4: Parallel Artifact Extraction
 
-Find the changed files most likely to cause these failures.
+**When:** Multiple builds or artifacts need independent analysis — binlog comparison, canceled job recovery, multi-build progression.
+
+**Key insight:** Launch one subagent per build/artifact in parallel. Each does its mechanical extraction independently. The main agent synthesizes results across all of them.
+
+**Delegate (per build, for binlog analysis):**
+```
+Download and analyze binlog from AzDO build {BUILD_ID}, artifact {ARTIFACT_NAME}.
 
 Steps:
-1. Get the list of changed files: git diff --name-only {TARGET}...{SOURCE}
-2. Filter to files matching these patterns (adjust per failure type):
-   - For MSBuild/build failures: *.targets, *.props, Directory.Build.*, eng/Versions.props
-   - For test failures: test project files, test assets
-   - For specific SDK areas: src/Tasks/, src/Cli/, src/WasmSdk/
-3. For each relevant file, show the key diff hunks (not the full diff)
-4. Look for version bumps, property changes, or behavioral changes
-
-Return: the 5-10 most relevant changed files with a one-line summary of what changed in each.
+1. Download the artifact (see azure-cli.md)
+2. Load: mcp-binlog-tool-load_binlog path:"{BINLOG_PATH}"
+3. Extract target info: mcp-binlog-tool-search_tasks_by_name taskName:"Csc"
+4. Get task parameters: mcp-binlog-tool-get_task_info
+
+Return JSON: { "buildId": N, "project": "...", "args": ["..."] }
 ```
 
-## Pattern 4: Parallel Binlog Extraction
+**Delegate (per build, for canceled job recovery):**
+```
+Check if canceled job "{JOB_NAME}" from build {BUILD_ID} has recoverable Helix results.
 
-**When:** Comparing two builds — see [binlog-comparison.md](binlog-comparison.md).
+Steps:
+1. Download the testResults.xml artifact from the canceled job (see azure-cli.md)
+2. If available, parse for pass/fail counts and work item status
+
+Return JSON: { "jobName": "...", "hasResults": true, "passed": N, "failed": N }
+Or: { "jobName": "...", "hasResults": false, "reason": "no artifacts" }
+```
 
-**Key insight:** Launch two subagents simultaneously (one per build). Each downloads a binlog, loads it into the MCP server, extracts task parameters, normalizes paths, and returns a sorted arg list. The main agent just diffs the two lists.
+This pattern scales to any number of builds — launch N subagents for N builds, collect results, compare.
 
 ## General Guidelines
 
-- **Use `task` agent type** for all delegation (it has shell + MCP access)
-- **Run independent tasks in parallel** (e.g., two binlog extractions)
-- **Include the CI script path** in every prompt — subagents don't inherit skill context
-- **Ask for structured output** — "return a list of X" not "show me what you find"
-- **Don't delegate interpretation** — subagents extract data, main agent interprets meaning
+- **Use `task` agent type** — it has shell + MCP access
+- **Run independent tasks in parallel** — the whole point of delegation
+- **Include script paths** — subagents don't inherit skill context
+- **Require structured JSON output** — enables comparison across subagents
+- **Don't delegate interpretation** — subagents return facts, main agent reasons
+- **STOP on errors** — subagents should return error details immediately, not troubleshoot auth/environment issues
+- **Use SQL for many results** — when launching 5+ subagents or doing multi-phase delegation, store results in a SQL table (`CREATE TABLE results (agent_id TEXT, build_id INT, data TEXT, status TEXT)`) so you can query across all results instead of holding them in context

From 5a852b135a915c4bb5927ad7f2ebd0dabf7dd52a Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 19:17:31 -0600
Subject: [PATCH 17/44] Trim SKILL.md from ~4.6K to ~4K tokens: condense
 anti-patterns, merge output format

---
 .github/skills/ci-analysis/SKILL.md           | 54 +++++--------------
 .../references/delegation-patterns.md         |  6 +--
 2 files changed, 15 insertions(+), 45 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 95fe0c13eda6a4..048e29e8d1ec09 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -112,11 +112,9 @@ The script operates in three distinct modes depending on what information you ha
 
 **Local test failures**: Some repos (e.g., dotnet/sdk) run tests directly on build agents. These can also match known issues - search for the test name with the "Known Build Error" label.
 
-> ⚠️ **Be cautious labeling failures as "infrastructure."** If Build Analysis didn't flag a failure as a known issue, treat it as potentially real — even if it looks like a device failure, Docker issue, or network timeout. Only conclude "infrastructure" when you have strong evidence (e.g., identical failure on the target branch, Build Analysis match, or confirmed outage). Dismissing failures as transient without evidence delays real bug discovery.
+> ⚠️ **Be cautious labeling failures as "infrastructure."** Only conclude infrastructure when you have strong evidence: Build Analysis match, identical failure on target branch, or confirmed outage. "Environment" in the error doesn't make it infrastructure — a test requiring an uninstalled framework is a test defect, not infra.
 
-> ❌ **Don't confuse "environment-related" with "infrastructure."** A test that fails because a required framework isn't installed (e.g., .NET 2.2) is a **test defect** — the test has wrong assumptions about what's available. Infrastructure failures are *transient*: network timeouts, Docker pull failures, agent crashes, disk space. If the failure would reproduce 100% of the time on any machine with the same setup, it's a code/test issue, not infra. The word "environment" in the error doesn't make it an infrastructure problem.
-
-> ❌ **Missing packages on flow PRs are NOT always infrastructure failures.** When a codeflow or dependency-update PR fails with "package not found" or "version not available", don't assume it's a feed propagation delay. Flow PRs bring in behavioral changes from upstream repos that can cause the build to request *different* packages than before. Example: an SDK flow changed runtime pack resolution logic, causing builds to look for `Microsoft.NETCore.App.Runtime.browser-wasm` (CoreCLR — doesn't exist) instead of `Microsoft.NETCore.App.Runtime.Mono.browser-wasm` (what had always been used). The fix was in the flowed code, not in feed infrastructure. Always check *which* package is missing and *why* it's being requested before diagnosing as infrastructure.
+> ❌ **Missing packages on flow PRs ≠ infrastructure.** Flow PRs bring behavioral changes that can cause builds to request *different* packages. Always check *which* package is missing and *why* before assuming feed propagation delay.
 
 ## Generating Recommendations
 
@@ -151,9 +149,14 @@ Then layer in nuance the heuristic can't capture:
 - **All pipelines**: Comment `/azp run` to retry all failing pipelines
 - **Helix work items**: Cannot be individually retried — must re-run the entire AzDO build
 
-### Tone
+### Tone and output format
+
+Be direct. Lead with the most important finding. Structure your response as:
+1. **Summary verdict** (1-2 sentences) — Is CI green? Failures PR-related? Known issues?
+2. **Failure details** (2-4 bullets) — what failed, why, evidence
+3. **Recommended actions** (numbered) — retry, fix, investigate. Include `/azp run` commands.
 
-Be direct. Lead with the most important finding. Use 2-4 bullet points, not long paragraphs. Distinguish what's known vs. uncertain.
+Synthesize from: JSON summary (structured facts) + human-readable output (details/logs) + Step 0 context (PR type, author intent).
 
 ## Analysis Workflow
 
@@ -209,32 +212,6 @@ Before stating a failure's cause, verify your claim:
 - **"Safe to retry"** → Are ALL failures accounted for (known issues or infrastructure), or are you ignoring some?
 - **"Not related to this PR"** → Have you checked if the test passes on the target branch? Don't assume — verify.
 
-## Presenting Results
-
-The script outputs both human-readable failure details and a `[CI_ANALYSIS_SUMMARY]` JSON block. Use both to produce a structured response.
-
-### Output structure
-
-Use this format — adapt sections based on what you find:
-
-**1. Summary verdict** (1-2 sentences)
-Lead with the most important finding. Is CI green? Are failures PR-related? Known issues?
-
-**2. Failure details** (2-4 bullets)
-For each distinct failure category, state: what failed, why (known/correlated/unknown), and evidence.
-
-**3. Recommended actions** (numbered list)
-Specific next steps: retry, fix specific files, investigate further. Include `/azp run` commands if retrying.
-
-### How to synthesize
-
-1. Read the JSON summary for structured facts (failed jobs, known issues, PR correlation, recommendation hint)
-2. Read the human-readable output for failure details, console logs, and error messages
-3. Layer in Step 0 context — PR type, author intent, changed files
-4. Reason over all three to produce contextual recommendations — the `recommendationHint` is a starting point, not the final answer
-5. Look for patterns the heuristic may have missed (e.g., same failure across multiple jobs, related failures in different builds)
-6. Present findings with appropriate caveats — state what is known vs. uncertain
-
 ## References
 
 - **Helix artifacts & binlogs**: See [references/helix-artifacts.md](references/helix-artifacts.md)
@@ -245,18 +222,11 @@ Specific next steps: retry, fix specific files, investigate further. Include `/a
 - **Manual investigation steps**: See [references/manual-investigation.md](references/manual-investigation.md)
 - **AzDO/Helix details**: See [references/azdo-helix-reference.md](references/azdo-helix-reference.md)
 
-## Recovering Results from Canceled Jobs
-
-Canceled jobs (typically from timeouts) often still have useful Helix results. See [references/azure-cli.md](references/azure-cli.md) for artifact download steps and [references/binlog-comparison.md](references/binlog-comparison.md) for the "binlogs to find binlogs" workflow.
-
-**Key insight**: "Canceled" ≠ "Failed". Always check artifacts before concluding results are lost.
-
 ## Tips
 
 1. Check if same test fails on the target branch before assuming transient
 2. Look for `[ActiveIssue]` attributes for known skipped tests
 3. Use `-SearchMihuBot` for semantic search of related issues
-4. Binlogs in artifacts help diagnose MSB4018 task failures
-5. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
-6. If checking CI status via `gh pr checks --json`, the valid fields are `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow`. There is **no `conclusion` field** in current `gh` versions — `state` contains `SUCCESS`/`FAILURE` directly
-7. When investigating internal AzDO pipelines, use the Azure CLI — see [references/azure-cli.md](references/azure-cli.md). Check `az account show` first to verify authentication
+4. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
+5. `gh pr checks --json` valid fields: `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow` — no `conclusion` field, `state` has `SUCCESS`/`FAILURE` directly
+6. "Canceled" ≠ "Failed" — canceled jobs may have recoverable Helix results. Check artifacts before concluding results are lost.
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index 184f6816f52861..ee1e6527ed1370 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -73,9 +73,9 @@ Download and analyze binlog from AzDO build {BUILD_ID}, artifact {ARTIFACT_NAME}
 
 Steps:
 1. Download the artifact (see azure-cli.md)
-2. Load: mcp-binlog-tool-load_binlog path:"{BINLOG_PATH}"
-3. Extract target info: mcp-binlog-tool-search_tasks_by_name taskName:"Csc"
-4. Get task parameters: mcp-binlog-tool-get_task_info
+2. Load: binlog-load_binlog path:"{BINLOG_PATH}"
+3. Find tasks: binlog-search_tasks_by_name taskName:"Csc"
+4. Get task parameters: binlog-get_task_info
 
 Return JSON: { "buildId": N, "project": "...", "args": ["..."] }
 ```

From 532a1b725a9b214aece0c03b5881db6f6792fb5e Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 19:31:09 -0600
Subject: [PATCH 18/44] Address review: standardize binlog tool names, document
 hint priority

---
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 0f7e32b34537a4..0c7e248d780a3c 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -2183,6 +2183,8 @@ if ($prChangedFiles.Count -gt 0 -and $allFailuresForCorrelation.Count -gt 0) {
 }
 
 # Compute recommendation hint
+# Priority: KNOWN_ISSUES wins over LIKELY_PR_RELATED intentionally.
+# When both exist, SKILL.md "Mixed signals" guidance tells the agent to separate them.
 if (-not $lastBuildJobSummary -and $buildIds.Count -gt 0) {
     $summary.recommendationHint = "REVIEW_REQUIRED"
 } elseif ($knownIssuesFromBuildAnalysis.Count -gt 0) {

From 2a9ffb90fb77dc80000f41213df4f2e0b23a3b86 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 19:54:47 -0600
Subject: [PATCH 19/44] build-progression: document triggerInfo.pr.sourceSha
 for commit mapping
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- sourceVersion is the merge commit, not the PR head — use pr.sourceSha instead
- A PR may have more unique pr.sourceSha values than visible commits (force-pushes)
- Updated example table to show grouping by pr.sourceSha
---
 .../references/build-progression-analysis.md  | 37 ++++++++++++-------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
index cf2d904fd27777..1a7d4bf475ea09 100644
--- a/.github/skills/ci-analysis/references/build-progression-analysis.md
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -21,28 +21,37 @@ az pipelines runs list --branch "refs/pull/{PR}/merge" --top 20 --org $org -p $p
     --query "[].{id:id, result:result, sourceVersion:sourceVersion, finishTime:finishTime}" -o table
 ```
 
-### Step 2: Map builds to commits
+### Step 2: Map builds to the PR's head commit
 
-For each build, identify the source commit and use `git log` to see what was included:
+Each build's `triggerInfo` contains `pr.sourceSha` — the PR's HEAD commit when the build was triggered. This is the key for mapping builds to commits:
 
 ```powershell
-# Get source version for a build
-az pipelines runs show --id $buildId --org $org -p $project `
-    --query "{id:id, result:result, sourceVersion:sourceVersion}" -o json
-
-# See what commits are between two source versions
-# git log --oneline $passingCommit..$failingCommit
+# Get full build details including triggerInfo
+$allBuilds = az pipelines runs list --branch "refs/pull/{PR}/merge" --top 20 `
+    --org $org -p $project -o json | ConvertFrom-Json
+
+# Extract PR head commit for each build
+foreach ($build in $allBuilds) {
+    $prHead = $build.triggerInfo.'pr.sourceSha'
+    $short = if ($prHead) { $prHead.Substring(0,7) } else { "n/a" }
+    Write-Host "Build $($build.id): $($build.result) — PR HEAD: $short"
+}
 ```
 
+> ⚠️ **`sourceVersion` is the merge commit**, not the PR's head commit. Use `triggerInfo.'pr.sourceSha'` instead.
+
+Note: a PR may have more unique `pr.sourceSha` values than commits visible on GitHub, because force-pushes replace the commit history. Each force-push triggers a new build with a new merge commit and a new `pr.sourceSha`.
+
 ### Step 3: Build a progression table
 
-Present the facts as a table:
+Present the facts as a table. Group builds by `pr.sourceSha` since multiple pipelines run per push:
 
-| Build | Source commit | Result | What changed since previous build |
-|-------|-------------|--------|----------------------------------|
-| 1284433 | abc123 | ✅ 9/9 | Initial PR commits |
-| 1286087 | def456 | ❌ 7/9 | Added commit C |
-| 1286967 | ghi789 | ❌ 7/9 | Modified commit C |
+| PR HEAD | Builds | Result | Notes |
+|---------|--------|--------|-------|
+| 6d499c2 | 1283943-5 | ✅ 3/3 | Initial commit |
+| 39dc0a6 | 1284433-5 | ✅ 3/3 | Added commit B |
+| f186b93 | 1286087-9 | ❌ 1/3 | Added commit C |
+| 2e74845 | 1286967-9 | ❌ 1/3 | Modified commit C |
 
 ### Step 4: Present findings, not conclusions
 

From 18d9331ec8977481cb1370cab097d4f0aa7218f1 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 20:19:22 -0600
Subject: [PATCH 20/44] Add per-failure details, Python error patterns, and log
 tail fallback

- Add Python/Traceback/timeout patterns to Format-TestFailure
- Show tail of log in PR mode when no failure pattern matches
  (Helix Job mode already did this; PR mode silently swallowed)
- Add failedJobDetails to JSON summary with per-job errorCategory:
  test-failure, build-error, test-timeout, crash,
  tests-passed-reporter-failed, unclassified
- tests-passed-reporter-failed: detects when all tests passed but
  Helix work item failed due to post-test infrastructure crash
  (e.g., Python ModuleNotFoundError in result reporter)
- Update SKILL.md: document failedJobDetails, error categories,
  clarify POSSIBLY_TRANSIENT means unclassified not transient
---
 .github/skills/ci-analysis/SKILL.md           | 15 +++++-
 .../ci-analysis/scripts/Get-CIStatus.ps1      | 52 ++++++++++++++++++-
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 048e29e8d1ec09..1f61e982e7ca23 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -112,7 +112,17 @@ The script operates in three distinct modes depending on what information you ha
 
 **Local test failures**: Some repos (e.g., dotnet/sdk) run tests directly on build agents. These can also match known issues - search for the test name with the "Known Build Error" label.
 
-> ⚠️ **Be cautious labeling failures as "infrastructure."** Only conclude infrastructure when you have strong evidence: Build Analysis match, identical failure on target branch, or confirmed outage. "Environment" in the error doesn't make it infrastructure — a test requiring an uninstalled framework is a test defect, not infra.
+**Per-failure details** (`failedJobDetails` in JSON): Each failed job includes `errorCategory`, `errorSnippet`, `helixWorkItems`, and `knownIssues`. Use these for per-job classification instead of applying a single `recommendationHint` to all failures.
+
+Error categories:
+- `test-failure` — test assertions failed
+- `build-error` — compilation/build errors
+- `test-timeout` — Helix work item timed out
+- `crash` — process crash (SIGSEGV/SIGABRT exit codes 139/134)
+- `tests-passed-reporter-failed` — all tests passed but the Helix work item still failed because post-test infrastructure (e.g., Python result reporter) crashed. This shows as a failed job in CI even though no tests failed. This is genuinely infrastructure.
+- `unclassified` — no pattern matched; investigate manually
+
+> ⚠️ **Be cautious labeling failures as "infrastructure."** Only conclude infrastructure when you have strong evidence: Build Analysis match, identical failure on target branch, or confirmed outage. "Environment" in the error doesn't make it infrastructure — a test requiring an uninstalled framework is a test defect, not infra. Exception: `tests-passed-reporter-failed` is genuinely infrastructure.
 
 > ❌ **Missing packages on flow PRs ≠ infrastructure.** Flow PRs bring behavioral changes that can cause builds to request *different* packages. Always check *which* package is missing and *why* before assuming feed propagation delay.
 
@@ -129,13 +139,14 @@ Read `recommendationHint` as a starting point, then layer in context:
 | `BUILD_SUCCESSFUL` | No failures. Confirm CI is green. |
 | `KNOWN_ISSUES_DETECTED` | Known tracked issues found. Recommend retry if failures match known issues. Link the issues. |
 | `LIKELY_PR_RELATED` | Failures correlate with PR changes. Lead with "fix these before retrying" and list `correlatedFiles`. |
-| `POSSIBLY_TRANSIENT` | No correlation with PR changes, no known issues. Suggest checking the target branch, searching for issues, or retrying. |
+| `POSSIBLY_TRANSIENT` | Failures could not be automatically classified — does NOT mean they are transient. Use `failedJobDetails` to investigate each failure individually. |
 | `REVIEW_REQUIRED` | Could not auto-determine cause. Review failures manually. |
 | `MERGE_CONFLICTS` | PR has merge conflicts — CI won't run. Tell the user to resolve conflicts. Offer to analyze a previous build by ID. |
 | `NO_BUILDS` | No AzDO builds found (CI not triggered). Offer to check if CI needs to be triggered or analyze a previous build. |
 
 Then layer in nuance the heuristic can't capture:
 
+- **Use `failedJobDetails`**: When present, classify each job individually by its `errorCategory` rather than applying the `recommendationHint` uniformly. Different jobs may have different causes.
 - **Mixed signals**: Some failures match known issues AND some correlate with PR changes → separate them. Known issues = safe to retry; correlated = fix first.
 - **Canceled jobs with recoverable results**: If `canceledJobNames` is non-empty, mention that canceled jobs may have passing Helix results (see "Recovering Results from Canceled Jobs").
 - **Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 0c7e248d780a3c..bb5a6f5b843338 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -1478,7 +1478,10 @@ function Format-TestFailure {
         'BUG:',
         'FAILED\s*$',
         'END EXECUTION - FAILED',
-        'System\.\w+Exception:'
+        'System\.\w+Exception:',
+        'Traceback \(most recent call last\)',
+        '\w+Error:',
+        'Timed Out \(timeout'
     )
     $combinedPattern = ($failureStartPatterns -join '|')
 
@@ -1770,6 +1773,7 @@ try {
     $allFailuresForCorrelation = @()
     $allFailedJobNames = @()
     $allCanceledJobNames = @()
+    $allFailedJobDetails = @()
     $lastBuildJobSummary = $null
 
     foreach ($currentBuildId in $buildIds) {
@@ -1957,6 +1961,15 @@ try {
                 Write-Host "`n--- $($job.name) ---" -ForegroundColor Cyan
                 Write-Host "  Build: https://dev.azure.com/$Organization/$Project/_build/results?buildId=$currentBuildId&view=logs&j=$($job.id)" -ForegroundColor Gray
 
+                # Track per-job failure details for JSON summary
+                $jobDetail = [ordered]@{
+                    jobName = $job.name
+                    errorSnippet = ""
+                    helixWorkItems = @()
+                    knownIssues = @()
+                    errorCategory = "unclassified"
+                }
+
                 # Get Helix tasks for this job
                 $helixTasks = Get-HelixJobInfo -Timeline $timeline -JobId $job.id
 
@@ -1984,6 +1997,8 @@ try {
                                         HelixLogs = @()
                                         FailedTests = $failures | ForEach-Object { $_.TestName }
                                     }
+                                    $jobDetail.errorCategory = "test-failure"
+                                    $jobDetail.errorSnippet = ($failures | Select-Object -First 3 | ForEach-Object { $_.TestName }) -join "; "
                                 }
 
                             # Extract and optionally fetch Helix URLs
@@ -1999,6 +2014,7 @@ try {
                                     $workItemName = ""
                                     if ($url -match '/workitems/([^/]+)/console') {
                                         $workItemName = $Matches[1]
+                                        $jobDetail.helixWorkItems += $workItemName
                                     }
 
                                     $helixLog = Get-HelixConsoleLog -Url $url
@@ -2007,9 +2023,36 @@ try {
                                         if ($failureInfo) {
                                             Write-Host $failureInfo -ForegroundColor White
 
+                                            # Categorize failure from log content
+                                            if ($failureInfo -match 'Timed Out \(timeout') {
+                                                $jobDetail.errorCategory = "test-timeout"
+                                            } elseif ($failureInfo -match 'Exit Code:\s*(139|134)' -or $failureInfo -match 'createdump') {
+                                                $jobDetail.errorCategory = "crash"
+                                            } elseif ($failureInfo -match 'Traceback \(most recent call last\)' -and $helixLog -match 'Tests run:.*Failures:\s*0') {
+                                                # Work item failed (non-zero exit from reporter crash) but all tests passed.
+                                                # The Python traceback is from Helix infrastructure, not from the test itself.
+                                                $jobDetail.errorCategory = "tests-passed-reporter-failed"
+                                            } elseif ($jobDetail.errorCategory -eq "unclassified") {
+                                                $jobDetail.errorCategory = "test-failure"
+                                            }
+                                            if (-not $jobDetail.errorSnippet) {
+                                                $jobDetail.errorSnippet = $failureInfo.Substring(0, [Math]::Min(200, $failureInfo.Length))
+                                            }
+
                                             # Search for known issues
                                             Show-KnownIssues -TestName $workItemName -ErrorMessage $failureInfo -IncludeMihuBot:$SearchMihuBot
                                         }
+                                        else {
+                                            # No failure pattern matched — show tail of log
+                                            $lines = $helixLog -split "`n"
+                                            $lastLines = $lines | Select-Object -Last 20
+                                            $tailText = $lastLines -join "`n"
+                                            Write-Host $tailText -ForegroundColor White
+                                            if (-not $jobDetail.errorSnippet) {
+                                                $jobDetail.errorSnippet = $tailText.Substring(0, [Math]::Min(200, $tailText.Length))
+                                            }
+                                            Show-KnownIssues -TestName $workItemName -ErrorMessage $tailText -IncludeMihuBot:$SearchMihuBot
+                                        }
                                     }
                                 }
                             }
@@ -2050,6 +2093,11 @@ try {
                                         HelixLogs = @()
                                         FailedTests = @()
                                     }
+                                    $jobDetail.errorCategory = "build-error"
+                                    if (-not $jobDetail.errorSnippet) {
+                                        $snippet = ($buildErrors | Select-Object -First 2) -join "; "
+                                        $jobDetail.errorSnippet = $snippet.Substring(0, [Math]::Min(200, $snippet.Length))
+                                    }
 
                                     # Extract Helix log URLs from the full log content
                                     $helixLogUrls = Extract-HelixLogUrls -LogContent $logContent
@@ -2085,6 +2133,7 @@ try {
                     }
                 }
 
+            $allFailedJobDetails += $jobDetail
             $processedJobs++
         }
         catch {
@@ -2162,6 +2211,7 @@ $summary = [ordered]@{
         total = 0; succeeded = 0; failed = 0; canceled = 0; pending = 0; warnings = 0; skipped = 0
     } }
     failedJobNames = @($allFailedJobNames)
+    failedJobDetails = @($allFailedJobDetails)
     canceledJobNames = @($allCanceledJobNames)
     knownIssues = @($knownIssuesFromBuildAnalysis | ForEach-Object {
         [ordered]@{ number = $_.Number; title = $_.Title; url = $_.Url }

From f81cab5a450a5130dbede88cc0bfc8324284c7c5 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 20:27:17 -0600
Subject: [PATCH 21/44] Remove overly broad \w+Error: and Traceback
 failure-start patterns

These patterns grab Python harness/reporter noise from Helix logs and
swamp real .NET test failures. Traceback detection is still used in the
errorCategory classifier (tests-passed-reporter-failed) where it's
cross-referenced with 'Failures: 0' to avoid false positives.
---
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index bb5a6f5b843338..ed779e15e1cf46 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -1471,6 +1471,8 @@ function Format-TestFailure {
     $failureCount = 0
 
     # Expanded failure detection patterns
+    # CAUTION: These trigger "failure block" capture. Overly broad patterns (e.g. \w+Error:)
+    # will grab Python harness/reporter noise and swamp the real test failure.
     $failureStartPatterns = @(
         '\[FAIL\]',
         'Assert\.\w+\(\)\s+Failure',
@@ -1479,8 +1481,6 @@ function Format-TestFailure {
         'FAILED\s*$',
         'END EXECUTION - FAILED',
         'System\.\w+Exception:',
-        'Traceback \(most recent call last\)',
-        '\w+Error:',
         'Timed Out \(timeout'
     )
     $combinedPattern = ($failureStartPatterns -join '|')

From d26014f2335967d78d83bf6cca6db0848e54f005 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Tue, 10 Feb 2026 20:41:08 -0600
Subject: [PATCH 22/44] build-progression: warn that target branch moves
 between builds

Each AzDO PR build merges pr.sourceSha into the target branch HEAD at
build start time. If main moved between builds, a pass->fail transition
may be caused by the new baseline, not the PR commit.
---
 .../ci-analysis/references/build-progression-analysis.md  | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
index 1a7d4bf475ea09..c3577be5f009f7 100644
--- a/.github/skills/ci-analysis/references/build-progression-analysis.md
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -40,11 +40,13 @@ foreach ($build in $allBuilds) {
 
 > ⚠️ **`sourceVersion` is the merge commit**, not the PR's head commit. Use `triggerInfo.'pr.sourceSha'` instead.
 
+> ⚠️ **Target branch moves between builds.** Each build merges `pr.sourceSha` into the target branch HEAD *at the time the build starts*. If `main` received new commits between build N and N+1, the two builds merged against different baselines — even if `pr.sourceSha` is the same. When the progression shows a transition from pass → fail, always check whether the target branch moved. Compare `sourceVersion` (the merge commit) parents, or check the target branch commit log for the time window between builds.
+
 Note: a PR may have more unique `pr.sourceSha` values than commits visible on GitHub, because force-pushes replace the commit history. Each force-push triggers a new build with a new merge commit and a new `pr.sourceSha`.
 
 ### Step 3: Build a progression table
 
-Present the facts as a table. Group builds by `pr.sourceSha` since multiple pipelines run per push:
+Present the facts as a table. Group builds by `pr.sourceSha` since multiple pipelines run per push. Include the target branch HEAD if available (from `sourceVersion` merge parents or commit timestamps) to catch baseline shifts:
 
 | PR HEAD | Builds | Result | Notes |
 |---------|--------|--------|-------|
@@ -53,6 +55,8 @@ Present the facts as a table. Group builds by `pr.sourceSha` since multiple pipe
 | f186b93 | 1286087-9 | ❌ 1/3 | Added commit C |
 | 2e74845 | 1286967-9 | ❌ 1/3 | Modified commit C |
 
+If a pass→fail transition has the **same** `pr.sourceSha`, the target branch moved — the PR didn't change, so the failure came from the new baseline. Check what merged into the target branch between those builds.
+
 ### Step 4: Present findings, not conclusions
 
 Report what the progression shows:
@@ -90,7 +94,7 @@ Both techniques compare a failing build against a passing one:
 | **Target-branch comparison** | Recent build on the base branch (e.g., main) | "Does this test pass without the PR's changes at all?" |
 | **Build progression** | Earlier build on the same PR | "Did this test pass with the PR's *earlier* changes?" |
 
-Use target-branch comparison first to confirm the failure is PR-related. Use build progression to narrow down *which part* of the PR introduced it.
+Use target-branch comparison first to confirm the failure is PR-related. Use build progression to narrow down *which part* of the PR introduced it. If build progression shows a pass→fail transition with the same `pr.sourceSha`, the target branch is the more likely culprit — use target-branch comparison to confirm.
 
 ## Anti-Patterns
 

From dab10c660ecea0d67c85438ad47a4c3de5b18d98 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 09:23:00 -0600
Subject: [PATCH 23/44] ci-analysis: polish SKILL.md, add target HEAD tracking
 to build progression

- SKILL.md: condense error categories, trim redundant guidance
- build-progression-analysis.md: add Step 0 (start recent), Step 2b
  (extract target branch HEAD from checkout logs), add Target HEAD
  column to progression table
- Get-CIStatus.ps1: remove unused knownIssues init in failedJobDetail
---
 .github/skills/ci-analysis/SKILL.md           | 18 ++-----
 .../references/build-progression-analysis.md  | 53 +++++++++++++++----
 .../ci-analysis/scripts/Get-CIStatus.ps1      |  1 -
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 1f61e982e7ca23..0dc907b1de2909 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -112,19 +112,13 @@ The script operates in three distinct modes depending on what information you ha
 
 **Local test failures**: Some repos (e.g., dotnet/sdk) run tests directly on build agents. These can also match known issues - search for the test name with the "Known Build Error" label.
 
-**Per-failure details** (`failedJobDetails` in JSON): Each failed job includes `errorCategory`, `errorSnippet`, `helixWorkItems`, and `knownIssues`. Use these for per-job classification instead of applying a single `recommendationHint` to all failures.
+**Per-failure details** (`failedJobDetails` in JSON): Each failed job includes `errorCategory`, `errorSnippet`, and `helixWorkItems`. Use these for per-job classification instead of applying a single `recommendationHint` to all failures.
 
-Error categories:
-- `test-failure` — test assertions failed
-- `build-error` — compilation/build errors
-- `test-timeout` — Helix work item timed out
-- `crash` — process crash (SIGSEGV/SIGABRT exit codes 139/134)
-- `tests-passed-reporter-failed` — all tests passed but the Helix work item still failed because post-test infrastructure (e.g., Python result reporter) crashed. This shows as a failed job in CI even though no tests failed. This is genuinely infrastructure.
-- `unclassified` — no pattern matched; investigate manually
+Error categories: `test-failure`, `build-error`, `test-timeout`, `crash` (exit codes 139/134), `tests-passed-reporter-failed` (all tests passed but reporter crashed — genuinely infrastructure), `unclassified` (investigate manually).
 
-> ⚠️ **Be cautious labeling failures as "infrastructure."** Only conclude infrastructure when you have strong evidence: Build Analysis match, identical failure on target branch, or confirmed outage. "Environment" in the error doesn't make it infrastructure — a test requiring an uninstalled framework is a test defect, not infra. Exception: `tests-passed-reporter-failed` is genuinely infrastructure.
+> ⚠️ **Be cautious labeling failures as "infrastructure."** Only conclude infrastructure with strong evidence: Build Analysis match, identical failure on target branch, or confirmed outage. Exception: `tests-passed-reporter-failed` is genuinely infrastructure.
 
-> ❌ **Missing packages on flow PRs ≠ infrastructure.** Flow PRs bring behavioral changes that can cause builds to request *different* packages. Always check *which* package is missing and *why* before assuming feed propagation delay.
+> ❌ **Missing packages on flow PRs ≠ infrastructure.** Flow PRs can cause builds to request *different* packages. Check *which* package and *why* before assuming feed delay.
 
 ## Generating Recommendations
 
@@ -146,13 +140,11 @@ Read `recommendationHint` as a starting point, then layer in context:
 
 Then layer in nuance the heuristic can't capture:
 
-- **Use `failedJobDetails`**: When present, classify each job individually by its `errorCategory` rather than applying the `recommendationHint` uniformly. Different jobs may have different causes.
 - **Mixed signals**: Some failures match known issues AND some correlate with PR changes → separate them. Known issues = safe to retry; correlated = fix first.
 - **Canceled jobs with recoverable results**: If `canceledJobNames` is non-empty, mention that canceled jobs may have passing Helix results (see "Recovering Results from Canceled Jobs").
 - **Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
 - **Multiple builds**: If `builds` has >1 entry, `lastBuildJobSummary` reflects only the last build — use `totalFailedJobs` for the aggregate count.
-- **BuildId mode**: `knownIssues` will be empty and `prCorrelation` will show `hasCorrelation = false` with `changedFileCount = 0` (PR correlation is not available without a PR number). Don't say "no known issues" or "no correlation" — say "Build Analysis and PR correlation not available in BuildId mode."
-- **Infrastructure vs code**: Don't label failures as "infrastructure" unless Build Analysis flagged them or the same test passes on the target branch. See the anti-patterns in "Interpreting Results" above.
+- **BuildId mode**: `knownIssues` and `prCorrelation` won't be populated. Say "Build Analysis and PR correlation not available in BuildId mode."
 
 ### How to Retry
 
diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
index c3577be5f009f7..6062de9672e657 100644
--- a/.github/skills/ci-analysis/references/build-progression-analysis.md
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -10,7 +10,13 @@ When the current build is failing, the PR's build history can reveal whether the
 
 ## The Pattern
 
-### Step 1: List all builds for the PR
+### Step 0: Start with the recent builds
+
+Don't try to analyze the full build history upfront — especially on large PRs with many pushes. Start with the most recent N builds (5-8), present the progression table, and let the user decide whether to dig deeper into earlier builds.
+
+On large PRs, the user is usually iterating toward a solution. The recent builds are the most relevant. Offer: "Here are the last N builds — the pass→fail transition was between X and Y. Want me to look at earlier builds?"
+
+### Step 1: List builds for the PR
 
 `gh pr checks` only shows checks for the current HEAD SHA. To see the full build history, query AzDO:
 
@@ -40,22 +46,49 @@ foreach ($build in $allBuilds) {
 
 > ⚠️ **`sourceVersion` is the merge commit**, not the PR's head commit. Use `triggerInfo.'pr.sourceSha'` instead.
 
-> ⚠️ **Target branch moves between builds.** Each build merges `pr.sourceSha` into the target branch HEAD *at the time the build starts*. If `main` received new commits between build N and N+1, the two builds merged against different baselines — even if `pr.sourceSha` is the same. When the progression shows a transition from pass → fail, always check whether the target branch moved. Compare `sourceVersion` (the merge commit) parents, or check the target branch commit log for the time window between builds.
+> ⚠️ **Target branch moves between builds.** Each build merges `pr.sourceSha` into the target branch HEAD *at the time the build starts*. If `main` received new commits between build N and N+1, the two builds merged against different baselines — even if `pr.sourceSha` is the same. Always extract the target branch HEAD to detect baseline shifts.
+
+### Step 2b: Extract the target branch HEAD from checkout logs
+
+The AzDO build API doesn't expose the target branch SHA, but the checkout task log contains it. The first checkout log (typically log ID 5) ends with:
+
+```
+HEAD is now at {mergeCommit} Merge {prSourceSha} into {targetBranchHead}
+```
+
+Extract it programmatically:
+
+```powershell
+$token = az account get-access-token --resource "499b84ac-1321-427f-aa17-267ca6975798" --query accessToken -o tsv
+$headers = @{ Authorization = "Bearer $token" }
+
+foreach ($build in $allBuilds) {
+    $logUrl = "https://dev.azure.com/{org}/{project}/_apis/build/builds/$($build.id)/logs/5"
+    $log = Invoke-RestMethod -Uri $logUrl -Headers $headers
+    $mergeLine = ($log -split "`n") | Where-Object { $_ -match 'Merge \w+ into \w+' } | Select-Object -First 1
+    if ($mergeLine -match 'Merge (\w+) into (\w+)') {
+        $targetHead = $Matches[2].Substring(0, 7)
+    }
+}
+```
+
+> Note: log ID 5 is the first checkout task in most pipelines. If it doesn't contain the merge line, check the build timeline for tasks named "Checkout" and use their log IDs.
 
 Note: a PR may have more unique `pr.sourceSha` values than commits visible on GitHub, because force-pushes replace the commit history. Each force-push triggers a new build with a new merge commit and a new `pr.sourceSha`.
 
 ### Step 3: Build a progression table
 
-Present the facts as a table. Group builds by `pr.sourceSha` since multiple pipelines run per push. Include the target branch HEAD if available (from `sourceVersion` merge parents or commit timestamps) to catch baseline shifts:
+Include the target branch HEAD to catch baseline shifts:
 
-| PR HEAD | Builds | Result | Notes |
-|---------|--------|--------|-------|
-| 6d499c2 | 1283943-5 | ✅ 3/3 | Initial commit |
-| 39dc0a6 | 1284433-5 | ✅ 3/3 | Added commit B |
-| f186b93 | 1286087-9 | ❌ 1/3 | Added commit C |
-| 2e74845 | 1286967-9 | ❌ 1/3 | Modified commit C |
+| PR HEAD | Target HEAD | Builds | Result | Notes |
+|---------|-------------|--------|--------|-------|
+| 7af79ad | 2d638dc | 1283986 | ❌ | Initial commits |
+| 28ec8a0 | 0b691ba | 1284169 | ❌ | Iteration 2 |
+| 39dc0a6 | 18a3069 | 1284433 | ✅ | Iteration 3 |
+| f186b93 | 5709f35 | 1286087 | ❌ | Added commit C; target moved ~35 commits |
+| 2e74845 | 482d8f9 | 1286967 | ❌ | Modified commit C |
 
-If a pass→fail transition has the **same** `pr.sourceSha`, the target branch moved — the PR didn't change, so the failure came from the new baseline. Check what merged into the target branch between those builds.
+When both `pr.sourceSha` AND `Target HEAD` change between a pass→fail transition, either could be the cause. Analyze the failure content to determine which. If only the target moved (same `pr.sourceSha`), the failure came from the new baseline.
 
 ### Step 4: Present findings, not conclusions
 
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index ed779e15e1cf46..97f876251417e2 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -1966,7 +1966,6 @@ try {
                     jobName = $job.name
                     errorSnippet = ""
                     helixWorkItems = @()
-                    knownIssues = @()
                     errorCategory = "unclassified"
                 }
 

From 41300531cfc36797c6e3543cfc2eaaae1edc565c Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 10:26:17 -0600
Subject: [PATCH 24/44] ci-analysis: MCP-first patterns from live analysis

- Timeout recovery: explicit workflow for verifying timed-out builds
  have passing Helix results (learned from PR 124125 build 1284169)
- Build progression: MCP tools (get_builds, get_build_log_by_id) as
  primary, az CLI as fallback
- Delegation: Pattern 5 for parallel target HEAD extraction, Pattern 1
  uses hlx_logs MCP instead of script
- Tighten canceled job guidance with actionable verification steps
---
 .github/skills/ci-analysis/SKILL.md           |  4 +-
 .../references/build-progression-analysis.md  | 60 +++++++++----------
 .../references/delegation-patterns.md         | 22 ++++++-
 3 files changed, 53 insertions(+), 33 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 0dc907b1de2909..280a03e7d77f98 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -100,9 +100,9 @@ The script operates in three distinct modes depending on what information you ha
 
 **Known Issues section**: Failures matching existing GitHub issues - these are tracked and being investigated.
 
-**Canceled jobs**: Jobs that were canceled (not failed) due to earlier stage failures or timeouts. Dependency-canceled jobs (canceled because an earlier stage failed) don't need investigation. Timeout-canceled jobs may still have recoverable Helix results — see "Recovering Results from Canceled Jobs" below.
+**Canceled/timed-out jobs**: Jobs canceled due to earlier stage failures or AzDO timeouts. Dependency-canceled jobs don't need investigation. **Timeout-canceled jobs may have all-passing Helix results** — the "failure" is just the AzDO job wrapper timing out, not actual test failures. To verify: use `hlx_status` on each Helix job in the timed-out build. If all work items passed, the build effectively passed.
 
-> ❌ **Don't dismiss canceled jobs.** Timeout-canceled jobs may have passing Helix results that prove the "failure" was just an AzDO timeout wrapper issue.
+> ❌ **Don't dismiss timed-out builds.** A build marked "failed" due to a 3-hour AzDO timeout can have 100% passing Helix work items. Check before concluding it failed.
 
 **PR Change Correlation**: Files changed by PR appearing in failures - likely PR-related.
 
diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
index 6062de9672e657..eadd241d9682bc 100644
--- a/.github/skills/ci-analysis/references/build-progression-analysis.md
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -18,31 +18,29 @@ On large PRs, the user is usually iterating toward a solution. The recent builds
 
 ### Step 1: List builds for the PR
 
-`gh pr checks` only shows checks for the current HEAD SHA. To see the full build history, query AzDO:
+`gh pr checks` only shows checks for the current HEAD SHA. To see the full build history, use AzDO MCP or CLI:
 
+**With AzDO MCP (preferred):**
+```
+azure-devops-pipelines_get_builds with:
+  project: "public"
+  branchName: "refs/pull/{PR}/merge"
+  top: 20
+  queryOrder: "QueueTimeDescending"
+```
+
+The response includes `triggerInfo` with `pr.sourceSha` — the PR's HEAD commit for each build.
+
+**Without MCP (fallback):**
 ```powershell
 $org = "https://dev.azure.com/dnceng-public"
 $project = "public"
-az pipelines runs list --branch "refs/pull/{PR}/merge" --top 20 --org $org -p $project `
-    --query "[].{id:id, result:result, sourceVersion:sourceVersion, finishTime:finishTime}" -o table
+az pipelines runs list --branch "refs/pull/{PR}/merge" --top 20 --org $org -p $project -o json
 ```
 
 ### Step 2: Map builds to the PR's head commit
 
-Each build's `triggerInfo` contains `pr.sourceSha` — the PR's HEAD commit when the build was triggered. This is the key for mapping builds to commits:
-
-```powershell
-# Get full build details including triggerInfo
-$allBuilds = az pipelines runs list --branch "refs/pull/{PR}/merge" --top 20 `
-    --org $org -p $project -o json | ConvertFrom-Json
-
-# Extract PR head commit for each build
-foreach ($build in $allBuilds) {
-    $prHead = $build.triggerInfo.'pr.sourceSha'
-    $short = if ($prHead) { $prHead.Substring(0,7) } else { "n/a" }
-    Write-Host "Build $($build.id): $($build.result) — PR HEAD: $short"
-}
-```
+Each build's `triggerInfo` contains `pr.sourceSha` — the PR's HEAD commit when the build was triggered. Extract it from the `get_builds` response or the `az` JSON output.
 
 > ⚠️ **`sourceVersion` is the merge commit**, not the PR's head commit. Use `triggerInfo.'pr.sourceSha'` instead.
 
@@ -50,29 +48,31 @@ foreach ($build in $allBuilds) {
 
 ### Step 2b: Extract the target branch HEAD from checkout logs
 
-The AzDO build API doesn't expose the target branch SHA, but the checkout task log contains it. The first checkout log (typically log ID 5) ends with:
+The AzDO build API doesn't expose the target branch SHA. Extract it from the checkout task log.
 
+**With AzDO MCP (preferred):**
 ```
-HEAD is now at {mergeCommit} Merge {prSourceSha} into {targetBranchHead}
+azure-devops-pipelines_get_build_log_by_id with:
+  project: "public"
+  buildId: {BUILD_ID}
+  logId: 5
+  startLine: 500
 ```
 
-Extract it programmatically:
+Search the output for the merge line:
+```
+HEAD is now at {mergeCommit} Merge {prSourceSha} into {targetBranchHead}
+```
 
+**Without MCP (fallback):**
 ```powershell
 $token = az account get-access-token --resource "499b84ac-1321-427f-aa17-267ca6975798" --query accessToken -o tsv
 $headers = @{ Authorization = "Bearer $token" }
-
-foreach ($build in $allBuilds) {
-    $logUrl = "https://dev.azure.com/{org}/{project}/_apis/build/builds/$($build.id)/logs/5"
-    $log = Invoke-RestMethod -Uri $logUrl -Headers $headers
-    $mergeLine = ($log -split "`n") | Where-Object { $_ -match 'Merge \w+ into \w+' } | Select-Object -First 1
-    if ($mergeLine -match 'Merge (\w+) into (\w+)') {
-        $targetHead = $Matches[2].Substring(0, 7)
-    }
-}
+$logUrl = "https://dev.azure.com/{org}/{project}/_apis/build/builds/{BUILD_ID}/logs/5"
+$log = Invoke-RestMethod -Uri $logUrl -Headers $headers
 ```
 
-> Note: log ID 5 is the first checkout task in most pipelines. If it doesn't contain the merge line, check the build timeline for tasks named "Checkout" and use their log IDs.
+> Note: log ID 5 is the first checkout task in most pipelines. The merge line is typically around line 500-650. If log 5 doesn't contain it, check the build timeline for "Checkout" tasks.
 
 Note: a PR may have more unique `pr.sourceSha` values than commits visible on GitHub, because force-pushes replace the commit history. Each force-push triggers a new build with a new merge commit and a new `pr.sourceSha`.
 
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index ee1e6527ed1370..1a363de9af33e8 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -13,7 +13,8 @@ Extract all unique test failures from these Helix work items:
 Job: {JOB_ID_1}, Work items: {ITEM_1}, {ITEM_2}
 Job: {JOB_ID_2}, Work items: {ITEM_3}
 
-For each, run:
+For each, use hlx_logs with jobId and workItem to get console output.
+If hlx MCP is not available, fall back to:
   ./scripts/Get-CIStatus.ps1 -HelixJob "{JOB}" -WorkItem "{ITEM}"
 
 Extract lines ending with [FAIL] (xUnit format). Ignore [OUTPUT] and [PASS] lines.
@@ -94,6 +95,25 @@ Or: { "jobName": "...", "hasResults": false, "reason": "no artifacts" }
 
 This pattern scales to any number of builds — launch N subagents for N builds, collect results, compare.
 
+## Pattern 5: Build Progression with Target HEAD Extraction
+
+**When:** PR has multiple builds and you need the full progression table with target branch HEADs.
+
+**Delegate (one subagent per build):**
+```
+Extract the target branch HEAD from AzDO build {BUILD_ID}.
+
+Use azure-devops-pipelines_get_build_log_by_id with:
+  project: "public", buildId: {BUILD_ID}, logId: 5, startLine: 500
+
+Search for: "HEAD is now at {mergeCommit} Merge {prSourceSha} into {targetBranchHead}"
+
+Return JSON: { "buildId": N, "targetHead": "abc1234", "mergeCommit": "def5678" }
+Or: { "buildId": N, "targetHead": null, "error": "merge line not found in log 5" }
+```
+
+Launch one per build in parallel. The main agent combines with `get_builds` results to build the full progression table.
+
 ## General Guidelines
 
 - **Use `task` agent type** — it has shell + MCP access

From e32a08e002026639075a16607c9dac0db0014f46 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 10:51:51 -0600
Subject: [PATCH 25/44] ci-analysis: fix review findings from multi-model audit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Fix nonexistent 'msbuild-mcp analyze' tool refs in manual-investigation.md,
  replace with actual mcp-binlog-tool-* tools
- Remove interpretive prose from script ('likely PR-related') — agent reasons
- Fix empty catch {} swallowing merge state errors — now Write-Verbose
- Fix imprecise 'binlog.mcp' reference in SKILL.md tips
---
 .github/skills/ci-analysis/SKILL.md                        | 2 +-
 .../skills/ci-analysis/references/manual-investigation.md  | 7 ++++---
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1        | 4 ++--
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 280a03e7d77f98..fc532925849d44 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -230,6 +230,6 @@ Before stating a failure's cause, verify your claim:
 1. Check if same test fails on the target branch before assuming transient
 2. Look for `[ActiveIssue]` attributes for known skipped tests
 3. Use `-SearchMihuBot` for semantic search of related issues
-4. Use the MSBuild MCP server (`binlog.mcp`) to search binlogs for Helix job IDs, build errors, and properties
+4. Use the binlog MCP tools (`mcp-binlog-tool-*`) to search binlogs for Helix job IDs, build errors, and properties
 5. `gh pr checks --json` valid fields: `bucket`, `completedAt`, `description`, `event`, `link`, `name`, `startedAt`, `state`, `workflow` — no `conclusion` field, `state` has `SUCCESS`/`FAILURE` directly
 6. "Canceled" ≠ "Failed" — canceled jobs may have recoverable Helix results. Check artifacts before concluding results are lost.
diff --git a/.github/skills/ci-analysis/references/manual-investigation.md b/.github/skills/ci-analysis/references/manual-investigation.md
index ea3e82fb589198..6c1e25c5c2c7fb 100644
--- a/.github/skills/ci-analysis/references/manual-investigation.md
+++ b/.github/skills/ci-analysis/references/manual-investigation.md
@@ -73,10 +73,11 @@ Binlogs contain detailed MSBuild execution traces for diagnosing:
 - NuGet restore problems
 - Target execution order issues
 
-**Using MSBuild MCP Server:**
+**Using MSBuild binlog MCP tools:**
 ```
-msbuild-mcp analyze --binlog path/to/build.binlog --errors
-msbuild-mcp analyze --binlog path/to/build.binlog --target ResolveReferences
+mcp-binlog-tool-load_binlog path: "path/to/build.binlog"
+mcp-binlog-tool-get_diagnostics binlog_file: "path/to/build.binlog"
+mcp-binlog-tool-search_binlog binlog_file: "path/to/build.binlog" query: "$error"
 ```
 
 **Manual Analysis:**
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 97f876251417e2..4b2e4a00dfae30 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -364,7 +364,7 @@ function Get-AzDOBuildIdFromPR {
     $prMergeState = $null
     try {
         $prMergeState = gh api "repos/$Repository/pulls/$PR" --jq '.mergeable_state' 2>$null
-    } catch {}
+    } catch { Write-Verbose "Could not determine PR merge state: $_" }
 
     # Find ALL failing Azure DevOps builds
     $failingBuilds = @{}
@@ -621,7 +621,7 @@ function Show-PRCorrelationSummary {
             }
         }
 
-        Write-Host "`nThese failures are likely PR-related." -ForegroundColor Yellow
+        Write-Host "`nCorrelated files found — check JSON summary for details." -ForegroundColor Yellow
     }
 }
 

From 417a1388ff6b20490cebf33b1aa7e3863634a307 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 11:03:32 -0600
Subject: [PATCH 26/44] ci-analysis: trim mergeable_state output to fix
 whitespace comparison

---
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 4b2e4a00dfae30..b7402d5c81be6c 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -363,7 +363,8 @@ function Get-AzDOBuildIdFromPR {
     # Check if PR has merge conflicts (no CI runs when mergeable_state is dirty)
     $prMergeState = $null
     try {
-        $prMergeState = gh api "repos/$Repository/pulls/$PR" --jq '.mergeable_state' 2>$null
+        $prMergeState = (gh api "repos/$Repository/pulls/$PR" --jq '.mergeable_state' 2>$null)
+        if ($prMergeState) { $prMergeState = $prMergeState.Trim() }
     } catch { Write-Verbose "Could not determine PR merge state: $_" }
 
     # Find ALL failing Azure DevOps builds

From 658d1bb509da6df04f890eeb8f064b96777b2204 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 11:14:31 -0600
Subject: [PATCH 27/44] ci-analysis: consistent JSON schema and canonical MCP
 tool names

- Add failedJobDetails to early-exit summary (NO_BUILDS/MERGE_CONFLICTS)
  for consistent JSON shape across all exit paths
- Align binlog MCP tool names to canonical mcp-binlog-tool-* prefix in
  binlog-comparison.md and delegation-patterns.md
---
 .../references/binlog-comparison.md           | 24 +++++++++----------
 .../references/delegation-patterns.md         |  6 ++---
 .../ci-analysis/scripts/Get-CIStatus.ps1      |  1 +
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/.github/skills/ci-analysis/references/binlog-comparison.md b/.github/skills/ci-analysis/references/binlog-comparison.md
index b9a06c9cbea9b5..15a5df3460a745 100644
--- a/.github/skills/ci-analysis/references/binlog-comparison.md
+++ b/.github/skills/ci-analysis/references/binlog-comparison.md
@@ -26,8 +26,8 @@ When the failing work item's Helix job ID isn't visible (e.g., canceled jobs, or
    ```
 2. Load the binlog and search for job IDs:
    ```
-   binlog-load_binlog  path:"$env:TEMP\artifact\...\SendToHelix.binlog"
-   binlog-search_binlog  binlog_file:"..."  query:"Sent Helix Job"
+   mcp-binlog-tool-load_binlog  path:"$env:TEMP\artifact\...\SendToHelix.binlog"
+   mcp-binlog-tool-search_binlog  binlog_file:"..."  query:"Sent Helix Job"
    ```
 3. Query each Helix job GUID with the CI script:
    ```
@@ -50,9 +50,9 @@ Download the msbuild.binlog from Helix job {JOB_ID} work item {WORK_ITEM}.
 Use the CI skill script to get the artifact URL:
   ./scripts/Get-CIStatus.ps1 -HelixJob "{JOB_ID}" -WorkItem "{WORK_ITEM}"
 Download the binlog URL to $env:TEMP\{label}.binlog.
-Load it with the binlog MCP server (binlog-load_binlog).
-Search for the {TASK_NAME} task (binlog-search_tasks_by_name).
-Get full task details (binlog-list_tasks_in_target) for the target containing the task.
+Load it with the binlog MCP server (mcp-binlog-tool-load_binlog).
+Search for the {TASK_NAME} task (mcp-binlog-tool-search_tasks_by_name).
+Get full task details (mcp-binlog-tool-list_tasks_in_target) for the target containing the task.
 Extract the CommandLineArguments parameter value.
 Normalize paths:
   - Replace Helix work dirs (/datadisks/disk1/work/XXXXXXXX) with {W}
@@ -71,23 +71,23 @@ With two normalized arg lists, `Compare-Object` instantly reveals the difference
 
 ## Useful Binlog MCP Queries
 
-After loading a binlog with `binlog-load_binlog`, use these queries (pass the loaded path as `binlog_file`):
+After loading a binlog with `mcp-binlog-tool-load_binlog`, use these queries (pass the loaded path as `binlog_file`):
 
 ```
 # Find all invocations of a specific task
-binlog-search_tasks_by_name  binlog_file:"$env:TEMP\my.binlog"  taskName:"Csc"
+mcp-binlog-tool-search_tasks_by_name  binlog_file:"$env:TEMP\my.binlog"  taskName:"Csc"
 
 # Search for a property value
-binlog-search_binlog  binlog_file:"..."  query:"analysislevel"
+mcp-binlog-tool-search_binlog  binlog_file:"..."  query:"analysislevel"
 
 # Find what happened inside a specific target
-binlog-search_binlog  binlog_file:"..."  query:"under($target AddGlobalAnalyzerConfigForPackage_MicrosoftCodeAnalysisNetAnalyzers)"
+mcp-binlog-tool-search_binlog  binlog_file:"..."  query:"under($target AddGlobalAnalyzerConfigForPackage_MicrosoftCodeAnalysisNetAnalyzers)"
 
 # Get all properties matching a pattern
-binlog-search_binlog  binlog_file:"..."  query:"GlobalAnalyzerConfig"
+mcp-binlog-tool-search_binlog  binlog_file:"..."  query:"GlobalAnalyzerConfig"
 
 # List tasks in a target (returns full parameter details including CommandLineArguments)
-binlog-list_tasks_in_target  binlog_file:"..."  projectId:22  targetId:167
+mcp-binlog-tool-list_tasks_in_target  binlog_file:"..."  projectId:22  targetId:167
 ```
 
 ## Path Normalization
@@ -141,4 +141,4 @@ Same MSBuild property resolution + different files on disk = different build beh
 
 > ❌ **Don't assume the MSBuild property diff explains the behavior diff.** Two branches can compute identical property values but produce different outputs because of different files on disk, different NuGet packages, or different task assemblies. Compare the actual task invocation.
 
-> ❌ **Don't load large binlogs and browse them interactively in main context.** Use targeted searches: `binlog-search_tasks_by_name` for a specific task, `binlog-search_binlog` with a focused query. Get in, get the data, get out.
+> ❌ **Don't load large binlogs and browse them interactively in main context.** Use targeted searches: `mcp-binlog-tool-search_tasks_by_name` for a specific task, `mcp-binlog-tool-search_binlog` with a focused query. Get in, get the data, get out.
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index 1a363de9af33e8..d233a6a3a95d77 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -74,9 +74,9 @@ Download and analyze binlog from AzDO build {BUILD_ID}, artifact {ARTIFACT_NAME}
 
 Steps:
 1. Download the artifact (see azure-cli.md)
-2. Load: binlog-load_binlog path:"{BINLOG_PATH}"
-3. Find tasks: binlog-search_tasks_by_name taskName:"Csc"
-4. Get task parameters: binlog-get_task_info
+2. Load: mcp-binlog-tool-load_binlog path:"{BINLOG_PATH}"
+3. Find tasks: mcp-binlog-tool-search_tasks_by_name taskName:"Csc"
+4. Get task parameters: mcp-binlog-tool-get_task_info
 
 Return JSON: { "buildId": N, "project": "...", "args": ["..."] }
 ```
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index b7402d5c81be6c..ef4d3a186e40f1 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -1736,6 +1736,7 @@ try {
                     total = 0; succeeded = 0; failed = 0; canceled = 0; pending = 0; warnings = 0; skipped = 0
                 }
                 failedJobNames = @()
+                failedJobDetails = @()
                 canceledJobNames = @()
                 knownIssues = @()
                 prCorrelation = [ordered]@{

From b89a53e2c3b64b9dff5bf64aff34f5b9f7eeb5a6 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 15:12:34 -0600
Subject: [PATCH 28/44] Address Copilot review: errorCategory precedence, gh
 exit code handling, MCP tool syntax

- Add precedence to errorCategory classification (crash > timeout > reporter-failed > test-failure)
  so iterating multiple Helix logs doesn't downgrade a crash to a lesser category
- Replace try/catch with LASTEXITCODE check for gh api merge state detection
  since native command failures don't throw in PowerShell
- Align MCP tool argument formatting in manual-investigation.md to canonical style
---
 .../references/manual-investigation.md        |  6 +++---
 .../ci-analysis/scripts/Get-CIStatus.ps1      | 20 +++++++++++++------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/.github/skills/ci-analysis/references/manual-investigation.md b/.github/skills/ci-analysis/references/manual-investigation.md
index 6c1e25c5c2c7fb..b0958f7dc3a82a 100644
--- a/.github/skills/ci-analysis/references/manual-investigation.md
+++ b/.github/skills/ci-analysis/references/manual-investigation.md
@@ -75,9 +75,9 @@ Binlogs contain detailed MSBuild execution traces for diagnosing:
 
 **Using MSBuild binlog MCP tools:**
 ```
-mcp-binlog-tool-load_binlog path: "path/to/build.binlog"
-mcp-binlog-tool-get_diagnostics binlog_file: "path/to/build.binlog"
-mcp-binlog-tool-search_binlog binlog_file: "path/to/build.binlog" query: "$error"
+mcp-binlog-tool-load_binlog path:"path/to/build.binlog"
+mcp-binlog-tool-get_diagnostics binlog_file:"path/to/build.binlog"
+mcp-binlog-tool-search_binlog binlog_file:"path/to/build.binlog" query:"$error"
 ```
 
 **Manual Analysis:**
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index ef4d3a186e40f1..c019d01425c057 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -362,10 +362,13 @@ function Get-AzDOBuildIdFromPR {
 
     # Check if PR has merge conflicts (no CI runs when mergeable_state is dirty)
     $prMergeState = $null
-    try {
-        $prMergeState = (gh api "repos/$Repository/pulls/$PR" --jq '.mergeable_state' 2>$null)
-        if ($prMergeState) { $prMergeState = $prMergeState.Trim() }
-    } catch { Write-Verbose "Could not determine PR merge state: $_" }
+    $prMergeStateOutput = & gh api "repos/$Repository/pulls/$PR" --jq '.mergeable_state' 2>$null
+    $ghMergeStateExitCode = $LASTEXITCODE
+    if ($ghMergeStateExitCode -eq 0 -and $prMergeStateOutput) {
+        $prMergeState = $prMergeStateOutput.Trim()
+    } else {
+        Write-Verbose "Could not determine PR merge state (gh exit code $ghMergeStateExitCode)."
+    }
 
     # Find ALL failing Azure DevOps builds
     $failingBuilds = @{}
@@ -2028,11 +2031,16 @@ try {
                                             if ($failureInfo -match 'Timed Out \(timeout') {
                                                 $jobDetail.errorCategory = "test-timeout"
                                             } elseif ($failureInfo -match 'Exit Code:\s*(139|134)' -or $failureInfo -match 'createdump') {
-                                                $jobDetail.errorCategory = "crash"
+                                                # Crash takes highest precedence — don't downgrade
+                                                if ($jobDetail.errorCategory -notin @("crash")) {
+                                                    $jobDetail.errorCategory = "crash"
+                                                }
                                             } elseif ($failureInfo -match 'Traceback \(most recent call last\)' -and $helixLog -match 'Tests run:.*Failures:\s*0') {
                                                 # Work item failed (non-zero exit from reporter crash) but all tests passed.
                                                 # The Python traceback is from Helix infrastructure, not from the test itself.
-                                                $jobDetail.errorCategory = "tests-passed-reporter-failed"
+                                                if ($jobDetail.errorCategory -notin @("crash", "test-timeout")) {
+                                                    $jobDetail.errorCategory = "tests-passed-reporter-failed"
+                                                }
                                             } elseif ($jobDetail.errorCategory -eq "unclassified") {
                                                 $jobDetail.errorCategory = "test-failure"
                                             }

From 8851a6dd2b4e5d2c7272051220522f0b94626f07 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 16:37:29 -0600
Subject: [PATCH 29/44] ci-analysis: enforce Build Analysis check status in
 recommendations

The Build Analysis GitHub check is green only when every failure matches
a known issue. When it's red, at least one failure is unaccounted for.

Updated guidance in four places to prevent the anti-pattern of seeing
known issues in the script output and concluding 'all failures are known'
without checking whether Build Analysis itself is green:

- Added explicit Build Analysis check status interpretation
- KNOWN_ISSUES_DETECTED hint now warns it doesn't mean full coverage
- Step 2 item 1 now requires green/red check before concluding
- 'Safe to retry' verification requires per-job known issue mapping
---
 .github/skills/ci-analysis/SKILL.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index fc532925849d44..cc6e6b5fb21f94 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -100,6 +100,8 @@ The script operates in three distinct modes depending on what information you ha
 
 **Known Issues section**: Failures matching existing GitHub issues - these are tracked and being investigated.
 
+**Build Analysis check status**: The "Build Analysis" GitHub check is **green** only when *every* failure is matched to a known issue. If it's **red**, at least one failure is unaccounted for — do NOT claim "all failures are known issues" just because some known issues were found. You must verify each failing job is covered by a specific known issue before calling it safe to retry.
+
 **Canceled/timed-out jobs**: Jobs canceled due to earlier stage failures or AzDO timeouts. Dependency-canceled jobs don't need investigation. **Timeout-canceled jobs may have all-passing Helix results** — the "failure" is just the AzDO job wrapper timing out, not actual test failures. To verify: use `hlx_status` on each Helix job in the timed-out build. If all work items passed, the build effectively passed.
 
 > ❌ **Don't dismiss timed-out builds.** A build marked "failed" due to a 3-hour AzDO timeout can have 100% passing Helix work items. Check before concluding it failed.
@@ -131,7 +133,7 @@ Read `recommendationHint` as a starting point, then layer in context:
 | Hint | Action |
 |------|--------|
 | `BUILD_SUCCESSFUL` | No failures. Confirm CI is green. |
-| `KNOWN_ISSUES_DETECTED` | Known tracked issues found. Recommend retry if failures match known issues. Link the issues. |
+| `KNOWN_ISSUES_DETECTED` | Known tracked issues found — but this does NOT mean all failures are covered. Check the Build Analysis check status: if it's red, some failures are unmatched. Only recommend retry for failures that specifically match a known issue; investigate the rest. |
 | `LIKELY_PR_RELATED` | Failures correlate with PR changes. Lead with "fix these before retrying" and list `correlatedFiles`. |
 | `POSSIBLY_TRANSIENT` | Failures could not be automatically classified — does NOT mean they are transient. Use `failedJobDetails` to investigate each failure individually. |
 | `REVIEW_REQUIRED` | Could not auto-determine cause. Review failures manually. |
@@ -189,13 +191,13 @@ Run with `-ShowLogs` for detailed failure info.
 
 ### Step 2: Analyze results
 
-1. **Check Build Analysis** — Known issues are safe to retry
+1. **Check Build Analysis** — If the Build Analysis GitHub check is **green**, all failures matched known issues and it's safe to retry. If it's **red**, some failures are unaccounted for — you must identify which failing jobs are covered by known issues and which are not. Never say "all failures are known issues" when Build Analysis is red.
 2. **Correlate with PR changes** — Same files failing = likely PR-related
 3. **Compare with baseline** — If a test passes on the target branch but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
 4. **Check build progression** — If the PR has multiple builds (multiple pushes), check whether earlier builds passed. A failure that appeared after a specific push narrows the investigation to those commits. See [references/build-progression-analysis.md](references/build-progression-analysis.md). Present findings as facts, not fix recommendations.
 5. **Interpret patterns** (but don't jump to conclusions):
    - Same error across many jobs → Real code issue
-   - Build Analysis flags a known issue → Safe to retry
+   - Build Analysis flags a known issue → That *specific failure* is safe to retry (but others may not be)
    - Failure is **not** in Build Analysis → Investigate further before assuming transient
    - Device failures, Docker pulls, network timeouts → *Could* be infrastructure, but verify against the target branch first
    - Test timeout but tests passed → Executor issue, not test failure
@@ -212,7 +214,7 @@ Before stating a failure's cause, verify your claim:
 - **"Infrastructure failure"** → Did Build Analysis flag it? Does the same test pass on the target branch? If neither, don't call it infrastructure.
 - **"Transient/flaky"** → Has it failed before? Is there a known issue? A single non-reproducing failure isn't enough to call it flaky.
 - **"PR-related"** → Do the changed files actually relate to the failing test? Correlation in the script output is heuristic, not proof.
-- **"Safe to retry"** → Are ALL failures accounted for (known issues or infrastructure), or are you ignoring some?
+- **"Safe to retry"** → Are ALL failures accounted for (known issues or infrastructure), or are you ignoring some? Check the Build Analysis check status — if it's red, not all failures are matched. Map each failing job to a specific known issue before concluding "safe to retry."
 - **"Not related to this PR"** → Have you checked if the test passes on the target branch? Don't assume — verify.
 
 ## References

From 6923d0e7711ecdb811dafe1175e959a356af14d4 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 17:06:35 -0600
Subject: [PATCH 30/44] ci-analysis: add crash/canceled job recovery procedure
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Helix work items with exit code -4 (Crash) often have all tests passing
in testResults.xml — the 'crash' is the wrapper timing out after test
completion, not actual test failures.

Added:
- Warning that crash category needs verification before concluding failure
- Step-by-step recovery procedure: find Helix job IDs from AzDO logs,
  check hlx_batch_status, download testResults.xml for crashed items,
  parse XML to determine if tests actually passed
- Verdict guide: crash-with-passing-results = infrastructure,
  failed > 0 in XML = real failures, no XML = investigate further
---
 .github/skills/ci-analysis/SKILL.md | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index cc6e6b5fb21f94..611bf704e8ef2e 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -116,12 +116,34 @@ The script operates in three distinct modes depending on what information you ha
 
 **Per-failure details** (`failedJobDetails` in JSON): Each failed job includes `errorCategory`, `errorSnippet`, and `helixWorkItems`. Use these for per-job classification instead of applying a single `recommendationHint` to all failures.
 
-Error categories: `test-failure`, `build-error`, `test-timeout`, `crash` (exit codes 139/134), `tests-passed-reporter-failed` (all tests passed but reporter crashed — genuinely infrastructure), `unclassified` (investigate manually).
+Error categories: `test-failure`, `build-error`, `test-timeout`, `crash` (exit codes 139/134/-4), `tests-passed-reporter-failed` (all tests passed but reporter crashed — genuinely infrastructure), `unclassified` (investigate manually).
+
+> ⚠️ **`crash` does NOT always mean tests failed.** Exit code -4 often means the Helix work item wrapper timed out *after* tests completed. Always check `testResults.xml` before concluding a crash is a real failure. See [Recovering Results from Crashed/Canceled Jobs](#recovering-results-from-crashedcanceled-jobs).
 
 > ⚠️ **Be cautious labeling failures as "infrastructure."** Only conclude infrastructure with strong evidence: Build Analysis match, identical failure on target branch, or confirmed outage. Exception: `tests-passed-reporter-failed` is genuinely infrastructure.
 
 > ❌ **Missing packages on flow PRs ≠ infrastructure.** Flow PRs can cause builds to request *different* packages. Check *which* package and *why* before assuming feed delay.
 
+### Recovering Results from Crashed/Canceled Jobs
+
+When an AzDO job is canceled (timeout) or Helix work items show `Crash` (exit code -4), the tests may have actually passed. Follow this procedure:
+
+1. **Find the Helix job IDs** — Read the AzDO "Send to Helix" step log (use `azure-devops-pipelines_get_build_log_by_id`) and search for lines containing `Sent Helix Job`. Extract the job GUIDs.
+
+2. **Check Helix job status** — Use `hlx_batch_status` (batches of 4) or `hlx_status` per job. Look at `failedCount` vs `passedCount`.
+
+3. **For work items marked Crash/Failed** — Use `hlx_files` to check if `testResults.xml` was uploaded. If it exists:
+   - Download it with `hlx_download_url`
+   - Parse the XML: `total`, `passed`, `failed` attributes on the `<assembly>` element
+   - If `failed=0` and `passed > 0`, the tests passed — the "crash" is the wrapper timing out after test completion
+
+4. **Verdict**:
+   - All work items passed or crash-with-passing-results → **Tests effectively passed.** The failure is infrastructure (wrapper timeout).
+   - Some work items have `failed > 0` in testResults.xml → **Real test failures.** Investigate those specific tests.
+   - No testResults.xml uploaded → Tests may not have run at all. Check console logs for errors.
+
+> This pattern is common with long-running test suites (e.g., WasmBuildTests) where tests complete but the Helix work item wrapper exceeds its timeout during result upload or cleanup.
+
 ## Generating Recommendations
 
 After the script outputs the `[CI_ANALYSIS_SUMMARY]` JSON block, **you** synthesize recommendations. Do not parrot the JSON — reason over it.

From c405efc68b1ce8b6760db7d523662b4f6350fec1 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 18:01:28 -0600
Subject: [PATCH 31/44] ci-analysis: document MCP tool limitation for subagents

Subagents (task, explore, general-purpose) don't have access to MCP
tools (hlx-*, azure-devops-*, mcp-binlog-tool-*). Updated delegation
patterns to reflect this:

- General guidelines: clarify task agents have shell but not MCP
- Added guidance that MCP-dependent work stays in main agent
- Pattern 4 (canceled job recovery): main agent downloads artifacts,
  subagents only parse already-downloaded files
---
 .../ci-analysis/references/delegation-patterns.md      | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index d233a6a3a95d77..a0a8a7f92f1378 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -86,11 +86,12 @@ Return JSON: { "buildId": N, "project": "...", "args": ["..."] }
 Check if canceled job "{JOB_NAME}" from build {BUILD_ID} has recoverable Helix results.
 
 Steps:
-1. Download the testResults.xml artifact from the canceled job (see azure-cli.md)
-2. If available, parse for pass/fail counts and work item status
+1. Use hlx-hlx_files with jobId:"{HELIX_JOB_ID}" workItem:"{WORK_ITEM}" to find testResults.xml
+2. Download with hlx-hlx_download_url using the testResults.xml URI
+3. Parse the XML for pass/fail counts on the <assembly> element
 
 Return JSON: { "jobName": "...", "hasResults": true, "passed": N, "failed": N }
-Or: { "jobName": "...", "hasResults": false, "reason": "no artifacts" }
+Or: { "jobName": "...", "hasResults": false, "reason": "no testResults.xml uploaded" }
 ```
 
 This pattern scales to any number of builds — launch N subagents for N builds, collect results, compare.
@@ -116,10 +117,11 @@ Launch one per build in parallel. The main agent combines with `get_builds` resu
 
 ## General Guidelines
 
-- **Use `task` agent type** — it has shell + MCP access
+- **Use `general-purpose` agent type** — it has shell + MCP access (hlx-*, azure-devops-*, mcp-binlog-tool-*)
 - **Run independent tasks in parallel** — the whole point of delegation
 - **Include script paths** — subagents don't inherit skill context
 - **Require structured JSON output** — enables comparison across subagents
 - **Don't delegate interpretation** — subagents return facts, main agent reasons
 - **STOP on errors** — subagents should return error details immediately, not troubleshoot auth/environment issues
 - **Use SQL for many results** — when launching 5+ subagents or doing multi-phase delegation, store results in a SQL table (`CREATE TABLE results (agent_id TEXT, build_id INT, data TEXT, status TEXT)`) so you can query across all results instead of holding them in context
+- **Specify `model: "claude-sonnet-4"` for MCP-heavy tasks** — default model may time out on multi-step MCP tool chains

From 5e2d427c934224e6487567bb3be8c8e65f6b05fd Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 20:23:19 -0600
Subject: [PATCH 32/44] Address review: truncation metadata, tool name
 consistency, hlx_batch_status clarification

---
 .github/skills/ci-analysis/SKILL.md                             | 2 +-
 .../skills/ci-analysis/references/build-progression-analysis.md | 2 +-
 .github/skills/ci-analysis/references/delegation-patterns.md    | 2 +-
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1             | 1 +
 4 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 611bf704e8ef2e..ac3df855bb3ef1 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -130,7 +130,7 @@ When an AzDO job is canceled (timeout) or Helix work items show `Crash` (exit co
 
 1. **Find the Helix job IDs** — Read the AzDO "Send to Helix" step log (use `azure-devops-pipelines_get_build_log_by_id`) and search for lines containing `Sent Helix Job`. Extract the job GUIDs.
 
-2. **Check Helix job status** — Use `hlx_batch_status` (batches of 4) or `hlx_status` per job. Look at `failedCount` vs `passedCount`.
+2. **Check Helix job status** — Use `hlx_batch_status` (accepts comma-separated job IDs) or `hlx_status` per job. Look at `failedCount` vs `passedCount`.
 
 3. **For work items marked Crash/Failed** — Use `hlx_files` to check if `testResults.xml` was uploaded. If it exists:
    - Download it with `hlx_download_url`
diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
index eadd241d9682bc..bee6783538108e 100644
--- a/.github/skills/ci-analysis/references/build-progression-analysis.md
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -40,7 +40,7 @@ az pipelines runs list --branch "refs/pull/{PR}/merge" --top 20 --org $org -p $p
 
 ### Step 2: Map builds to the PR's head commit
 
-Each build's `triggerInfo` contains `pr.sourceSha` — the PR's HEAD commit when the build was triggered. Extract it from the `get_builds` response or the `az` JSON output.
+Each build's `triggerInfo` contains `pr.sourceSha` — the PR's HEAD commit when the build was triggered. Extract it from the `azure-devops-pipelines_get_builds` response or the `az` JSON output.
 
 > ⚠️ **`sourceVersion` is the merge commit**, not the PR's head commit. Use `triggerInfo.'pr.sourceSha'` instead.
 
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index a0a8a7f92f1378..a8e2a41813226f 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -113,7 +113,7 @@ Return JSON: { "buildId": N, "targetHead": "abc1234", "mergeCommit": "def5678" }
 Or: { "buildId": N, "targetHead": null, "error": "merge line not found in log 5" }
 ```
 
-Launch one per build in parallel. The main agent combines with `get_builds` results to build the full progression table.
+Launch one per build in parallel. The main agent combines with `azure-devops-pipelines_get_builds` results to build the full progression table.
 
 ## General Guidelines
 
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index c019d01425c057..9f4e149550f86c 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -2221,6 +2221,7 @@ $summary = [ordered]@{
     } }
     failedJobNames = @($allFailedJobNames)
     failedJobDetails = @($allFailedJobDetails)
+    failedJobDetailsTruncated = ($allFailedJobNames.Count -gt $allFailedJobDetails.Count)
     canceledJobNames = @($allCanceledJobNames)
     knownIssues = @($knownIssuesFromBuildAnalysis | ForEach-Object {
         [ordered]@{ number = $_.Number; title = $_.Title; url = $_.Url }

From 087ec048c9f79299a7d8ec1df060b21625ed24ca Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 20:34:56 -0600
Subject: [PATCH 33/44] Fix hlx tool name consistency in delegation-patterns

---
 .github/skills/ci-analysis/references/delegation-patterns.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index a8e2a41813226f..75e83006d49804 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -86,8 +86,8 @@ Return JSON: { "buildId": N, "project": "...", "args": ["..."] }
 Check if canceled job "{JOB_NAME}" from build {BUILD_ID} has recoverable Helix results.
 
 Steps:
-1. Use hlx-hlx_files with jobId:"{HELIX_JOB_ID}" workItem:"{WORK_ITEM}" to find testResults.xml
-2. Download with hlx-hlx_download_url using the testResults.xml URI
+1. Use hlx_files with jobId:"{HELIX_JOB_ID}" workItem:"{WORK_ITEM}" to find testResults.xml
+2. Download with hlx_download_url using the testResults.xml URI
 3. Parse the XML for pass/fail counts on the <assembly> element
 
 Return JSON: { "jobName": "...", "hasResults": true, "passed": N, "failed": N }

From bff659cce87d8529b8fd8c353966e2d83d728d98 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 20:54:28 -0600
Subject: [PATCH 34/44] Add SQL-based progression tracking to
 build-progression-analysis

---
 .../references/build-progression-analysis.md  | 42 ++++++++++++++++++-
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
index bee6783538108e..9a79d196b96a5d 100644
--- a/.github/skills/ci-analysis/references/build-progression-analysis.md
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -76,9 +76,47 @@ $log = Invoke-RestMethod -Uri $logUrl -Headers $headers
 
 Note: a PR may have more unique `pr.sourceSha` values than commits visible on GitHub, because force-pushes replace the commit history. Each force-push triggers a new build with a new merge commit and a new `pr.sourceSha`.
 
-### Step 3: Build a progression table
+### Step 3: Store progression in SQL
+
+Use the SQL tool to track builds as you discover them. This avoids losing context and enables queries across the full history:
+
+```sql
+CREATE TABLE IF NOT EXISTS build_progression (
+  build_id INT PRIMARY KEY,
+  pr_sha TEXT,
+  target_sha TEXT,
+  result TEXT,       -- passed, failed, canceled
+  queued_at TEXT,
+  failed_jobs TEXT,  -- comma-separated job names
+  notes TEXT
+);
+```
+
+Insert rows as you extract data from each build:
+
+```sql
+INSERT INTO build_progression VALUES
+  (1283986, '7af79ad', '2d638dc', 'failed', '2026-02-08T10:00:00Z', 'WasmBuildTests', 'Initial commits'),
+  (1284169, '28ec8a0', '0b691ba', 'failed', '2026-02-08T14:00:00Z', 'WasmBuildTests', 'Iteration 2'),
+  (1284433, '39dc0a6', '18a3069', 'passed', '2026-02-09T09:00:00Z', NULL, 'Iteration 3');
+```
+
+Then query to find the pass→fail transition:
+
+```sql
+-- Find where it went from passing to failing
+SELECT * FROM build_progression ORDER BY queued_at;
+
+-- Did the target branch move between pass and fail?
+SELECT pr_sha, target_sha, result FROM build_progression
+WHERE result IN ('passed', 'failed') ORDER BY queued_at;
+
+-- Which builds share the same PR SHA? (force-push detection)
+SELECT pr_sha, COUNT(*) as builds, GROUP_CONCAT(result) as results
+FROM build_progression GROUP BY pr_sha HAVING builds > 1;
+```
 
-Include the target branch HEAD to catch baseline shifts:
+Present the table to the user:
 
 | PR HEAD | Target HEAD | Builds | Result | Notes |
 |---------|-------------|--------|--------|-------|

From e2fe9c475a5e68fa11eb001be99b38a24d201425 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 20:58:16 -0600
Subject: [PATCH 35/44] Add SQL failure tracking across builds for progression
 analysis

---
 .../references/build-progression-analysis.md  | 33 +++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/.github/skills/ci-analysis/references/build-progression-analysis.md b/.github/skills/ci-analysis/references/build-progression-analysis.md
index 9a79d196b96a5d..df327454a404ac 100644
--- a/.github/skills/ci-analysis/references/build-progression-analysis.md
+++ b/.github/skills/ci-analysis/references/build-progression-analysis.md
@@ -128,6 +128,39 @@ Present the table to the user:
 
 When both `pr.sourceSha` AND `Target HEAD` change between a pass→fail transition, either could be the cause. Analyze the failure content to determine which. If only the target moved (same `pr.sourceSha`), the failure came from the new baseline.
 
+#### Tracking individual test failures across builds
+
+For deeper analysis, track which tests failed in each build:
+
+```sql
+CREATE TABLE IF NOT EXISTS build_failures (
+  build_id INT,
+  job_name TEXT,
+  test_name TEXT,
+  error_snippet TEXT,
+  helix_job TEXT,
+  work_item TEXT,
+  PRIMARY KEY (build_id, job_name, test_name)
+);
+```
+
+Insert failures as you investigate each build, then query for patterns:
+
+```sql
+-- Tests that fail in every build (persistent, not flaky)
+SELECT test_name, COUNT(DISTINCT build_id) as fail_count, GROUP_CONCAT(build_id) as builds
+FROM build_failures GROUP BY test_name HAVING fail_count > 1;
+
+-- New failures in the latest build (what changed?)
+SELECT f.* FROM build_failures f
+LEFT JOIN build_failures prev ON f.test_name = prev.test_name AND prev.build_id = {PREV_BUILD_ID}
+WHERE f.build_id = {LATEST_BUILD_ID} AND prev.test_name IS NULL;
+
+-- Flaky tests: fail in some builds, pass in others
+SELECT test_name FROM build_failures GROUP BY test_name
+HAVING COUNT(DISTINCT build_id) < (SELECT COUNT(*) FROM build_progression WHERE result = 'failed');
+```
+
 ### Step 4: Present findings, not conclusions
 
 Report what the progression shows:

From 72efae4a69269eec45828b231c4523f6a2f237bb Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 21:06:17 -0600
Subject: [PATCH 36/44] Add SQL tracking reference for failure-to-known-issue
 mapping

---
 .github/skills/ci-analysis/SKILL.md           |  3 +-
 .../ci-analysis/references/sql-tracking.md    | 63 +++++++++++++++++++
 2 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100644 .github/skills/ci-analysis/references/sql-tracking.md

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index ac3df855bb3ef1..f089309ea434c7 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -213,7 +213,7 @@ Run with `-ShowLogs` for detailed failure info.
 
 ### Step 2: Analyze results
 
-1. **Check Build Analysis** — If the Build Analysis GitHub check is **green**, all failures matched known issues and it's safe to retry. If it's **red**, some failures are unaccounted for — you must identify which failing jobs are covered by known issues and which are not. Never say "all failures are known issues" when Build Analysis is red.
+1. **Check Build Analysis** — If the Build Analysis GitHub check is **green**, all failures matched known issues and it's safe to retry. If it's **red**, some failures are unaccounted for — you must identify which failing jobs are covered by known issues and which are not. For 3+ failures, use SQL tracking to avoid missed matches (see [references/sql-tracking.md](references/sql-tracking.md)).
 2. **Correlate with PR changes** — Same files failing = likely PR-related
 3. **Compare with baseline** — If a test passes on the target branch but fails on the PR, compare Helix binlogs. See [references/binlog-comparison.md](references/binlog-comparison.md) — **delegate binlog download/extraction to subagents** to avoid burning context on mechanical work.
 4. **Check build progression** — If the PR has multiple builds (multiple pushes), check whether earlier builds passed. A failure that appeared after a specific push narrows the investigation to those commits. See [references/build-progression-analysis.md](references/build-progression-analysis.md). Present findings as facts, not fix recommendations.
@@ -247,6 +247,7 @@ Before stating a failure's cause, verify your claim:
 - **Subagent delegation patterns**: See [references/delegation-patterns.md](references/delegation-patterns.md)
 - **Azure CLI deep investigation**: See [references/azure-cli.md](references/azure-cli.md)
 - **Manual investigation steps**: See [references/manual-investigation.md](references/manual-investigation.md)
+- **SQL tracking for investigations**: See [references/sql-tracking.md](references/sql-tracking.md)
 - **AzDO/Helix details**: See [references/azdo-helix-reference.md](references/azdo-helix-reference.md)
 
 ## Tips
diff --git a/.github/skills/ci-analysis/references/sql-tracking.md b/.github/skills/ci-analysis/references/sql-tracking.md
new file mode 100644
index 00000000000000..353b4cf828260f
--- /dev/null
+++ b/.github/skills/ci-analysis/references/sql-tracking.md
@@ -0,0 +1,63 @@
+# SQL Tracking for CI Investigations
+
+Use the SQL tool to track structured data during complex investigations. This avoids losing context across tool calls and enables queries that catch mistakes (like claiming "all failures known" when some are unmatched).
+
+## Failed Job Tracking
+
+Track each failure from the script output and map it to known issues as you verify them:
+
+```sql
+CREATE TABLE IF NOT EXISTS failed_jobs (
+  build_id INT,
+  job_name TEXT,
+  error_category TEXT,   -- from failedJobDetails: test-failure, build-error, crash, etc.
+  error_snippet TEXT,
+  known_issue_url TEXT,  -- NULL if unmatched
+  known_issue_title TEXT,
+  is_pr_correlated BOOLEAN DEFAULT FALSE,
+  recovery_status TEXT DEFAULT 'not-checked',  -- effectively-passed, real-failure, no-results
+  notes TEXT,
+  PRIMARY KEY (build_id, job_name)
+);
+```
+
+### Key queries
+
+```sql
+-- Unmatched failures (Build Analysis red = these exist)
+SELECT job_name, error_category, error_snippet FROM failed_jobs
+WHERE known_issue_url IS NULL;
+
+-- Are ALL failures accounted for?
+SELECT COUNT(*) as total,
+       SUM(CASE WHEN known_issue_url IS NOT NULL THEN 1 ELSE 0 END) as matched
+FROM failed_jobs;
+
+-- Which crash/canceled jobs need recovery verification?
+SELECT job_name, build_id FROM failed_jobs
+WHERE error_category IN ('crash', 'unclassified') AND recovery_status = 'not-checked';
+
+-- PR-correlated failures (fix before retrying)
+SELECT job_name, error_snippet FROM failed_jobs WHERE is_pr_correlated = TRUE;
+```
+
+### Workflow
+
+1. After the script runs, insert one row per failed job from `failedJobDetails`
+2. For each known issue from `knownIssues`, UPDATE matching rows with the issue URL
+3. Query for unmatched failures — these need investigation
+4. For crash/canceled jobs, update `recovery_status` after checking Helix results
+
+## Build Progression
+
+See [build-progression-analysis.md](build-progression-analysis.md) for the `build_progression` and `build_failures` tables that track pass/fail across multiple builds.
+
+## When to Use SQL vs. Not
+
+| Situation | Use SQL? |
+|-----------|----------|
+| 1-2 failed jobs, all match known issues | No — straightforward, hold in context |
+| 3+ failed jobs across multiple builds | Yes — prevents missed matches |
+| Build progression with 5+ builds | Yes — see build-progression-analysis.md |
+| Crash recovery across multiple work items | Yes — cache testResults.xml findings |
+| Single build, single failure | No — overkill |

From 20cf6c7064081a2f323688532e9df2d247fcfba6 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 21:21:09 -0600
Subject: [PATCH 37/44] Fix section reference: 'Recovering Results from
 Crashed/Canceled Jobs'

---
 .github/skills/ci-analysis/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index f089309ea434c7..0139f5731506e5 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -165,7 +165,7 @@ Read `recommendationHint` as a starting point, then layer in context:
 Then layer in nuance the heuristic can't capture:
 
 - **Mixed signals**: Some failures match known issues AND some correlate with PR changes → separate them. Known issues = safe to retry; correlated = fix first.
-- **Canceled jobs with recoverable results**: If `canceledJobNames` is non-empty, mention that canceled jobs may have passing Helix results (see "Recovering Results from Canceled Jobs").
+- **Canceled jobs with recoverable results**: If `canceledJobNames` is non-empty, mention that canceled jobs may have passing Helix results (see "Recovering Results from Crashed/Canceled Jobs").
 - **Build still in progress**: If `lastBuildJobSummary.pending > 0`, note that more failures may appear.
 - **Multiple builds**: If `builds` has >1 entry, `lastBuildJobSummary` reflects only the last build — use `totalFailedJobs` for the aggregate count.
 - **BuildId mode**: `knownIssues` and `prCorrelation` won't be populated. Say "Build Analysis and PR correlation not available in BuildId mode."

From 2566a3184eb939dd0322ec909a55fd694dc5fb27 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 22:09:10 -0600
Subject: [PATCH 38/44] Add PR comment tracking pattern for deep analysis and
 PR chains

---
 .../ci-analysis/references/sql-tracking.md    | 39 +++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/.github/skills/ci-analysis/references/sql-tracking.md b/.github/skills/ci-analysis/references/sql-tracking.md
index 353b4cf828260f..3c8d62f2c6b590 100644
--- a/.github/skills/ci-analysis/references/sql-tracking.md
+++ b/.github/skills/ci-analysis/references/sql-tracking.md
@@ -52,6 +52,44 @@ SELECT job_name, error_snippet FROM failed_jobs WHERE is_pr_correlated = TRUE;
 
 See [build-progression-analysis.md](build-progression-analysis.md) for the `build_progression` and `build_failures` tables that track pass/fail across multiple builds.
 
+## PR Comment Tracking
+
+For deep-dive analysis — especially across a chain of related PRs (e.g., dependency flow failures, sequential merge PRs, or long-lived PRs with weeks of triage) — store PR comments so you can query them without re-fetching:
+
+```sql
+CREATE TABLE IF NOT EXISTS pr_comments (
+  pr_number INT,
+  repo TEXT DEFAULT 'dotnet/runtime',
+  comment_id INT PRIMARY KEY,
+  author TEXT,
+  created_at TEXT,
+  body TEXT,
+  is_triage BOOLEAN DEFAULT FALSE  -- set TRUE if comment diagnoses a failure
+);
+```
+
+### Key queries
+
+```sql
+-- What has already been diagnosed? (avoid re-investigating)
+SELECT author, created_at, substr(body, 1, 200) FROM pr_comments
+WHERE is_triage = TRUE ORDER BY created_at;
+
+-- Cross-PR: same failure discussed in multiple PRs?
+SELECT pr_number, author, substr(body, 1, 150) FROM pr_comments
+WHERE body LIKE '%BlazorWasm%' ORDER BY created_at;
+
+-- Who was asked to investigate what?
+SELECT author, substr(body, 1, 200) FROM pr_comments
+WHERE body LIKE '%PTAL%' OR body LIKE '%could you%look%';
+```
+
+### When to use
+
+- Long-lived PRs (>1 week) with 10+ comments containing triage context
+- Analyzing a chain of related PRs where earlier PRs have relevant diagnosis
+- When the same failure appears across multiple merge/flow PRs and you need to know what was already tried
+
 ## When to Use SQL vs. Not
 
 | Situation | Use SQL? |
@@ -61,3 +99,4 @@ See [build-progression-analysis.md](build-progression-analysis.md) for the `buil
 | Build progression with 5+ builds | Yes — see build-progression-analysis.md |
 | Crash recovery across multiple work items | Yes — cache testResults.xml findings |
 | Single build, single failure | No — overkill |
+| PR chain or long-lived PR with extensive triage comments | Yes — preserves diagnosis context across tool calls |

From f9d9263541350bace40c63449683d89b8261ffba Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 22:14:00 -0600
Subject: [PATCH 39/44] Fix plain-text cross-references to use markdown links

---
 .github/skills/ci-analysis/references/delegation-patterns.md | 4 ++--
 .github/skills/ci-analysis/references/sql-tracking.md        | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index 75e83006d49804..e649ced3a5146f 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -37,7 +37,7 @@ Steps:
    github-mcp-server-search_pull_requests query:"is:merged base:{TARGET_BRANCH}" owner:dotnet repo:{REPO}
 2. Run: ./scripts/Get-CIStatus.ps1 -PRNumber {MERGED_PR} -Repository "dotnet/{REPO}"
 3. Find the build with same job name that passed
-4. Locate the Helix job ID (may need artifact download — see azure-cli.md)
+4. Locate the Helix job ID (may need artifact download — see [azure-cli.md](azure-cli.md))
 
 Return JSON: { "found": true, "buildId": N, "helixJob": "...", "workItem": "...", "result": "Pass" }
 Or: { "found": false, "reason": "no passing build in last 5 merged PRs" }
@@ -73,7 +73,7 @@ Return JSON: { "totalFiles": N, "files": [{ "path": "...", "changeType": "modifi
 Download and analyze binlog from AzDO build {BUILD_ID}, artifact {ARTIFACT_NAME}.
 
 Steps:
-1. Download the artifact (see azure-cli.md)
+1. Download the artifact (see [azure-cli.md](azure-cli.md))
 2. Load: mcp-binlog-tool-load_binlog path:"{BINLOG_PATH}"
 3. Find tasks: mcp-binlog-tool-search_tasks_by_name taskName:"Csc"
 4. Get task parameters: mcp-binlog-tool-get_task_info
diff --git a/.github/skills/ci-analysis/references/sql-tracking.md b/.github/skills/ci-analysis/references/sql-tracking.md
index 3c8d62f2c6b590..2d2f253562d655 100644
--- a/.github/skills/ci-analysis/references/sql-tracking.md
+++ b/.github/skills/ci-analysis/references/sql-tracking.md
@@ -96,7 +96,7 @@ WHERE body LIKE '%PTAL%' OR body LIKE '%could you%look%';
 |-----------|----------|
 | 1-2 failed jobs, all match known issues | No — straightforward, hold in context |
 | 3+ failed jobs across multiple builds | Yes — prevents missed matches |
-| Build progression with 5+ builds | Yes — see build-progression-analysis.md |
+| Build progression with 5+ builds | Yes — see [build-progression-analysis.md](build-progression-analysis.md) |
 | Crash recovery across multiple work items | Yes — cache testResults.xml findings |
 | Single build, single failure | No — overkill |
 | PR chain or long-lived PR with extensive triage comments | Yes — preserves diagnosis context across tool calls |

From a3e2bdb2bf47afd6b502af88ebc9543c3780a6fc Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 22:50:36 -0600
Subject: [PATCH 40/44] =?UTF-8?q?Add=20buildId=20to=20failedJobDetails,=20?=
 =?UTF-8?q?include=20exit=20code=20-4=20in=20crash=20regex,=20fix=20$=1Brr?=
 =?UTF-8?q?or=20example?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .github/skills/ci-analysis/references/manual-investigation.md | 2 +-
 .github/skills/ci-analysis/references/sql-tracking.md         | 2 +-
 .github/skills/ci-analysis/scripts/Get-CIStatus.ps1           | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/.github/skills/ci-analysis/references/manual-investigation.md b/.github/skills/ci-analysis/references/manual-investigation.md
index b0958f7dc3a82a..c7a67b98ea91a3 100644
--- a/.github/skills/ci-analysis/references/manual-investigation.md
+++ b/.github/skills/ci-analysis/references/manual-investigation.md
@@ -77,7 +77,7 @@ Binlogs contain detailed MSBuild execution traces for diagnosing:
 ```
 mcp-binlog-tool-load_binlog path:"path/to/build.binlog"
 mcp-binlog-tool-get_diagnostics binlog_file:"path/to/build.binlog"
-mcp-binlog-tool-search_binlog binlog_file:"path/to/build.binlog" query:"$error"
+mcp-binlog-tool-search_binlog binlog_file:"path/to/build.binlog" query:"error"
 ```
 
 **Manual Analysis:**
diff --git a/.github/skills/ci-analysis/references/sql-tracking.md b/.github/skills/ci-analysis/references/sql-tracking.md
index 2d2f253562d655..948f2ea6ab1bd2 100644
--- a/.github/skills/ci-analysis/references/sql-tracking.md
+++ b/.github/skills/ci-analysis/references/sql-tracking.md
@@ -43,7 +43,7 @@ SELECT job_name, error_snippet FROM failed_jobs WHERE is_pr_correlated = TRUE;
 
 ### Workflow
 
-1. After the script runs, insert one row per failed job from `failedJobDetails`
+1. After the script runs, insert one row per failed job from `failedJobDetails` (each entry includes `buildId`)
 2. For each known issue from `knownIssues`, UPDATE matching rows with the issue URL
 3. Query for unmatched failures — these need investigation
 4. For crash/canceled jobs, update `recovery_status` after checking Helix results
diff --git a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1 b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
index 9f4e149550f86c..07a7e29ce280dd 100644
--- a/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
+++ b/.github/skills/ci-analysis/scripts/Get-CIStatus.ps1
@@ -1969,6 +1969,7 @@ try {
                 # Track per-job failure details for JSON summary
                 $jobDetail = [ordered]@{
                     jobName = $job.name
+                    buildId = $currentBuildId
                     errorSnippet = ""
                     helixWorkItems = @()
                     errorCategory = "unclassified"
@@ -2030,7 +2031,7 @@ try {
                                             # Categorize failure from log content
                                             if ($failureInfo -match 'Timed Out \(timeout') {
                                                 $jobDetail.errorCategory = "test-timeout"
-                                            } elseif ($failureInfo -match 'Exit Code:\s*(139|134)' -or $failureInfo -match 'createdump') {
+                                            } elseif ($failureInfo -match 'Exit Code:\s*(139|134|-4)' -or $failureInfo -match 'createdump') {
                                                 # Crash takes highest precedence — don't downgrade
                                                 if ($jobDetail.errorCategory -notin @("crash")) {
                                                     $jobDetail.errorCategory = "crash"

From 9533720c418905817e23c5855db6a3e418006389 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 23:02:02 -0600
Subject: [PATCH 41/44] Add downloaded artifact layout guide and SQL tracking
 for artifact management

---
 .../ci-analysis/references/helix-artifacts.md | 85 +++++++++++++++++++
 .../ci-analysis/references/sql-tracking.md    |  1 +
 2 files changed, 86 insertions(+)

diff --git a/.github/skills/ci-analysis/references/helix-artifacts.md b/.github/skills/ci-analysis/references/helix-artifacts.md
index 16bd426aad04ad..840e3955a8d2a2 100644
--- a/.github/skills/ci-analysis/references/helix-artifacts.md
+++ b/.github/skills/ci-analysis/references/helix-artifacts.md
@@ -184,6 +184,91 @@ Get-ChildItem -Path $extractPath -Filter "*.binlog" -Recurse | ForEach-Object {
 
 If a test runs `dotnet build` internally (like SDK end-to-end tests), both sources may have relevant binlogs.
 
+## Downloaded Artifact Layout
+
+When you download artifacts via MCP tools or manually, the directory structure can be confusing. Here's what to expect.
+
+### Helix Work Item Downloads (`hlx_download`)
+
+`hlx_download` saves files to a temp directory and returns local paths. The structure is **flat** — all files from the work item land in one directory:
+
+```
+C:\...\Temp\helix-{hash}\
+├── console.d991a56d.log          # Console output
+├── testResults.xml               # Test pass/fail details
+├── msbuild.binlog                # Only if test invoked MSBuild
+├── publish.msbuild.binlog        # Only if test did a publish
+├── msbuild0.binlog               # Numbered: first test's build
+├── msbuild1.binlog               # Numbered: second test's build
+└── core.1000.34                  # Only on crash
+```
+
+**Key confusion point:** Numbered binlogs (`msbuild0.binlog`, `msbuild1.binlog`) correspond to individual test cases within the work item, not to build phases. A work item like `Microsoft.NET.Build.Tests.dll.18` runs dozens of tests, each invoking MSBuild separately. To map a binlog to a specific test:
+1. Load it with `mcp-binlog-tool-load_binlog`
+2. Check the project paths inside — they usually contain the test name
+3. Or check `testResults.xml` to correlate test execution order with binlog numbering
+
+### AzDO Build Artifact Downloads
+
+AzDO artifacts download as **ZIP files** with nested directory structures:
+
+```
+$env:TEMP\TestBuild_linux_x64\
+└── TestBuild_linux_x64\          # Artifact name repeated as subfolder
+    └── log\Release\
+        ├── Build.binlog           # Main build
+        ├── TestBuildTests.binlog   # Test build verification
+        ├── ToolsetRestore.binlog   # Toolset restore
+        └── SendToHelix.binlog     # Contains Helix job GUIDs
+```
+
+**Key confusion point:** The artifact name appears twice in the path (extract folder + subfolder inside the ZIP). Use the full nested path with `mcp-binlog-tool-load_binlog`.
+
+### Mapping Binlogs to Failures
+
+| You want to investigate... | Use this binlog | Source |
+|---------------------------|-----------------|--------|
+| Why a test's internal `dotnet build` failed | `msbuild.binlog` or `msbuild{N}.binlog` | Helix work item |
+| Why the CI build itself failed to compile | `Build.binlog` | AzDO build artifact |
+| Which Helix jobs were dispatched | `SendToHelix.binlog` | AzDO build artifact |
+| AOT compilation failure | `AOTBuild.binlog` | Helix work item |
+
+### Tracking Downloaded Artifacts with SQL
+
+When downloading from multiple work items (e.g., binlog comparison between passing and failing builds), use SQL to avoid losing track of what's where:
+
+```sql
+CREATE TABLE IF NOT EXISTS downloaded_artifacts (
+  local_path TEXT PRIMARY KEY,
+  helix_job TEXT,
+  work_item TEXT,
+  build_id INT,
+  artifact_source TEXT,  -- 'helix' or 'azdo'
+  file_type TEXT,        -- 'binlog', 'testResults', 'console', 'crash'
+  notes TEXT             -- e.g., 'passing baseline', 'failing PR build'
+);
+```
+
+Key queries:
+```sql
+-- Find the pair of binlogs for comparison
+SELECT local_path, notes FROM downloaded_artifacts
+WHERE file_type = 'binlog' ORDER BY notes;
+
+-- What have I downloaded from a specific work item?
+SELECT local_path, file_type FROM downloaded_artifacts
+WHERE work_item = 'Microsoft.NET.Build.Tests.dll.18';
+```
+
+Use this whenever you're juggling artifacts from 2+ Helix jobs (especially during the binlog comparison pattern in [binlog-comparison.md](binlog-comparison.md)).
+
+### Tips
+
+- **Multiple binlogs ≠ multiple builds.** A single work item can produce several binlogs if the test suite runs multiple `dotnet build`/`dotnet publish` commands.
+- **Helix binlogs are test-time, AzDO binlogs are build-time.** If a test was built wrong, check AzDO artifacts. If it ran a build that produced wrong output, check Helix artifacts.
+- **Not all work items have binlogs.** Standard unit tests only produce `testResults.xml` and console logs.
+- **Use `hlx_download` with `pattern:"*.binlog"`** to filter downloads and avoid pulling large console logs.
+
 ## Artifact Retention
 
 Helix artifacts are retained for a limited time (typically 30 days). Download important artifacts promptly if needed for long-term analysis.
diff --git a/.github/skills/ci-analysis/references/sql-tracking.md b/.github/skills/ci-analysis/references/sql-tracking.md
index 948f2ea6ab1bd2..6d6ae673fae21b 100644
--- a/.github/skills/ci-analysis/references/sql-tracking.md
+++ b/.github/skills/ci-analysis/references/sql-tracking.md
@@ -100,3 +100,4 @@ WHERE body LIKE '%PTAL%' OR body LIKE '%could you%look%';
 | Crash recovery across multiple work items | Yes — cache testResults.xml findings |
 | Single build, single failure | No — overkill |
 | PR chain or long-lived PR with extensive triage comments | Yes — preserves diagnosis context across tool calls |
+| Downloading artifacts from 2+ Helix jobs (e.g., binlog comparison) | Yes — see [helix-artifacts.md](helix-artifacts.md) |

From ce7083c81e29a18a39cc2b536c8e5409b618a8bd Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 23:03:23 -0600
Subject: [PATCH 42/44] Clarify build discovery scope, SQL table purposes, and
 use concrete tool name examples

---
 .github/skills/ci-analysis/SKILL.md                          | 2 +-
 .github/skills/ci-analysis/references/delegation-patterns.md | 2 +-
 .github/skills/ci-analysis/references/sql-tracking.md        | 4 ++++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/.github/skills/ci-analysis/SKILL.md b/.github/skills/ci-analysis/SKILL.md
index 0139f5731506e5..8ae99a98c8b1d1 100644
--- a/.github/skills/ci-analysis/SKILL.md
+++ b/.github/skills/ci-analysis/SKILL.md
@@ -75,7 +75,7 @@ The script operates in three distinct modes depending on what information you ha
 ## What the Script Does
 
 ### PR Analysis Mode (`-PRNumber`)
-1. Discovers all AzDO builds associated with the PR
+1. Discovers AzDO builds associated with the PR (via `gh pr checks` — finds failing builds and one non-failing build as fallback; for full build history, use `azure-devops-pipelines_get_builds`)
 2. Fetches Build Analysis for known issues
 3. Gets failed jobs from Azure DevOps timeline
 4. **Separates canceled jobs from failed jobs** (canceled may be dependency-canceled or timeout-canceled)
diff --git a/.github/skills/ci-analysis/references/delegation-patterns.md b/.github/skills/ci-analysis/references/delegation-patterns.md
index e649ced3a5146f..e0b191ed68c37f 100644
--- a/.github/skills/ci-analysis/references/delegation-patterns.md
+++ b/.github/skills/ci-analysis/references/delegation-patterns.md
@@ -117,7 +117,7 @@ Launch one per build in parallel. The main agent combines with `azure-devops-pip
 
 ## General Guidelines
 
-- **Use `general-purpose` agent type** — it has shell + MCP access (hlx-*, azure-devops-*, mcp-binlog-tool-*)
+- **Use `general-purpose` agent type** — it has shell + MCP access (`hlx_status`, `azure-devops-pipelines_get_builds`, `mcp-binlog-tool-load_binlog`, etc.)
 - **Run independent tasks in parallel** — the whole point of delegation
 - **Include script paths** — subagents don't inherit skill context
 - **Require structured JSON output** — enables comparison across subagents
diff --git a/.github/skills/ci-analysis/references/sql-tracking.md b/.github/skills/ci-analysis/references/sql-tracking.md
index 6d6ae673fae21b..950e2f61a4465e 100644
--- a/.github/skills/ci-analysis/references/sql-tracking.md
+++ b/.github/skills/ci-analysis/references/sql-tracking.md
@@ -52,6 +52,10 @@ SELECT job_name, error_snippet FROM failed_jobs WHERE is_pr_correlated = TRUE;
 
 See [build-progression-analysis.md](build-progression-analysis.md) for the `build_progression` and `build_failures` tables that track pass/fail across multiple builds.
 
+> **`failed_jobs` vs `build_failures` — when to use each:**
+> - `failed_jobs` (above): **Job-level** — maps each failed AzDO job to a known issue. Use for single-build triage ("are all failures accounted for?").
+> - `build_failures` (build-progression-analysis.md): **Test-level** — tracks individual test names across builds. Use for progression analysis ("which tests started failing after commit X?").
+
 ## PR Comment Tracking
 
 For deep-dive analysis — especially across a chain of related PRs (e.g., dependency flow failures, sequential merge PRs, or long-lived PRs with weeks of triage) — store PR comments so you can query them without re-fetching:

From 788a4f8da36b6935e7efe3139a5df5b43bfb7cc5 Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 23:07:17 -0600
Subject: [PATCH 43/44] Soften binlog source guidance: AzDO and Helix
 boundaries aren't absolute

---
 .../ci-analysis/references/helix-artifacts.md | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/.github/skills/ci-analysis/references/helix-artifacts.md b/.github/skills/ci-analysis/references/helix-artifacts.md
index 840e3955a8d2a2..715330f50b77a8 100644
--- a/.github/skills/ci-analysis/references/helix-artifacts.md
+++ b/.github/skills/ci-analysis/references/helix-artifacts.md
@@ -226,12 +226,17 @@ $env:TEMP\TestBuild_linux_x64\
 
 ### Mapping Binlogs to Failures
 
-| You want to investigate... | Use this binlog | Source |
-|---------------------------|-----------------|--------|
-| Why a test's internal `dotnet build` failed | `msbuild.binlog` or `msbuild{N}.binlog` | Helix work item |
-| Why the CI build itself failed to compile | `Build.binlog` | AzDO build artifact |
-| Which Helix jobs were dispatched | `SendToHelix.binlog` | AzDO build artifact |
-| AOT compilation failure | `AOTBuild.binlog` | Helix work item |
+This table shows the **typical** source for each binlog type. The boundaries aren't absolute — some repos run tests on the build agent (producing test binlogs in AzDO artifacts), and Helix work items for SDK/Blazor tests invoke `dotnet build` internally (producing build binlogs as Helix artifacts).
+
+| You want to investigate... | Look here first | But also check... |
+|---------------------------|-----------------|-------------------|
+| Why a test's internal `dotnet build` failed | Helix work item (`msbuild{N}.binlog`) | AzDO artifact if tests ran on agent |
+| Why the CI build itself failed to compile | AzDO build artifact (`Build.binlog`) | — |
+| Which Helix jobs were dispatched | AzDO build artifact (`SendToHelix.binlog`) | — |
+| AOT compilation failure | Helix work item (`AOTBuild.binlog`) | — |
+| Test build/publish behavior | Helix work item (`publish.msbuild.binlog`) | AzDO artifact (`TestBuildTests.binlog`) |
+
+> **Rule of thumb:** If the failing job name contains "Helix" or "Send to Helix", the test binlogs are in Helix. If the job runs tests directly (common in dotnet/sdk), check AzDO artifacts.
 
 ### Tracking Downloaded Artifacts with SQL
 
@@ -265,7 +270,7 @@ Use this whenever you're juggling artifacts from 2+ Helix jobs (especially durin
 ### Tips
 
 - **Multiple binlogs ≠ multiple builds.** A single work item can produce several binlogs if the test suite runs multiple `dotnet build`/`dotnet publish` commands.
-- **Helix binlogs are test-time, AzDO binlogs are build-time.** If a test was built wrong, check AzDO artifacts. If it ran a build that produced wrong output, check Helix artifacts.
+- **Helix and AzDO binlogs can overlap.** Helix binlogs are *usually* from test execution and AzDO binlogs from the build phase, but SDK/Blazor tests invoke MSBuild inside Helix (producing build-like binlogs), and some repos run tests directly on the build agent (producing test binlogs in AzDO). Check both sources if you can't find what you need.
 - **Not all work items have binlogs.** Standard unit tests only produce `testResults.xml` and console logs.
 - **Use `hlx_download` with `pattern:"*.binlog"`** to filter downloads and avoid pulling large console logs.
 

From c6a96ab543c832d261aee1376137957ae8d5b77a Mon Sep 17 00:00:00 2001
From: Larry Ewing <lewing@microsoft.com>
Date: Wed, 11 Feb 2026 23:09:21 -0600
Subject: [PATCH 44/44] Clarify hlx_download vs hlx_download_url usage

---
 .github/skills/ci-analysis/references/helix-artifacts.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/.github/skills/ci-analysis/references/helix-artifacts.md b/.github/skills/ci-analysis/references/helix-artifacts.md
index 715330f50b77a8..6b73691bd44a45 100644
--- a/.github/skills/ci-analysis/references/helix-artifacts.md
+++ b/.github/skills/ci-analysis/references/helix-artifacts.md
@@ -188,9 +188,13 @@ If a test runs `dotnet build` internally (like SDK end-to-end tests), both sourc
 
 When you download artifacts via MCP tools or manually, the directory structure can be confusing. Here's what to expect.
 
-### Helix Work Item Downloads (`hlx_download`)
+### Helix Work Item Downloads
 
-`hlx_download` saves files to a temp directory and returns local paths. The structure is **flat** — all files from the work item land in one directory:
+Two MCP tools download Helix artifacts:
+- **`hlx_download`** — downloads multiple files from a work item, with optional glob `pattern` (e.g., `pattern:"*.binlog"`). Returns local file paths.
+- **`hlx_download_url`** — downloads a single file by direct URI (from `hlx_files` output). Use when you know exactly which file you need.
+
+`hlx_download` saves files to a temp directory. The structure is **flat** — all files from the work item land in one directory:
 
 ```
 C:\...\Temp\helix-{hash}\