ci-analysis skill: restore domain examples from eval regression analysis by lewing · Pull Request #124416 · dotnet/runtime

lewing · 2026-02-14T01:48:04Z

Summary

Waza eval progression testing (16 runs across 4 skill versions, then 12 more runs validating fixes) revealed the tool-agnostic refactor (#124398) caused a 68% regression in tool calls (25→42) for the build progression task while other tasks were stable or improved.

Root cause: Three domain-specific examples were incorrectly classified as tool schema restatements and removed. These are domain knowledge the agent genuinely cannot infer from tool descriptions:

Restored	Why it matters
`refs/pull/{PR}/merge` branch pattern + AzDO query params	Agent can't infer the PR merge branch ref format
`gh api .../git/commits/{sha} --jq '.parents[0].sha'` + `get_commit` MCP alternative	Merge commit parent extraction pattern
log ID 5, line 500+ hints with emphasis	Magic numbers for checkout log location
Stop signal in Step 4	Agent was doing 5-7 extra tool calls after having enough data

Eval Results (Tool Calls)

Each fix was validated independently with a full 4-task eval run:

Version	Build Progression	CI Status	Helix	Retry	Total
8fdec1f (pre-refactor best)	25	5	20	5	55
833041c (tool-agnostic, #124398)	42 📉	10 📉	21	4	77
75f75d3 (restore domain examples)	36	7	24	4	71
a9f5140 (+branch ref hints)	32	6	26	4	68
`aa4193b` (this PR)	25 ✅	6	14	4	49 ✅

Eval Results (Duration)

Version	Build Progression	CI Status	Helix	Retry	Total
8fdec1f (pre-refactor best)	5m20s	1m28s	3m13s	1m03s	11m04s
833041c (tool-agnostic, #124398)	7m38s	1m38s	3m34s	0m56s	13m45s
`aa4193b` (this PR)	4m25s	1m08s	2m28s	0m45s	8m46s

The final version matches the pre-refactor best on Build Progression (25 tools) and beats it overall (49 vs 55 total tools, 8m46s vs 11m04s).

Key Insight

Simple tasks (retry, CI status) benefit from less prescriptive guidance — retry improved from 5 to 4 calls. Complex multi-step tasks (build progression) need specific domain examples showing branch ref patterns, field names, and log locations. The rule of thumb: if removing an example leaves the agent unable to accomplish the task efficiently AND the information isn't in any tool description, it's a domain example — keep it.

Changes

build-progression-analysis.md: Restore key parameters, merge parent extraction example, checkout log hints, stop signal
delegation-patterns.md: ALL CAPS emphasis on log ID/line hints in subagent template
SKILL.md: Mention refs/pull/{PR}/merge in step 1

…ogression Waza eval progression testing (16 runs across 4 skill versions) revealed the tool-agnostic refactor (dotnet#124398) caused a 68% regression in tool calls (25→42) for the build progression task. Root cause: domain-specific examples were incorrectly classified as tool schema restatements. Changes: - build-progression-analysis.md: restore key AzDO query parameters (branchName, queryOrder, top, project) as inline hints - build-progression-analysis.md: restore gh api merge parent extraction example and mention get_commit MCP alternative - build-progression-analysis.md: restore logId:5 / startLine:500 hints with bold emphasis for checkout log extraction - build-progression-analysis.md: add stop signal — present findings when the progression table and transition are identified - delegation-patterns.md: add bold emphasis on log ID/line hints in subagent prompt template - SKILL.md: mention refs/pull/{PR}/merge branch pattern in step 1 These are domain examples (branch ref formats, field names, log locations, jq expressions) that agents cannot infer from tool descriptions alone. Simple tasks (retry) still benefit from less prescriptive guidance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Restores domain-specific guidance in the ci-analysis skill docs that was removed during the tool-agnostic refactor, aiming to reduce unnecessary tool calls and improve efficiency for complex build progression investigations.

Changes:

Reintroduces AzDO build query specifics for refs/pull/{PR}/merge (project/ordering/top) and clarifies where pr.sourceSha lives.
Restores a concrete merge-parent extraction example for obtaining target branch HEAD from the merge commit.
Reinforces “checkout log ID 5 / line 500+” hints and adds an explicit stop signal once the progression table + transition are identified.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
.github/skills/ci-analysis/references/delegation-patterns.md	Adds emphasis to the checkout-log hint in the subagent delegation template.
.github/skills/ci-analysis/references/build-progression-analysis.md	Restores key domain examples/parameters for PR build listing, merge-parent extraction, checkout-log extraction, and stopping criteria.
.github/skills/ci-analysis/SKILL.md	Updates PR analysis mode description to reference querying AzDO builds on the PR merge ref for full history.

.github/skills/ci-analysis/references/delegation-patterns.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 14, 2026 01:48

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 14, 2026

dotnet-policy-service bot assigned lewing Feb 14, 2026

lewing added area-skills Agent Skills and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 14, 2026

Copilot started reviewing on behalf of lewing February 14, 2026 01:48 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

.github/skills/ci-analysis/references/delegation-patterns.md Outdated Show resolved Hide resolved

Address review: use ALL CAPS instead of markdown bold inside code fence

fcbb6fd

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

lewing requested a review from steveisok February 14, 2026 01:55

hoyosjs approved these changes Feb 14, 2026

View reviewed changes

lewing merged commit 50e7fbb into dotnet:main Feb 14, 2026
18 checks passed

lewing deleted the ci-analysis-eval-regression-fix branch February 14, 2026 17:12

dotnet-maestro bot mentioned this pull request Feb 15, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#4873

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci-analysis skill: restore domain examples from eval regression analysis#124416

ci-analysis skill: restore domain examples from eval regression analysis#124416
lewing merged 2 commits intodotnet:mainfrom
lewing:ci-analysis-eval-regression-fix

lewing commented Feb 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lewing commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Eval Results (Tool Calls)

Eval Results (Duration)

Key Insight

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lewing commented Feb 14, 2026 •

edited

Loading