ci-analysis skill: restore domain examples from eval regression analysis#124416
Merged
lewing merged 2 commits intodotnet:mainfrom Feb 14, 2026
Merged
ci-analysis skill: restore domain examples from eval regression analysis#124416lewing merged 2 commits intodotnet:mainfrom
lewing merged 2 commits intodotnet:mainfrom
Conversation
…ogression Waza eval progression testing (16 runs across 4 skill versions) revealed the tool-agnostic refactor (dotnet#124398) caused a 68% regression in tool calls (25→42) for the build progression task. Root cause: domain-specific examples were incorrectly classified as tool schema restatements. Changes: - build-progression-analysis.md: restore key AzDO query parameters (branchName, queryOrder, top, project) as inline hints - build-progression-analysis.md: restore gh api merge parent extraction example and mention get_commit MCP alternative - build-progression-analysis.md: restore logId:5 / startLine:500 hints with bold emphasis for checkout log extraction - build-progression-analysis.md: add stop signal — present findings when the progression table and transition are identified - delegation-patterns.md: add bold emphasis on log ID/line hints in subagent prompt template - SKILL.md: mention refs/pull/{PR}/merge branch pattern in step 1 These are domain examples (branch ref formats, field names, log locations, jq expressions) that agents cannot infer from tool descriptions alone. Simple tasks (retry) still benefit from less prescriptive guidance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Restores domain-specific guidance in the ci-analysis skill docs that was removed during the tool-agnostic refactor, aiming to reduce unnecessary tool calls and improve efficiency for complex build progression investigations.
Changes:
- Reintroduces AzDO build query specifics for
refs/pull/{PR}/merge(project/ordering/top) and clarifies wherepr.sourceShalives. - Restores a concrete merge-parent extraction example for obtaining target branch HEAD from the merge commit.
- Reinforces “checkout log ID 5 / line 500+” hints and adds an explicit stop signal once the progression table + transition are identified.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| .github/skills/ci-analysis/references/delegation-patterns.md | Adds emphasis to the checkout-log hint in the subagent delegation template. |
| .github/skills/ci-analysis/references/build-progression-analysis.md | Restores key domain examples/parameters for PR build listing, merge-parent extraction, checkout-log extraction, and stopping criteria. |
| .github/skills/ci-analysis/SKILL.md | Updates PR analysis mode description to reference querying AzDO builds on the PR merge ref for full history. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
hoyosjs
approved these changes
Feb 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Waza eval progression testing (16 runs across 4 skill versions, then 12 more runs validating fixes) revealed the tool-agnostic refactor (#124398) caused a 68% regression in tool calls (25→42) for the build progression task while other tasks were stable or improved.
Root cause: Three domain-specific examples were incorrectly classified as tool schema restatements and removed. These are domain knowledge the agent genuinely cannot infer from tool descriptions:
refs/pull/{PR}/mergebranch pattern + AzDO query paramsgh api .../git/commits/{sha} --jq '.parents[0].sha'+get_commitMCP alternativeEval Results (Tool Calls)
Each fix was validated independently with a full 4-task eval run:
Eval Results (Duration)
The final version matches the pre-refactor best on Build Progression (25 tools) and beats it overall (49 vs 55 total tools, 8m46s vs 11m04s).
Key Insight
Simple tasks (retry, CI status) benefit from less prescriptive guidance — retry improved from 5 to 4 calls. Complex multi-step tasks (build progression) need specific domain examples showing branch ref patterns, field names, and log locations. The rule of thumb: if removing an example leaves the agent unable to accomplish the task efficiently AND the information isn't in any tool description, it's a domain example — keep it.
Changes
build-progression-analysis.md: Restore key parameters, merge parent extraction example, checkout log hints, stop signaldelegation-patterns.md: ALL CAPS emphasis on log ID/line hints in subagent templateSKILL.md: Mentionrefs/pull/{PR}/mergein step 1