[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-03-17 #21391

2026-03-17T13:00:14Z

github-actions[bot]
bot Mar 17, 2026

Daily NLP-based clustering analysis of Copilot agent task prompts across the last 30 days. TF-IDF vectorization + K-means clustering (k=6, selected by silhouette score) applied to 2,551 PRs with valid extracted prompts.

Summary

Metric	Value
Analysis period	Last 30 days
Total Copilot PRs loaded	2,666
PRs with valid prompts	2,551 (96%)
Overall merge rate	71.1%
Clusters identified	6
Analysis date	2026-03-17

Cluster Overview

#	Theme	PRs	Share	Merge Rate	Avg Commits	Top Keywords
A	Workflow & Feature Development	903	35%	75%	3.9	reference, update, github, workflow, add, review
B	Issue Resolution via `gh aw`	892	35%	65%	3.0	issue, section, workflow, details, gh aw, resolve
C	Safe Outputs Development	269	11%	80%	4.2	safe outputs, output, issue, agent, handler
D	Agentic Workflow Management	210	8%	77%	4.1	agentic workflows, workflow, md, update, create, engine
E	Open-ended Copilot Tasks	169	7%	62%	3.4	coding agent, copilot coding, copilot, set, work
F	CI Failure Diagnosis & Fixes	108	4%	77%	2.9	job, fix, analyze, identify, failing, implement

Cluster Details — Sample Prompts & Characteristics

A · Workflow & Feature Development (903 PRs, 75% merge)

The largest cluster by volume. Prompts reference GitHub Actions workflows, feature additions, and review/remove operations. Tasks are often broad feature implementation and refactoring requests referencing specific workflow files.

Top terms: reference, update, github, workflow, add, agent, review, remove

Example PRs: #11968, #11357, #16020

Sample prompt:

Run the update command and ensure that the Sentry MCP is updated. It should be upgraded to version 0.27.0, as shown in the dependency pull-request description attached below.

B · Issue Resolution via `gh aw` (892 PRs, 65% merge)

Tied for the largest cluster. Prompts follow a structured template referencing a GitHub issue body ((issue_title), (issue_description), (issue_number) sections). The agent is asked to resolve issues from the tracker. The lower merge rate (65%) reflects that issue-driven tasks involve more ambiguity and back-and-forth.

Top terms: issue, section, workflow, details, gh, aw, gh aw, resolve

Example PRs: #13576, #11980, #16277

Sample prompt:

[CI Failure Doctor] Test failures after PR #11036: regex pattern changes without test updates — Analyze the test failures, identify root cause, and fix the regex patterns or update the tests.

C · Safe Outputs Development (269 PRs, 80% merge) ★ Highest merge rate

Prompts are highly focused and specific — describing precise safe-output handler behaviour, MCP server configuration, and artifact logging. The specificity of these prompts correlates with the highest merge rate (80%) in the dataset.

Top terms: safe, safe outputs, outputs, safe output, output, issue, agent, handler

Example PRs: #16842, #11120, #18989

Sample prompt:

Update the safe output JavaScript handlers such that every create item in GitHub is logged in a JSONL manifest file and uploaded as an artifact (/tmp/safe-output-items.jsonl). Uploads always.

D · Agentic Workflow Management (210 PRs, 77% merge)

Prompts specifically mention agentic workflows, .md workflow files, and engine/compiler concepts. These are configuration-heavy tasks: updating templates, managing workflow lifecycle, changing compiler behaviour.

Top terms: agentic, agentic workflows, workflows, workflow, md, update, create, engine

Example PRs: #11360, #13996, #11299

Sample prompt:

Update the template used to create the parent issue for all agentic-workflow issues so that it creates a conclusion job. Set the issue title to "agentic-workflows failures"…

E · Open-ended Copilot Tasks (169 PRs, 62% merge) ★ Lowest merge rate

Prompts are the least specific — often just generic task handoffs ("Thanks for asking me to work on this…") or vague feature requests without clear acceptance criteria. The lowest merge rate (62%) signals that underspecified prompts produce lower quality outcomes.

Top terms: coding agent, coding, copilot coding, agent, copilot, set, work, input

Example PRs: #20143, #15195, #18028

F · CI Failure Diagnosis & Fixes (108 PRs, 77% merge)

Prompts follow a structured CI-doctor template: provide a Job ID and URL, ask the agent to analyze workflow logs, identify root cause, and implement a fix. Smaller in volume but high success rate.

Top terms: job, fix, analyze, identify, failing, implement, id, url

Example PRs: #18876, #13949, #13592

Sample prompt:

Fix the failing GitHub Actions workflow. Analyze the workflow logs, identify the root cause of the failure, and implement a fix. Job ID: 61070763482 Job URL: https://github.com/…

Full PR Data Table (top 200 by recency)

PR #	Title (truncated)	Cluster	Outcome	Commits	Files
#21387	[WIP] Update project safe output to include conten	C	Open	1	0
#21386	[WIP] [BD-101] Fix bot detection workflow precompu	B	Open	1	0
#21385	[WIP] [aw] Debug workflow failure for Documentatio	D	Open	1	0
#21384	fix: format wrapped error chains with newlines	A	Merged	1	4
#21349	Fix cross-host workflow resolution in `add`	A	Open	4	6
#21342	fix: add write-sink guard policies for non-GitHub	D	Merged	6	165
#21336	refactor: extract shared MCP renderer helpers	A	Merged	2	5
#21333	fix(ci-coach): fallback to issue when PR touches p	A	Merged	2	2
#21329	fix: activate GitHub App configuration in shared w	C	Merged	5	19
#21323	feat: load safe_outputs_tools.json from actions	C	Merged	6	172
#21307	fix: prompt steers model away from GitHub MCP read	A	Merged	4	181
#21294	Remove `lockdown: false` from all agentic workflows	D	Merged	3	8
#21287	Replace automatic lockdown with guard policies	A	Merged	4	184
#21280	docs: create missing /reference/auth-projects/ page	E	Merged	2	1
#21265	perf: fix ~50% regression in FindIncludesInContent	A	Merged	2	2
#21218	fix: wire `call_workflow` into consolidated safe_o	C	Open	3	9
#21174	fix: update golden files for gh-aw-firewall v0.24	F	Merged	2	3
#21124	Fix call_workflow tool registration in HTTP safe	C	Merged	3	2
#21118	feat: add label-command trigger	A	Merged	13	20
#21083	Add `safe-outputs.allowed-url-domains`	C	Merged	3	49
#21077	Add CI guard for validator file size limit	A	Merged	4	3
#21011	fix: use artifact prefix in conclusion job	C	Merged	6	6
#21010	fix: resolve TypeScript type error	F	Merged	2	1
#21005	feat: add write-sink guard policy to non-GitHub	C	Merged	3	19
#20806	feat: add `call-workflow` safe output	C	Merged	11	25
#20751	Fix "Exceeded max expression length 21000"	C	Merged	3	164

Full table available in clustering-report.md. Showing 26 representative PRs.

Key Findings

Two dominant task types account for 70% of all work: Workflow & Feature Development (35%, 75% merge) and Issue Resolution via gh aw (35%, 65% merge). The 10 pp gap in merge rate between these two similarly-sized clusters is the biggest actionable signal.
Prompt specificity predicts success: Safe Outputs Development (80% merge) uses detailed, technically precise prompts. Open-ended Copilot Tasks (62% merge) uses vague or boilerplate prompts. This is a direct correlation between prompt quality and outcome.
CI Failure Doctor template is effective: The structured "Job ID + Job URL + fix instructions" template achieves 77% merge rate with only 2.9 avg commits — the most efficient pattern in the dataset.
Issue-driven tasks are the weakest link: The 892 PRs in cluster B have the lowest merge rate (65%) despite being the most numerous. The issue-template format introduces ambiguity that the agent struggles with.

Recommendations

Improve issue-template prompts (Cluster B, 65% merge): Add explicit acceptance criteria, expected file paths, and test requirements to issue bodies before dispatching to the agent. This alone could move ~60 PRs from closed/unmerged to merged per 30-day cycle.
Adopt the CI Failure Doctor pattern more broadly: The structured Job ID + analysis template (Cluster F) achieves strong results efficiently. Consider adapting this template for other diagnostic tasks.
Audit open-ended Copilot tasks: Review the 169 PRs in Cluster E before dispatching. Require a minimum prompt length and checklist of acceptance criteria to avoid vague handoffs that waste agent cycles.
Safe Outputs is a model cluster: Review the prompt patterns in Cluster C as examples of how to write agent tasks — specific output formats, file paths, and behavioral contracts drive 80% merge rate.

References:

Workflow run: §23194313437
Analysis script: /tmp/gh-aw/analyze-prompts.py
Full results: /tmp/gh-aw/pr-data/clustering-results.json

AI generated by Copilot Agent Prompt Clustering Analysis · history

expires on Mar 18, 2026, 1:00 PM UTC

2026-03-17T13:21:59Z

github-actions[bot]
bot Mar 17, 2026
Author

🤖 Beep boop! The smoke test agent was here! Running diagnostics on your discussion while simultaneously questioning the meaning of existence. All systems nominal. Carry on, humans! 🚀

📰 BREAKING: Report filed by Smoke Copilot · ◷

0 replies

2026-03-18T12:56:58Z

github-actions[bot]
bot Mar 18, 2026
Author

This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis.

A newer discussion is available at Discussion #21587.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-03-17 #21391

Uh oh!

{{title}}

Uh oh!

A · Workflow & Feature Development (903 PRs, 75% merge)

B · Issue Resolution via `gh aw` (892 PRs, 65% merge)

C · Safe Outputs Development (269 PRs, 80% merge) ★ Highest merge rate

D · Agentic Workflow Management (210 PRs, 77% merge)

E · Open-ended Copilot Tasks (169 PRs, 62% merge) ★ Lowest merge rate

F · CI Failure Diagnosis & Fixes (108 PRs, 77% merge)

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-03-17 #21391

Uh oh!

github-actions[bot] bot Mar 17, 2026

Summary

Cluster Overview

A · Workflow & Feature Development (903 PRs, 75% merge)

B · Issue Resolution via gh aw (892 PRs, 65% merge)

C · Safe Outputs Development (269 PRs, 80% merge) ★ Highest merge rate

D · Agentic Workflow Management (210 PRs, 77% merge)

E · Open-ended Copilot Tasks (169 PRs, 62% merge) ★ Lowest merge rate

F · CI Failure Diagnosis & Fixes (108 PRs, 77% merge)

Key Findings

Recommendations

Replies: 2 comments

Uh oh!

github-actions[bot] bot Mar 17, 2026 Author

Uh oh!

github-actions[bot] bot Mar 18, 2026 Author

github-actions[bot]
bot Mar 17, 2026

B · Issue Resolution via `gh aw` (892 PRs, 65% merge)

github-actions[bot]
bot Mar 17, 2026
Author

github-actions[bot]
bot Mar 18, 2026
Author