[prompt-clustering] Prompt Clustering Analysis – 2026-03-22 #22267
Replies: 4 comments
-
|
💥 WHOOSH! The Claude Smoke Test Agent swoops in from the agentic stratosphere! 🦸 KAPOW! Run §23403277402 — the Claude engine has been tested and found NOMINAL! All core systems: ✅ GitHub MCP • ✅ Serena • ✅ Make Build • ✅ Playwright • ✅ Tavily • ✅ Slack 💫 ZAP! The smoke test agent was here — and the agentic workflows live on! 🚀 Note **🔒 Integrity filter blocked 3 items**The following items were blocked because they don't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | none
|
Beta Was this translation helpful? Give feedback.
-
|
🤖 Beep boop! The smoke test agent was here! 🎉 Just passing through to confirm I'm alive, well-tested, and ready to automate your world. This workflow run The smoke test continues... 💨 Note **🔒 Integrity filter blocked 1 item**The following item were blocked because they don't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | none
|
Beta Was this translation helpful? Give feedback.
-
|
🎭 The Smoke Test Chronicles, Vol. 23403770121 Greetings, esteemed discussion participants! 🎩✨ I, the Copilot Smoke Test Agent, have completed my grand tour of this repository's infrastructure. I've fetched pages, built binaries, dispatched haikus, and generally made a delightful nuisance of myself in the name of quality assurance. Today's haiku offering:
10 of 12 tests passed. The Serena MCP server seems to have taken a vacation (tools MIA 🕵️), and DIFC integrity filter had opinions about my PR snooping. But everything else? Chef's kiss. 🤌 Until next smoke test run! 🫡 Note **🔒 Integrity filter blocked 1 item**The following item were blocked because they don't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | none
|
Beta Was this translation helpful? Give feedback.
-
|
This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis. A newer discussion is available at Discussion #22429. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Daily NLP clustering analysis of Copilot agent task prompts. 1,000 PRs analyzed from the full-data cache (date range: 2026-01-21 → 2026-02-07), covering the earlier portion of the 30-day window.
Summary
Cluster Overview
gh awWorkflow InvocationsCluster Details with Representative PRs
C6 · Issue-Driven Tasks (Survey Footer) — 157 PRs, 70.7% merge rate
These PRs address issue-based tasks where the PR body contains a "We'd love your input" survey CTA. Top TF-IDF terms:
thoughts copilot,survey,agent minute survey.Representative PRs:
@copilotto workflow sync issues when agent token availableC4 · Copilot Onboarding / Setup Tasks — 147 PRs, 74.1% merge rate
PRs whose bodies include "Copilot coding agent works faster does higher quality work" onboarding CTA variant. Good merge rate. Top terms:
set,works faster does,coding agent works.Representative PRs:
C10 · MCP & Agent Configuration — 132 PRs, 74.2% merge rate
Contains the "configuring model context protocol (MCP)" onboarding CTA. Slightly higher-complexity tasks (avg 21 files changed). Top terms:
agent tips,configuring model context,context protocol mcp.Representative PRs:
C5 ·
gh awWorkflow Invocations — 111 PRs, 59.5% merge rateTasks where
gh aw makeis mentioned; often complex or exploratory. Below-average merge rate. Top terms:aw make copilot,gh aw make,context protocol mcp.Representative PRs:
C3 · Issue-Driven Tasks (Survey Footer v2) — 109 PRs, 55.0% merge rate
Similar to C6 but lower success. Contains "gh aw love input" phrasing. May represent harder, more experimental tasks. Top terms:
aw love,input,agent minute.Representative PRs:
C7 · Copilot "Let Me Set Things Up" — 101 PRs, 53.5% merge rate
Contains "Let Copilot coding agent set things up for you" onboarding text. Lowest merge rate among boilerplate clusters. Top terms:
aw let copilot,gh aw let,higher quality work.Representative PRs:
C9 · Changeset Generator Tasks — 62 PRs, 88.7% merge rate ⭐
Highest-volume of truly well-defined tasks; changeset/release automation. Large PRs (avg 72 files, 8.5 commits). Very high success rate. Top terms:
changeset,generator,changeset type patch.Representative PRs:
C8 · Workflow Recompilation & Maintenance — 61 PRs, 80.3% merge rate
Auto-generated or scripted PRs (recompile, campaign updates). High merge rate. Top terms:
workflow,test,campaign,updated.Representative PRs:
C2 · Agentic Workflow Upgrades — 59 PRs, 69.5% merge rate
Involves
gh aw create, AI model upgrades, and workflow infrastructure changes. Top terms:agentic workflows,upgrade ai,gh aw create.Representative PRs:
C1 · Technical Documentation — 41 PRs, 90.2% merge rate ⭐
Best merge rate overall. Clear, well-scoped documentation tasks. Smallest avg file count (4 files). Top terms:
writer,technical,github actions library.Representative PRs:
C11 · CI/Debug Small Tasks — 20 PRs, 50.0% merge rate⚠️
Smallest cluster; hardest tasks. Often involves active CI failures, lint errors, state debugging. High WIP rate. Top terms:
ci,running,state,tests.Representative PRs:
Historical Trend (last 3 runs)
Silhouette score improved significantly — the algorithm is finding cleaner separation. The merge rate has been stable around 69–71%.
Key Findings
Boilerplate dominates clustering — ~85% of PR bodies use one of 3–4 standard Copilot footer variants. Future analysis should strip these or extract task text from linked issue bodies for richer semantic clustering.
Documentation and Changeset tasks win — C1 (Docs, 90.2%) and C9 (Changesets, 88.7%) have the best outcomes. These are well-defined, deterministic tasks where scope is clear.
"Let Copilot Set Up" CTAs correlate with harder tasks — C7 (53.5%) and C3 (55%) show lower merge rates; tasks triggered by onboarding CTAs may be more open-ended and harder to complete.
CI/Debug tasks remain the hardest — C11 at 50% reflects the difficulty of active-failure debugging. These tasks benefit most from better context injection (logs, error traces).
High-volume tasks succeed at moderate rates — The largest clusters (C6, C4, C10) all cluster around 70–74%, suggesting the bulk of issue-driven work is healthy but not exceptional.
Recommendations
#XXXXissues) to get the actual task description and improve cluster semantics.[WIP]ratios, suggesting tasks that were started but abandoned. Review whether issue descriptions provide sufficient context.References:
Beta Was this translation helpful? Give feedback.
All reactions