[copilot-cli-research] Copilot CLI Deep Research - 2026-04-16 #26727

2026-04-16T21:15:00Z

github-actions[bot]
Bot Apr 16, 2026

Analysis Date: 2026-04-16 | Repository: github/gh-aw | Scope: 192 total workflows, ~135 using Copilot engine (90 explicit + 19 object-format + 26 default)

📊 Executive Summary

This is the 4th consecutive daily analysis tracking Copilot CLI feature adoption across this repository. The headline story today is a significant playwright regression (20→12, -40%) alongside a strong improvement in strict mode adoption (111→126, +13%). Persistent feature gaps — version pinning, token-weights, mcp-gateway, engine.args/env, and the majority of custom agent files — remain unchanged for the fourth day in a row, suggesting these are structural rather than temporary omissions.

The overall workflow count grew by 1 (191→192), Claude workflows grew by 1 (45→46), and Codex dropped by 1 (9→8). The most impactful quick win remains deploying the engine.agent field to route specialized workflows through the 9 idle custom agent files already committed to .github/agents/.

Critical Findings

🔴 High Priority

Issue	Current	Prev	Trend
`playwright` drop	6% (12/192)	10% (20/192)	⬇️ -40%
Version pinning	0% (0/192)	0%	= (4 days)
`bash:['*']` wildcard	19% (37/192)	19%	= (security risk)
`engine.args/env` unused	0%	0%	= (4 days)

🟡 Medium Priority

Opportunity	Current	Impact
`max-continuations`	1% (2/192)	Copilot-unique autopilot feature almost unused
`engine.agent` custom files	2% (3/192)	9/11 agent files idle
`bare: true` for analytics	1% (2/192)	Many narrow-scope workflows carry AGENTS.md overhead
`token-weights`	0%	No cost customization for 4 days

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Copilot CLI Capabilities Inventory (v1.0.21)

Engine Configuration Fields:

engine.version — Pin Copilot CLI version (e.g. "0.0.369")
engine.model — Override model (e.g. gpt-5.1-codex-mini, claude-haiku-4-5)
engine.agent — Load custom .github/agents/*.agent.md file
engine.args — Pass extra CLI flags (array of strings)
engine.env — Inject environment variables into the Copilot process
engine.api-target — Custom API endpoint for enterprise GHE
engine.command — Custom executable path (skip installation)
engine.token-weights — Override model cost multipliers
engine.bare — Pass --no-custom-instructions to suppress AGENTS.md

Execution Features:

max-continuations → --autopilot --max-autopilot-continues N (Copilot-only)
sandbox.awf / agent: awf — Network firewall with AWF binary
bare: true — Suppress custom instruction loading
--disable-builtin-mcps — Always applied; no per-workflow override needed
--no-ask-user — Autonomous mode (v1.0.19+, always applied)
--add-dir — Directory access (auto-configured from cache-memory)
--allow-all-paths — Enabled when edit tool is present

Tool & MCP Features:

tools.bash, tools.edit, tools.github, tools.playwright, tools.web-fetch
mcp-scripts — Custom MCP server scripts
cache-memory — Cross-run persistent storage
repo-memory — Git-branch persistent storage
strict — Enable safe-inputs sanitization

Feature Flags:

features.copilot-requests: true — Show Copilot request counts
features.mcp-gateway — MCP gateway routing (unused)
features.copilot-integration-id — Integration telemetry (unused)
features.disable-xpia-prompt — Disable XPIA defense prompt (unused)
features.cli-proxy — CLI proxy routing (unused)

Available Custom Agents (.github/agents/):
adr-writer, agentic-workflows, ci-cleaner, contribution-checker, create-safe-output-type, custom-engine-implementation, developer.instructions, grumpy-reviewer, interactive-agent-designer, technical-doc-writer, w3c-specification-writer

View Usage Statistics

Usage Statistics

Engine Distribution (192 workflows):

Engine	Count	%
Copilot (explicit)	90	47%
Copilot (object format)	19	10%
Copilot (default/implicit)	26	14%
Claude	46	24%
Codex	8	4%
Other/Custom	3	2%

Copilot effective total: ~135 workflows (70%)

2️⃣ Feature Usage Matrix

Feature Category	Available	Used	Not Used	Rate
`engine.version`	✅	0	192	0%
`engine.model`	✅	9	183	5%
`engine.agent`	✅	3	189	2%
`engine.args`	✅	0	192	0%
`engine.env`	✅	0	192	0%
`engine.api-target`	✅	0	192	0%
`engine.bare`	✅	2	190	1%
`engine.token-weights`	✅	0	192	0%
`max-continuations`	✅	2	190	1%
`sandbox.awf`	✅	14	178	7%
`cache-memory`	✅	55	137	29%
`web-fetch`	✅	17	175	9%
`playwright`	✅	12	180	6%
`mcp-scripts`	✅	1	191	1%
`strict`	✅	126	66	66%
`safe-outputs`	✅	157	35	82%
`features.copilot-requests`	✅	46	146	24%
`features.mcp-gateway`	✅	0	192	0%
`tools.timeout`	✅	8	184	4%
`bash:['*']` wildcard	⚠️	37	155	19%

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 High Priority

Opportunity 1: Playwright Regression (-40%)

What: Playwright workflows dropped from 20 to 12 in one day
Why It Matters: Browser automation is a high-value, Copilot-differentiated capability. A 40% single-day drop is unusual and warrants investigation
Where: 8 workflows were using playwright yesterday and aren't today
How to Investigate: Compare lock files for playwright-using workflows; check if any were deleted or converted

Opportunity 2: `bash:['*']` Wildcard Security (37 workflows, 19%)

What: 37 workflows grant the agent unrestricted bash execution (--allow-all-tools) instead of specifying exact tool permissions
Why It Matters: Principle of least privilege — a workflow that only reads issues shouldn't have unlimited shell access

Example fix:

tools:
  bash:
    - 'git log*'
    - 'grep *'
    - 'cat *'
  github:
    toolsets: [repos, issues]

Opportunity 3: Version Pinning at 0% (4-day persistent gap)

What: Zero workflows pin the Copilot CLI version
Why It Matters: Unpinned workflows silently upgrade on each run. A breaking Copilot CLI change could cause all 135 Copilot workflows to fail simultaneously

How to Implement:

engine:
  id: copilot
  version: "1.0.21"  # Pin to known-good version

Trade-off: Requires periodic bump PRs; consider pinning only production/critical workflows

View Medium Priority Opportunities

🟡 Medium Priority

Opportunity 4: `max-continuations` Underuse (1%)

What: Only smoke-copilot (2) and test-quality-sentinel (40) use --autopilot
Why It Matters: This is a Copilot-unique feature not available in Claude or Codex. Long-running tasks like architecture review, comprehensive code refactoring, or multi-file generation could benefit enormously
Best candidates: architecture-guardian, code-scanning-fixer, jsweep, issue-monster

Example:

engine:
  id: copilot
  max-continuations: 10

Opportunity 5: 9/11 Custom Agent Files Unused

What: Only technical-doc-writer (2 workflows) and ci-cleaner (1 workflow) are active; 9 agent files sit idle
Why It Matters: Custom agents provide specialized personas with consistent behavior across multiple runs
Idle agents and where they'd fit:
- adr-writer → ADR creation tasks
- contribution-checker → PR review workflows (currently via generic Copilot)
- grumpy-reviewer → Code review workflows (already exists!)
- agentic-workflows → Workflow generation/editing tasks
- w3c-specification-writer → Documentation standards

Opportunity 6: `bare: true` for Analytics Workflows

What: Only smoke-copilot uses bare: true (--no-custom-instructions)
Why It Matters: Workflows that don't need project-level instructions (daily-fact, poem-bot, daily-news, etc.) load and process AGENTS.md unnecessarily, wasting tokens and potentially confusing the agent
Good candidates: Any "fun" or purely analytical workflow that doesn't touch the codebase:
- daily-fact, poem-bot, daily-news, daily-hippo-learn, constraint-solving-potd
```
engine:
  id: copilot
  bare: true
```

Opportunity 7: `engine.token-weights` at 0%

What: Token weight customization has never been used
Why It Matters: The api-consumption-report and agent-performance-analyzer workflows analyze token costs but use built-in model multipliers. Custom weights would make these reports accurate for non-standard models

Example:

engine:
  id: copilot
  token-weights:
    gpt-5.1-codex-mini: 0.3

View Low Priority Opportunities

🟢 Low Priority

Opportunity 8: `engine.env` for Workflow-Specific Configuration

What: Zero workflows use engine.env to inject custom env vars
Why It Matters: Could allow per-workflow model selection without hardcoding (pass COPILOT_MODEL dynamically) or inject feature flags via environment

Example:

engine:
  id: copilot
  env:
    MY_WORKFLOW_MODE: "analysis"

Opportunity 9: `tools.startup-timeout` for MCP-Heavy Workflows

What: No workflows set startup-timeout (only 8 set any timeout)
Why It Matters: Workflows using many MCP servers can timeout during server initialization; explicit startup timeouts fail fast with clear errors
Best candidates: Workflows with 3+ MCP servers

Opportunity 10: `features.copilot-integration-id` for Telemetry

What: Integration ID tracking has never been used
Why It Matters: Would allow GitHub-side telemetry grouping for this repository's specific Copilot usage patterns

4️⃣ Specific Workflow Recommendations

View Workflow-Specific Recommendations

`architecture-guardian.md`

Current: Generic Copilot engine, no continuations, no custom agent
Recommended: Add max-continuations: 8 (architectural analysis benefits from iterative refinement) + consider agent: grumpy-reviewer for consistency
Benefit: Deeper, more thorough analysis without hitting context limits

`code-scanning-fixer.md`

Current: No continuations, bash wildcard
Recommended: Add max-continuations: 5, restrict bash to ['git *', 'grep *']
Benefit: Can tackle multi-file security fixes in one run

`daily-fact.md` / `poem-bot.md` / `constraint-solving-potd.md`

Current: No bare mode; these load all project instructions needlessly
Recommended: bare: true
Benefit: Faster, cheaper, avoids project-specific instructions confusing fun workflows

`contribution-check.md`

Current: Generic Copilot execution
Recommended: agent: contribution-checker — a purpose-built agent file already exists!
Benefit: Consistent PR review behavior, agent is already authored for this purpose

`archie.md` / `adr-writer`-adjacent workflows

Current: No custom agent
Recommended: Use agent: adr-writer for architectural documentation tasks
Benefit: Specialized agent persona for structured ADR output

Workflows with `copilot-requests: true` (46 workflows)

Current: 24% of workflows track Copilot API usage
Recommended: Add engine.token-weights override to at least api-consumption-report.md for accurate cost reporting
Benefit: Correct token cost calculations using real model pricing

5️⃣ Trends & Insights

View 4-Day Historical Trends (Apr 13–16)

Feature	Apr 13	Apr 14	Apr 15	Apr 16	Trend
Total workflows	191	191	191	192	+1
Copilot workflows	~101	~96	~91	~90	⬇️
Claude workflows	~44	45	45	46	⬆️
playwright	~8	~8	20	12	⚠️ volatile
web-fetch	~15	~15	20	17	~
cache-memory	~36	~36	55	55	⬆️ stable
strict mode	~100	~106	111	126	⬆️ improving
safe-outputs	~161	161	161	157	⬇️ slight
version pinning	0	0	0	0	= persistent
token-weights	0	0	0	0	= persistent
mcp-gateway	0	0	0	0	= persistent

Key Observations:

Playwright volatility: Jumped from ~8 to 20 on Apr 15, dropped to 12 on Apr 16 — this suggests workflows are being added and removed rapidly
Strict adoption: Steady improvement (+15 in one day) shows active security hardening
Cache-memory stability: Held at 55 after jumping from 36 — new adopters are sticking with it
Persistent gaps: Version pinning, token-weights, mcp-gateway have been 0% for all 4 days — these need explicit team decision (use or document as intentionally unused)

6️⃣ Best Practice Guidelines

Based on 4 days of research, here are the recommended best practices:

Use engine.agent for specialized workflows: The .github/agents/ directory has 11 purpose-built agents. Route appropriate workflows through them for consistent, specialized behavior.
Avoid bash:['*']: Replace with explicit allowed patterns. This is the single highest-impact security improvement available — affects 37 workflows (19%).
Enable strict on all input-triggered workflows: Already at 66% and improving. The remaining 34% (66 workflows) that receive untrusted input (issues, PRs, comments) without strict mode are potential injection vectors.
Use bare: true for non-code workflows: Workflows that don't need project context (entertainment, analysis, standalone tools) shouldn't load AGENTS.md — it costs tokens and can confuse the agent.
Pin version for production workflows: At minimum, workflows in the critical path (CI, release, triage) should pin engine.version so they don't silently break on Copilot CLI upgrades.
Use max-continuations for complex tasks: If a task regularly hits the continuation limit, add max-continuations: 5-10. It's a Copilot-unique capability not available in Claude or Codex.

7️⃣ Action Items

Immediate Actions (this week):

Investigate playwright regression: which 8 workflows were removed/changed?
Add agent: contribution-checker to contribution-check.md (zero-effort win)
Add bare: true to daily-fact.md, poem-bot.md, constraint-solving-potd.md

Short-term (this month):

Audit 37 bash:['*'] workflows and replace wildcards with specific patterns
Add max-continuations: 5-10 to complex workflows (architecture-guardian, code-scanning-fixer)
Pin engine.version on at least 5 critical production workflows
Activate agent: grumpy-reviewer for code review workflows

Long-term (this quarter):

Add engine.token-weights to api-consumption-report.md for accurate cost data
Document team decision on features.mcp-gateway, features.copilot-integration-id — use or close gap
Evaluate whether all remaining strict: false workflows genuinely don't process untrusted input

View Supporting Evidence & Methodology

Research Methodology

Data collection: All 192 .github/workflows/*.md files were scanned using ripgrep pattern matching for each feature. Engine distribution counted via exact pattern matching (^engine: copilot, ^engine: claude, etc.).

Sources examined:

pkg/workflow/copilot_engine.go — Engine interface and feature support flags
pkg/workflow/copilot_engine_execution.go — CLI argument construction, feature → flag mapping
pkg/workflow/engine.go — EngineConfig struct (all configurable fields)
.github/agents/*.agent.md — Available custom agent files
CHANGELOG.md — Recent feature additions
/tmp/gh-aw/repo-memory/default/copilot-research-latest.json — Historical trend data (4 days)

Limitations: Counts are based on static file analysis, not runtime execution. Some workflows may have the feature configured but conditionally disabled. Playwright count volatility may be due to rapid workflow authoring/deletion cycles.

References:

§24534029243 — This analysis run
§24478512391 — Previous run (Apr 15)
§24423068207 — Run before that (Apr 14)

Generated by Copilot CLI Deep Research Agent · ● 1.9M · ◷

expires on Apr 17, 2026, 9:15 PM UTC

pelikhan · 2026-04-16T21:22:05Z

pelikhan
Apr 16, 2026
Maintainer

/plan

1 reply

github-actions[bot] Bot Apr 16, 2026
Author

🚀 Plan Command has started processing this discussion comment

2026-04-17T22:55:46Z

github-actions[bot]
Bot Apr 17, 2026
Author

This discussion was automatically closed because it expired on 2026-04-17T21:15:00.303Z.

Closed by Workflow

0 replies

[copilot-cli-research] Copilot CLI Deep Research - 2026-04-16 #26727

Uh oh!

github-actions[bot] Bot Apr 16, 2026

📊 Executive Summary

Critical Findings

🔴 High Priority

🟡 Medium Priority

1️⃣ Current State Analysis

Copilot CLI Capabilities Inventory (v1.0.21)

Usage Statistics

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 High Priority

Opportunity 1: Playwright Regression (-40%)

Opportunity 2: bash:['*'] Wildcard Security (37 workflows, 19%)

Opportunity 3: Version Pinning at 0% (4-day persistent gap)

🟡 Medium Priority

Opportunity 4: max-continuations Underuse (1%)

Opportunity 5: 9/11 Custom Agent Files Unused

Opportunity 6: bare: true for Analytics Workflows

Opportunity 7: engine.token-weights at 0%

🟢 Low Priority

Opportunity 8: engine.env for Workflow-Specific Configuration

Opportunity 9: tools.startup-timeout for MCP-Heavy Workflows

Opportunity 10: features.copilot-integration-id for Telemetry

4️⃣ Specific Workflow Recommendations

architecture-guardian.md

code-scanning-fixer.md

daily-fact.md / poem-bot.md / constraint-solving-potd.md

contribution-check.md

archie.md / adr-writer-adjacent workflows

Workflows with copilot-requests: true (46 workflows)

5️⃣ Trends & Insights

6️⃣ Best Practice Guidelines

7️⃣ Action Items

Research Methodology

Replies: 2 comments · 1 reply

Uh oh!

pelikhan Apr 16, 2026 Maintainer

Uh oh!

github-actions[bot] Bot Apr 16, 2026 Author

Uh oh!

github-actions[bot] Bot Apr 17, 2026 Author

github-actions[bot]
Bot Apr 16, 2026

Opportunity 2: `bash:['*']` Wildcard Security (37 workflows, 19%)

Opportunity 4: `max-continuations` Underuse (1%)

Opportunity 6: `bare: true` for Analytics Workflows

Opportunity 7: `engine.token-weights` at 0%

Opportunity 8: `engine.env` for Workflow-Specific Configuration

Opportunity 9: `tools.startup-timeout` for MCP-Heavy Workflows

Opportunity 10: `features.copilot-integration-id` for Telemetry

`architecture-guardian.md`

`code-scanning-fixer.md`

`daily-fact.md` / `poem-bot.md` / `constraint-solving-potd.md`

`contribution-check.md`

`archie.md` / `adr-writer`-adjacent workflows

Workflows with `copilot-requests: true` (46 workflows)

Replies: 2 comments 1 reply

pelikhan
Apr 16, 2026
Maintainer

github-actions[bot] Bot Apr 16, 2026
Author

github-actions[bot]
Bot Apr 17, 2026
Author