feat(ci): automate apm-triage-panel on new and reopened issues by danielmeppiel · Pull Request #954 · microsoft/apm

danielmeppiel · 2026-04-26T14:13:59Z

TL;DR

Adds an automated triage workflow that:

Runs the existing apm-triage-panel skill on a daily fuzzy schedule (gh-aw "daily" → ~12:49 UTC), processing up to 10 oldest-first untriaged open issues per sweep — hard cost ceiling.
Provides an opt-in fast path: maintainers apply status/needs-triage to trigger immediate re-triage on any specific issue (security reports, urgent items).
Treats every triage decision as agentic proposal pending human ratification — silence approves, any maintainer label edit always wins.
Recompiles all gh-aw workflows on v0.68.3 (was three locks at v0.52.1).

Problem (WHY)

The apm-triage-panel skill exists at .apm/skills/apm-triage-panel/SKILL.md but a maintainer has to invoke it by hand on every issue. Real data from this repo, last 30 days:

Metric	Value
Issues opened	200
Mean / peak per day	6.7 / 17
Bot-authored	19
Got a `theme/*` label eventually	10 / 200 (5%)

Manual triage is severely under-resourced. Automation is the right call. But naive on: issues.opened triggers would mean ~200 panel runs/month with no ceiling — at ~30-50k tokens per panel run that's ~10M LLM tokens/month with no upper bound, and every spike day burns through quota.

Approach (WHAT)

Two cooperative paths plus manual dispatch:

                     event                          
                       |                            
    +------------------+------------------+         
    |                  |                  |         
issues:                schedule:          workflow_  
labeled                "daily" (~12:49Z)  dispatch   
(status/needs-                            (optional  
 triage only,                              issue#)   
 fast path)                                          
    |                  |                  |         
    v                  v                  v         
   on.steps         filter:             gh issue    
   front-gate       open + non-bot +    view #      
   (bot/locked      no status/triaged                
   /closed -> exit) -> oldest 10                    
    |                  |                  |         
    +------------------+------------------+         
                       |                            
                       v                            
              apm-triage-panel skill                
                       |                            
                       v                            
         safe-outputs.add-comment (max:12)          
              + safe-outputs.update-issue            
                                                    
        ----> human-ratification footer on every comment
        ----> apply theme/area/type/priority labels  
        ----> ADD status/triaged (skip-me signal)    
        ----> REMOVE status/needs-triage (consume trigger)
        ----> milestone only if absent (never overwrite)

Lifecycle invariants (designed and documented in the agent prompt)

Event	Behavior
Issue opened by human	Sweep picks up next day; panel triages, adds `status/triaged`
Issue opened by bot	Filtered out; no run
Maintainer applies `status/needs-triage`	Fast path fires within minutes; panel re-runs; refreshes labels; removes `status/needs-triage`; keeps `status/triaged`
Maintainer edits a label (changes/removes a `theme/*`)	No retrigger. Agent never reverts human edits on subsequent runs. Human always wins.
Maintainer removes `status/triaged`	Sweep re-picks the issue next day (treated as fresh untriaged)
Issue reopened	Treated as still-triaged unless `status/triaged` is removed; maintainer applies `status/needs-triage` to force re-triage
Issue locked / closed	Front-gate skips; sweep filters out
17-issue spike day	Cap=10 oldest-first; tail rolls to tomorrow. Capacity 300/mo vs ~140/mo eligible → drains with margin

Implementation (HOW)

Files

.github/workflows/triage-panel.md — gh-aw source (new)
.github/workflows/triage-panel.lock.yml — generated (new)
.github/workflows/{cli-consistency-checker,daily-doc-updater,daily-test-improver}.lock.yml — recompiled v0.52.1 → v0.68.3
.github/workflows/agentics-maintenance.yml — auto-regenerated by gh-aw v0.68.3

Verdict-comment footer (appended verbatim by the agent)

> **Triage status: agentic proposal pending human ratification.**
> Silence is approval. Maintainers can:
> - Override any label or milestone above by editing it directly --
>   human edits are authoritative and will not be reverted on
>   subsequent runs.
> - Re-trigger triage by applying the `status/needs-triage` label, or
>   by removing `status/triaged` to enroll the issue in the next
>   daily sweep.

This puts the human-in-the-loop contract on every issue, visible to contributor and maintainer alike.

Substrate choices

Choice	Rationale
`on: issues.labeled` (filtered to `status/needs-triage` via `on.steps:`)	Mirror of `pr-review-panel.md` pattern. Pre-activation gate kills out-of-scope label events at zero cost.
`schedule: cron 'daily'`	gh-aw fuzzy distributor (resolved to `49 12 * * *`). Avoids deterministic top-of-the-hour load spikes.
Drop `issues.opened` / `issues.reopened` subscriptions	Sweep handles within 24h; maintainer's `status/needs-triage` is the immediate-path escape hatch. Unbounded per-issue triggers would have no cost ceiling.
`concurrency.group` on issue number	Prevents fast-path and sweep racing on the same issue.
APM imports pinned to `@main`	Trusted-skill rule: a malicious issue could otherwise tamper with the triage prompt. Same rationale as `pr-review-panel.md`.
`safe-outputs.update-issue` (not direct `gh issue edit`)	Agent stays read-only; gh-aw mints scoped write only for the safe-outputs job.

Cost shape

Metric	Before (uncapped per-issue)	After (sweep + fast path)
Hard monthly ceiling	None	300 sweep + ~5-20 fast-path = ≤320 panel runs
Typical monthly volume	~200 (uncapped, would grow)	~140-180 (drains the eligible queue)
Spike protection	None	Cap=10/day, tail rolls forward
Worst-case latency for opened issues	Minutes	≤24h (≤48h on consecutive 10+ spike days)
Latency for explicit re-triage requests	n/a	Minutes (fast path)
Maintainer override mechanics	Implicit	Explicit and documented in every comment

Trade-offs

Routine triage latency goes from "minutes" to "≤24h" vs the original per-issue subscription. Acceptable because today's manual baseline is already days-to-never (5% in 30 days), and the maintainer-applied status/needs-triage label is the always-immediate escape hatch for anything urgent.
Sweep cap of 10/day means a sustained 11+/day open rate would let the queue grow. Real data shows mean 6.7/day so margin is large; if the repo becomes far more popular this knob can be raised in a small follow-up.
Agent runs panel sequentially within one sweep (up to 10 invocations per run). This re-uses runner cold-start once but does not amortize LLM tokens across issues (the apm-triage-panel skill is single-issue by contract). A future "lighter classifier first, escalate to panel" two-tier model could cut tokens further; out of scope here.

Validation

$ gh aw compile --approve-updates
✓ .github/workflows/triage-panel.md (71.1 KB)
✓ Compiled 5 workflow(s): 0 error(s), 0 warning(s)

$ actionlint .github/workflows/triage-panel.lock.yml
2 'artifact_prefix' warnings -- pre-existing gh-aw codegen pattern
also present (4x) in pr-review-panel.lock.yml.

Security review (gh-aw safe-update baselines approved)

All flagged items are already in active use elsewhere in the repo:

microsoft/apm-action@v1.4.2 — same pin as pr-review-panel.lock.yml, daily-doc-updater.lock.yml, cli-consistency-checker.lock.yml.
GH_AW_PLUGINS_TOKEN — already configured at the repo level for the same imports.
CREATE_PR_PAT — already declared in daily-doc-updater.md and daily-test-improver.md sources; the v0.52.1 metadata schema simply didn't track it. No new external surface.

Pre-merge checklist

Create the status/triaged label:

gh label create status/triaged \
  --repo microsoft/apm \
  --description "Initial agentic triage complete; pending maintainer ratification (silence = approval)." \
  --color FBCA04

This is the "do not re-sweep me" signal the workflow depends on.

How to test (post-merge)

Fast path smoke test — apply status/needs-triage to any open untriaged issue. Within ~1-2 min the panel should run, post a verdict, apply labels, add status/triaged, and remove status/needs-triage. Confirm project-sync.yml then picks up the theme/* label change and adds the issue to the project board.
Sweep dry run — gh aw run triage-panel (no input) before the first scheduled fire. Confirm the agent finds untriaged issues, processes ≤10 oldest, and skips bot-authored / locked / already-status/triaged ones.
Out-of-scope label test — apply a theme/auth label directly to a fresh issue. The on.steps gate should drop the event before the agent runs (visible as a quick "skipping" step in the Actions log, no agent job).
Human-edit-wins regression — after a sweep triages an issue, manually swap one of its theme/* labels for a different one. Re-trigger by applying status/needs-triage. Verify the new comment surfaces any disagreement as a recommendation but does not revert your edit.

Out of scope

Editing apm-triage-panel/SKILL.md (already shippable; last refactored in 50d24b0a).
Editing project-sync.yml (clean handoff via labels already works).
Two-tier classifier-then-panel triage model (future iteration if cost still too high).
Backfilling v0.9.3 docs site (separate item; fix(ci): deploy docs after bot-cut releases via workflow_call #953 fixes the future case).

Adds .github/workflows/triage-panel.md (gh-aw substrate) that auto-invokes the existing apm-triage-panel skill on issue events: - issues opened - issues reopened - issues labeled with status/needs-triage (filtered via on.steps gate) - workflow_dispatch (manual fallback with issue_number input) Mirrors the proven pattern from pr-review-panel.md: - Pre-activation label-name guard kills the pipeline at the cheapest point for out-of-scope label events (no runner cold-start, no agent spin-up). - APM imports pinned to microsoft/apm@main so the triage panel always runs against the trusted, already-reviewed skill. - Read-only agent permissions; writes go through gh-aw safe-outputs (add-comment max:2, update-issue target:*). - roles: [admin, maintainer, write] matches the trust model of applying / opening triageable issues. Clean separation of concerns: - This workflow decides labels and milestone via the panel skill. - The existing project-sync.yml workflow then picks up theme/* label changes and syncs the issue to the PGS board column. - apm-triage-panel skill content (personas, rubric, output template) is untouched -- this PR only adds the trigger surface. Validation: - gh aw compile triage-panel: 0 errors, 0 warnings. - actionlint on the generated lock: 2 artifact_prefix warnings that also appear (4x) in the existing pr-review-panel.lock.yml -- a known gh-aw compiler artifact, not from this change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three lock files (cli-consistency-checker, daily-doc-updater, daily-test-improver) plus the auto-generated agentics-maintenance.yml were still pinned to gh-aw compiler v0.52.1. pr-review-panel and the new triage-panel are already at v0.68.3, so this aligns the entire repo on a single compiler version. Ran from a clean checkout: gh aw compile --approve-updates Approved baselines: - CREATE_PR_PAT secret -- already declared in daily-doc-updater.md and daily-test-improver.md sources; the v0.52.1 metadata format simply didn't track it. No new external surface. actionlint findings on the regenerated files are all pre-existing gh-aw codegen patterns (artifact_prefix property warnings, empty-string choice option in agentics-maintenance.yml). None are introduced by the upgrade. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danielmeppiel · 2026-04-26T14:15:43Z

Added second commit recompiling all gh-aw workflows on v0.68.3.

Scope of the upgrade: 3 stale lock files (cli-consistency-checker, daily-doc-updater, daily-test-improver) and the auto-generated agentics-maintenance.yml were still pinned to v0.52.1. pr-review-panel and the new triage-panel are already at v0.68.3 — this aligns the whole repo on one compiler version.

Security baseline approved: CREATE_PR_PAT secret was flagged by safe-update mode but is already declared in the daily-doc-updater.md and daily-test-improver.md sources; the v0.52.1 metadata schema simply didn't track it. No new external surface (no new actions, no new secrets beyond what's already in those source files).

actionlint: only pre-existing codegen patterns flagged (artifact_prefix, empty-string choice option in agentics-maintenance) — neither introduced by the upgrade.

Copilot

Pull request overview

Adds a new Agentic Workflows (gh-aw) automation to run the existing apm-triage-panel skill on issue activity, post a synthesized triage comment, and apply the panel-decided labels/milestone via Safe Outputs.

Changes:

Add a new gh-aw source workflow to trigger triage on issues events (opened/reopened/labeled) and workflow_dispatch.
Add the compiled lock workflow generated by gh aw compile.

Show a summary per file

File	Description
.github/workflows/triage-panel.md	New gh-aw workflow definition to orchestrate `apm-triage-panel` on issue events and use Safe Outputs for comments + issue updates.
.github/workflows/triage-panel.lock.yml	Compiled GitHub Actions workflow generated from the gh-aw source, including Safe Outputs tool schema and execution plumbing.

Copilot's findings

Comments suppressed due to low confidence (1)

.github/workflows/triage-panel.lock.yml:623

Safe Outputs update_issue tool schema requires customValidation: requiresOneOf:status,title,body. If this validator is enforced, label/milestone-only updates (the stated intent of this workflow) will fail unless the agent also provides a no-op status/title/body. Either adjust the workflow/prompt to meet this constraint or configure Safe Outputs validation to allow label/milestone-only updates.

              "update_issue": {
                "defaultMax": 1,
                "fields": {
                  "assignees": {
                    "type": "array",
                    "itemType": "string",
                    "itemSanitize": true,
                    "itemMaxLength": 39
                  },
                  "body": {
                    "type": "string",
                    "sanitize": true,
                    "maxLength": 65000
                  },
                  "issue_number": {
                    "issueOrPRNumber": true
                  },
                  "labels": {
                    "type": "array",
                    "itemType": "string",
                    "itemSanitize": true,
                    "itemMaxLength": 128
                  },
                  "milestone": {
                    "optionalPositiveInteger": true
                  },
                  "operation": {
                    "type": "string",
                    "enum": [
                      "replace",
                      "append",
                      "prepend",
                      "replace-island"
                    ]
                  },
                  "repo": {
                    "type": "string",
                    "maxLength": 256
                  },
                  "status": {
                    "type": "string",
                    "enum": [
                      "open",
                      "closed"
                    ]
                  },
                  "title": {
                    "type": "string",
                    "sanitize": true,
                    "maxLength": 128
                  }
                },
                "customValidation": "requiresOneOf:status,title,body"
              }

Files reviewed: 5/6 changed files
Comments generated: 5

Copilot · 2026-04-26T14:19:26Z

+        if [ "$ACTION" = "labeled" ] && [ "$LABEL_NAME" = "status/needs-triage" ]; then
+          echo "Triggering label is 'status/needs-triage' -- proceeding."
+          exit 0
+        fi
+        echo "Event '$ACTION' / label '$LABEL_NAME' is out of scope; skipping."
+        exit 1


The pre-activation label gate exits with status 1 for out-of-scope issues:labeled events. That will mark the workflow run as failed for every non-status/needs-triage label applied to an issue (and potentially for label changes performed by this workflow), which can create significant Actions noise. Consider removing labeled from the trigger (keeping only opened/reopened + workflow_dispatch), or restructuring the gate so out-of-scope label events end in a neutral/green outcome while still preventing agent activation.

Copilot · 2026-04-26T14:19:27Z

+          cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_efa05e814fb58cf7_EOF'
+          {"add_comment":{"max":2},"create_report_incomplete_issue":{},"missing_data":{},"missing_tool":{},"noop":{"max":1,"report-as-issue":"true"},"report_incomplete":{},"update_issue":{"allow_body":true,"max":1,"target":"*"}}
+          GH_AW_SAFE_OUTPUTS_CONFIG_efa05e814fb58cf7_EOF


Safe Outputs config enables update_issue.allow_body: true, which allows the agent to edit issue bodies via the write-sink even though the workflow intent is label/milestone-only updates. If the Safe Outputs framework supports disabling body/title edits for update_issue, it would be safer to do so here to reduce prompt-injection impact (issue bodies are untrusted input).

This issue also appears on line 570 of the same file.

Copilot · 2026-04-26T14:19:27Z

+        fi
+        echo "Event '$ACTION' / label '$LABEL_NAME' is out of scope; skipping."
+        exit 1
+  roles: [admin, maintainer, write]


roles: [admin, maintainer, write] means the workflow only activates when the triggering actor has write access. That prevents auto-triage for issues opened/reopened by external contributors, which is the primary use case described in this PR. Drop the roles gate (or change the triggers to only run on maintainer-applied labels) so newly opened issues from non-members are actually processed.

Suggested change

roles: [admin, maintainer, write]

Copilot · 2026-04-26T14:19:27Z

+# 1. issues: opened / reopened / labeled. We listen on plain `issues`
+#    (not `issues_target`) because issues -- unlike PRs -- don't have a
+#    fork-head untrusted code surface; the only untrusted input is the


The header comment says "issues_target" is an alternative event, but GitHub Actions does not have an issues_target trigger (unlike pull_request_target). This is likely to confuse future maintainers; please reword to just explain why plain issues is sufficient here.

Suggested change

# 1. issues: opened / reopened / labeled. We listen on plain `issues`

# (not `issues_target`) because issues -- unlike PRs -- don't have a

# fork-head untrusted code surface; the only untrusted input is the

# 1. issues: opened / reopened / labeled. Plain `issues` is sufficient

# here because issues -- unlike PRs -- do not introduce a fork-head

# untrusted code execution surface; the only untrusted input is the

Copilot · 2026-04-26T14:19:28Z

+   `safe-outputs.update-issue`. The `theme/*` label, if assigned,
+   will automatically trigger the existing `project-sync.yml`
+   workflow to add the issue to the appropriate PGS board column --
+   no extra action needed here.
+
+Do not perform any other writes. Do not edit the issue body or title.
+Do not close, reopen, lock, or assign the issue.


The prompt instructs the agent to apply only labels/milestone via safe-outputs.update-issue, but the compiled workflow's Safe Outputs tool schema requires at least one of status, title, or body (customValidation requiresOneOf:status,title,body). With the current instructions, label-only updates are likely to be rejected by the Safe Outputs validator and the workflow won't apply the panel's decisions. Update the instructions (or Safe Outputs config, if possible) so the agent includes a no-op status value (e.g., open) when updating labels/milestone, or otherwise satisfies the validator without changing title/body.

Suggested change

`safe-outputs.update-issue`. The `theme/*` label, if assigned,

will automatically trigger the existing `project-sync.yml`

workflow to add the issue to the appropriate PGS board column --

no extra action needed here.

Do not perform any other writes. Do not edit the issue body or title.

Do not close, reopen, lock, or assign the issue.

`safe-outputs.update-issue`, and include `status: open` in that

update payload as a validator no-op. This satisfies the Safe

Outputs schema requirement that at least one of `status`, `title`,

or `body` be present, while preserving the existing issue state and

avoiding any title/body edits. The `theme/*` label, if assigned,

will automatically trigger the existing `project-sync.yml`

workflow to add the issue to the appropriate PGS board column --

no extra action needed here.

Do not perform any other writes. Do not edit the issue body or title.

Do not close, reopen, lock, or assign the issue. When calling

`safe-outputs.update-issue` for labels/milestone, use `status:

open` only as the required no-op field for validator compliance.

…ication framing Reshapes the triage workflow trigger model to bound LLM cost without sacrificing 'no genuine issue left untriaged'. Grounded in actual repo data: 200 issues / 30 days mean 6.7/day, peak 17/day, 19 bot-authored, and only 5% currently get a theme/* label -- so unbounded per-issue runs would mean ~10M tokens/month with no ceiling. Trigger model: 1. SCHEDULED_SWEEP (gh-aw fuzzy daily, ~12:49 UTC): finds open human-authored issues lacking 'status/triaged', oldest first, processes up to 10 per run. Hard ceiling: 10 * 30 = 300 panel runs / month. Drains the typical queue (~140 eligible/month) with headroom for spike days. 2. OPT_IN_RETRIAGE (issues: labeled, status/needs-triage): explicit fast-path for maintainers who need immediate triage, e.g. on a security report. Fires within minutes. Front-gate filters bots, locked, closed at zero cost. 3. MANUAL_DISPATCH (workflow_dispatch with optional issue_number): replay / debugging. Lifecycle invariants now explicitly designed and documented in the agent prompt: - Bots are never triaged (front-gate + sweep filter). - Every human-authored open issue is triaged within 24h, or sooner if a maintainer applies status/needs-triage. - Maintainer label edits ALWAYS WIN: agent never reverts human label changes; the panel may surface disagreement in the verdict comment but never silently overrides. - status/triaged is the 'do not re-sweep me' signal. Removing it re-enrolls the issue in the next sweep. Applying status/needs- triage bypasses the sweep cap entirely (immediate fast-path). - status/needs-triage is consumed (removed) by the workflow after re-triage so the trigger doesn't loop. - Existing milestone is never overwritten -- panel only sets a milestone if none exists today. Verdict comment now appends a verbatim 'agentic proposal pending human ratification, silence is approval' footer that documents the override mechanics inline with every triage comment, so contributors and maintainers see the human-in-the-loop framing on the issue itself. Concurrency group prevents double-triage on the same issue when fast path and sweep race. Compile: 0 errors, 0 warnings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Four targeted hardenings address realistic abuse vectors that could inflate panel cost beyond the per-month ceiling. 1. Body size cap (64 KB / 65536 chars) -- protects against a single adversarially-large issue body inflating per-run token cost from ~50k to >500k. Generous margin: typical APM PRD/design issues are <16 KB. Truncated bodies are flagged in the verdict so a maintainer can read the original. 2. Per-author quota in scheduled sweep (max 2 issues per distinct author per sweep) -- prevents a sock-puppet account from monopolizing daily triage capacity (10 slots/day). Excess issues from the same author roll to subsequent sweeps; they still get triaged eventually, just spread across days. 3. Output safety rails / batch allow-list -- explicit prompt-level rail that the agent computes its write allow-list (BATCH_ALLOW_LIST) from the workflow-controlled candidate selection BEFORE reading any issue body, so prompt-injection in a body cannot redirect writes to unrelated issues. Compensates for safe-outputs.update-issue.target: "*" being required for sweep mode (gh-aw does not support per-mode targets). Every workflow run logs every safe-output call for audit. 4. Spam-shape filter in scheduled sweep -- crude pre-panel filter skips issues whose bodies are >50 consecutive identical chars, >80% URLs, dominated by a single 3-char repeat, or have <20 chars of real content. Skipped issues stay in queue for manual review; maintainers can force a panel run via status/needs-triage. Threat model NOT addressed by this PR: open-close-reopen loop (workflow does not subscribe to those events; cannot fire), label manipulation by external users (gated by repo write permission), fabricated bot login (we use user.type == Bot which is unspoofable from user side). Lock file unchanged because gh-aw embeds the prompt via runtime-import, so .md edits propagate without recompile changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danielmeppiel · 2026-04-26T14:54:34Z

Hardening commit (f0e7abe) addresses the open-close-loop concern + adjacent abuse vectors.

What I checked first

The specific attack named ("open-close issue indefinitely to burn tokens") doesn't actually fire any triggers in this PR -- workflow subscribes to issues.labeled (gated to write+ via roles:) plus schedule plus workflow_dispatch. No opened / reopened / closed subscriptions. Open-close-reopen as an external OSS user = zero panel runs.

Real surfaces I found and hardened

#	Surface	Hardening
1	Sock-puppet open spam (one actor floods 100 issues; sweep cap=10/day = 10 days of monopolized capacity)	Per-author quota: max 2 issues per distinct author per sweep
2	One issue with a 500 KB body inflates per-run tokens 10x	Body size cap: 64 KB / 65536 chars (4-8x typical APM issue), truncation flagged in verdict
3	Prompt-injection in body redirects writes to unrelated issues (`update-issue.target: "*"` is required by sweep)	Batch allow-list rail: agent computes BATCH_ALLOW_LIST from workflow-controlled candidate selection BEFORE reading any body; documented as a hard rail in the prompt; every safe-output call audited via run logs
4	Low-effort body spam	Spam-shape filter: skips bodies with >50 consecutive identical chars, >80% URLs, dominated by 3-char repeats, or <20 chars of real content; not auto-`status/triaged` so maintainers can force review

Why `update-issue.target` stays `"*"`

gh-aw doesn't support per-event/per-mode targets. Sweep mode legitimately needs to update N different issues per run; fast path needs only the triggering one. Splitting into two workflows would be over-engineering. Mitigation #3 (BATCH_ALLOW_LIST as agent-discipline rail + audit log) gets ~80% of the protection of target: "triggering" without the workflow split.

NOT in scope

Account-age / contribution heuristics (biases against new genuine contributors; GitHub's own abuse detection handles the low end)
Cumulative-runs circuit breaker (the existing 320-runs/month cap from sweep + fast-path bounds is already the ceiling)
Two-tier classifier+panel cost optimization (deferred to follow-up if 320/month is still too expensive in practice)

danielmeppiel · 2026-04-26T16:24:15Z

Live validation outcome

Ran two end-to-end scenarios against the merged workflow on main. Validation iterated four times on scenario 1 and once on scenario 5; three real bugs were caught and fixed in the process, all now landed on main.

Scenarios run

#	Mode	Mock issue	What we tested	Result
1	`workflow_dispatch`	#955 (closed)	Genuine docs-quality triage on a `priority/low` issue	PASS after 4 iterations
5	`issues: [labeled]` (`status/needs-triage`)	#956 (closed)	Fast-path event-driven trigger	PASS on first attempt

Both issues received: full panel verdict comment, complete label set (8 labels each, including theme/* where warranted), milestone 0.9.4, and the status/triaged "do not re-sweep me" signal. For #956, status/needs-triage was correctly removed (consumed) when the panel finished.

Scenarios 2 and 3 (spam-shape filter, empty-body filter) only exercise the SCHEDULED_SWEEP candidate selection logic and are not testable via workflow_dispatch -- they will run in the daily cron sweep and surface as observability data over time. Scenario 4 (oversized body) is a Step 1 in-prompt rail that operates identically across all three execution modes; covered indirectly by scenario 1.

Bugs caught during validation (all fixed on `main`)

1. `update-issue` cannot apply labels or milestones (`f2fa33f9`)

The first run posted a verdict comment but applied zero labels and no milestone. Inspection of agent_output.json revealed the validator rejected every write with:

update_issue requires at least one of: 'status', 'title', 'body' fields

gh-aw update-issue only accepts status / title / body -- pure label or milestone updates are rejected even though the JSON schema lists labels / milestone as accepted fields. Verified against gh-aw v0.68.3 source at pkg/workflow/safe_outputs_validation_config.go:135-148 (requiresOneOf:status,title,body constraint).

Fix: replaced update-issue with three dedicated tools per the gh-aw safe-outputs reference: add-labels, remove-labels, assign-milestone. Each tool takes an explicit item_number / issue_number in the payload, so SCHEDULED_SWEEP can still hit multiple distinct issues per run.

Schema gotcha: target: "*" is supported on add-labels / remove-labels but not assign-milestone in v0.68.3 (field absent from its JSON schema). Compiler raises Unknown property: target if you set it. The runtime tool accepts issue_number directly, so multi-issue dispatch still works without it.

2. `allowed` field is strict-equality only, not glob (`5fe59094`)

Second run applied only 4 of 7 decided labels. Missing: area/cli, area/docs-site, type/docs, priority/low. The verdict comment had every one of them. Verified against actions/setup/js/safe_output_validator.cjs:122:

validLabels = validLabels.filter(label => allowedLabels.includes(label));

allowed is strict Array.prototype.includes, not glob. Only blocked supports glob patterns (via matchesSimpleGlob). Every entry like theme/*, area/*, type/*, priority/* matched nothing.

Fix: enumerated APM's full label taxonomy literally (3 themes + 14 areas + 8 types + 2 priorities + 5 status states + 3 community labels = 35 entries). status/needs-triage is intentionally omitted so only humans can apply that fast-path trigger -- the panel cannot inject its own re-trigger.

Documented in the prompt body so the agent flags any novel label in the verdict comment instead of silently dropping it.

3. Verdict <-> applied-state drift on milestone (`666925f2`)

Third run had verdict text saying "Milestone: 0.9.4" but the agent did not emit assign_milestone. No errors -- the agent simply omitted the call. This produced a confusing UX: maintainers reading the verdict would expect the milestone to be set, but it was not.

Fix: tightened Step 3 wording to require that emission and verdict text agree. The agent now MUST emit assign_milestone whenever the verdict comment names a milestone; the only legal way to skip is to omit milestone from the verdict entirely.

What the panel produced (final state on #955 and #956)

#955 (docs-quality, `priority/low`)

area/cli, area/docs-site, type/docs, status/accepted, priority/low,
good first issue, status/triaged
milestone: 0.9.4

Verdict identified the exact two-file scope (src/apm_cli/cli.py + docs/.../compilation/index.md), endorsed the author's proposed canonical phrasing, and left a courteous acknowledgement comment. Six personas weighed in (DevX UX, Supply Chain, OSS Growth, Python Architect, Doc Writer, CEO).

#956 (lockfile portability bug, `priority/high`)

theme/portability, area/docs-site, area/lockfile, type/bug,
status/accepted, priority/high, status/triaged
milestone: 0.9.4

Panel correctly raised this to theme/portability (cross-cutting label) and priority/high (vs the priority/low for #955), demonstrating the persona routing distinguishes severity correctly. status/needs-triage was consumed as designed.

Cosmetic note

When the activation step gates a run out (e.g., issue created with a non-trigger label), GitHub records the workflow run as failure because the activation script exits non-zero by design. This is gh-aw's standard early-exit pattern and produces a misleading red badge on the Actions tab for legitimate skips. Not a correctness issue; flagging for follow-up if it becomes an annoyance signal in practice.

Token spend

Each panel invocation runs 3-6 personas plus arbitration. Single-issue dispatch / fast-path mode posts one comment, applies labels, sets milestone -- well under the 30-minute timeout. Daily cron sweep handles up to 10 issues per run, capped at add-comment.max: 12 and add-labels.max: 70. Empirically the four iterations on #955 + one iteration on #956 ran without rate-limit issues.

Residual concerns

Scenarios 2/3/4 will only get exercised by the daily cron once status/needs-triage issues exist organically. We will see them in the first sweep.
The activation-gate failure cosmetic above is the only known UX wart.
add-labels allowlist will need to be updated whenever APM's label taxonomy grows. Documented in the workflow frontmatter.

The triage panel is operational on main and ready for organic traffic.

Validation conducted in run 24960282698 (initial bug discovery), 24960673185 (post-fix #1), 24960874890 (post-fix #2), 24961051684 (post-fix #3), and 24961194910 (scenario 5).

danielmeppiel · 2026-04-26T17:06:01Z

Triage Panel — architecture reference

Note

This is a self-contained reference for the system landed by this PR. If you maintain microsoft/apm and want issues to get triaged — or want to understand exactly what gets written to your repo and why — start here.

TL;DR

A daily scheduled sweep plus a one-click opt-in fast-path that runs the apm-triage-panel skill against open, untriaged, human-authored issues. The agent posts one synthesized verdict comment per issue and applies the panel-decided theme/*, area/*, type/*, priority/*, plus status/triaged labels and (if the issue has none today) a milestone. Every write is framed as an agentic proposal pending human ratification — maintainer edits are authoritative and never reverted.

Problem (WHY)

New issues sat untriaged for days. The review-panel skill existed for PRs; intake had no analogue.
A naive issues: [opened, reopened] trigger is exploitable: any OSS user could open/close/reopen an issue indefinitely and burn LLM budget. Cost had to be bounded structurally.
Triage is multi-axis (theme + area + type + priority + milestone) and benefits from multiple specialist lenses, mirroring the shape of apm-review-panel. A single-prompt classifier is the wrong tool.
gh-aw safe-outputs is the only sanctioned write surface (read-only agent + scoped writes). Direct gh mutations from agent steps would bypass the audit trail.

Approach (WHAT)

Decision	Choice	Why
Trigger model	Daily `cron: 'daily'` sweep + opt-in `status/needs-triage` label + `workflow_dispatch`	Bounds cost (~300 runs/month ceiling); preserves a one-click maintainer escape hatch; no exploitable per-issue trigger
Panel surface	Reuse `apm-triage-panel` skill, identical orchestration shape to `apm-review-panel`	One mental model for both intake and review; 3 mandatory + 3 conditional persona lenses + CEO arbiter
Write surface	`safe-outputs` only (`add-comment`, `add-labels`, `remove-labels`, `assign-milestone`)	Read-only agent; every write is auditable in run logs; scoped tokens
Reversibility	Every label is human-revertable; only `status/needs-triage` is in the `remove-labels` allow-list	Humans apply, only humans remove (except the one trigger label the agent consumes)

Implementation (HOW)

File	Role
`.github/workflows/triage-panel.md`	Workflow source: trigger model, front-gate, concurrency, safe-outputs config, mode-routing prompt
`.github/workflows/triage-panel.lock.yml`	Compiled artifact (do not hand-edit; regenerated by `gh aw compile`)
`.apm/skills/apm-triage-panel/SKILL.md`	The triage panel skill: persona roster, routing topology, completeness gate, single-comment template

Three execution modes share Step 2 (run the panel) and Step 3 (apply decisions); they differ only in Step 1 (which issues to triage):

SCHEDULED_SWEEP — pulls up to 200 open issues, drops bots/locked/already-triaged/empty/spam, applies a per-author quota of 2/sweep, takes the oldest 10. Quiet sweep = no comment.
OPT_IN_RETRIAGE — single issue, triggered by a maintainer applying status/needs-triage. Treats existing human labels as authoritative; consumes the trigger label on completion.
MANUAL_DISPATCH — single issue via workflow_dispatch with optional issue_number. Useful for replay/debug.

Defensive layers, in order:

Workflow-level front-gate (shell if): rejects non-status/needs-triage label events, bot authors, locked or closed issues — at zero LLM cost.
Concurrency group keyed on issue.number || inputs.issue_number || 'sweep' — prevents two panel runs racing on the same issue.
Body size cap of 64 KB before the body enters reasoning; oversized bodies are flagged in the verdict.
Spam-shape filter (>50 consecutive identical chars / >80% URLs / >70% repeated 3-char substring / <20 alphanumerics): silently drops without applying status/triaged so the issue stays in queue for human review.
BATCH_ALLOW_LIST is computed before any issue body is read, so prompt injection in a body cannot redirect a write to an unrelated issue.

Diagrams

Trigger fan-in into the shared three-step pipeline:

flowchart LR
  A1[issues: labeled] -->|status/needs-triage only| G[Front-gate shell step]
  A2[schedule: daily] --> M{Mode}
  A3[workflow_dispatch] --> M
  G --> M
  M -->|labeled event| R1[OPT_IN_RETRIAGE]
  M -->|cron OR dispatch w/o issue| R2[SCHEDULED_SWEEP]
  M -->|dispatch w/ issue_number| R3[MANUAL_DISPATCH]
  R1 --> S1[Step 1: gather candidates]
  R2 --> S1
  R3 --> S1
  S1 --> S2[Step 2: run apm-triage-panel skill]
  S2 --> S3[Step 3: emit verdict + apply labels/milestone]

Per-issue Step 3 emission, showing the safe-output surface and the human-ratification contract:

sequenceDiagram
  autonumber
  participant Agent
  participant SO as safe-outputs
  participant Issue as GitHub issue
  participant Maint as Maintainer
  Agent->>SO: add-comment (verdict + ratification footer)
  SO->>Issue: post comment
  Agent->>SO: add-labels (theme/area/type/priority + status/triaged)
  SO->>Issue: apply labels (allow-list strict-equality)
  Agent->>SO: remove-labels (status/needs-triage if present)
  SO->>Issue: consume trigger label
  Agent->>SO: assign-milestone (only if issue had none)
  SO->>Issue: set milestone
  Note over Issue,Maint: Silence is approval. Any human edit is authoritative and not reverted.
  Maint->>Issue: optional override (edit label / milestone)

Trade-offs

Daily latency, not realtime. A new issue waits up to ~24h for the sweep, or one click for the fast-path. Chosen over issues: [opened] because the latter has no cost ceiling and is exploitable. Today's manual baseline was days-to-never, so 24h is a strict improvement.
allowed: is strict-equality, not glob. Every legal label is enumerated literally in the workflow (35 entries). Adding a new taxonomy label means a one-line workflow edit + recompile. Surfaced explicitly because it's a footgun: unlisted labels are silently dropped by the safe-output handler.
Agent reads, never writes directly. All mutations flow through safe-outputs. Costs one extra hop; gains a fully auditable write trail and scoped tokens.
No automatic milestone overwrite. The agent only assigns a milestone when the issue has none. Existing milestones are a maintainer call. Slightly weaker triage on already-milestoned issues; eliminates a whole class of "agent moved my milestone" complaints.
Spam-shape detection is heuristic. False negatives reach the panel (cost: one panel run); false positives stay in the queue without status/triaged (cost: a maintainer must apply status/needs-triage to force a run). Tuned conservative on purpose.

Benefits

Bounded LLM spend. Hard ceiling of ~300 runs/month (10/day × 30) regardless of issue volume; no per-issue trigger means no exploit surface.
One-click priority override. Maintainers re-trigger triage on any issue by applying status/needs-triage — instant fast-path, no dispatch UI.
Auditable writes. Every label, milestone, and comment is recorded in workflow run logs with the originating issue number; breach of BATCH_ALLOW_LIST would be visible post-hoc.
Drop-in extensibility. New persona lenses are added by editing the skill (.apm/skills/apm-triage-panel/SKILL.md); no workflow change. New labels are added by extending allowed: in the workflow frontmatter.
Silence-is-approval contract prevents agent-vs-human label thrash. Maintainers stay in charge by default.

Validation

Live-validation evidence is in the prior comment on this PR. Three bugs in safe-outputs configuration were caught and fixed during validation; both the dispatch path (issue #955) and the status/needs-triage fast-path (issue #956) verified end-to-end with correct labels, milestone, and comment in their final states.

Compile check (run locally to verify the lock is in sync)

cd /path/to/microsoft-apm
gh aw compile .github/workflows/triage-panel.md
git diff --exit-code .github/workflows/triage-panel.lock.yml

How to use this — three workflows for maintainers

Default path (do nothing). Open issues are auto-swept once daily, oldest-first, capped at 10. You'll see a verdict comment with a proposed label/milestone set. Override anything you disagree with — the agent will not revert.
Need this triaged now. Apply the status/needs-triage label to any open, human-authored issue. The fast-path fires within ~minutes, posts the verdict, applies labels, then removes status/needs-triage (consuming the trigger).
Replay / debug a specific issue. Run the workflow manually via the Actions tab → "Triage Panel" → "Run workflow", optionally with issue_number=N. Without an issue number, dispatch runs the full sweep on demand.

To re-enroll an issue into the next daily sweep, remove the status/triaged label.

To extend the taxonomy, add the label in GitHub, then add it to safe-outputs.add-labels.allowed in .github/workflows/triage-panel.md and run gh aw compile. Unlisted labels are silently dropped by the safe-output handler — this is a feature (label allow-listing) not a bug.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

* chore(release): cut 0.9.4 CHANGELOG entry for 0.9.4 covers all 7 PRs merged since v0.9.3: - #974 SKILL_BUNDLE day-0 install parity (Added) - #954 automate apm-triage-panel workflow (Added) - #970 python-architect mermaid classDiagram trap (Changed) - #911 REQUESTS_CA_BUNDLE TLS validation (Fixed) - #971 triage-panel project-sync dispatch (Fixed) - #910 CLI consistency cleanup (Fixed) - #958 issue templates label taxonomy (Fixed) - #953 docs auto-deploy after bot-cut releases (Fixed) Open milestone 0.9.4 issues (41) reassigned to 0.9.5. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(changelog): tighten 0.9.4 entries (so-what for developers) Refactor per Keep-a-Changelog spirit: lead with developer impact, trim agent-internals prose, group maintainer-only changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(changelog): add #660 install.sh air-gapped entry to 0.9.4 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 26, 2026 14:14

Copilot started reviewing on behalf of danielmeppiel April 26, 2026 14:14 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

danielmeppiel and others added 2 commits April 26, 2026 16:30

danielmeppiel merged commit 0cbe742 into main Apr 26, 2026
7 checks passed

danielmeppiel deleted the feat/triage-panel-automation branch April 26, 2026 15:30

This was referenced Apr 26, 2026

[VALIDATION-1] Inconsistent --dry-run docs between CLI help and compilation guide #955

Closed

[VALIDATION-5] Lockfile portability: absolute paths leak across machines #956

Closed

This was referenced Apr 26, 2026

fix(templates): use canonical taxonomy labels in issue templates #958

Merged

chore(release): cut 0.9.4 #978

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): automate apm-triage-panel on new and reopened issues#954

feat(ci): automate apm-triage-panel on new and reopened issues#954
danielmeppiel merged 4 commits intomainfrom
feat/triage-panel-automation

danielmeppiel commented Apr 26, 2026 •

edited

Loading

Uh oh!

danielmeppiel commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

danielmeppiel commented Apr 26, 2026

Uh oh!

Uh oh!

danielmeppiel commented Apr 26, 2026

Uh oh!

danielmeppiel commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielmeppiel commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Problem (WHY)

Approach (WHAT)

Lifecycle invariants (designed and documented in the agent prompt)

Implementation (HOW)

Files

Verdict-comment footer (appended verbatim by the agent)

Substrate choices

Cost shape

Trade-offs

Validation

Security review (gh-aw safe-update baselines approved)

Pre-merge checklist

How to test (post-merge)

Out of scope

Uh oh!

danielmeppiel commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

danielmeppiel commented Apr 26, 2026

What I checked first

Real surfaces I found and hardened

Why update-issue.target stays "*"

NOT in scope

Uh oh!

Uh oh!

danielmeppiel commented Apr 26, 2026

Live validation outcome

Scenarios run

Bugs caught during validation (all fixed on main)

1. update-issue cannot apply labels or milestones (f2fa33f9)

2. allowed field is strict-equality only, not glob (5fe59094)

3. Verdict <-> applied-state drift on milestone (666925f2)

What the panel produced (final state on #955 and #956)

#955 (docs-quality, priority/low)

#956 (lockfile portability bug, priority/high)

Cosmetic note

Token spend

Residual concerns

Uh oh!

danielmeppiel commented Apr 26, 2026

Triage Panel — architecture reference

TL;DR

Problem (WHY)

Approach (WHAT)

Implementation (HOW)

Diagrams

Trade-offs

Benefits

Validation

How to use this — three workflows for maintainers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

danielmeppiel commented Apr 26, 2026 •

edited

Loading

Why `update-issue.target` stays `"*"`

Bugs caught during validation (all fixed on `main`)

1. `update-issue` cannot apply labels or milestones (`f2fa33f9`)

2. `allowed` field is strict-equality only, not glob (`5fe59094`)

3. Verdict <-> applied-state drift on milestone (`666925f2`)

#955 (docs-quality, `priority/low`)

#956 (lockfile portability bug, `priority/high`)