Skip to content

feat(ci): automate apm-triage-panel on new and reopened issues#954

Merged
danielmeppiel merged 4 commits intomainfrom
feat/triage-panel-automation
Apr 26, 2026
Merged

feat(ci): automate apm-triage-panel on new and reopened issues#954
danielmeppiel merged 4 commits intomainfrom
feat/triage-panel-automation

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel commented Apr 26, 2026

TL;DR

Adds an automated triage workflow that:

  • Runs the existing apm-triage-panel skill on a daily fuzzy schedule (gh-aw "daily" → ~12:49 UTC), processing up to 10 oldest-first untriaged open issues per sweep — hard cost ceiling.
  • Provides an opt-in fast path: maintainers apply status/needs-triage to trigger immediate re-triage on any specific issue (security reports, urgent items).
  • Treats every triage decision as agentic proposal pending human ratification — silence approves, any maintainer label edit always wins.
  • Recompiles all gh-aw workflows on v0.68.3 (was three locks at v0.52.1).

Problem (WHY)

The apm-triage-panel skill exists at .apm/skills/apm-triage-panel/SKILL.md but a maintainer has to invoke it by hand on every issue. Real data from this repo, last 30 days:

Metric Value
Issues opened 200
Mean / peak per day 6.7 / 17
Bot-authored 19
Got a theme/* label eventually 10 / 200 (5%)

Manual triage is severely under-resourced. Automation is the right call. But naive on: issues.opened triggers would mean ~200 panel runs/month with no ceiling — at ~30-50k tokens per panel run that's ~10M LLM tokens/month with no upper bound, and every spike day burns through quota.

Approach (WHAT)

Two cooperative paths plus manual dispatch:

                     event                          
                       |                            
    +------------------+------------------+         
    |                  |                  |         
issues:                schedule:          workflow_  
labeled                "daily" (~12:49Z)  dispatch   
(status/needs-                            (optional  
 triage only,                              issue#)   
 fast path)                                          
    |                  |                  |         
    v                  v                  v         
   on.steps         filter:             gh issue    
   front-gate       open + non-bot +    view #      
   (bot/locked      no status/triaged                
   /closed -> exit) -> oldest 10                    
    |                  |                  |         
    +------------------+------------------+         
                       |                            
                       v                            
              apm-triage-panel skill                
                       |                            
                       v                            
         safe-outputs.add-comment (max:12)          
              + safe-outputs.update-issue            
                                                    
        ----> human-ratification footer on every comment
        ----> apply theme/area/type/priority labels  
        ----> ADD status/triaged (skip-me signal)    
        ----> REMOVE status/needs-triage (consume trigger)
        ----> milestone only if absent (never overwrite)

Lifecycle invariants (designed and documented in the agent prompt)

Event Behavior
Issue opened by human Sweep picks up next day; panel triages, adds status/triaged
Issue opened by bot Filtered out; no run
Maintainer applies status/needs-triage Fast path fires within minutes; panel re-runs; refreshes labels; removes status/needs-triage; keeps status/triaged
Maintainer edits a label (changes/removes a theme/*) No retrigger. Agent never reverts human edits on subsequent runs. Human always wins.
Maintainer removes status/triaged Sweep re-picks the issue next day (treated as fresh untriaged)
Issue reopened Treated as still-triaged unless status/triaged is removed; maintainer applies status/needs-triage to force re-triage
Issue locked / closed Front-gate skips; sweep filters out
17-issue spike day Cap=10 oldest-first; tail rolls to tomorrow. Capacity 300/mo vs ~140/mo eligible → drains with margin

Implementation (HOW)

Files

  • .github/workflows/triage-panel.md — gh-aw source (new)
  • .github/workflows/triage-panel.lock.yml — generated (new)
  • .github/workflows/{cli-consistency-checker,daily-doc-updater,daily-test-improver}.lock.yml — recompiled v0.52.1 → v0.68.3
  • .github/workflows/agentics-maintenance.yml — auto-regenerated by gh-aw v0.68.3

Verdict-comment footer (appended verbatim by the agent)

> **Triage status: agentic proposal pending human ratification.**
> Silence is approval. Maintainers can:
> - Override any label or milestone above by editing it directly --
>   human edits are authoritative and will not be reverted on
>   subsequent runs.
> - Re-trigger triage by applying the `status/needs-triage` label, or
>   by removing `status/triaged` to enroll the issue in the next
>   daily sweep.

This puts the human-in-the-loop contract on every issue, visible to contributor and maintainer alike.

Substrate choices

Choice Rationale
on: issues.labeled (filtered to status/needs-triage via on.steps:) Mirror of pr-review-panel.md pattern. Pre-activation gate kills out-of-scope label events at zero cost.
schedule: cron 'daily' gh-aw fuzzy distributor (resolved to 49 12 * * *). Avoids deterministic top-of-the-hour load spikes.
Drop issues.opened / issues.reopened subscriptions Sweep handles within 24h; maintainer's status/needs-triage is the immediate-path escape hatch. Unbounded per-issue triggers would have no cost ceiling.
concurrency.group on issue number Prevents fast-path and sweep racing on the same issue.
APM imports pinned to @main Trusted-skill rule: a malicious issue could otherwise tamper with the triage prompt. Same rationale as pr-review-panel.md.
safe-outputs.update-issue (not direct gh issue edit) Agent stays read-only; gh-aw mints scoped write only for the safe-outputs job.

Cost shape

Metric Before (uncapped per-issue) After (sweep + fast path)
Hard monthly ceiling None 300 sweep + ~5-20 fast-path = ≤320 panel runs
Typical monthly volume ~200 (uncapped, would grow) ~140-180 (drains the eligible queue)
Spike protection None Cap=10/day, tail rolls forward
Worst-case latency for opened issues Minutes ≤24h (≤48h on consecutive 10+ spike days)
Latency for explicit re-triage requests n/a Minutes (fast path)
Maintainer override mechanics Implicit Explicit and documented in every comment

Trade-offs

  • Routine triage latency goes from "minutes" to "≤24h" vs the original per-issue subscription. Acceptable because today's manual baseline is already days-to-never (5% in 30 days), and the maintainer-applied status/needs-triage label is the always-immediate escape hatch for anything urgent.
  • Sweep cap of 10/day means a sustained 11+/day open rate would let the queue grow. Real data shows mean 6.7/day so margin is large; if the repo becomes far more popular this knob can be raised in a small follow-up.
  • Agent runs panel sequentially within one sweep (up to 10 invocations per run). This re-uses runner cold-start once but does not amortize LLM tokens across issues (the apm-triage-panel skill is single-issue by contract). A future "lighter classifier first, escalate to panel" two-tier model could cut tokens further; out of scope here.

Validation

$ gh aw compile --approve-updates
✓ .github/workflows/triage-panel.md (71.1 KB)
✓ Compiled 5 workflow(s): 0 error(s), 0 warning(s)

$ actionlint .github/workflows/triage-panel.lock.yml
2 'artifact_prefix' warnings -- pre-existing gh-aw codegen pattern
also present (4x) in pr-review-panel.lock.yml.

Security review (gh-aw safe-update baselines approved)

All flagged items are already in active use elsewhere in the repo:

  • microsoft/apm-action@v1.4.2 — same pin as pr-review-panel.lock.yml, daily-doc-updater.lock.yml, cli-consistency-checker.lock.yml.
  • GH_AW_PLUGINS_TOKEN — already configured at the repo level for the same imports.
  • CREATE_PR_PAT — already declared in daily-doc-updater.md and daily-test-improver.md sources; the v0.52.1 metadata schema simply didn't track it. No new external surface.

Pre-merge checklist

  • Create the status/triaged label:
    gh label create status/triaged \
      --repo microsoft/apm \
      --description "Initial agentic triage complete; pending maintainer ratification (silence = approval)." \
      --color FBCA04
    
    This is the "do not re-sweep me" signal the workflow depends on.

How to test (post-merge)

  1. Fast path smoke test — apply status/needs-triage to any open untriaged issue. Within ~1-2 min the panel should run, post a verdict, apply labels, add status/triaged, and remove status/needs-triage. Confirm project-sync.yml then picks up the theme/* label change and adds the issue to the project board.

  2. Sweep dry rungh aw run triage-panel (no input) before the first scheduled fire. Confirm the agent finds untriaged issues, processes ≤10 oldest, and skips bot-authored / locked / already-status/triaged ones.

  3. Out-of-scope label test — apply a theme/auth label directly to a fresh issue. The on.steps gate should drop the event before the agent runs (visible as a quick "skipping" step in the Actions log, no agent job).

  4. Human-edit-wins regression — after a sweep triages an issue, manually swap one of its theme/* labels for a different one. Re-trigger by applying status/needs-triage. Verify the new comment surfaces any disagreement as a recommendation but does not revert your edit.

Out of scope

  • Editing apm-triage-panel/SKILL.md (already shippable; last refactored in 50d24b0a).
  • Editing project-sync.yml (clean handoff via labels already works).
  • Two-tier classifier-then-panel triage model (future iteration if cost still too high).
  • Backfilling v0.9.3 docs site (separate item; fix(ci): deploy docs after bot-cut releases via workflow_call #953 fixes the future case).

Adds .github/workflows/triage-panel.md (gh-aw substrate) that auto-invokes
the existing apm-triage-panel skill on issue events:

  - issues opened
  - issues reopened
  - issues labeled with status/needs-triage (filtered via on.steps gate)
  - workflow_dispatch (manual fallback with issue_number input)

Mirrors the proven pattern from pr-review-panel.md:

  - Pre-activation label-name guard kills the pipeline at the cheapest
    point for out-of-scope label events (no runner cold-start, no agent
    spin-up).
  - APM imports pinned to microsoft/apm@main so the triage panel always
    runs against the trusted, already-reviewed skill.
  - Read-only agent permissions; writes go through gh-aw safe-outputs
    (add-comment max:2, update-issue target:*).
  - roles: [admin, maintainer, write] matches the trust model of
    applying / opening triageable issues.

Clean separation of concerns:

  - This workflow decides labels and milestone via the panel skill.
  - The existing project-sync.yml workflow then picks up theme/* label
    changes and syncs the issue to the PGS board column.
  - apm-triage-panel skill content (personas, rubric, output template)
    is untouched -- this PR only adds the trigger surface.

Validation:

  - gh aw compile triage-panel: 0 errors, 0 warnings.
  - actionlint on the generated lock: 2 artifact_prefix warnings that
    also appear (4x) in the existing pr-review-panel.lock.yml -- a
    known gh-aw compiler artifact, not from this change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 26, 2026 14:14
Three lock files (cli-consistency-checker, daily-doc-updater, daily-test-improver)
plus the auto-generated agentics-maintenance.yml were still pinned to gh-aw
compiler v0.52.1. pr-review-panel and the new triage-panel are already at
v0.68.3, so this aligns the entire repo on a single compiler version.

Ran from a clean checkout:

  gh aw compile --approve-updates

Approved baselines:

  - CREATE_PR_PAT secret -- already declared in daily-doc-updater.md and
    daily-test-improver.md sources; the v0.52.1 metadata format simply
    didn't track it. No new external surface.

actionlint findings on the regenerated files are all pre-existing gh-aw
codegen patterns (artifact_prefix property warnings, empty-string choice
option in agentics-maintenance.yml). None are introduced by the upgrade.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Added second commit recompiling all gh-aw workflows on v0.68.3.

Scope of the upgrade: 3 stale lock files (cli-consistency-checker, daily-doc-updater, daily-test-improver) and the auto-generated agentics-maintenance.yml were still pinned to v0.52.1. pr-review-panel and the new triage-panel are already at v0.68.3 — this aligns the whole repo on one compiler version.

Security baseline approved: CREATE_PR_PAT secret was flagged by safe-update mode but is already declared in the daily-doc-updater.md and daily-test-improver.md sources; the v0.52.1 metadata schema simply didn't track it. No new external surface (no new actions, no new secrets beyond what's already in those source files).

actionlint: only pre-existing codegen patterns flagged (artifact_prefix, empty-string choice option in agentics-maintenance) — neither introduced by the upgrade.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Agentic Workflows (gh-aw) automation to run the existing apm-triage-panel skill on issue activity, post a synthesized triage comment, and apply the panel-decided labels/milestone via Safe Outputs.

Changes:

  • Add a new gh-aw source workflow to trigger triage on issues events (opened/reopened/labeled) and workflow_dispatch.
  • Add the compiled lock workflow generated by gh aw compile.
Show a summary per file
File Description
.github/workflows/triage-panel.md New gh-aw workflow definition to orchestrate apm-triage-panel on issue events and use Safe Outputs for comments + issue updates.
.github/workflows/triage-panel.lock.yml Compiled GitHub Actions workflow generated from the gh-aw source, including Safe Outputs tool schema and execution plumbing.

Copilot's findings

Comments suppressed due to low confidence (1)

.github/workflows/triage-panel.lock.yml:623

  • Safe Outputs update_issue tool schema requires customValidation: requiresOneOf:status,title,body. If this validator is enforced, label/milestone-only updates (the stated intent of this workflow) will fail unless the agent also provides a no-op status/title/body. Either adjust the workflow/prompt to meet this constraint or configure Safe Outputs validation to allow label/milestone-only updates.
              "update_issue": {
                "defaultMax": 1,
                "fields": {
                  "assignees": {
                    "type": "array",
                    "itemType": "string",
                    "itemSanitize": true,
                    "itemMaxLength": 39
                  },
                  "body": {
                    "type": "string",
                    "sanitize": true,
                    "maxLength": 65000
                  },
                  "issue_number": {
                    "issueOrPRNumber": true
                  },
                  "labels": {
                    "type": "array",
                    "itemType": "string",
                    "itemSanitize": true,
                    "itemMaxLength": 128
                  },
                  "milestone": {
                    "optionalPositiveInteger": true
                  },
                  "operation": {
                    "type": "string",
                    "enum": [
                      "replace",
                      "append",
                      "prepend",
                      "replace-island"
                    ]
                  },
                  "repo": {
                    "type": "string",
                    "maxLength": 256
                  },
                  "status": {
                    "type": "string",
                    "enum": [
                      "open",
                      "closed"
                    ]
                  },
                  "title": {
                    "type": "string",
                    "sanitize": true,
                    "maxLength": 128
                  }
                },
                "customValidation": "requiresOneOf:status,title,body"
              }
  • Files reviewed: 5/6 changed files
  • Comments generated: 5

Comment thread .github/workflows/triage-panel.md Outdated
Comment on lines +54 to +59
if [ "$ACTION" = "labeled" ] && [ "$LABEL_NAME" = "status/needs-triage" ]; then
echo "Triggering label is 'status/needs-triage' -- proceeding."
exit 0
fi
echo "Event '$ACTION' / label '$LABEL_NAME' is out of scope; skipping."
exit 1
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-activation label gate exits with status 1 for out-of-scope issues:labeled events. That will mark the workflow run as failed for every non-status/needs-triage label applied to an issue (and potentially for label changes performed by this workflow), which can create significant Actions noise. Consider removing labeled from the trigger (keeping only opened/reopened + workflow_dispatch), or restructuring the gate so out-of-scope label events end in a neutral/green outcome while still preventing agent activation.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/triage-panel.lock.yml Outdated
Comment on lines +459 to +461
cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_efa05e814fb58cf7_EOF'
{"add_comment":{"max":2},"create_report_incomplete_issue":{},"missing_data":{},"missing_tool":{},"noop":{"max":1,"report-as-issue":"true"},"report_incomplete":{},"update_issue":{"allow_body":true,"max":1,"target":"*"}}
GH_AW_SAFE_OUTPUTS_CONFIG_efa05e814fb58cf7_EOF
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe Outputs config enables update_issue.allow_body: true, which allows the agent to edit issue bodies via the write-sink even though the workflow intent is label/milestone-only updates. If the Safe Outputs framework supports disabling body/title edits for update_issue, it would be safer to do so here to reduce prompt-injection impact (issue bodies are untrusted input).

This issue also appears on line 570 of the same file.

Copilot uses AI. Check for mistakes.
fi
echo "Event '$ACTION' / label '$LABEL_NAME' is out of scope; skipping."
exit 1
roles: [admin, maintainer, write]
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roles: [admin, maintainer, write] means the workflow only activates when the triggering actor has write access. That prevents auto-triage for issues opened/reopened by external contributors, which is the primary use case described in this PR. Drop the roles gate (or change the triggers to only run on maintainer-applied labels) so newly opened issues from non-members are actually processed.

Suggested change
roles: [admin, maintainer, write]

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/triage-panel.md Outdated
Comment on lines +7 to +9
# 1. issues: opened / reopened / labeled. We listen on plain `issues`
# (not `issues_target`) because issues -- unlike PRs -- don't have a
# fork-head untrusted code surface; the only untrusted input is the
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says "issues_target" is an alternative event, but GitHub Actions does not have an issues_target trigger (unlike pull_request_target). This is likely to confuse future maintainers; please reword to just explain why plain issues is sufficient here.

Suggested change
# 1. issues: opened / reopened / labeled. We listen on plain `issues`
# (not `issues_target`) because issues -- unlike PRs -- don't have a
# fork-head untrusted code surface; the only untrusted input is the
# 1. issues: opened / reopened / labeled. Plain `issues` is sufficient
# here because issues -- unlike PRs -- do not introduce a fork-head
# untrusted code execution surface; the only untrusted input is the

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/triage-panel.md Outdated
Comment on lines +158 to +164
`safe-outputs.update-issue`. The `theme/*` label, if assigned,
will automatically trigger the existing `project-sync.yml`
workflow to add the issue to the appropriate PGS board column --
no extra action needed here.

Do not perform any other writes. Do not edit the issue body or title.
Do not close, reopen, lock, or assign the issue.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt instructs the agent to apply only labels/milestone via safe-outputs.update-issue, but the compiled workflow's Safe Outputs tool schema requires at least one of status, title, or body (customValidation requiresOneOf:status,title,body). With the current instructions, label-only updates are likely to be rejected by the Safe Outputs validator and the workflow won't apply the panel's decisions. Update the instructions (or Safe Outputs config, if possible) so the agent includes a no-op status value (e.g., open) when updating labels/milestone, or otherwise satisfies the validator without changing title/body.

Suggested change
`safe-outputs.update-issue`. The `theme/*` label, if assigned,
will automatically trigger the existing `project-sync.yml`
workflow to add the issue to the appropriate PGS board column --
no extra action needed here.
Do not perform any other writes. Do not edit the issue body or title.
Do not close, reopen, lock, or assign the issue.
`safe-outputs.update-issue`, and include `status: open` in that
update payload as a validator no-op. This satisfies the Safe
Outputs schema requirement that at least one of `status`, `title`,
or `body` be present, while preserving the existing issue state and
avoiding any title/body edits. The `theme/*` label, if assigned,
will automatically trigger the existing `project-sync.yml`
workflow to add the issue to the appropriate PGS board column --
no extra action needed here.
Do not perform any other writes. Do not edit the issue body or title.
Do not close, reopen, lock, or assign the issue. When calling
`safe-outputs.update-issue` for labels/milestone, use `status:
open` only as the required no-op field for validator compliance.

Copilot uses AI. Check for mistakes.
danielmeppiel and others added 2 commits April 26, 2026 16:30
…ication framing

Reshapes the triage workflow trigger model to bound LLM cost without
sacrificing 'no genuine issue left untriaged'. Grounded in actual repo
data: 200 issues / 30 days mean 6.7/day, peak 17/day, 19 bot-authored,
and only 5% currently get a theme/* label -- so unbounded per-issue
runs would mean ~10M tokens/month with no ceiling.

Trigger model:

  1. SCHEDULED_SWEEP (gh-aw fuzzy daily, ~12:49 UTC): finds open
     human-authored issues lacking 'status/triaged', oldest first,
     processes up to 10 per run. Hard ceiling: 10 * 30 = 300 panel
     runs / month. Drains the typical queue (~140 eligible/month) with
     headroom for spike days.

  2. OPT_IN_RETRIAGE (issues: labeled, status/needs-triage): explicit
     fast-path for maintainers who need immediate triage, e.g. on a
     security report. Fires within minutes. Front-gate filters bots,
     locked, closed at zero cost.

  3. MANUAL_DISPATCH (workflow_dispatch with optional issue_number):
     replay / debugging.

Lifecycle invariants now explicitly designed and documented in the
agent prompt:

  - Bots are never triaged (front-gate + sweep filter).
  - Every human-authored open issue is triaged within 24h, or sooner
    if a maintainer applies status/needs-triage.
  - Maintainer label edits ALWAYS WIN: agent never reverts human label
    changes; the panel may surface disagreement in the verdict comment
    but never silently overrides.
  - status/triaged is the 'do not re-sweep me' signal. Removing it
    re-enrolls the issue in the next sweep. Applying status/needs-
    triage bypasses the sweep cap entirely (immediate fast-path).
  - status/needs-triage is consumed (removed) by the workflow after
    re-triage so the trigger doesn't loop.
  - Existing milestone is never overwritten -- panel only sets a
    milestone if none exists today.

Verdict comment now appends a verbatim 'agentic proposal pending human
ratification, silence is approval' footer that documents the override
mechanics inline with every triage comment, so contributors and
maintainers see the human-in-the-loop framing on the issue itself.

Concurrency group prevents double-triage on the same issue when fast
path and sweep race.

Compile: 0 errors, 0 warnings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four targeted hardenings address realistic abuse vectors that could
inflate panel cost beyond the per-month ceiling.

1. Body size cap (64 KB / 65536 chars) -- protects against a single
   adversarially-large issue body inflating per-run token cost from
   ~50k to >500k. Generous margin: typical APM PRD/design issues are
   <16 KB. Truncated bodies are flagged in the verdict so a maintainer
   can read the original.

2. Per-author quota in scheduled sweep (max 2 issues per distinct
   author per sweep) -- prevents a sock-puppet account from
   monopolizing daily triage capacity (10 slots/day). Excess issues
   from the same author roll to subsequent sweeps; they still get
   triaged eventually, just spread across days.

3. Output safety rails / batch allow-list -- explicit prompt-level
   rail that the agent computes its write allow-list
   (BATCH_ALLOW_LIST) from the workflow-controlled candidate
   selection BEFORE reading any issue body, so prompt-injection in a
   body cannot redirect writes to unrelated issues. Compensates for
   safe-outputs.update-issue.target: "*" being required for sweep
   mode (gh-aw does not support per-mode targets). Every workflow
   run logs every safe-output call for audit.

4. Spam-shape filter in scheduled sweep -- crude pre-panel filter
   skips issues whose bodies are >50 consecutive identical chars,
   >80% URLs, dominated by a single 3-char repeat, or have <20 chars
   of real content. Skipped issues stay in queue for manual review;
   maintainers can force a panel run via status/needs-triage.

Threat model NOT addressed by this PR: open-close-reopen loop
(workflow does not subscribe to those events; cannot fire), label
manipulation by external users (gated by repo write permission),
fabricated bot login (we use user.type == Bot which is unspoofable
from user side).

Lock file unchanged because gh-aw embeds the prompt via
runtime-import, so .md edits propagate without recompile changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Hardening commit (f0e7abe) addresses the open-close-loop concern + adjacent abuse vectors.

What I checked first

The specific attack named ("open-close issue indefinitely to burn tokens") doesn't actually fire any triggers in this PR -- workflow subscribes to issues.labeled (gated to write+ via roles:) plus schedule plus workflow_dispatch. No opened / reopened / closed subscriptions. Open-close-reopen as an external OSS user = zero panel runs.

Real surfaces I found and hardened

# Surface Hardening
1 Sock-puppet open spam (one actor floods 100 issues; sweep cap=10/day = 10 days of monopolized capacity) Per-author quota: max 2 issues per distinct author per sweep
2 One issue with a 500 KB body inflates per-run tokens 10x Body size cap: 64 KB / 65536 chars (4-8x typical APM issue), truncation flagged in verdict
3 Prompt-injection in body redirects writes to unrelated issues (update-issue.target: "*" is required by sweep) Batch allow-list rail: agent computes BATCH_ALLOW_LIST from workflow-controlled candidate selection BEFORE reading any body; documented as a hard rail in the prompt; every safe-output call audited via run logs
4 Low-effort body spam Spam-shape filter: skips bodies with >50 consecutive identical chars, >80% URLs, dominated by 3-char repeats, or <20 chars of real content; not auto-status/triaged so maintainers can force review

Why update-issue.target stays "*"

gh-aw doesn't support per-event/per-mode targets. Sweep mode legitimately needs to update N different issues per run; fast path needs only the triggering one. Splitting into two workflows would be over-engineering. Mitigation #3 (BATCH_ALLOW_LIST as agent-discipline rail + audit log) gets ~80% of the protection of target: "triggering" without the workflow split.

NOT in scope

  • Account-age / contribution heuristics (biases against new genuine contributors; GitHub's own abuse detection handles the low end)
  • Cumulative-runs circuit breaker (the existing 320-runs/month cap from sweep + fast-path bounds is already the ceiling)
  • Two-tier classifier+panel cost optimization (deferred to follow-up if 320/month is still too expensive in practice)

@danielmeppiel danielmeppiel merged commit 0cbe742 into main Apr 26, 2026
7 checks passed
@danielmeppiel danielmeppiel deleted the feat/triage-panel-automation branch April 26, 2026 15:30
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Live validation outcome

Ran two end-to-end scenarios against the merged workflow on main. Validation iterated four times on scenario 1 and once on scenario 5; three real bugs were caught and fixed in the process, all now landed on main.

Scenarios run

# Mode Mock issue What we tested Result
1 workflow_dispatch #955 (closed) Genuine docs-quality triage on a priority/low issue PASS after 4 iterations
5 issues: [labeled] (status/needs-triage) #956 (closed) Fast-path event-driven trigger PASS on first attempt

Both issues received: full panel verdict comment, complete label set (8 labels each, including theme/* where warranted), milestone 0.9.4, and the status/triaged "do not re-sweep me" signal. For #956, status/needs-triage was correctly removed (consumed) when the panel finished.

Scenarios 2 and 3 (spam-shape filter, empty-body filter) only exercise the SCHEDULED_SWEEP candidate selection logic and are not testable via workflow_dispatch -- they will run in the daily cron sweep and surface as observability data over time. Scenario 4 (oversized body) is a Step 1 in-prompt rail that operates identically across all three execution modes; covered indirectly by scenario 1.

Bugs caught during validation (all fixed on main)

1. update-issue cannot apply labels or milestones (f2fa33f9)

The first run posted a verdict comment but applied zero labels and no milestone. Inspection of agent_output.json revealed the validator rejected every write with:

update_issue requires at least one of: 'status', 'title', 'body' fields

gh-aw update-issue only accepts status / title / body -- pure label or milestone updates are rejected even though the JSON schema lists labels / milestone as accepted fields. Verified against gh-aw v0.68.3 source at pkg/workflow/safe_outputs_validation_config.go:135-148 (requiresOneOf:status,title,body constraint).

Fix: replaced update-issue with three dedicated tools per the gh-aw safe-outputs reference: add-labels, remove-labels, assign-milestone. Each tool takes an explicit item_number / issue_number in the payload, so SCHEDULED_SWEEP can still hit multiple distinct issues per run.

Schema gotcha: target: "*" is supported on add-labels / remove-labels but not assign-milestone in v0.68.3 (field absent from its JSON schema). Compiler raises Unknown property: target if you set it. The runtime tool accepts issue_number directly, so multi-issue dispatch still works without it.

2. allowed field is strict-equality only, not glob (5fe59094)

Second run applied only 4 of 7 decided labels. Missing: area/cli, area/docs-site, type/docs, priority/low. The verdict comment had every one of them. Verified against actions/setup/js/safe_output_validator.cjs:122:

validLabels = validLabels.filter(label => allowedLabels.includes(label));

allowed is strict Array.prototype.includes, not glob. Only blocked supports glob patterns (via matchesSimpleGlob). Every entry like theme/*, area/*, type/*, priority/* matched nothing.

Fix: enumerated APM's full label taxonomy literally (3 themes + 14 areas + 8 types + 2 priorities + 5 status states + 3 community labels = 35 entries). status/needs-triage is intentionally omitted so only humans can apply that fast-path trigger -- the panel cannot inject its own re-trigger.

Documented in the prompt body so the agent flags any novel label in the verdict comment instead of silently dropping it.

3. Verdict <-> applied-state drift on milestone (666925f2)

Third run had verdict text saying "Milestone: 0.9.4" but the agent did not emit assign_milestone. No errors -- the agent simply omitted the call. This produced a confusing UX: maintainers reading the verdict would expect the milestone to be set, but it was not.

Fix: tightened Step 3 wording to require that emission and verdict text agree. The agent now MUST emit assign_milestone whenever the verdict comment names a milestone; the only legal way to skip is to omit milestone from the verdict entirely.

What the panel produced (final state on #955 and #956)

#955 (docs-quality, priority/low)

area/cli, area/docs-site, type/docs, status/accepted, priority/low,
good first issue, status/triaged
milestone: 0.9.4

Verdict identified the exact two-file scope (src/apm_cli/cli.py + docs/.../compilation/index.md), endorsed the author's proposed canonical phrasing, and left a courteous acknowledgement comment. Six personas weighed in (DevX UX, Supply Chain, OSS Growth, Python Architect, Doc Writer, CEO).

#956 (lockfile portability bug, priority/high)

theme/portability, area/docs-site, area/lockfile, type/bug,
status/accepted, priority/high, status/triaged
milestone: 0.9.4

Panel correctly raised this to theme/portability (cross-cutting label) and priority/high (vs the priority/low for #955), demonstrating the persona routing distinguishes severity correctly. status/needs-triage was consumed as designed.

Cosmetic note

When the activation step gates a run out (e.g., issue created with a non-trigger label), GitHub records the workflow run as failure because the activation script exits non-zero by design. This is gh-aw's standard early-exit pattern and produces a misleading red badge on the Actions tab for legitimate skips. Not a correctness issue; flagging for follow-up if it becomes an annoyance signal in practice.

Token spend

Each panel invocation runs 3-6 personas plus arbitration. Single-issue dispatch / fast-path mode posts one comment, applies labels, sets milestone -- well under the 30-minute timeout. Daily cron sweep handles up to 10 issues per run, capped at add-comment.max: 12 and add-labels.max: 70. Empirically the four iterations on #955 + one iteration on #956 ran without rate-limit issues.

Residual concerns

  • Scenarios 2/3/4 will only get exercised by the daily cron once status/needs-triage issues exist organically. We will see them in the first sweep.
  • The activation-gate failure cosmetic above is the only known UX wart.
  • add-labels allowlist will need to be updated whenever APM's label taxonomy grows. Documented in the workflow frontmatter.

The triage panel is operational on main and ready for organic traffic.


Validation conducted in run 24960282698 (initial bug discovery), 24960673185 (post-fix #1), 24960874890 (post-fix #2), 24961051684 (post-fix #3), and 24961194910 (scenario 5).

@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Triage Panel — architecture reference

Note

This is a self-contained reference for the system landed by this PR. If you maintain microsoft/apm and want issues to get triaged — or want to understand exactly what gets written to your repo and why — start here.

TL;DR

A daily scheduled sweep plus a one-click opt-in fast-path that runs the apm-triage-panel skill against open, untriaged, human-authored issues. The agent posts one synthesized verdict comment per issue and applies the panel-decided theme/*, area/*, type/*, priority/*, plus status/triaged labels and (if the issue has none today) a milestone. Every write is framed as an agentic proposal pending human ratification — maintainer edits are authoritative and never reverted.

Problem (WHY)

  • New issues sat untriaged for days. The review-panel skill existed for PRs; intake had no analogue.
  • A naive issues: [opened, reopened] trigger is exploitable: any OSS user could open/close/reopen an issue indefinitely and burn LLM budget. Cost had to be bounded structurally.
  • Triage is multi-axis (theme + area + type + priority + milestone) and benefits from multiple specialist lenses, mirroring the shape of apm-review-panel. A single-prompt classifier is the wrong tool.
  • gh-aw safe-outputs is the only sanctioned write surface (read-only agent + scoped writes). Direct gh mutations from agent steps would bypass the audit trail.

Approach (WHAT)

Decision Choice Why
Trigger model Daily cron: 'daily' sweep + opt-in status/needs-triage label + workflow_dispatch Bounds cost (~300 runs/month ceiling); preserves a one-click maintainer escape hatch; no exploitable per-issue trigger
Panel surface Reuse apm-triage-panel skill, identical orchestration shape to apm-review-panel One mental model for both intake and review; 3 mandatory + 3 conditional persona lenses + CEO arbiter
Write surface safe-outputs only (add-comment, add-labels, remove-labels, assign-milestone) Read-only agent; every write is auditable in run logs; scoped tokens
Reversibility Every label is human-revertable; only status/needs-triage is in the remove-labels allow-list Humans apply, only humans remove (except the one trigger label the agent consumes)

Implementation (HOW)

File Role
.github/workflows/triage-panel.md Workflow source: trigger model, front-gate, concurrency, safe-outputs config, mode-routing prompt
.github/workflows/triage-panel.lock.yml Compiled artifact (do not hand-edit; regenerated by gh aw compile)
.apm/skills/apm-triage-panel/SKILL.md The triage panel skill: persona roster, routing topology, completeness gate, single-comment template

Three execution modes share Step 2 (run the panel) and Step 3 (apply decisions); they differ only in Step 1 (which issues to triage):

  • SCHEDULED_SWEEP — pulls up to 200 open issues, drops bots/locked/already-triaged/empty/spam, applies a per-author quota of 2/sweep, takes the oldest 10. Quiet sweep = no comment.
  • OPT_IN_RETRIAGE — single issue, triggered by a maintainer applying status/needs-triage. Treats existing human labels as authoritative; consumes the trigger label on completion.
  • MANUAL_DISPATCH — single issue via workflow_dispatch with optional issue_number. Useful for replay/debug.

Defensive layers, in order:

  1. Workflow-level front-gate (shell if): rejects non-status/needs-triage label events, bot authors, locked or closed issues — at zero LLM cost.
  2. Concurrency group keyed on issue.number || inputs.issue_number || 'sweep' — prevents two panel runs racing on the same issue.
  3. Body size cap of 64 KB before the body enters reasoning; oversized bodies are flagged in the verdict.
  4. Spam-shape filter (>50 consecutive identical chars / >80% URLs / >70% repeated 3-char substring / <20 alphanumerics): silently drops without applying status/triaged so the issue stays in queue for human review.
  5. BATCH_ALLOW_LIST is computed before any issue body is read, so prompt injection in a body cannot redirect a write to an unrelated issue.

Diagrams

Trigger fan-in into the shared three-step pipeline:

flowchart LR
  A1[issues: labeled] -->|status/needs-triage only| G[Front-gate shell step]
  A2[schedule: daily] --> M{Mode}
  A3[workflow_dispatch] --> M
  G --> M
  M -->|labeled event| R1[OPT_IN_RETRIAGE]
  M -->|cron OR dispatch w/o issue| R2[SCHEDULED_SWEEP]
  M -->|dispatch w/ issue_number| R3[MANUAL_DISPATCH]
  R1 --> S1[Step 1: gather candidates]
  R2 --> S1
  R3 --> S1
  S1 --> S2[Step 2: run apm-triage-panel skill]
  S2 --> S3[Step 3: emit verdict + apply labels/milestone]
Loading

Per-issue Step 3 emission, showing the safe-output surface and the human-ratification contract:

sequenceDiagram
  autonumber
  participant Agent
  participant SO as safe-outputs
  participant Issue as GitHub issue
  participant Maint as Maintainer
  Agent->>SO: add-comment (verdict + ratification footer)
  SO->>Issue: post comment
  Agent->>SO: add-labels (theme/area/type/priority + status/triaged)
  SO->>Issue: apply labels (allow-list strict-equality)
  Agent->>SO: remove-labels (status/needs-triage if present)
  SO->>Issue: consume trigger label
  Agent->>SO: assign-milestone (only if issue had none)
  SO->>Issue: set milestone
  Note over Issue,Maint: Silence is approval. Any human edit is authoritative and not reverted.
  Maint->>Issue: optional override (edit label / milestone)
Loading

Trade-offs

  • Daily latency, not realtime. A new issue waits up to ~24h for the sweep, or one click for the fast-path. Chosen over issues: [opened] because the latter has no cost ceiling and is exploitable. Today's manual baseline was days-to-never, so 24h is a strict improvement.
  • allowed: is strict-equality, not glob. Every legal label is enumerated literally in the workflow (35 entries). Adding a new taxonomy label means a one-line workflow edit + recompile. Surfaced explicitly because it's a footgun: unlisted labels are silently dropped by the safe-output handler.
  • Agent reads, never writes directly. All mutations flow through safe-outputs. Costs one extra hop; gains a fully auditable write trail and scoped tokens.
  • No automatic milestone overwrite. The agent only assigns a milestone when the issue has none. Existing milestones are a maintainer call. Slightly weaker triage on already-milestoned issues; eliminates a whole class of "agent moved my milestone" complaints.
  • Spam-shape detection is heuristic. False negatives reach the panel (cost: one panel run); false positives stay in the queue without status/triaged (cost: a maintainer must apply status/needs-triage to force a run). Tuned conservative on purpose.

Benefits

  1. Bounded LLM spend. Hard ceiling of ~300 runs/month (10/day × 30) regardless of issue volume; no per-issue trigger means no exploit surface.
  2. One-click priority override. Maintainers re-trigger triage on any issue by applying status/needs-triage — instant fast-path, no dispatch UI.
  3. Auditable writes. Every label, milestone, and comment is recorded in workflow run logs with the originating issue number; breach of BATCH_ALLOW_LIST would be visible post-hoc.
  4. Drop-in extensibility. New persona lenses are added by editing the skill (.apm/skills/apm-triage-panel/SKILL.md); no workflow change. New labels are added by extending allowed: in the workflow frontmatter.
  5. Silence-is-approval contract prevents agent-vs-human label thrash. Maintainers stay in charge by default.

Validation

Live-validation evidence is in the prior comment on this PR. Three bugs in safe-outputs configuration were caught and fixed during validation; both the dispatch path (issue #955) and the status/needs-triage fast-path (issue #956) verified end-to-end with correct labels, milestone, and comment in their final states.

Compile check (run locally to verify the lock is in sync)
cd /path/to/microsoft-apm
gh aw compile .github/workflows/triage-panel.md
git diff --exit-code .github/workflows/triage-panel.lock.yml

How to use this — three workflows for maintainers

  1. Default path (do nothing). Open issues are auto-swept once daily, oldest-first, capped at 10. You'll see a verdict comment with a proposed label/milestone set. Override anything you disagree with — the agent will not revert.
  2. Need this triaged now. Apply the status/needs-triage label to any open, human-authored issue. The fast-path fires within ~minutes, posts the verdict, applies labels, then removes status/needs-triage (consuming the trigger).
  3. Replay / debug a specific issue. Run the workflow manually via the Actions tab → "Triage Panel" → "Run workflow", optionally with issue_number=N. Without an issue number, dispatch runs the full sweep on demand.

To re-enroll an issue into the next daily sweep, remove the status/triaged label.

To extend the taxonomy, add the label in GitHub, then add it to safe-outputs.add-labels.allowed in .github/workflows/triage-panel.md and run gh aw compile. Unlisted labels are silently dropped by the safe-output handler — this is a feature (label allow-listing) not a bug.


Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

danielmeppiel added a commit that referenced this pull request Apr 27, 2026
* chore(release): cut 0.9.4

CHANGELOG entry for 0.9.4 covers all 7 PRs merged since v0.9.3:

- #974 SKILL_BUNDLE day-0 install parity (Added)
- #954 automate apm-triage-panel workflow (Added)
- #970 python-architect mermaid classDiagram trap (Changed)
- #911 REQUESTS_CA_BUNDLE TLS validation (Fixed)
- #971 triage-panel project-sync dispatch (Fixed)
- #910 CLI consistency cleanup (Fixed)
- #958 issue templates label taxonomy (Fixed)
- #953 docs auto-deploy after bot-cut releases (Fixed)

Open milestone 0.9.4 issues (41) reassigned to 0.9.5.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(changelog): tighten 0.9.4 entries (so-what for developers)

Refactor per Keep-a-Changelog spirit: lead with developer impact,
trim agent-internals prose, group maintainer-only changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(changelog): add #660 install.sh air-gapped entry to 0.9.4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants