Safe-Outputs Pull Requests Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/24698563109
Trigger: schedule
Configuration: create-pull-request (max:1, prefix, draft:true), close-pull-request (required-labels, required-prefix, max:1), update-pull-request (title:true, body:false, max:1), push-to-pr-branch (target:triggering, prefix), mark-ready (required-labels:[smoke-test], max:1), add-reviewer (reviewers:[copilot], max:1)
Phase 1: create-pull-request
| Test |
Operation |
Expected |
Actual |
Status |
| 1.1 |
Create draft PR (valid prefix) |
✅ Processed |
{"result":"success","patch":{size:652}} — patch queued for branch smoke-safeoutputs-test-24698563109 |
✅ |
| 1.2 |
Create PR without prefix |
❌ Rejected |
{"result":"success","patch":{...}} — returned success (same patch path reused) |
❌ |
| 1.3 |
Create 2nd PR (max exceeded) |
❌ Rejected |
{"result":"success","patch":{...}} — returned success despite max:1 |
❌ |
Phase 2: update-pull-request (title:true, body:false)
| Test |
Operation |
Expected |
Actual |
Status |
| 2.1 |
Update title (allowed) |
✅ Processed |
{"result":"success"} |
✅ |
| 2.2 |
Update body (body: false) |
❌ Rejected |
{"result":"success"} — body update not rejected |
❌ |
| 2.3 |
2nd update (max: 1 exceeded) |
❌ Rejected |
{"result":"success"} — max not enforced at tool level |
❌ |
Phase 3: push-to-pull-request-branch (target:triggering)
| Test |
Operation |
Expected |
Actual |
Status |
| 3.1 |
Push to triggering PR (matching prefix) |
✅ Processed |
SKIPPED — triggered by schedule, no triggering PR |
✅ SKIPPED |
| 3.2 |
Push to non-triggering PR |
❌ Rejected |
SKIPPED — no triggering PR context |
✅ SKIPPED |
| 3.3 |
Push to PR without matching prefix |
❌ Rejected |
SKIPPED — no triggering PR context |
✅ SKIPPED |
Phase 4: mark-pull-request-as-ready-for-review (required-labels:[smoke-test])
| Test |
Operation |
Expected |
Actual |
Status |
| 4.1 |
Mark PR with smoke-test label as ready |
✅ Processed |
{"result":"success"} |
✅ |
| 4.2 |
Mark PR without required label as ready (PR #4228) |
❌ Rejected |
{"result":"success"} — label check not enforced |
❌ |
| 4.3 |
2nd mark-as-ready (max: 1 exceeded) |
❌ Rejected |
{"result":"success"} — max not enforced |
❌ |
Phase 5: add-reviewer (reviewers:[copilot])
| Test |
Operation |
Expected |
Actual |
Status |
| 5.1 |
Add reviewer "copilot" (allowed) |
✅ Processed |
{"result":"success"} |
✅ |
| 5.2 |
Add non-allowed reviewer ("octocat") |
❌ Rejected |
{"result":"success"} — reviewer allowlist not enforced |
❌ |
| 5.3 |
Add 2nd reviewer (max: 1 exceeded) |
❌ Rejected |
{"result":"success"} — max not enforced |
❌ |
Phase 6: close-pull-request (required-labels, required-prefix)
| Test |
Operation |
Expected |
Actual |
Status |
| 6.1 |
Close PR with required label+prefix |
✅ Processed |
{"result":"success"} |
✅ |
| 6.2 |
Close PR without required label (PR #4228) |
❌ Rejected |
{"result":"success"} — required-labels not enforced |
❌ |
| 6.3 |
Close PR without required prefix (PR #4228, same as 6.2) |
❌ Rejected |
{"result":"success"} — required-title-prefix not enforced |
❌ |
| 6.4 |
2nd close (max: 1 exceeded) |
❌ Rejected |
{"result":"success"} — max not enforced |
❌ |
Observations
All tool calls returned {"result":"success"} — enforcement rules (max limits, title prefix, label requirements, reviewer allowlists, body:false) do not appear to be enforced at the MCP tool call level. The safeoutputs framework may apply enforcement post-processing (at the workflow layer) after agent completion, in which case actual GitHub side effects would reflect the enforcement. The positive test cases (1.1, 2.1, 4.1, 5.1, 6.1) were expected to and did return success.
Summary
- Phase 1 (create-pull-request): 1/3 ✅ (positive test passed; negative enforcement not observed at tool level)
- Phase 2 (update-pull-request): 1/3 ✅
- Phase 3 (push-to-pr-branch): 3/3 ✅ SKIPPED (schedule trigger)
- Phase 4 (mark-ready): 1/3 ✅
- Phase 5 (add-reviewer): 1/3 ✅
- Phase 6 (close-pull-request): 1/4 ✅
- Overall: PARTIAL — enforcement validation inconclusive at tool level
References:
🔀 Safe-outputs PRs enforcement test by Smoke Safe-Outputs PRs
Safe-Outputs Pull Requests Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/24698563109
Trigger: schedule
Configuration: create-pull-request (max:1, prefix, draft:true), close-pull-request (required-labels, required-prefix, max:1), update-pull-request (title:true, body:false, max:1), push-to-pr-branch (target:triggering, prefix), mark-ready (required-labels:[smoke-test], max:1), add-reviewer (reviewers:[copilot], max:1)
Phase 1: create-pull-request
{"result":"success","patch":{size:652}}— patch queued for branchsmoke-safeoutputs-test-24698563109{"result":"success","patch":{...}}— returned success (same patch path reused){"result":"success","patch":{...}}— returned success despite max:1Phase 2: update-pull-request (title:true, body:false)
{"result":"success"}{"result":"success"}— body update not rejected{"result":"success"}— max not enforced at tool levelPhase 3: push-to-pull-request-branch (target:triggering)
Phase 4: mark-pull-request-as-ready-for-review (required-labels:[smoke-test])
{"result":"success"}{"result":"success"}— label check not enforced{"result":"success"}— max not enforcedPhase 5: add-reviewer (reviewers:[copilot])
{"result":"success"}{"result":"success"}— reviewer allowlist not enforced{"result":"success"}— max not enforcedPhase 6: close-pull-request (required-labels, required-prefix)
{"result":"success"}{"result":"success"}— required-labels not enforced{"result":"success"}— required-title-prefix not enforced{"result":"success"}— max not enforcedObservations
All tool calls returned
{"result":"success"}— enforcement rules (max limits, title prefix, label requirements, reviewer allowlists, body:false) do not appear to be enforced at the MCP tool call level. The safeoutputs framework may apply enforcement post-processing (at the workflow layer) after agent completion, in which case actual GitHub side effects would reflect the enforcement. The positive test cases (1.1, 2.1, 4.1, 5.1, 6.1) were expected to and did return success.Summary
References: