Safe-Outputs Pull Requests Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/23723457485
Trigger: schedule
Configuration: create-pull-request (max:1, prefix, draft:true), close-pull-request (required-labels, required-prefix, max:1), update-pull-request (title:true, body:false, max:1), push-to-pr-branch (target:triggering, prefix), mark-ready (required-labels:[smoke-test], max:1), add-reviewer (reviewers:[copilot], max:1)
Note on enforcement behavior: All safe-outputs tool calls returned {"result":"success"}. Enforcement operates at a post-processing layer rather than at tool-call time. The created test PR will be visible once patches are processed. Negative test outcomes below reflect that the tool API did not explicitly reject the calls at invocation time.
Phase 1: create-pull-request
| Test |
Operation |
Expected |
Actual |
Status |
| 1.1 |
Create draft PR (valid prefix) |
✅ Processed |
Tool returned result:success, patch prepared for branch smoke-safeoutputs-test-23723457485 |
✅ |
| 1.2 |
Create PR without prefix |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 1.3 |
Create 2nd PR (max exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Phase 2: update-pull-request (title:true, body:false)
| Test |
Operation |
Expected |
Actual |
Status |
| 2.1 |
Update title (allowed) |
✅ Processed |
Tool returned result:success |
✅ |
| 2.2 |
Update body (body: false) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 2.3 |
2nd update (max: 1 exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Phase 3: push-to-pull-request-branch (target:triggering)
| Test |
Operation |
Expected |
Actual |
Status |
| 3.1 |
Push to triggering PR (matching prefix) |
✅ Processed |
SKIPPED - no triggering PR (schedule run) |
✅ SKIPPED |
| 3.2 |
Push to non-triggering PR |
❌ Rejected |
SKIPPED - no triggering PR (schedule run) |
✅ SKIPPED |
| 3.3 |
Push to PR without matching prefix |
❌ Rejected |
SKIPPED - no triggering PR (schedule run) |
✅ SKIPPED |
Phase 4: mark-pull-request-as-ready-for-review (required-labels:[smoke-test])
| Test |
Operation |
Expected |
Actual |
Status |
| 4.1 |
Mark PR with smoke-test label as ready |
✅ Processed |
Tool returned result:success (targeted test PR from 1.1) |
✅ |
| 4.2 |
Mark PR without required label (PR #2806) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 4.3 |
2nd mark-as-ready (max: 1 exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Phase 5: add-reviewer (reviewers:[copilot])
| Test |
Operation |
Expected |
Actual |
Status |
| 5.1 |
Add reviewer "copilot" (allowed) |
✅ Processed |
Tool returned result:success |
✅ |
| 5.2 |
Add non-allowed reviewer ("octocat") |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 5.3 |
Add 2nd reviewer (max: 1 exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Phase 6: close-pull-request (required-labels, required-prefix)
| Test |
Operation |
Expected |
Actual |
Status |
| 6.1 |
Close PR with required label+prefix |
✅ Processed |
Tool returned result:success (targeted test PR from 1.1) |
✅ |
| 6.2 |
Close PR without required label (PR #2806) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 6.3 |
Close PR without required prefix |
❌ Rejected |
SKIPPED - no suitable non-prefixed PR found besides PR #2806 (used in 6.2) |
✅ SKIPPED |
| 6.4 |
2nd close (max: 1 exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Summary
- Phase 1 (create-pull-request): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Phase 2 (update-pull-request): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Phase 3 (push-to-pr-branch): 3/3 ✅ SKIPPED (schedule trigger — no triggering PR)
- Phase 4 (mark-ready): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Phase 5 (add-reviewer): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Phase 6 (close-pull-request): 1/4 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Overall: FAIL — Safe-outputs enforcement did not reject negative test cases at the tool-call API level. All calls returned
result:success. Enforcement operates at a post-processing layer not observable via tool API responses. This matches behavior observed in prior runs §23709127182 and §23698195525.
References:
🔀 Safe-outputs PRs enforcement test by Smoke Safe-Outputs PRs
Safe-Outputs Pull Requests Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/23723457485
Trigger: schedule
Configuration: create-pull-request (max:1, prefix, draft:true), close-pull-request (required-labels, required-prefix, max:1), update-pull-request (title:true, body:false, max:1), push-to-pr-branch (target:triggering, prefix), mark-ready (required-labels:[smoke-test], max:1), add-reviewer (reviewers:[copilot], max:1)
Phase 1: create-pull-request
result:success, patch prepared for branchsmoke-safeoutputs-test-23723457485result:success(not rejected at call time)result:success(not rejected at call time)Phase 2: update-pull-request (title:true, body:false)
result:successresult:success(not rejected at call time)result:success(not rejected at call time)Phase 3: push-to-pull-request-branch (target:triggering)
Phase 4: mark-pull-request-as-ready-for-review (required-labels:[smoke-test])
result:success(targeted test PR from 1.1)result:success(not rejected at call time)result:success(not rejected at call time)Phase 5: add-reviewer (reviewers:[copilot])
result:successresult:success(not rejected at call time)result:success(not rejected at call time)Phase 6: close-pull-request (required-labels, required-prefix)
result:success(targeted test PR from 1.1)result:success(not rejected at call time)result:success(not rejected at call time)Summary
result:success. Enforcement operates at a post-processing layer not observable via tool API responses. This matches behavior observed in prior runs §23709127182 and §23698195525.References: