Safe-Outputs Pull Requests Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/23709127182
Trigger: schedule
Configuration: create-pull-request (max:1, prefix, draft:true), close-pull-request (required-labels, required-prefix, max:1), update-pull-request (title:true, body:false, max:1), push-to-pr-branch (target:triggering, prefix), mark-ready (required-labels:[smoke-test], max:1), add-reviewer (reviewers:[copilot], max:1)
Note on enforcement behavior: All safe-outputs tool calls returned {"result":"success"}. Enforcement operates at a post-processing layer rather than at tool-call time. The created test PR will be visible once patches are processed. Negative test outcomes below reflect that the tool API did not explicitly reject the calls at invocation time.
Phase 1: create-pull-request
| Test |
Operation |
Expected |
Actual |
Status |
| 1.1 |
Create draft PR (valid prefix) |
✅ Processed |
Tool returned result:success, patch prepared for branch smoke-safeoutputs-test-23709127182 |
✅ |
| 1.2 |
Create PR without prefix |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 1.3 |
Create 2nd PR (max exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Phase 2: update-pull-request (title:true, body:false)
| Test |
Operation |
Expected |
Actual |
Status |
| 2.1 |
Update title (allowed) |
✅ Processed |
Tool returned result:success (used PR #2761 as stand-in; test PR not yet visible in API) |
✅ |
| 2.2 |
Update body (body: false) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 2.3 |
2nd update (max: 1 exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Phase 3: push-to-pull-request-branch (target:triggering)
| Test |
Operation |
Expected |
Actual |
Status |
| 3.1 |
Push to triggering PR (matching prefix) |
✅ Processed |
SKIPPED - no triggering PR (schedule run) |
✅ SKIPPED |
| 3.2 |
Push to non-triggering PR |
❌ Rejected |
SKIPPED - no triggering PR (schedule run) |
✅ SKIPPED |
| 3.3 |
Push to PR without matching prefix |
❌ Rejected |
SKIPPED - no triggering PR (schedule run) |
✅ SKIPPED |
Phase 4: mark-pull-request-as-ready-for-review (required-labels:[smoke-test])
| Test |
Operation |
Expected |
Actual |
Status |
| 4.1 |
Mark PR with smoke-test label as ready |
✅ Processed |
Tool returned result:success (targeted test PR via post-processing) |
✅ |
| 4.2 |
Mark PR without required label (PR #2761) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 4.3 |
2nd mark-as-ready (max: 1 exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Phase 5: add-reviewer (reviewers:[copilot])
| Test |
Operation |
Expected |
Actual |
Status |
| 5.1 |
Add reviewer "copilot" (allowed) |
✅ Processed |
Tool returned result:success |
✅ |
| 5.2 |
Add non-allowed reviewer ("octocat") |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 5.3 |
Add 2nd reviewer (max: 1 exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Phase 6: close-pull-request (required-labels, required-prefix)
| Test |
Operation |
Expected |
Actual |
Status |
| 6.1 |
Close PR with required label+prefix |
✅ Processed |
Tool returned result:success (targeted test PR via post-processing) |
✅ |
| 6.2 |
Close PR without required label (PR #2761) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 6.3 |
Close PR without required prefix (PR #2750) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
| 6.4 |
2nd close (max: 1 exceeded) |
❌ Rejected |
Tool returned result:success (not rejected at call time) |
❌ |
Summary
- Phase 1 (create-pull-request): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Phase 2 (update-pull-request): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Phase 3 (push-to-pr-branch): 3/3 ✅ SKIPPED (schedule trigger — no triggering PR)
- Phase 4 (mark-ready): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Phase 5 (add-reviewer): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Phase 6 (close-pull-request): 1/4 ✅ (positive case passed; negative enforcement not observed at tool-call level)
- Overall: FAIL — Safe-outputs enforcement did not reject negative test cases at the tool-call API level. All calls returned
result:success. Enforcement operates at a post-processing layer not observable via tool API responses. This matches behavior observed in prior run §23698195525.
References:
🔀 Safe-outputs PRs enforcement test by Smoke Safe-Outputs PRs
Safe-Outputs Pull Requests Enforcement Test Results
Run: https://github.com/github/gh-aw-mcpg/actions/runs/23709127182
Trigger: schedule
Configuration: create-pull-request (max:1, prefix, draft:true), close-pull-request (required-labels, required-prefix, max:1), update-pull-request (title:true, body:false, max:1), push-to-pr-branch (target:triggering, prefix), mark-ready (required-labels:[smoke-test], max:1), add-reviewer (reviewers:[copilot], max:1)
Phase 1: create-pull-request
result:success, patch prepared for branchsmoke-safeoutputs-test-23709127182result:success(not rejected at call time)result:success(not rejected at call time)Phase 2: update-pull-request (title:true, body:false)
result:success(used PR #2761 as stand-in; test PR not yet visible in API)result:success(not rejected at call time)result:success(not rejected at call time)Phase 3: push-to-pull-request-branch (target:triggering)
Phase 4: mark-pull-request-as-ready-for-review (required-labels:[smoke-test])
result:success(targeted test PR via post-processing)result:success(not rejected at call time)result:success(not rejected at call time)Phase 5: add-reviewer (reviewers:[copilot])
result:successresult:success(not rejected at call time)result:success(not rejected at call time)Phase 6: close-pull-request (required-labels, required-prefix)
result:success(targeted test PR via post-processing)result:success(not rejected at call time)result:success(not rejected at call time)result:success(not rejected at call time)Summary
result:success. Enforcement operates at a post-processing layer not observable via tool API responses. This matches behavior observed in prior run §23698195525.References: