Skip to content

[smoke-safeoutputs] Smoke Safe-Outputs PRs: 23709127182 #2766

@github-actions

Description

@github-actions

Safe-Outputs Pull Requests Enforcement Test Results

Run: https://github.com/github/gh-aw-mcpg/actions/runs/23709127182
Trigger: schedule
Configuration: create-pull-request (max:1, prefix, draft:true), close-pull-request (required-labels, required-prefix, max:1), update-pull-request (title:true, body:false, max:1), push-to-pr-branch (target:triggering, prefix), mark-ready (required-labels:[smoke-test], max:1), add-reviewer (reviewers:[copilot], max:1)

Note on enforcement behavior: All safe-outputs tool calls returned {"result":"success"}. Enforcement operates at a post-processing layer rather than at tool-call time. The created test PR will be visible once patches are processed. Negative test outcomes below reflect that the tool API did not explicitly reject the calls at invocation time.

Phase 1: create-pull-request

Test Operation Expected Actual Status
1.1 Create draft PR (valid prefix) ✅ Processed Tool returned result:success, patch prepared for branch smoke-safeoutputs-test-23709127182
1.2 Create PR without prefix ❌ Rejected Tool returned result:success (not rejected at call time)
1.3 Create 2nd PR (max exceeded) ❌ Rejected Tool returned result:success (not rejected at call time)

Phase 2: update-pull-request (title:true, body:false)

Test Operation Expected Actual Status
2.1 Update title (allowed) ✅ Processed Tool returned result:success (used PR #2761 as stand-in; test PR not yet visible in API)
2.2 Update body (body: false) ❌ Rejected Tool returned result:success (not rejected at call time)
2.3 2nd update (max: 1 exceeded) ❌ Rejected Tool returned result:success (not rejected at call time)

Phase 3: push-to-pull-request-branch (target:triggering)

Test Operation Expected Actual Status
3.1 Push to triggering PR (matching prefix) ✅ Processed SKIPPED - no triggering PR (schedule run) ✅ SKIPPED
3.2 Push to non-triggering PR ❌ Rejected SKIPPED - no triggering PR (schedule run) ✅ SKIPPED
3.3 Push to PR without matching prefix ❌ Rejected SKIPPED - no triggering PR (schedule run) ✅ SKIPPED

Phase 4: mark-pull-request-as-ready-for-review (required-labels:[smoke-test])

Test Operation Expected Actual Status
4.1 Mark PR with smoke-test label as ready ✅ Processed Tool returned result:success (targeted test PR via post-processing)
4.2 Mark PR without required label (PR #2761) ❌ Rejected Tool returned result:success (not rejected at call time)
4.3 2nd mark-as-ready (max: 1 exceeded) ❌ Rejected Tool returned result:success (not rejected at call time)

Phase 5: add-reviewer (reviewers:[copilot])

Test Operation Expected Actual Status
5.1 Add reviewer "copilot" (allowed) ✅ Processed Tool returned result:success
5.2 Add non-allowed reviewer ("octocat") ❌ Rejected Tool returned result:success (not rejected at call time)
5.3 Add 2nd reviewer (max: 1 exceeded) ❌ Rejected Tool returned result:success (not rejected at call time)

Phase 6: close-pull-request (required-labels, required-prefix)

Test Operation Expected Actual Status
6.1 Close PR with required label+prefix ✅ Processed Tool returned result:success (targeted test PR via post-processing)
6.2 Close PR without required label (PR #2761) ❌ Rejected Tool returned result:success (not rejected at call time)
6.3 Close PR without required prefix (PR #2750) ❌ Rejected Tool returned result:success (not rejected at call time)
6.4 2nd close (max: 1 exceeded) ❌ Rejected Tool returned result:success (not rejected at call time)

Summary

  • Phase 1 (create-pull-request): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
  • Phase 2 (update-pull-request): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
  • Phase 3 (push-to-pr-branch): 3/3 ✅ SKIPPED (schedule trigger — no triggering PR)
  • Phase 4 (mark-ready): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
  • Phase 5 (add-reviewer): 1/3 ✅ (positive case passed; negative enforcement not observed at tool-call level)
  • Phase 6 (close-pull-request): 1/4 ✅ (positive case passed; negative enforcement not observed at tool-call level)
  • Overall: FAIL — Safe-outputs enforcement did not reject negative test cases at the tool-call API level. All calls returned result:success. Enforcement operates at a post-processing layer not observable via tool API responses. This matches behavior observed in prior run §23698195525.

References:

🔀 Safe-outputs PRs enforcement test by Smoke Safe-Outputs PRs

  • expires on Mar 29, 2026, 2:40 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions