Skip to content

[no-ci] Use collaborator permission API for restricted-paths-guard trust check#1930

Open
rwgk wants to merge 11 commits intoNVIDIA:mainfrom
rwgk:restricted-paths-guard_addl_collaborator_check
Open

[no-ci] Use collaborator permission API for restricted-paths-guard trust check#1930
rwgk wants to merge 11 commits intoNVIDIA:mainfrom
rwgk:restricted-paths-guard_addl_collaborator_check

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Apr 16, 2026

The author_association field from the webhook event payload is unreliable for PRs originating from forks. Even when the author is an NVIDIA org member and an explicit collaborator with maintain permissions, fork PRs can receive CONTRIBUTOR instead of the expected MEMBER or COLLABORATOR (see comment below for investigation details).

This PR replaces the event payload's author_association with a direct query to the collaborator permission API (GET /repos/{owner}/{repo}/collaborators/{username}/permission), which returns the author's actual permission level regardless of whether the PR comes from a fork or a branch in the main repo.

Changes:

  • Query collaborator permission API to get authoritative permission level
  • Trust authors with admin, maintain, or write permissions
  • Treat 404 from the collaborator permission API as the expected non-collaborator case
  • Fail the workflow on other collaborator permission API errors so the guard does not apply labels based on an unknown result
  • Add contents: write permission (required for the collaborator API endpoint)

Testing:

  • Temporarily switched trigger to pull_request to validate this branch's workflow
  • Confirmed the API returns the correct permission level for fork PRs

rwgk added 2 commits April 16, 2026 12:32
…iation

The webhook event payload's author_association field is unreliable for PRs
originating from forks: even if the author is an org member or explicit
collaborator with maintain/write permissions, fork PRs receive CONTRIBUTOR.

This change queries the collaborator permission API directly to get the
author's actual permission level (admin/maintain/write/triage/read/none),
which is authoritative regardless of whether the PR comes from a fork or
a branch in the main repo.

Requires contents:write permission to access the collaborator API endpoint.

Made-with: Cursor
This commit is for testing the collaborator permission check and must be
reverted before merge:

1. Changes trigger from pull_request_target to pull_request so this
   branch's workflow definition runs instead of main's.

2. Adds a dummy change to cuda_bindings/pyproject.toml to trigger the
   restricted-paths detection.

REVERT THIS COMMIT BEFORE MERGE.

Made-with: Cursor
@rwgk rwgk added this to the cuda.bindings next milestone Apr 16, 2026
@rwgk rwgk self-assigned this Apr 16, 2026
@rwgk rwgk added bug Something isn't working P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 16, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 16, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk rwgk changed the title WIP: restricted-paths-guard_addl_collaborator_check [no-ci] WIP: restricted-paths-guard_addl_collaborator_check Apr 16, 2026
@rwgk rwgk changed the title [no-ci] WIP: restricted-paths-guard_addl_collaborator_check [no-ci] Use collaborator permission API for restricted-paths-guard trust check Apr 16, 2026
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 16, 2026

Investigation: author_association inconsistency in webhook payloads

While investigating why mdboom and cpcloud's PRs showed CONTRIBUTOR instead of the expected COLLABORATOR or MEMBER, I discovered that the author_association field in webhook event payloads is inconsistent even within the same day (April 16):

Author NVIDIA Org Member? Collaborator Role Webhook Payload Current API
rwgk Yes maintain COLLABORATOR MEMBER
leofang Yes admin MEMBER MEMBER
mdboom Yes maintain CONTRIBUTOR MEMBER
cpcloud Yes maintain CONTRIBUTOR MEMBER

What "Current API" means: Querying GET /repos/NVIDIA/cuda-python/pulls/{number} right now returns author_association: MEMBER for all four users. This is the live, authoritative value. However, the webhook event payload (github.event.pull_request.author_association) that triggered the workflow contained different values.

Conclusion: This doesn't appear to be a clean "behavior changed on date X" situation. The inconsistency within the same timeframe suggests either a GitHub-side caching issue, a race condition, or some subtle factor related to how/when each user was added as both an org member and a collaborator. The exact root cause is unclear without access to GitHub's internal state or audit logs.

The fix in this PR sidesteps the unreliable webhook payload entirely by querying the collaborator permission API directly (GET /repos/{owner}/{repo}/collaborators/{username}/permission), which returns the live, authoritative permission level.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 16, 2026

Screenshot from testing with the temporary commit b814323:

Screenshot 2026-04-16 at 12 58 43

@rwgk rwgk marked this pull request as ready for review April 16, 2026 20:00
@rwgk rwgk requested a review from leofang April 16, 2026 20:03
case "$AUTHOR_ASSOCIATION" in
COLLABORATOR|MEMBER|OWNER)
case "$COLLABORATOR_PERMISSION" in
admin|maintain|write)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any handling of the none case you're creating above.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's explicit now: commit 45b8357

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant: does it need to fail there? Or is none acceptable?

rwgk added 3 commits April 17, 2026 07:18
Address review feedback: explicitly handle the fallthrough case in the
permission check to make it clear that triage, read, none, and API errors
are not trusted signals.

Made-with: Cursor
This commit is for testing the collaborator permission check and must be
reverted before merge:

1. Changes trigger from pull_request_target to pull_request so this
   branch's workflow definition runs instead of main's.

2. Adds a dummy change to cuda_bindings/pyproject.toml to trigger the
   restricted-paths detection.

REVERT THIS COMMIT BEFORE MERGE.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 17, 2026

Retested with pull_request trigger:

https://github.com/NVIDIA/cuda-python/actions/runs/24570019045?pr=1930

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 17, 2026

Copy-pasting a couple very short prompts and agent responses. What would you do?

I think we should fail the workflow if we detect a response other than 200|404, no retry. I'm assuming it'll be rare, not worth the retry logic, but better than assigning the label just because there is an error in the API call.

 > I meant: does it need to fail there? Or is none acceptable?

  What's your take?


  My take: none is acceptable and shouldn't cause the workflow to fail.

  Reasoning:

  1. none has two meanings: Either the API call failed, or the user genuinely has no collaborator relationship (e.g., first-time external
      contributor).


  2. Both cases should require review: The workflow's purpose is to flag PRs from untrusted authors. If we can't confirm trust, requiring
      review is the correct conservative behavior.


  3. Failing would be too aggressive: External contributors will legitimately have none permission. We want their PRs flagged for review,
      not blocked by a workflow failure.


  4. The review label is the gate: The Needs-Restricted-Paths-Review label (and the separate review-gate workflow) is what blocks merge,
     not this workflow's success/failure.

  The current design correctly treats "unknown/none permission" the same as "known untrusted" - both result in adding the review label,
  which is the intended behavior.

  Want me to draft a reply to the reviewer explaining this?


  > 1. none has two meanings: Either the API call failed, or

  Is that something we could distinguish somehow?


  Yes, we can distinguish them. The API returns:
  • 200 with permission level → user is a collaborator
  • 404 → user is not a collaborator (legitimate external contributor)
  • 403/5xx/network error → actual failure

  We could capture the HTTP status and handle them differently:

  # Capture both output and HTTP status
  COLLAB_RESPONSE=$(gh api "repos/$REPO/collaborators/$PR_AUTHOR/permission" 2>&1) && {
    COLLABORATOR_PERMISSION=$(echo "$COLLAB_RESPONSE" | jq -r '.permission')
  } || {
    if [[ "$COLLAB_RESPONSE" == *"404"* ]]; then
      COLLABORATOR_PERMISSION="none"  # Expected for external contributors
    else
      echo "::warning::Collaborator permission API failed: $COLLAB_RESPONSE"
      COLLABORATOR_PERMISSION="unknown"
    fi
  }

  Options for handling actual API failures:
  1. Warn but continue (require review) - fail-safe, doesn't block PRs
  2. Fail the workflow - surfaces the issue but blocks PRs
  3. Retry once - handles transient failures

  My recommendation: warn but continue. This surfaces the issue in the workflow log while still applying the conservative "require
  review" behavior. The reviewer would see the warning if they check the workflow run.

  Want me to implement this distinction?

rwgk added 2 commits April 18, 2026 09:38
Treat 404 responses from the collaborator permission API as the expected
non-collaborator case, but fail the workflow for any other API error so
restricted-paths review labels are not added based on an unknown result.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 18, 2026

I think we should fail the workflow if we detect a response other than 200|404, no retry. I'm assuming it'll be rare, not worth the retry logic, but better than assigning the label just because there is an error in the API call.

@cpcloud I went ahead and did that: commit 500cb29

I also asked what it would take to add retries with exponential backoff. The sketch that came out is a lot of extra code (see separate comment). I'm still thinking API call failures will be very rare, and it's better not to add that much code complexity.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 18, 2026

Cursor response to asking about exponential backoff capped by a 60 second deadline


I can sketch this directly from the current workflow logic. The main change would be to wrap the collaborator-permission lookup in a small shell function that retries only for non-404 failures, with exponential backoff capped by a 60 second deadline.

Yes. If we wanted that behavior in .github/workflows/restricted-paths-guard.yml, I'd probably make just this one lookup more explicit and use curl instead of gh api, because curl makes 200 vs 404 vs transport/5xx errors much easier to distinguish cleanly.

It would look roughly like this:

COLLABORATOR_PERMISSION="not checked"
COLLABORATOR_PERMISSION_API_ERROR=""
COLLABORATOR_PERMISSION_ATTEMPTS=0

write_collaborator_permission_api_error() {
  echo "- **Collaborator permission API error**:"
  echo '```text'
  printf '%s\n' "$COLLABORATOR_PERMISSION_API_ERROR"
  echo '```'
}

fetch_collaborator_permission() {
  local deadline_epoch now_epoch remaining_seconds
  local response_file error_file http_status
  local sleep_seconds=1
  local attempt=0

  response_file=$(mktemp)
  error_file=$(mktemp)
  deadline_epoch=$(( $(date +%s) + 60 ))

  while :; do
    attempt=$((attempt + 1))
    COLLABORATOR_PERMISSION_ATTEMPTS="$attempt"
    : >"$error_file"

    if http_status=$(
      curl -sS \
        -o "$response_file" \
        -w '%{http_code}' \
        -H "Authorization: Bearer $GH_TOKEN" \
        -H "Accept: application/vnd.github+json" \
        -H "X-GitHub-Api-Version: 2022-11-28" \
        "https://api.github.com/repos/$REPO/collaborators/$PR_AUTHOR/permission" \
        2>"$error_file"
    ); then
      case "$http_status" in
        200)
          COLLABORATOR_PERMISSION=$(jq -r '.permission' <"$response_file")
          rm -f "$response_file" "$error_file"
          return 0
          ;;
        404)
          # Expected terminal result: author is not a collaborator.
          COLLABORATOR_PERMISSION="none"
          rm -f "$response_file" "$error_file"
          return 0
          ;;
      esac
    else
      http_status="curl-error"
    fi

    COLLABORATOR_PERMISSION="unknown"
    COLLABORATOR_PERMISSION_API_ERROR=$(
      {
        printf 'Attempt: %s\n' "$attempt"
        printf 'HTTP status: %s\n' "$http_status"
        if [ -s "$error_file" ]; then
          printf '\nTransport error:\n'
          sed -n '1,20p' "$error_file"
        fi
        if [ -s "$response_file" ]; then
          printf '\nResponse body:\n'
          sed -n '1,40p' "$response_file"
        fi
      }
    )

    now_epoch=$(date +%s)
    remaining_seconds=$((deadline_epoch - now_epoch))
    if [ "$remaining_seconds" -le 0 ]; then
      rm -f "$response_file" "$error_file"
      return 1
    fi

    if [ "$sleep_seconds" -gt "$remaining_seconds" ]; then
      sleep_seconds="$remaining_seconds"
    fi
    sleep "$sleep_seconds"

    if [ "$sleep_seconds" -lt 16 ]; then
      sleep_seconds=$((sleep_seconds * 2))
      if [ "$sleep_seconds" -gt 16 ]; then
        sleep_seconds=16
      fi
    fi
  done
}

And then the call site becomes:

if [ "$TOUCHES_RESTRICTED_PATHS" = "true" ]; then
  if ! fetch_collaborator_permission; then
    echo "::error::Failed to inspect collaborator permission for $PR_AUTHOR after $COLLABORATOR_PERMISSION_ATTEMPTS attempt(s)."
    {
      echo "## Restricted Paths Guard Failed"
      echo ""
      echo "- **Error**: Failed to inspect collaborator permission before the 60 second deadline."
      echo "- **Author**: $PR_AUTHOR"
      echo "- **Collaborator permission**: $COLLABORATOR_PERMISSION"
      echo "- **Collaborator permission lookup attempts**: $COLLABORATOR_PERMISSION_ATTEMPTS"
      echo ""
      write_matching_restricted_paths
      echo ""
      write_collaborator_permission_api_error
      echo ""
      echo "Please retry this workflow. If the failure persists, inspect the collaborator permission API error above."
    } >> "$GITHUB_STEP_SUMMARY"
    exit 1
  fi

  case "$COLLABORATOR_PERMISSION" in
    admin|maintain|write)
      HAS_TRUSTED_SIGNAL=true
      LABEL_ACTION="not needed (collaborator permission is a trusted signal)"
      TRUSTED_SIGNALS="collaborator_permission:$COLLABORATOR_PERMISSION"
      ;;
    *)
      # triage, read, or none: not a trusted signal
      ;;
  esac
fi

Behavior-wise, that means:

  • 200: success, no retry
  • 404: success as none, no retry
  • anything else: retry with backoff 1, 2, 4, 8, 16, 16, ... until the 60 second deadline
  • deadline exceeded: fail the workflow

I haven't applied this version yet. If you want, I can put it into the branch as a real patch so you can compare it against the current simpler fail-fast version.

This commit is for testing the collaborator permission check and must be
reverted before merge:

1. Changes trigger from pull_request_target to pull_request so this
   branch's workflow definition runs instead of main's.

2. Adds a dummy change to cuda_bindings/pyproject.toml to trigger the
   restricted-paths detection.

REVERT THIS COMMIT BEFORE MERGE.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 18, 2026

Retested with pull_request trigger:

https://github.com/NVIDIA/cuda-python/actions/runs/24609648527?pr=1930

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI/CD CI/CD infrastructure P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants