[no-ci] Use collaborator permission API for restricted-paths-guard trust check#1930
[no-ci] Use collaborator permission API for restricted-paths-guard trust check#1930rwgk wants to merge 11 commits intoNVIDIA:mainfrom
Conversation
…iation The webhook event payload's author_association field is unreliable for PRs originating from forks: even if the author is an org member or explicit collaborator with maintain/write permissions, fork PRs receive CONTRIBUTOR. This change queries the collaborator permission API directly to get the author's actual permission level (admin/maintain/write/triage/read/none), which is authoritative regardless of whether the PR comes from a fork or a branch in the main repo. Requires contents:write permission to access the collaborator API endpoint. Made-with: Cursor
This commit is for testing the collaborator permission check and must be reverted before merge: 1. Changes trigger from pull_request_target to pull_request so this branch's workflow definition runs instead of main's. 2. Adds a dummy change to cuda_bindings/pyproject.toml to trigger the restricted-paths detection. REVERT THIS COMMIT BEFORE MERGE. Made-with: Cursor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Investigation:
|
| Author | NVIDIA Org Member? | Collaborator Role | Webhook Payload | Current API |
|---|---|---|---|---|
| rwgk | Yes | maintain | COLLABORATOR |
MEMBER |
| leofang | Yes | admin | MEMBER |
MEMBER |
| mdboom | Yes | maintain | CONTRIBUTOR |
MEMBER |
| cpcloud | Yes | maintain | CONTRIBUTOR |
MEMBER |
What "Current API" means: Querying GET /repos/NVIDIA/cuda-python/pulls/{number} right now returns author_association: MEMBER for all four users. This is the live, authoritative value. However, the webhook event payload (github.event.pull_request.author_association) that triggered the workflow contained different values.
Conclusion: This doesn't appear to be a clean "behavior changed on date X" situation. The inconsistency within the same timeframe suggests either a GitHub-side caching issue, a race condition, or some subtle factor related to how/when each user was added as both an org member and a collaborator. The exact root cause is unclear without access to GitHub's internal state or audit logs.
The fix in this PR sidesteps the unreliable webhook payload entirely by querying the collaborator permission API directly (GET /repos/{owner}/{repo}/collaborators/{username}/permission), which returns the live, authoritative permission level.
This reverts commit b814323.
|
Screenshot from testing with the temporary commit b814323:
|
| case "$AUTHOR_ASSOCIATION" in | ||
| COLLABORATOR|MEMBER|OWNER) | ||
| case "$COLLABORATOR_PERMISSION" in | ||
| admin|maintain|write) |
There was a problem hiding this comment.
I don't see any handling of the none case you're creating above.
There was a problem hiding this comment.
I meant: does it need to fail there? Or is none acceptable?
Address review feedback: explicitly handle the fallthrough case in the permission check to make it clear that triage, read, none, and API errors are not trusted signals. Made-with: Cursor
This commit is for testing the collaborator permission check and must be reverted before merge: 1. Changes trigger from pull_request_target to pull_request so this branch's workflow definition runs instead of main's. 2. Adds a dummy change to cuda_bindings/pyproject.toml to trigger the restricted-paths detection. REVERT THIS COMMIT BEFORE MERGE. Made-with: Cursor
|
Retested with https://github.com/NVIDIA/cuda-python/actions/runs/24570019045?pr=1930 |
This reverts commit 8686adb.
|
Copy-pasting a couple very short prompts and agent responses. What would you do? I think we should fail the workflow if we detect a response other than 200|404, no retry. I'm assuming it'll be rare, not worth the retry logic, but better than assigning the label just because there is an error in the API call. |
Treat 404 responses from the collaborator permission API as the expected non-collaborator case, but fail the workflow for any other API error so restricted-paths review labels are not added based on an unknown result. Made-with: Cursor
@cpcloud I went ahead and did that: commit 500cb29 I also asked what it would take to add retries with exponential backoff. The sketch that came out is a lot of extra code (see separate comment). I'm still thinking API call failures will be very rare, and it's better not to add that much code complexity. |
|
Cursor response to asking about exponential backoff capped by a 60 second deadline I can sketch this directly from the current workflow logic. The main change would be to wrap the collaborator-permission lookup in a small shell function that retries only for non- Yes. If we wanted that behavior in It would look roughly like this: COLLABORATOR_PERMISSION="not checked"
COLLABORATOR_PERMISSION_API_ERROR=""
COLLABORATOR_PERMISSION_ATTEMPTS=0
write_collaborator_permission_api_error() {
echo "- **Collaborator permission API error**:"
echo '```text'
printf '%s\n' "$COLLABORATOR_PERMISSION_API_ERROR"
echo '```'
}
fetch_collaborator_permission() {
local deadline_epoch now_epoch remaining_seconds
local response_file error_file http_status
local sleep_seconds=1
local attempt=0
response_file=$(mktemp)
error_file=$(mktemp)
deadline_epoch=$(( $(date +%s) + 60 ))
while :; do
attempt=$((attempt + 1))
COLLABORATOR_PERMISSION_ATTEMPTS="$attempt"
: >"$error_file"
if http_status=$(
curl -sS \
-o "$response_file" \
-w '%{http_code}' \
-H "Authorization: Bearer $GH_TOKEN" \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"https://api.github.com/repos/$REPO/collaborators/$PR_AUTHOR/permission" \
2>"$error_file"
); then
case "$http_status" in
200)
COLLABORATOR_PERMISSION=$(jq -r '.permission' <"$response_file")
rm -f "$response_file" "$error_file"
return 0
;;
404)
# Expected terminal result: author is not a collaborator.
COLLABORATOR_PERMISSION="none"
rm -f "$response_file" "$error_file"
return 0
;;
esac
else
http_status="curl-error"
fi
COLLABORATOR_PERMISSION="unknown"
COLLABORATOR_PERMISSION_API_ERROR=$(
{
printf 'Attempt: %s\n' "$attempt"
printf 'HTTP status: %s\n' "$http_status"
if [ -s "$error_file" ]; then
printf '\nTransport error:\n'
sed -n '1,20p' "$error_file"
fi
if [ -s "$response_file" ]; then
printf '\nResponse body:\n'
sed -n '1,40p' "$response_file"
fi
}
)
now_epoch=$(date +%s)
remaining_seconds=$((deadline_epoch - now_epoch))
if [ "$remaining_seconds" -le 0 ]; then
rm -f "$response_file" "$error_file"
return 1
fi
if [ "$sleep_seconds" -gt "$remaining_seconds" ]; then
sleep_seconds="$remaining_seconds"
fi
sleep "$sleep_seconds"
if [ "$sleep_seconds" -lt 16 ]; then
sleep_seconds=$((sleep_seconds * 2))
if [ "$sleep_seconds" -gt 16 ]; then
sleep_seconds=16
fi
fi
done
}And then the call site becomes: if [ "$TOUCHES_RESTRICTED_PATHS" = "true" ]; then
if ! fetch_collaborator_permission; then
echo "::error::Failed to inspect collaborator permission for $PR_AUTHOR after $COLLABORATOR_PERMISSION_ATTEMPTS attempt(s)."
{
echo "## Restricted Paths Guard Failed"
echo ""
echo "- **Error**: Failed to inspect collaborator permission before the 60 second deadline."
echo "- **Author**: $PR_AUTHOR"
echo "- **Collaborator permission**: $COLLABORATOR_PERMISSION"
echo "- **Collaborator permission lookup attempts**: $COLLABORATOR_PERMISSION_ATTEMPTS"
echo ""
write_matching_restricted_paths
echo ""
write_collaborator_permission_api_error
echo ""
echo "Please retry this workflow. If the failure persists, inspect the collaborator permission API error above."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
fi
case "$COLLABORATOR_PERMISSION" in
admin|maintain|write)
HAS_TRUSTED_SIGNAL=true
LABEL_ACTION="not needed (collaborator permission is a trusted signal)"
TRUSTED_SIGNALS="collaborator_permission:$COLLABORATOR_PERMISSION"
;;
*)
# triage, read, or none: not a trusted signal
;;
esac
fiBehavior-wise, that means:
I haven't applied this version yet. If you want, I can put it into the branch as a real patch so you can compare it against the current simpler fail-fast version. |
This commit is for testing the collaborator permission check and must be reverted before merge: 1. Changes trigger from pull_request_target to pull_request so this branch's workflow definition runs instead of main's. 2. Adds a dummy change to cuda_bindings/pyproject.toml to trigger the restricted-paths detection. REVERT THIS COMMIT BEFORE MERGE. Made-with: Cursor
|
Retested with https://github.com/NVIDIA/cuda-python/actions/runs/24609648527?pr=1930 |
This reverts commit 2a019b6.

The
author_associationfield from the webhook event payload is unreliable for PRs originating from forks. Even when the author is an NVIDIA org member and an explicit collaborator with maintain permissions, fork PRs can receiveCONTRIBUTORinstead of the expectedMEMBERorCOLLABORATOR(see comment below for investigation details).This PR replaces the event payload's
author_associationwith a direct query to the collaborator permission API (GET /repos/{owner}/{repo}/collaborators/{username}/permission), which returns the author's actual permission level regardless of whether the PR comes from a fork or a branch in the main repo.Changes:
admin,maintain, orwritepermissions404from the collaborator permission API as the expected non-collaborator casecontents: writepermission (required for the collaborator API endpoint)Testing:
pull_requestto validate this branch's workflow