Skip to content

KAFKA-14995: Automate asf.yaml collaborators refresh#13842

Closed
stevenbooke wants to merge 4 commits intoapache:trunkfrom
stevenbooke:automate-asf-yaml-contributors-refresh
Closed

KAFKA-14995: Automate asf.yaml collaborators refresh#13842
stevenbooke wants to merge 4 commits intoapache:trunkfrom
stevenbooke:automate-asf-yaml-contributors-refresh

Conversation

@stevenbooke
Copy link
Copy Markdown
Contributor

Create Github Action workflow to run Python script which will automate the asf.yaml collaborators refresh.

Tested the workflow locally using https://github.com/nektos/act.

Committer Checklist (excluded from commit message)

  • [NA] Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@stevenbooke
Copy link
Copy Markdown
Contributor Author

Could someone review this PR and leave some feedback please?

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. That seems a nice approach to automate this task. I left a few comments and suggestions.

@@ -0,0 +1,24 @@
name: Refresh asf.yaml collaborators every 3 months
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have the Apache license at the top of the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Comment thread refresh-collaborators.py
@@ -0,0 +1,44 @@
import os
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have the Apache license at the top of the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wil do.

Comment thread refresh-collaborators.py Outdated
yaml_content = yaml.safe_load(file.decoded_content)

# Update 'github_whitelist' list
github_whitelist = refreshed_collaborators[:10] # New users to be added
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the length of refreshed_collaborators already 10? It's been assigned collaborators[:n] where n is 10.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix that.

Comment thread refresh-collaborators.py Outdated
updated_yaml = yaml.safe_dump(yaml_content)

# Commit and push the changes
commit_message = "Update .asf.yaml file with refreshed github_whitelist, and collaborators"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to prefix commit with either a Jira or MINOR. Maybe we can use MINOR: here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @stevenbooke , thanks for this script!

I just have one concern (see below)

Comment thread refresh-collaborators.py Outdated
### GET THE CONTRIBUTORS OF THE apache/kafka REPO ###
n = 10
repo = g.get_repo("apache/kafka")
contributors = repo.get_contributors()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about taking the top ten contributors by lifetime commits, rather than by commits from the last year (as in git shortlog --email --numbered --summary --since=2022-04-28), since we have some prolific contributors in the past who are no longer active.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @vvcephei, I understand your concern. I will make changes to account for this.

…ntributors by lifetime commits, rather than by commits from the last year (as in git shortlog --email --numbered --summary --since=2022-04-28), since we have some prolific contributors in the past who are no longer active.'
@mimaison
Copy link
Copy Markdown
Member

mimaison commented Jul 6, 2023

Thanks @stevenbooke for the updates. The changes seem fine to me. @vvcephei do you have further comments?

Also let's update apache/kafka-site#521 as we'll need to merge it first.

@stevenbooke
Copy link
Copy Markdown
Contributor Author

@mimaison Do you think we can we move forward and merge this PR?

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, I left a couple of comments.

Comment thread refresh-collaborators.py
contributors_login_to_commit_volume = {}
end_date = datetime.now()
start_date = end_date - timedelta(days=365)
for commit in repo.get_commits(since=start_date, until=end_date):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here repo points to apache/kafka-site. Shouldn't it point to apache/kafka instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. Will change.

Comment thread refresh-collaborators.py
end_date = datetime.now()
start_date = end_date - timedelta(days=365)
for commit in repo.get_commits(since=start_date, until=end_date):
login = commit.author.login
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

author can be None is the author is not a member of Github, see PyGithub/PyGithub#279.

We should handle this case as this happens from time to time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out. I will make the appropriate changes.

@mimaison
Copy link
Copy Markdown
Member

Also I inadvertently run the script and you can see the commit it generated: a1f6ab6

It's pretty different from the current file, so there are a few other issues. The jenkins section looks incorrect and it would be good to also keep the comments if possible.

@mimaison
Copy link
Copy Markdown
Member

I also wonder if the tool should open a pull request instead of directly merging to trunk. It would make it easier to run the script locally as you can just close an unwanted PR while a commit as to be reverted.

@stevenbooke
Copy link
Copy Markdown
Contributor Author

stevenbooke commented Jul 18, 2023

@mimaison As you have recommended, the script will create a new branch for the changes, commit the changes to the new branch, and open a pull request with the updated .asf.yaml file. The updated .asf.yaml file will be the same format as it was previously (it will retain the comments and only update the github_whitelist list and the collaborators list).

…an be 'None'. Ensure the updated '.asf.yaml' file retains the previous format ( retain the comments and only update the 'github_whitelist' list and the 'collaborators' list). Instead of directly merging to trunk, the script will create a new branch for the changes, commit the changes to the new branch, and open a pull request with the updated '.asf.yaml' file.
Comment thread refresh-collaborators.py
### GET THE NAMES OF THE KAFKA COMMITTERS FROM THE apache/kafka-site REPO ###
github_token = os.environ.get('GITHUB_TOKEN')
g = Github(github_token)
repo = g.get_repo("apache/kafka-site")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we retrieve the organization from the environment? That would allow running this action on forks too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would not be able to retrieve the organization from the environment for "apache/kafka-site" due to the fact that "At the start of each workflow run, GitHub automatically creates a unique GITHUB_TOKEN secret to use in your workflow. You can use the GITHUB_TOKEN to authenticate in a workflow run.

When you enable GitHub Actions, GitHub installs a GitHub App on your repository. The GITHUB_TOKEN secret is a GitHub App installation access token. You can use the installation access token to authenticate on behalf of the GitHub App installed on your repository. The token's permissions are limited to the repository that contains your workflow." See reference here.

We would only be able to retrieve the organization from the environment for "apache/kafka".

Comment thread refresh-collaborators.py Outdated
start_date = end_date - timedelta(days=365)
repo = g.get_repo("apache/kafka")
for commit in repo.get_commits(since=start_date, until=end_date):
if commit.author is None and commit.author.login is None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be or instead of and, both conditions will never be true together

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, will change.

Comment thread refresh-collaborators.py Outdated

# Commit and push the changes
# Create a new branch for the changes
new_branch_name = "update-asf.yaml-github-whitelist-and-collaborators"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know whether/how these branches will be deleted if the PR is merged? Typically it's the author that can delete the branch. I'm wondering what happens the second time this runs, will it force push to the branch or fail?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look into this more in the coming days and post an update.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimaison If the PR is merged the branch will not be deleted.

I have updated the script to check if branch already exists. If so, the script will commit the changes to the branch, and open a pull request with the updated .asf.yaml file. Otherwise, the script will create a new branch for the changes, commit the changes to the new branch, and open a pull request with the updated .asf.yaml file.

Here is an example of what happens if changes are made to a branch that already has a pull request opened and merged:

A pull request is opened for the branch.
The pull request is merged.
Changes are made to the branch.
Another pull request is opened for the branch.
The second pull request will be based on the latest commit on the branch, which will include all of the changes that have been made to the branch since the first pull request was merged.

…nges to the branch, and open a pull request with the updated '.asf.yaml' file. Otherwise, the script will create a new branch for the changes, commit the changes to the new branch, and open a pull request with the updated '.asf.yaml' file.
@mimaison
Copy link
Copy Markdown
Member

@stevenbooke Thanks for the updates and sorry for the delay. I took another look at this PR today.
When trying to run the script, I get the following error:

Traceback (most recent call last):
  File "/private/tmp/test/./refresh-collaborators.py", line 79, in <module>
    branch = repo.get_branch(branch=branch_name)
  File "/private/tmp/test/lib/python3.9/site-packages/github/Repository.py", line 1939, in get_branch
    headers, data = self._requester.requestJsonAndCheck(
  File "/private/tmp/test/lib/python3.9/site-packages/github/Requester.py", line 442, in requestJsonAndCheck
    return self.__check(
  File "/private/tmp/test/lib/python3.9/site-packages/github/Requester.py", line 487, in __check
    raise self.__createException(status, responseHeaders, data)
github.GithubException.GithubException: 404 {"message": "Branch not found", "documentation_url": "https://docs.github.com/rest/branches/branches#get-a-branch"}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/private/tmp/test/./refresh-collaborators.py", line 90, in <module>
    repo.create_git_ref(f"refs/heads/{branch_name}", repo.default_branch)
  File "/private/tmp/test/lib/python3.9/site-packages/github/Repository.py", line 1085, in create_git_ref
    headers, data = self._requester.requestJsonAndCheck(
  File "/private/tmp/test/lib/python3.9/site-packages/github/Requester.py", line 442, in requestJsonAndCheck
    return self.__check(
  File "/private/tmp/test/lib/python3.9/site-packages/github/Requester.py", line 487, in __check
    raise self.__createException(status, responseHeaders, data)
github.GithubException.GithubException: 422 {"message": "Invalid request.\n\nAt least 40 characters are required; only 5 were supplied.", "documentation_url": "https://docs.github.com/rest/git/refs#create-a-reference"}

Is this expected?

@github-actions
Copy link
Copy Markdown

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions Bot added the stale Stale PRs label Dec 11, 2023
@mimaison
Copy link
Copy Markdown
Member

@stevenbooke Do you intend to finish this PR?

@stevenbooke stevenbooke closed this by deleting the head repository May 3, 2024
@fonsdant
Copy link
Copy Markdown
Contributor

fonsdant commented Sep 7, 2024

@mimaison, @stevenbooke could I try to resume this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants