KAFKA-14995: Automate asf.yaml collaborators refresh#13842
KAFKA-14995: Automate asf.yaml collaborators refresh#13842stevenbooke wants to merge 4 commits intoapache:trunkfrom stevenbooke:automate-asf-yaml-contributors-refresh
Conversation
|
Could someone review this PR and leave some feedback please? |
mimaison
left a comment
There was a problem hiding this comment.
Thanks for the PR. That seems a nice approach to automate this task. I left a few comments and suggestions.
| @@ -0,0 +1,24 @@ | |||
| name: Refresh asf.yaml collaborators every 3 months | |||
There was a problem hiding this comment.
We need to have the Apache license at the top of the file.
| @@ -0,0 +1,44 @@ | |||
| import os | |||
There was a problem hiding this comment.
We need to have the Apache license at the top of the file.
| yaml_content = yaml.safe_load(file.decoded_content) | ||
|
|
||
| # Update 'github_whitelist' list | ||
| github_whitelist = refreshed_collaborators[:10] # New users to be added |
There was a problem hiding this comment.
Isn't the length of refreshed_collaborators already 10? It's been assigned collaborators[:n] where n is 10.
| updated_yaml = yaml.safe_dump(yaml_content) | ||
|
|
||
| # Commit and push the changes | ||
| commit_message = "Update .asf.yaml file with refreshed github_whitelist, and collaborators" |
There was a problem hiding this comment.
We tend to prefix commit with either a Jira or MINOR. Maybe we can use MINOR: here
vvcephei
left a comment
There was a problem hiding this comment.
Hey @stevenbooke , thanks for this script!
I just have one concern (see below)
| ### GET THE CONTRIBUTORS OF THE apache/kafka REPO ### | ||
| n = 10 | ||
| repo = g.get_repo("apache/kafka") | ||
| contributors = repo.get_contributors() |
There was a problem hiding this comment.
I'm worried about taking the top ten contributors by lifetime commits, rather than by commits from the last year (as in git shortlog --email --numbered --summary --since=2022-04-28), since we have some prolific contributors in the past who are no longer active.
There was a problem hiding this comment.
Hello @vvcephei, I understand your concern. I will make changes to account for this.
…ntributors by lifetime commits, rather than by commits from the last year (as in git shortlog --email --numbered --summary --since=2022-04-28), since we have some prolific contributors in the past who are no longer active.'
|
Thanks @stevenbooke for the updates. The changes seem fine to me. @vvcephei do you have further comments? Also let's update apache/kafka-site#521 as we'll need to merge it first. |
|
@mimaison Do you think we can we move forward and merge this PR? |
mimaison
left a comment
There was a problem hiding this comment.
Thanks for the updates, I left a couple of comments.
| contributors_login_to_commit_volume = {} | ||
| end_date = datetime.now() | ||
| start_date = end_date - timedelta(days=365) | ||
| for commit in repo.get_commits(since=start_date, until=end_date): |
There was a problem hiding this comment.
Here repo points to apache/kafka-site. Shouldn't it point to apache/kafka instead?
There was a problem hiding this comment.
Yes, you are correct. Will change.
| end_date = datetime.now() | ||
| start_date = end_date - timedelta(days=365) | ||
| for commit in repo.get_commits(since=start_date, until=end_date): | ||
| login = commit.author.login |
There was a problem hiding this comment.
author can be None is the author is not a member of Github, see PyGithub/PyGithub#279.
We should handle this case as this happens from time to time.
There was a problem hiding this comment.
Thank you for pointing that out. I will make the appropriate changes.
|
Also I inadvertently run the script and you can see the commit it generated: a1f6ab6 It's pretty different from the current file, so there are a few other issues. The |
|
I also wonder if the tool should open a pull request instead of directly merging to trunk. It would make it easier to run the script locally as you can just close an unwanted PR while a commit as to be reverted. |
|
@mimaison As you have recommended, the script will create a new branch for the changes, commit the changes to the new branch, and open a pull request with the updated |
…an be 'None'. Ensure the updated '.asf.yaml' file retains the previous format ( retain the comments and only update the 'github_whitelist' list and the 'collaborators' list). Instead of directly merging to trunk, the script will create a new branch for the changes, commit the changes to the new branch, and open a pull request with the updated '.asf.yaml' file.
| ### GET THE NAMES OF THE KAFKA COMMITTERS FROM THE apache/kafka-site REPO ### | ||
| github_token = os.environ.get('GITHUB_TOKEN') | ||
| g = Github(github_token) | ||
| repo = g.get_repo("apache/kafka-site") |
There was a problem hiding this comment.
Could we retrieve the organization from the environment? That would allow running this action on forks too.
There was a problem hiding this comment.
We would not be able to retrieve the organization from the environment for "apache/kafka-site" due to the fact that "At the start of each workflow run, GitHub automatically creates a unique GITHUB_TOKEN secret to use in your workflow. You can use the GITHUB_TOKEN to authenticate in a workflow run.
When you enable GitHub Actions, GitHub installs a GitHub App on your repository. The GITHUB_TOKEN secret is a GitHub App installation access token. You can use the installation access token to authenticate on behalf of the GitHub App installed on your repository. The token's permissions are limited to the repository that contains your workflow." See reference here.
We would only be able to retrieve the organization from the environment for "apache/kafka".
| start_date = end_date - timedelta(days=365) | ||
| repo = g.get_repo("apache/kafka") | ||
| for commit in repo.get_commits(since=start_date, until=end_date): | ||
| if commit.author is None and commit.author.login is None: |
There was a problem hiding this comment.
This should be or instead of and, both conditions will never be true together
There was a problem hiding this comment.
Correct, will change.
|
|
||
| # Commit and push the changes | ||
| # Create a new branch for the changes | ||
| new_branch_name = "update-asf.yaml-github-whitelist-and-collaborators" |
There was a problem hiding this comment.
Do you know whether/how these branches will be deleted if the PR is merged? Typically it's the author that can delete the branch. I'm wondering what happens the second time this runs, will it force push to the branch or fail?
There was a problem hiding this comment.
I look into this more in the coming days and post an update.
There was a problem hiding this comment.
@mimaison If the PR is merged the branch will not be deleted.
I have updated the script to check if branch already exists. If so, the script will commit the changes to the branch, and open a pull request with the updated .asf.yaml file. Otherwise, the script will create a new branch for the changes, commit the changes to the new branch, and open a pull request with the updated .asf.yaml file.
Here is an example of what happens if changes are made to a branch that already has a pull request opened and merged:
A pull request is opened for the branch.
The pull request is merged.
Changes are made to the branch.
Another pull request is opened for the branch.
The second pull request will be based on the latest commit on the branch, which will include all of the changes that have been made to the branch since the first pull request was merged.
…nges to the branch, and open a pull request with the updated '.asf.yaml' file. Otherwise, the script will create a new branch for the changes, commit the changes to the new branch, and open a pull request with the updated '.asf.yaml' file.
|
@stevenbooke Thanks for the updates and sorry for the delay. I took another look at this PR today. Is this expected? |
|
This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch) If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
|
@stevenbooke Do you intend to finish this PR? |
|
@mimaison, @stevenbooke could I try to resume this work? |
Create Github Action workflow to run Python script which will automate the asf.yaml collaborators refresh.
Tested the workflow locally using https://github.com/nektos/act.
Committer Checklist (excluded from commit message)