-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add CI jobs and tooling to aid with tracking backtracking pip issues #21825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CI jobs and tooling to aid with tracking backtracking pip issues #21825
Conversation
e8c3d33 to
bf4c98d
Compare
bf4c98d to
7a7f962
Compare
|
I think this one is ready for review and it should be really helpful in the future when analysing You can see an example of what happens, here: https://github.com/potiuk/airflow/actions/runs/1903230054 If there is a sudden problem with backtracking, there is a "failure" handler in the Build Image step and will produce candidates of packages that have been updated in the last 1 day: This will not only produce the list and information about those "candidates" but also a Also I improved something that we knew needs fixing (@ashb) - cancelling the "waiting" job in case Building Image fails. Rather than finding and cancelling job, I simply build and push empty images, which is properly detected by the "waiting for image" job: |
|
cc: @malthe |
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing some typos
984b1b7 to
ea35809
Compare
|
@dstandish -> not really an "easy" answer to your question ("what is wrong :)?" with the backtracking issues, but at least with this one we have convenient tooling and "followable" approach on how ot handle it next time when it happens - but we need to merge it so that it will start tracking the changes whenever it happens :D |
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed older name of commands.
c2f70c7 to
856f1df
Compare
This is a follow-up after investigation done with failing main builds (resulting in apache#21824 and tracked in pypa/pip#10924) Handling failure of image building has been also improved as part of the PR. Instead of separate "cancel" job (which did not really work anyway) we build and push empty images instead and the empty images are handled in the "wait for images" job with appropriate message.
856f1df to
99d2621
Compare
|
Would love this one to be merged to catch potential problems early :) |
|
Looks good to me – a bit unhappy that we have to add all this code in Airflow. It would be brilliant if it could be contributed to pip itself instead. |
|
Something like this will come in pip soon enough, but we can move quicker to fix our specific need right now. |
| Push empty CI images to finish waiting jobs: | ||
| ${{ matrix.python-version }}:${{ env.GITHUB_REGISTRY_PUSH_IMAGE_TAG }}" | ||
| if: failure() || cancelled() | ||
| run: ./scripts/ci/images/ci_push_empty_ci_images.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this in place of the cancel-on-ci-build step?
[cancel] which did not really work anyway
Oh really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Script to find recently released deps looks useful, I'm just not sure why we made this change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script did not work anyway (not sure if you remember but we discussed it and even tried to fix it some time ago but it kept on cancelling other stuff (I can find PR and Revert if you want).
The current script does not work and produces this:
Example here: https://github.com/apache/airflow/runs/5393691536?check_suite_focus=true#step:2:1
And since this is all about failing the image build, I figured it might be a good one to fix it and I figured that pushing empty image is much better way of "communication" with the script that waits for it. It's 100% accurate and FAR simpler.
But I can separate it out if you think it would be better.
ashb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, LGTM otherwise
|
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
|
Random "CI Job failed" . Re-running. |
|
Random issues (looking at them shortly). |



This is a follow-up after investigation done with failing main
builds (resulting in #21824 and tracked in
pypa/pip#10924)
Handling failure of image building has been also improved as part
of the PR. Instead of separate "cancel" job (which did not really
work anyway) we build and push empty images instead and the
empty images are handled in the "wait for images" job
with appropriate message.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.