Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Feb 25, 2022

This is a follow-up after investigation done with failing main
builds (resulting in #21824 and tracked in
pypa/pip#10924)

Handling failure of image building has been also improved as part
of the PR. Instead of separate "cancel" job (which did not really
work anyway) we build and push empty images instead and the
empty images are handled in the "wait for images" job
with appropriate message.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk requested review from dstandish and uranusjr February 25, 2022 19:33
@potiuk potiuk force-pushed the add-tool-that-helps-with-backtracking-issues branch 3 times, most recently from e8c3d33 to bf4c98d Compare February 26, 2022 13:18
@potiuk potiuk requested a review from mik-laj as a code owner February 26, 2022 13:18
@potiuk potiuk force-pushed the add-tool-that-helps-with-backtracking-issues branch from bf4c98d to 7a7f962 Compare February 26, 2022 13:46
@potiuk
Copy link
Member Author

potiuk commented Feb 26, 2022

I think this one is ready for review and it should be really helpful in the future when analysing pip resolver issues.

You can see an example of what happens, here: https://github.com/potiuk/airflow/actions/runs/1903230054

If there is a sudden problem with backtracking, there is a "failure" handler in the Build Image step and will produce candidates of packages that have been updated in the last 1 day:

Screenshot 2022-02-26 at 14 47 16

This will not only produce the list and information about those "candidates" but also a pip install command that we should use to iterate in order to find the problem. Detailed instruction on what to do is added in this PR and should be avaliable under the link when we merge it.

Also I improved something that we knew needs fixing (@ashb) - cancelling the "waiting" job in case Building Image fails. Rather than finding and cancelling job, I simply build and push empty images, which is properly detected by the "waiting for image" job:

Screenshot 2022-02-26 at 14 48 32

@potiuk
Copy link
Member Author

potiuk commented Feb 26, 2022

@dstandish @uranusjr ^^

@potiuk
Copy link
Member Author

potiuk commented Feb 26, 2022

cc: @malthe

Copy link
Member Author

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing some typos

@potiuk potiuk force-pushed the add-tool-that-helps-with-backtracking-issues branch 2 times, most recently from 984b1b7 to ea35809 Compare February 26, 2022 15:11
@potiuk
Copy link
Member Author

potiuk commented Feb 26, 2022

@dstandish -> not really an "easy" answer to your question ("what is wrong :)?" with the backtracking issues, but at least with this one we have convenient tooling and "followable" approach on how ot handle it next time when it happens - but we need to merge it so that it will start tracking the changes whenever it happens :D

Copy link
Member Author

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed older name of commands.

@potiuk potiuk force-pushed the add-tool-that-helps-with-backtracking-issues branch from c2f70c7 to 856f1df Compare February 27, 2022 11:50
This is a follow-up after investigation done with failing main
builds (resulting in apache#21824 and tracked in
pypa/pip#10924)

Handling failure of image building has been also improved as part
of the PR. Instead of separate "cancel" job (which did not really
work anyway) we build and push empty images instead and the
empty images are handled in the "wait for images" job
with appropriate message.
@potiuk potiuk force-pushed the add-tool-that-helps-with-backtracking-issues branch from 856f1df to 99d2621 Compare February 27, 2022 11:52
@potiuk
Copy link
Member Author

potiuk commented Mar 1, 2022

Would love this one to be merged to catch potential problems early :)

@malthe
Copy link
Contributor

malthe commented Mar 2, 2022

Looks good to me – a bit unhappy that we have to add all this code in Airflow. It would be brilliant if it could be contributed to pip itself instead.

@ashb
Copy link
Member

ashb commented Mar 2, 2022

Something like this will come in pip soon enough, but we can move quicker to fix our specific need right now.

Push empty CI images to finish waiting jobs:
${{ matrix.python-version }}:${{ env.GITHUB_REGISTRY_PUSH_IMAGE_TAG }}"
if: failure() || cancelled()
run: ./scripts/ci/images/ci_push_empty_ci_images.sh
Copy link
Member

@ashb ashb Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this in place of the cancel-on-ci-build step?

[cancel] which did not really work anyway

Oh really?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Script to find recently released deps looks useful, I'm just not sure why we made this change)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script did not work anyway (not sure if you remember but we discussed it and even tried to fix it some time ago but it kept on cancelling other stuff (I can find PR and Revert if you want).

The current script does not work and produces this:

Screenshot 2022-03-02 at 18 58 09

Example here: https://github.com/apache/airflow/runs/5393691536?check_suite_focus=true#step:2:1

And since this is all about failing the image build, I figured it might be a good one to fix it and I figured that pushing empty image is much better way of "communication" with the script that waits for it. It's 100% accurate and FAR simpler.

But I can separate it out if you think it would be better.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, LGTM otherwise

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Mar 2, 2022
@github-actions
Copy link

github-actions bot commented Mar 2, 2022

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
@potiuk
Copy link
Member Author

potiuk commented Mar 2, 2022

Random "CI Job failed" . Re-running.

@potiuk
Copy link
Member Author

potiuk commented Mar 3, 2022

Random issues (looking at them shortly).

@potiuk potiuk merged commit 0b9257b into apache:main Mar 3, 2022
@potiuk potiuk deleted the add-tool-that-helps-with-backtracking-issues branch March 3, 2022 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants