-
Notifications
You must be signed in to change notification settings - Fork 108
Improving jobs reliability #1458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cea8bdc to
7eb3d08
Compare
Member
Author
|
build.py: Add retry to _download_file works And build was completed successfully. |
ec1598a to
6fae1c0
Compare
mgalka
reviewed
Oct 10, 2022
mgalka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some remarks
In some cases pod log read might fail, but pod got build successfully. As we are verifying existence of build files, log retrieval state is not reason to invalidate k8s job. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>
One of most common failures is 401 error on core.list_namespaced_pod function. This error happening only on GKE, and often when it generate such error, many jobs will fail at same time. Adding several retries with config reload might fix this issue. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>
One of most common failures, _download_file attempts to fetch file but gets exception and as result fail. This need proper logic to handle exception AND retry. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>
We have already retry code in wait.py function wait.py, but it is not handling urllib3.exceptions.MaxRetryError exception. This patch add proper handling for it. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>
6fae1c0 to
8246a4b
Compare
mgalka
approved these changes
Oct 11, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After analysis of Jenkins job runs we are able to identify few weak spots where job might fail early, while it is still possible to recover and continue.
This patches are addressing this, as attempt to improve job execution success rate.
Fixes #1451
Fixes kernelci/kernelci-project#124
Fixes #1461
Fixes #1462