-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Fix Fargate logging for AWS system tests #31622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Fargate logging for AWS system tests #31622
Conversation
|
@ferruzzi who made most of the work here |
|
You've reverted everything except adding the exception. Was that intentional? |
|
@vincbeck can you share an example traceback that caused you to add tenacity to I am just digging into KPO logging a bit again, and finding that the whole thing has become extremely messy and complicated, and I'm trying to chip away at some of the mess. And as part of that, the behavior of consume_logs is such that it is already called in a loop if there's an error. So it's a bit odd to also wrap it with tenacity, a kind of mixing of two different retry strategies that makes it a bit confusing. Now what can be done.... Well please observe that |
|
Hey @dstandish . I removed the tenacity wrapper around |
|
Hey, thanks @vincbeck , kind of you to check that. Ok, i'll make a PR to remove it for now. And then if it comes back, we can revisit. Thanks |
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache#31622 (comment).
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at #31622 (comment).
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache#31622 (comment).
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache/airflow#31622 (comment). GitOrigin-RevId: d6c79ce340dd4cd088edfa92ed052d643ae3587d
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache/airflow#31622 (comment). GitOrigin-RevId: d6c79ce340dd4cd088edfa92ed052d643ae3587d
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache/airflow#31622 (comment). GitOrigin-RevId: d6c79ce340dd4cd088edfa92ed052d643ae3587d
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache/airflow#31622 (comment). GitOrigin-RevId: d6c79ce340dd4cd088edfa92ed052d643ae3587d
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache/airflow#31622 (comment). GitOrigin-RevId: d6c79ce340dd4cd088edfa92ed052d643ae3587d
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache/airflow#31622 (comment). GitOrigin-RevId: d6c79ce340dd4cd088edfa92ed052d643ae3587d
There are many overlapping layers and strategies of retrying in this area of code. It appears this particular layer may be unnecessary. See discussion starting at apache/airflow#31622 (comment). GitOrigin-RevId: d6c79ce340dd4cd088edfa92ed052d643ae3587d
Both system tests
example_eks_with_fargate_in_one_stepandexample_eks_with_fargate_profileare failing for the same reason. When the operatorEksPodOperatoris used withget_logs=True, the operator tries to get log once the POD started. When doing so, 90% of the time it fails because of:After investigation, it turns out there is delay between when the pod starts and when the CSR is available, signed and approved. If you try to get logs with the command
kubectl logs <pod-name> -n <namespace>when the CSR is not available, signed and approved, you'll get the exact same error. If you wait until the CSR is there, you'll get the logs.Therefore, in order to fix it, I decided to just retry on
ApiExceptionwhich the is the exception we get in such scenario.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.