Skip to content

Conversation

@paolo-moriello
Copy link
Contributor

This is a follow-up of this PR: #39325

With the above change, the tenacity retry mechanism was introduced while waiting pod completion. For long running tasks, in fact, k8s credentials could expire while the task is still running, we are therefore refreshing credentials and retrying. However this did not completely solve the issue due to the stop_after_attempt(3): the job is still failing when credentials were expiring more than twice.

This PR attempts at fixing this issue by:

  1. removing the stop_after_attempt logic
  2. still failing the job in case credentials are invalid after refresh. we in fact still want to make sure the job doesn't run forever if it is producing 401s after refresh

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@paolo-moriello paolo-moriello force-pushed the cncf-k8s-pod-operator-401 branch from f28c328 to 0c0787c Compare September 23, 2024 05:44
@paolo-moriello paolo-moriello force-pushed the cncf-k8s-pod-operator-401 branch from 0c0787c to 389a59a Compare September 23, 2024 12:56
@paolo-moriello paolo-moriello force-pushed the cncf-k8s-pod-operator-401 branch from 389a59a to 815d765 Compare September 26, 2024 12:21
@romsharon98 romsharon98 merged commit 7782050 into apache:main Sep 27, 2024
@paolo-moriello paolo-moriello deleted the cncf-k8s-pod-operator-401 branch September 30, 2024 10:55
joaopamaral pushed a commit to joaopamaral/airflow that referenced this pull request Oct 21, 2024
…e#42361)

* Never stop retrying 401s in k8s pod operator

* Try reading pod after refreshing credentials

* Never stop retrying until credentials are refreshed

* Linting

---------

Co-authored-by: pmoriello <paolo.moriello@zalando.ch>
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
…e#42361)

* Never stop retrying 401s in k8s pod operator

* Try reading pod after refreshing credentials

* Never stop retrying until credentials are refreshed

* Linting

---------

Co-authored-by: pmoriello <paolo.moriello@zalando.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants