-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.8.3
What happened?
I'm chasing an issue involving a hard-to-reproduce issue with deferrable operators being "stuck" in Queued state.
In debugging, I noticed what seems to be a defficiency in the kubernetes_executor's override of cleanup_stuck_queued_tasks is not actually failing the instance.
What you think should happen instead?
Background: In debugging, I focused on the logs available at default log-level, which included
Marking task instance <TaskInstance: my_dag.my_group.my_task scheduled__2024-03-25T20:00:00+00:00 [queued]> stuck in queued as failed. If the task instance has available retries, it will be retried.
In reviewing the code printing this message, it seems clear that the expectation of a base_executor subclass is to fail the task-instances. For reference, the celery_executor does.
Problem: The kubernetes_executor does not.
Solution: It seems to me like it should.
How to reproduce
The true root cause of my issue is still a mystery to me. This is an attempt at fixing this safety net.
Operating System
debian
Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==8.0.0
Deployment
Other 3rd-party Helm chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct