Skip to content

kubernetes_executor does not properly fail task-instances in cleanup_stuck_queued_tasks #39078

@waldoppper

Description

@waldoppper

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.8.3

What happened?

I'm chasing an issue involving a hard-to-reproduce issue with deferrable operators being "stuck" in Queued state.

In debugging, I noticed what seems to be a defficiency in the kubernetes_executor's override of cleanup_stuck_queued_tasks is not actually failing the instance.

What you think should happen instead?

Background: In debugging, I focused on the logs available at default log-level, which included

Marking task instance <TaskInstance: my_dag.my_group.my_task scheduled__2024-03-25T20:00:00+00:00 [queued]> stuck in queued as failed. If the task instance has available retries, it will be retried.

In reviewing the code printing this message, it seems clear that the expectation of a base_executor subclass is to fail the task-instances. For reference, the celery_executor does.

Problem: The kubernetes_executor does not.

Solution: It seems to me like it should.

How to reproduce

The true root cause of my issue is still a mystery to me. This is an attempt at fixing this safety net.

Operating System

debian

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==8.0.0

Deployment

Other 3rd-party Helm chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions