-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Simplify the handle stuck in queued interface #43647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify the handle stuck in queued interface #43647
Conversation
| DM = DagModel | ||
|
|
||
| RESCHEDULE_STUCK_IN_QUEUED_EVENT = "rescheduling stuck in queued" | ||
| STUCK_IN_QUEUED_EVENT = "stuck in queued" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one reason i removed the "rescheduling" part is because at the point where you log this, you don't know that it's reschedulable -- you only know that further down.
| As a compromise between always failing a stuck task and always rescheduling a stuck task (which could | ||
| lead to tasks being stuck in queued forever without informing the user), we have creating the config | ||
| `[core] num_stuck_reschedules`. With this new configuration, an airflow admin can decide how | ||
| ``[scheduler] num_stuck_in_queued_retries``. With this new configuration, an airflow admin can decide how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a scheduler setting not core, and more a retry than a reschedule
| executor.fail(ti.key) | ||
| if not hasattr(executor, "cleanup_stuck_queued_tasks"): | ||
| continue | ||
| for ti in executor.cleanup_stuck_queued_tasks(tis=stuck_tis): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the thing that still bothers me @dimberman is, it doesn't feel right that we defer to the executor and only conditionally log if it "cleans up" the ti. we have already observed that it is stuck in queued so why not log that?
i guess the problem is we are logging the wrong event. the event is not that it is "stuck in queued" (which is an unconditional observation) but rather that it was requeued. that's the thing that conditionally happens.
8c71f3f to
d6d1caa
Compare
d6d1caa to
4021186
Compare
Proposed changes to #43520
This changes the signature of
cleanup_stuck_queued_taskssuch that it returns Iterable[TaskInstance] instead of List[str] (where the str is a repr).What are the implications....
Old provider with new executor:
New provider with old executor: