-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add documentation for state mismatch between db and executor #41593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,19 @@ | |
| Troubleshooting | ||
| =============== | ||
|
|
||
| Obscure scheduling failures | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| State mismatch between Airflow database and executor | ||
| ---------------------------------------------------- | ||
|
|
||
| This indicates that when the scheduler queried the Airflow database, it observed that the task instance had one status according to the Airflow metadata database, | ||
| but a different status according to the executor. A common example is when the query returned to the scheduler, the task instance was in the ``queued`` status, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What query?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i will clarify which query. |
||
| but the status according to the executor was ``running``. | ||
|
|
||
| This mismatch must have persisted for multiple attempts. When this happens, Airflow will not attempt to queue the task. It's possible that something has gone wrong | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Attempts=airflow retries?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no -- good call that this needs to be clarified |
||
| in the executor, and the task may need to be cleared. | ||
|
|
||
| Obscure task failures | ||
| ^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rather we won't place this here.
Our goal should be shorting this doc not adding more stuff to it.
I rather we explain the mechanisem of scheduler-executor dynamic in the relevant section of the docs and then explain the limits/edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular log message (
"could not queue task %s (still running after %d attempts)") is essentially useless. The troubleshooting page adds context that we can't fit into a log message. It'd be difficult to provide this context in a way users can find in the normal documentation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you need to consider that people land in the doc without hitting the message you are refering to.
I wonder if maybe the message should refer to github discussion arround the problem. I am not a fan of troubleshooting pages. If this is a bug/limitation/problem we don't have a good solution for then it's an open issue.