-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add documentation for state mismatch between db and executor #41593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lke the approach "something wrong" is good enough, it's great that this is actionable.
| Troubleshooting | ||
| =============== | ||
|
|
||
| Obscure scheduling failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rather we won't place this here.
Our goal should be shorting this doc not adding more stuff to it.
I rather we explain the mechanisem of scheduler-executor dynamic in the relevant section of the docs and then explain the limits/edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular log message ("could not queue task %s (still running after %d attempts)") is essentially useless. The troubleshooting page adds context that we can't fit into a log message. It'd be difficult to provide this context in a way users can find in the normal documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you need to consider that people land in the doc without hitting the message you are refering to.
I wonder if maybe the message should refer to github discussion arround the problem. I am not a fan of troubleshooting pages. If this is a bug/limitation/problem we don't have a good solution for then it's an open issue.
| ---------------------------------------------------- | ||
|
|
||
| This indicates that when the scheduler queried the Airflow database, it observed that the task instance had one status according to the Airflow metadata database, | ||
| but a different status according to the executor. A common example is when the query returned to the scheduler, the task instance was in the ``queued`` status, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will clarify which query.
| but a different status according to the executor. A common example is when the query returned to the scheduler, the task instance was in the ``queued`` status, | ||
| but the status according to the executor was ``running``. | ||
|
|
||
| This mismatch must have persisted for multiple attempts. When this happens, Airflow will not attempt to queue the task. It's possible that something has gone wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempts=airflow retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no -- good call that this needs to be clarified
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
This was discussed at length in this comment to #40468