Skip to content

Conversation

@RNHTTR
Copy link
Contributor

@RNHTTR RNHTTR commented Aug 19, 2024

This was discussed at length in this comment to #40468

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lke the approach "something wrong" is good enough, it's great that this is actionable.

Troubleshooting
===============

Obscure scheduling failures
Copy link
Contributor

@eladkal eladkal Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather we won't place this here.

Our goal should be shorting this doc not adding more stuff to it.

I rather we explain the mechanisem of scheduler-executor dynamic in the relevant section of the docs and then explain the limits/edge cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular log message ("could not queue task %s (still running after %d attempts)") is essentially useless. The troubleshooting page adds context that we can't fit into a log message. It'd be difficult to provide this context in a way users can find in the normal documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you need to consider that people land in the doc without hitting the message you are refering to.

I wonder if maybe the message should refer to github discussion arround the problem. I am not a fan of troubleshooting pages. If this is a bug/limitation/problem we don't have a good solution for then it's an open issue.

----------------------------------------------------

This indicates that when the scheduler queried the Airflow database, it observed that the task instance had one status according to the Airflow metadata database,
but a different status according to the executor. A common example is when the query returned to the scheduler, the task instance was in the ``queued`` status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will clarify which query.

but a different status according to the executor. A common example is when the query returned to the scheduler, the task instance was in the ``queued`` status,
but the status according to the executor was ``running``.

This mismatch must have persisted for multiple attempts. When this happens, Airflow will not attempt to queue the task. It's possible that something has gone wrong
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempts=airflow retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no -- good call that this needs to be clarified

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 15, 2024
@github-actions github-actions bot closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Executors-core LocalExecutor & SequentialExecutor kind:documentation stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants