Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion airflow/executors/base_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ def trigger_tasks(self, open_slots: int) -> None:

# Otherwise, we give up and remove the task from the queue.
self.log.error(
"could not queue task %s (still running after %d attempts).",
"Could not queue task %s (still running after %d attempts). Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#state-mismatch-between-airflow-database-and-executor",
key,
attempt.total_tries,
)
Expand Down
13 changes: 13 additions & 0 deletions docs/apache-airflow/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,19 @@
Troubleshooting
===============

Obscure scheduling failures
Copy link
Contributor

@eladkal eladkal Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather we won't place this here.

Our goal should be shorting this doc not adding more stuff to it.

I rather we explain the mechanisem of scheduler-executor dynamic in the relevant section of the docs and then explain the limits/edge cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular log message ("could not queue task %s (still running after %d attempts)") is essentially useless. The troubleshooting page adds context that we can't fit into a log message. It'd be difficult to provide this context in a way users can find in the normal documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you need to consider that people land in the doc without hitting the message you are refering to.

I wonder if maybe the message should refer to github discussion arround the problem. I am not a fan of troubleshooting pages. If this is a bug/limitation/problem we don't have a good solution for then it's an open issue.

^^^^^^^^^^^^^^^^^^^^^^^^^^^

State mismatch between Airflow database and executor
----------------------------------------------------

This indicates that when the scheduler queried the Airflow database, it observed that the task instance had one status according to the Airflow metadata database,
but a different status according to the executor. A common example is when the query returned to the scheduler, the task instance was in the ``queued`` status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will clarify which query.

but the status according to the executor was ``running``.

This mismatch must have persisted for multiple attempts. When this happens, Airflow will not attempt to queue the task. It's possible that something has gone wrong
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempts=airflow retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no -- good call that this needs to be clarified

in the executor, and the task may need to be cleared.

Obscure task failures
^^^^^^^^^^^^^^^^^^^^^

Expand Down