Skip to content

Conversation

@pankajkoti
Copy link
Member

Using the feature built in #32646, when the scheduler marks
tasks stuck in queued as failed, send such an explicit log
indicating the action to the task logs so that it helps users
identify why exactly the task was marked failed in such a case.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Using the feature built in apache#32646, when the scheduler marks tasks
stuck in queued as failed, send such an explicit log indicating
the action to the task logs so that it helps users identify why
exactly the task was marked failed in such a case.
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Nov 26, 2023
@pankajkoti
Copy link
Member Author

Screenshot 2023-11-26 at 12 23 20 PM

@pankajkoti
Copy link
Member Author

cc: @RNHTTR @vatsrahul1001

@pankajkoti pankajkoti added this to the Airflow 2.8.0 milestone Nov 26, 2023
"Marking task instance %s stuck in queued as failed. "
"If the task instance has available retries, it will be retried.",
ti,
ti=ti,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but we would lose this information if the user disables the feature. I think both scheduler and task should have the information for the time being.
Another thing that worries me is the performance implication but in theory, I don't think there would be many tasks stuck in queued state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't loose this information. Even if the user disables the feature the TCL would log this information using this site's logger in the scheduler itself https://github.com/apache/airflow/pull/32646/files#diff-fb48bd1344270ccbaadb60b2b7fbc5d74bb5440f908eedd384bd25ada648c05dR91

Yes, we can test the performance. If it hits the performance badly, we can disable the feature and still have logs as mentioned above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the call site logger is set from the scheduler component while initialising TCL instance here: https://github.com/apache/airflow/pull/32646/files#diff-b0491913f69327937706aea8fc77a71efeb979897898e405ade2b162ad862476R239

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to log both separately. One is for reasons of clarity. The other is that the information you need is slightly different in the two contexts. E.g. with task instance log, the message doesn't need to reference the task instance details, cus it's implied by the context. But these are things we can tweak later since everything is private.

Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>
@pankajkoti pankajkoti added the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label Nov 26, 2023
@pankajkoti pankajkoti closed this Nov 26, 2023
@pankajkoti pankajkoti reopened this Nov 26, 2023
@pankajkoti pankajkoti merged commit c7e1306 into apache:main Nov 26, 2023
@pankajkoti pankajkoti deleted the tcl-task-stuck-in-queued branch November 26, 2023 09:19
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants