Skip to content

Conversation

@uranusjr
Copy link
Member

@uranusjr uranusjr commented Oct 11, 2024

More details to come. And tests.

When a ti worker goes for too long without a heartbeat, the scheduler would identify it as a zombie, and stop it from running any longer. However, the scheduler does not perform housekeeping correctly, and in some cases the executor (of the ti) would still think the ti is still running.

This PR adds an explicit call to the executor to tell it the ti has been terminated and should be cleaned up.

Some tests were modified to check this cleanup is performing as expected. Those tests were actually not set up correctly, likely due to changes from AIP-61, mocking the wrong attribute and failing to actually assert the behaviour. They have been fixed to mock ExecutorLoader instead, which is the canonical executor source after AIP-61 implementation.

@boring-cyborg boring-cyborg bot added area:Scheduler including HA (high availability) scheduler kind:documentation labels Oct 11, 2024
@uranusjr uranusjr force-pushed the remove-zombie-from-executor branch from 4c7c211 to acf738c Compare October 11, 2024 13:01
The tests were not set up correctly in the first place, likely a side
effect from the per-ti-executor configuration changes (AIP-61). The
tests are fixed to mock the correct component instead (ExecutorLoader),
and checks added to assert ti cleanup in the executor.
@uranusjr uranusjr force-pushed the remove-zombie-from-executor branch from acf738c to d02e583 Compare October 15, 2024 08:07
@uranusjr
Copy link
Member Author

Alright, things should be good now. I also identified some test setup issues when trying to add checks for this logic, and fixed them in this PR. See (edited) description above.

@uranusjr uranusjr marked this pull request as ready for review October 15, 2024 08:14
@uranusjr uranusjr force-pushed the remove-zombie-from-executor branch from 443349b to c60c35e Compare October 15, 2024 16:04
@uranusjr uranusjr merged commit 6549b17 into apache:main Oct 16, 2024
@uranusjr uranusjr deleted the remove-zombie-from-executor branch October 16, 2024 02:32
uranusjr added a commit to astronomer/airflow that referenced this pull request Oct 16, 2024
R7L208 pushed a commit to R7L208/airflow that referenced this pull request Oct 17, 2024
@utkarsharma2 utkarsharma2 added this to the Airflow 2.10.3 milestone Oct 23, 2024
@utkarsharma2 utkarsharma2 added the type:bug-fix Changelog: Bug Fixes label Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler kind:documentation type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants