Skip to content

Conversation

@LeonardoIshida
Copy link

@LeonardoIshida LeonardoIshida commented Nov 3, 2025

closes #57777, closes #57174, closes #53337

Problem

Currently, task timeouts are handled using SIGALRM within the task process itself. This approach is bad because:

  • SIGALRM can be blocked by other signals
  • Tasks can exceed their configured timeout without being terminated

Solution

This PR moves timeout enforcement from the task process to the supervisor:

  • Task process sends SetTaskExecutionTimeout message to supervisor before execution
  • Supervisor monitors task execution time using monotonic clock
  • Supervisor terminates task with SIGTERM when timeout is reached

Changes

  • Added SetTaskExecutionTimeout message to IPC protocol
  • Added timeout tracking and checking in supervisor
  • Modified task runner to send timeout to supervisor

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented Nov 3, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@LeonardoIshida LeonardoIshida force-pushed the handle-timeouts branch 2 times, most recently from e4e0db4 to 20d35b8 Compare November 5, 2025 00:15
@LeonardoIshida LeonardoIshida force-pushed the handle-timeouts branch 2 times, most recently from f1c0858 to e4e9af9 Compare November 10, 2025 00:48
@LeonardoIshida LeonardoIshida requested a review from ashb November 10, 2025 16:19
@LeonardoIshida LeonardoIshida force-pushed the handle-timeouts branch 2 times, most recently from 70d47bd to 47b6145 Compare November 18, 2025 00:13
@LeonardoIshida LeonardoIshida requested a review from ashb November 18, 2025 00:14
@potiuk
Copy link
Member

potiuk commented Nov 24, 2025

Hey @LeonardoIshida - are you going to continue working on it?

@potiuk
Copy link
Member

potiuk commented Nov 24, 2025

There are some test failures that need at least rebase if not fixing.

@LeonardoIshida
Copy link
Author

Hey @LeonardoIshida - are you going to continue working on it?

Hi, @potiuk, I am just waiting for a new review. And I will be working on this until it gets merged.
About the tests, rebasing fixed it.

@potiuk
Copy link
Member

potiuk commented Nov 24, 2025

I just ran your workflows (as a first time contributor your workflows need to be approved) - so it is yet to be seen if they are fixed. Generally getting the PR green is often a prerequisite for someone to take a look at it. And if you see failing tests - attempt to fix them (and as a first time contributor ping in the PR when you think you did - and rebased - so that the workflows can be run, rather than wait

@LeonardoIshida
Copy link
Author

My bad, did not know about that. I see that there are some tests failling, going to take a look into those ones. Thanks for the help @potiuk.

@LeonardoIshida
Copy link
Author

Hi, can someone approve my workflow again, please?

@potiuk
Copy link
Member

potiuk commented Nov 25, 2025

did

@LeonardoIshida
Copy link
Author

Hey guys, my final exams are kicking in and I am afraid I will not be able to continue working on this PR and issue.
I wanted to thank you guys for the support and patience.
Also, I want to apologize for not being able to solve this issue, I am sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DagBag Processing SIGSEGV causes runaway tasks with LocalExecutor Handle task timeouts (execution_timeout) at supervisor

4 participants