-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Ensure teardown tasks are executed when DAG run is set to failed #45530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure teardown tasks are executed when DAG run is set to failed #45530
Conversation
b9d8425 to
196b89c
Compare
shahar1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
One edge case that I thought about though* - what if the corresponding setup task hasn't finished running yet? (if such exists ofc)
For example:

If you set the DAG run to failed before the cluster was created, the delete_cluster should be skipped.
* Maybe the current architecture already handles, but it's worth checking before merging this PR.
196b89c to
031d7af
Compare
|
Added some docs as I thought about that slightly the behavior changes - to ensure it is properly documented. |
…o failed (#45530) * Ensure teardown tasks are executed when DAG run is set to failed * Also handle the case of setting DAG to success * Add some documentation to behavior changes * Add some documentation to behavior changes (cherry picked from commit 1e8977a) Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
…o failed (apache#45530) * Ensure teardown tasks are executed when DAG run is set to failed * Also handle the case of setting DAG to success * Add some documentation to behavior changes * Add some documentation to behavior changes (cherry picked from commit 1e8977a) Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
…o failed (#45530) (#45581) * [v2-10-test] Ensure teardown tasks are executed when DAG run is set to failed (#45530) * Ensure teardown tasks are executed when DAG run is set to failed * Also handle the case of setting DAG to success * Add some documentation to behavior changes * Add some documentation to behavior changes (cherry picked from commit 1e8977a) Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> * Remove type hints only working in Airflow 3 --------- Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> Co-authored-by: Jens Scheffler <jens.scheffler@de.bosch.com>
…che#45530) * Ensure teardown tasks are executed when DAG run is set to failed * Also handle the case of setting DAG to success * Add some documentation to behavior changes * Add some documentation to behavior changes
|
just notice this could potentially be a breaking change, but doesn't look like something we need a migration rule 🤔 would like to confirm whether we might want to have such a rule? |
I'd say it's a bugfix. Not all "change behaviour" is "breaking change". Every bugfix is a "behavioural change" in essence, the important thing is what is the feature intention and whether the change is "fixing" things or "changing intention on how things should work". SemVer is all about intentions and not about whether behaviour changes or not - see https://semver.org/ In this case it seems reasonable to assume that this behaviour was intention of the "teardown" behaviour - so technically it's a bugfix (and it's been already backported to 2.10 it seems. It might require a newsfragment, but IMHO - that's more than enough. |
For sure it does not need migration rules. No code change needed. Yeah and thought also a bit about if it is breaking, asked in #random / Slack for feedback and nobody except @dstandish was responding. Then after passing... I think it is really in the intend for why we have Setup/Teardown. Behaviour change is "just" because of the side effect that the DAG is not immediately failed if you set it to failed as Teardown need to be scheduled. |
…che#45530) * Ensure teardown tasks are executed when DAG run is set to failed * Also handle the case of setting DAG to success * Add some documentation to behavior changes * Add some documentation to behavior changes
…che#45530) * Ensure teardown tasks are executed when DAG run is set to failed * Also handle the case of setting DAG to success * Add some documentation to behavior changes * Add some documentation to behavior changes
…che#45530) * Ensure teardown tasks are executed when DAG run is set to failed * Also handle the case of setting DAG to success * Add some documentation to behavior changes * Add some documentation to behavior changes
|
So @jscheffl we are using 2.7.2 and we can see an issue where sometime when a task fails I am guessing it because of Task Kill might be due to Pod crash does not call failure callback could you please help me on this if this issue was also fixed or should I raise one. |
This was fixed in 2.10.5 - See milestone flag on the right side. Also Release Notes help in this |
Yeah thanks for that @jscheffl I saw the release notes and it says it will work for teardown when the state is changed manually to failed But @jscheffl I was thinking if it also covers when the issue appears due to Kill event during a pod crash which generates a SIGTERM ? |
This sounds like a different use case. But you can test it and if you think it is a bug then please open a new issue for this as it seems to be unrelated. But please test with 2.11.0 or most recent 3.0 |
Sure @jscheffl sounds good will test it and revert |

Related to Slack topic: https://apache-airflow.slack.com/archives/CCR6P6JRL/p1736440079894049
We noticed that if a DAG run is set to failed, all tasks are either set to failed or skipped. But if Teardown Tasks are used in a DAG, they are not executed. This could lead to infrastructure or external dependencies not properly cleaned-up.
This PR changes the behavior and does NOT fail/skip teardown tasks if a DAG is set to failed.
A side effect as consequence might be that the DAG is after the call NOT failed, else if it would set it to failed, then any teardown task (even if not skipped/failed) will not scheduled anymore.