-
Notifications
You must be signed in to change notification settings - Fork 16.8k
Fix deadlock when mapped task with removed upstream is rerun #26518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ephraimbuddy
merged 3 commits into
apache:main
from
astronomer:fix-dynamic-task-rerun-deadlock
Sep 21, 2022
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we compare map index with number of upstream successes? that seems odd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes now you mention it this feels like it's going to break in some other cases.
Like what if there is 1 mapped upstream which is in the failed state, one in the removed state, this would erroneously remove it I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ephraimbuddy Could you take another look at this PR/case please?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case,
successeswill be0, also failed=1, so the condition will not be reached and the taskinstance will be marked asupstream_failed. Same thing when we have skipped task instances. The condition to mark the task instance as removed will not be reached.The condition for the task to be marked
removedis if we have someremovedtask instances andsuccessfultask instances,no failed,no skippedand the task is mapped. So if we get here, if themap_indexof the task instance is >= all successful task instances, it means the task instance upstream is removed because indexes go from -1 upwards, it's not possible to remove map_index 1 and still have map_index 3?If we have 5 mapped tasks(0,1,2,3,4), and we remove 2, we will have 3 mapped tasks(0,1,2). If these 3 are successful,(successes=3), then the removed are those greater than or equal to the map index 3(3,4).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a task has multiple upstreams?
[a, b] >> mapped_task(list_gen)for instance?Edit:
[a, b] >> mapped_task.map(list_gen)for instance?And a is success, b is failure, and list_gen is reduced to only returning a single item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would apply
airflow/airflow/ti_deps/deps/trigger_rule_dep.py
Lines 165 to 166 in 0c7b4cb
Line 169 is only satisfied if we have
removed,successes,nofailed, noskippedand mapped taskThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh good. This is important enough functionality (it's the very core of Airflow) that we should add atest cases covering things like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like it's covered here:
airflow/tests/models/test_taskinstance.py
Lines 1065 to 1216 in 7d6d182
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is removing this part altogether. It's not part of the deadlock issue but I feel that it's good to have stuff.
My reason is this:
If at first run upstream was 3 and downstream was 3 too. Upstream created the downstream. We have 3 -> 3 successes.
Then we reduce upstream to 2, meaning one task is removed and we clear and rerun the dag, without this part of the change, we will end up running all 3 of the downstreams: upstream (2 successful, 1 removed). Downstream(3 successful)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not this one specific line, they are all like this and that's the worry.
The test you highlighted doesn't use mapped tasks so I don't think it covers the case I highlighted. Edit: sorry, original example didn't have a map. Added that.