Fix waiting setup tasks when they are not a direct upstream by hussein-awala · Pull Request #33570 · apache/airflow

hussein-awala · 2023-08-20T20:26:25Z

This PR force the TI to wait all the upstream setup tasks event if they are not a direct upstream. Please check #33561 for more details.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

dstandish · 2023-08-24T16:54:20Z

good catch. bummer we missed this one. will take a look here.

dstandish · 2023-08-24T17:04:29Z

airflow/ti_deps/deps/trigger_rule_dep.py


        upstream_done = done >= upstream
+        setup_done = (success_setup + skipped_setup + failed_setup) >= upstream_setup
+        is_tear_down = task.is_teardown or trigger_rule == TR.ALL_DONE_SETUP_SUCCESS


i don't think you need to check both is_teardown and trigger rule.
it should be sufficient to check the trigger rule.
and with that, it may be unnecessary to create this variable

dstandish · 2023-08-24T17:10:02Z

@hussein-awala i noticed there are some failing tests. is this ready for review or are you still working through some edge cases?

hussein-awala · 2023-08-24T20:52:17Z

@hussein-awala i noticed there are some failing tests. is this ready for review or are you still working through some edge cases?

It was supposed to be ready (I tested it and it worked perfectly as expected), but there seems to be a problem with the mapped operators (which I didn't test), and I didn't have time to check it. I'll resume working on it this weekend, and I'll draft it in the meantime.

dstandish · 2023-08-24T21:39:12Z

Cool, so one thing @hussein-awala ... I think in order to ensure consistent behavior, we must also add something like this to function validate_setup_teardown

            if task.is_setup:
                for down_task in task.downstream_list:
                    if not down_task.is_teardown and down_task.trigger_rule not in one_success_rules:
                        # this is required to ensure consistent clearing behavior when upstream
                        raise ValueError(
                            "Setup tasks must be followed with trigger rule ALL_SUCCESS or ONE_SUCCESS."
                        )

or perhaps better, do the check when setting upstream / downstream. The reason is that, what you are doing (and needfully so) is essentially to check that upstream setups are done and successful before the downstream (which is not directly connected) may run. BUT, if a user uses a trigger rule such as one_failed following a setup task, then this can create odd results where, the scenario where the task runs is when the setup fails, but we're clearing it and running it, and the setup succeeds, so it shouldn't run. It's very odd and nonsensical, but we should guard against it and for consistency. If it's not clear I can try and create an example. We could perhaps relax this a bit to allow other trigger rules for work tasks not inside the scope of the setup.

Other point... on the topic of scope of the setup.... I think your logic for get_setups_only may be incorrect because it gets, i think, all upstream setups, but note that the scope of the setup can be constrained by adding a teardown, which means that anything that is not between the setup and its teardown is not assumed to require that setup. i think the easiest way for you to obtain just the "relevant" setups is to just use get_upstreams_only_setups_and_teardowns and filter out the teardowns. This method restricts to just setups and teardowns for which the object task is "in scope".

- update the method which find all upstream setup tasks - update some code according to code review

dstandish · 2023-08-26T15:26:51Z

i'm gonna collaborate on this with you if that's alright since time is short.
i will rebase and add a couple changes which of course you can revert if you have concerns. i'l try to make it obvious what each commit does.

dstandish · 2023-08-26T15:27:59Z

oh darn cannot rebase let's see if i can just add

…ven indirect

dstandish · 2023-08-26T15:31:37Z

tests/models/test_taskinstance.py

+            ),
        ],
    )
    def test_check_task_dependencies(


i think that some of these tests, where you've made substantial changes, i think we should just make a new test.

it's very important that we're confident that we're not breaking anything here, so rather than make such significant changes to the test, i think it's preferable to add a new one alongside. i'll try to do this right now.

It was my first thought, but I tried to avoid duplicating the code. But yeah +1

dstandish · 2023-08-26T15:42:11Z

tests/models/test_taskinstance.py

    #   successes, skipped, failed, upstream_failed, removed, done
    @pytest.mark.parametrize(
-        "trigger_rule, upstream_setups, upstream_states, flag_upstream_failed, expect_state, expect_passed",
+        "trigger_rule, direct_upstream_setups, indirect_upstream_setups, upstream_states,"


there are so many changes to tests it makes it a bit hard to review.

dstandish · 2023-08-26T15:46:15Z

airflow/ti_deps/deps/trigger_rule_dep.py

    done: int
    success_setup: int
    skipped_setup: int
+    failed_setup: int


until now, everything in this class is direct upstreams. is failed_setup direct only or does it include indirect too? if it includes indirect too, it should probably be clarified through a more precise variable name. but perhaps better would be to avoid mixing direct and indirect in the same class if it can be avoided. perhaps we can just add the information through an optional argument in calculate or something. this would also make the diff easier to deal with.

the other issue is, there are more states than just "failed" that we need to account for. there is also upstream_failed, skipped, etc -- essentially anything other than success.

is failed_setup direct only or does it include indirect too?

yes it includes indirect too. The same for success_setup and skipped_setup

For me there is 4 different cases:

running (all the unfinished states) -> we should wait

success -> we can run the task

failed (failed or upstream_failed) -> fail

skipped with any fail -> we should skipp
So failed here includes failed and upstream_failed:

return _UpstreamTIStates( success=counter.get(TaskInstanceState.SUCCESS, 0), skipped=counter.get(TaskInstanceState.SKIPPED, 0), failed=counter.get(TaskInstanceState.FAILED, 0), upstream_failed=counter.get(TaskInstanceState.UPSTREAM_FAILED, 0), removed=counter.get(TaskInstanceState.REMOVED, 0), done=sum(counter.values()), success_setup=setup_counter.get(TaskInstanceState.SUCCESS, 0), skipped_setup=setup_counter.get(TaskInstanceState.SKIPPED, 0), failed_setup=setup_counter.get(TaskInstanceState.FAILED, 0) + setup_counter.get(TaskInstanceState.UPSTREAM_FAILED, 0), )

dstandish · 2023-08-26T15:50:36Z

ok yeah that worked... ok so yeah, pushed a few changes, mainly the only changes i made were with respect to making the diff easier to follow. added a few comments. will continue to review.

dstandish · 2023-08-26T15:52:26Z

airflow/ti_deps/deps/trigger_rule_dep.py

            return
-        if ti.task.trigger_rule == TR.ALWAYS:
+        if ti.task.trigger_rule == TR.ALWAYS and not setup_upstream_tasks:
+            # even with ALWAYS trigger rule, we still need to check setup tasks


i think always should probably keep the same behavior. it says in the docs No dependencies at all, run this task at any time. it used to be called "dummy" trigger rule. so i think we just keep the absolute short circuit here.

it's really a non-sensical rule. with it, the deps are just for show. no one should ever really use this trigger rule. perhaps it's there for testing 🤷

dstandish · 2023-08-26T15:54:22Z

airflow/ti_deps/deps/trigger_rule_dep.py

            yield self._passing_status(reason="The task had a always trigger rule set.")
            return
-        yield from self._evaluate_trigger_rule(ti=ti, dep_context=dep_context, session=session)
+        yield from self._evaluate_trigger_rule(


given that there's no need to evaluate whether there are upstream setups for the purpose of TR.ALWAYS, i think we no longer need this new param setup_upstream_tasks in _evaluate_trigger_rule

….ALWAYS behavior

…r rule dep check

airflow/models/dag.py

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>

…tead of tasks list

…getting all tasks

uranusjr · 2023-08-30T08:53:18Z

airflow/models/dag.py

+                for down_task in task.downstream_list:
+                    if not down_task.is_teardown and down_task.trigger_rule != TriggerRule.ALL_SUCCESS:
+                        # this is required to ensure consistent clearing behavior when upstream
+                        raise ValueError("Setup tasks must be followed with trigger rule ALL_SUCCESS.")


Since we’ve drilled this deep, can this show what the offending task is for clarity?

uranusjr · 2023-08-30T08:55:16Z

airflow/ti_deps/deps/trigger_rule_dep.py

+@dataclass
+class _UpstreamTIStates:


Is this just to have the default? Dataclass is pretty significantly slower and not really worthwhile here since the class is unpacked pretty much immediately. Better to stick to a named tuple.

uranusjr · 2023-08-30T09:08:57Z

airflow/ti_deps/deps/trigger_rule_dep.py

+        if ti.task.is_teardown:
+            setup_upstream_tasks = [task for task in ti.task.upstream_list if task.is_setup]
+        else:
+            setup_upstream_tasks = list(ti.task.get_upstreams_only_setups())


Suggested change

if ti.task.is_teardown:

setup_upstream_tasks = [task for task in ti.task.upstream_list if task.is_setup]

else:

setup_upstream_tasks = list(ti.task.get_upstreams_only_setups())

if ti.task.is_teardown:

setup_upstream_tasks = (task for task in ti.task.upstream_list if task.is_setup)

else:

setup_upstream_tasks = ti.task.get_upstreams_only_setups()

No need to build a list here as far as I can tell

uranusjr · 2023-08-30T09:11:54Z

airflow/ti_deps/deps/trigger_rule_dep.py

+            elif not upstream_setup or setup_done:
+                # if there are no upstream setup tasks or all of them are done,
+                # and we haven't set a new state, then we can check the upstream tasks


I’d do

elif upstream_setup and not setup_down: pass

and dedent the entire block below. (Not sure if I got the boolean negation right)

hussein-awala · 2023-09-10T10:36:55Z

Fixed by an alternative solution by #33903

Fix waiting setup tasks when they are not a direct upstream

b7833db

hussein-awala added the type:bug-fix Changelog: Bug Fixes label Aug 20, 2023

hussein-awala requested a review from dstandish August 20, 2023 20:26

hussein-awala requested a review from uranusjr as a code owner August 20, 2023 20:26

eladkal added this to the Airflow 2.7.1 milestone Aug 23, 2023

dstandish reviewed Aug 24, 2023

View reviewed changes

hussein-awala marked this pull request as draft August 24, 2023 20:52

hussein-awala added 2 commits August 26, 2023 01:25

- Fix a bug with mapped tasks

7ea522c

- update the method which find all upstream setup tasks - update some code according to code review

Merge branch 'main' into fix/setup_task_deps

b53a84d

hussein-awala added use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) and removed use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) labels Aug 25, 2023

Fix a bug and failed tests

aed786a

hussein-awala marked this pull request as ready for review August 26, 2023 00:38

hussein-awala requested review from XD-DENG, ashb and kaxil as code owners August 26, 2023 00:38

dstandish added 2 commits August 26, 2023 08:28

reduce diffs by making setup_upstream_tasks optional

5ea28b6

if a task has no upstreams, there cannot be an upstream setup task, e…

4d7059c

…ven indirect

dstandish reviewed Aug 26, 2023

View reviewed changes

dstandish added 3 commits August 26, 2023 08:34

add indirect upstream tests as separate test

b7a4c61

Merge branch 'main' into fix/setup_task_deps

29412bc

remove new tests from existing test

45d5ae2

dstandish reviewed Aug 26, 2023

View reviewed changes

dstandish and others added 6 commits August 26, 2023 09:07

remove setup_upstream_tasks from evaluate_trigger_rule and restore TR…

8308a40

….ALWAYS behavior

remove comment

4101b25

fix trigger_rule_dep test

c85cd23

Split upstream and setup upstream in two lists to fix a bug in trigge…

f9262ec

…r rule dep check

Make direct setup task respect trigger rule

d01e802

Check setup tasks before the direct upstream tasks

e8cddfb

dstandish reviewed Aug 27, 2023

View reviewed changes

airflow/models/dag.py Outdated Show resolved Hide resolved

dstandish reviewed Aug 27, 2023

View reviewed changes

airflow/models/dag.py Outdated Show resolved Hide resolved

dstandish and others added 4 commits August 27, 2023 14:22

revert some changes to tests for easier review

32add9e

Apply suggestions from code review

8e21572

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>

fix unit tests and static checks

9695093

remove comment

2591dd3

ephraimbuddy modified the milestones: Airflow 2.7.1, Airflow 2.7.2 Aug 28, 2023

hussein-awala and others added 5 commits August 28, 2023 19:36

calculate number of setup tasks which we should wait for from TIs ins…

999dd99

…tead of tasks list

small simplification

9eddecb

simplify by using existing structure and convert to set

6cb5e5e

Optimise the perf by getting the count of the relevant TI instead of …

f6e3d56

…getting all tasks

Check if the new method fixes the tests

831933f

uranusjr reviewed Aug 30, 2023

View reviewed changes

dstandish mentioned this pull request Aug 30, 2023

Ensure that tasks wait for running indirect setup #33903

Merged

hussein-awala closed this Sep 10, 2023

ephraimbuddy removed this from the Airflow 2.7.2 milestone Oct 7, 2023

Conversation

hussein-awala commented Aug 20, 2023

Uh oh!

dstandish commented Aug 24, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dstandish commented Aug 24, 2023

Uh oh!

hussein-awala commented Aug 24, 2023

Uh oh!

dstandish commented Aug 24, 2023

Uh oh!

dstandish commented Aug 26, 2023

Uh oh!

dstandish commented Aug 26, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dstandish commented Aug 26, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hussein-awala commented Sep 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants