[AIRFLOW-198] Implement 'only_run_latest' feature in BaseOperator #1562

PeterAttardo · 2016-05-31T20:22:58Z

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:

https://issues.apache.org/jira/browse/AIRFLOW-198

criccomini · 2016-06-01T15:24:49Z

airflow/models.py

+        ).first()
+        if ti:
+            if self.state in State.runnable():
+                self.set_state(State.FUTURE_SUCCEEDED, session)


Having an is_ method change state is a little unintuitive. @bolkedebruin just did some scheduler work to clean up this kind of pattern. Can you split the is_ method from the set call, so that change state is explicit, rather than done via what appears to be a read method?

Agreed that it's not the cleanest, but I'm not sure how easily it can be extracted. It behaves very similarly to evaluate_trigger_rule() which gets called from are_dependencies_met() in the same chain. Is there a work in progress branch anywhere that shows how state assignments in evaluate_trigger_rule() were brought out of the is_ stack that I could reference? The other change I could make would be to add a flag similar to flag_upstream_failed so that state would only change if the original caller explicitly passed through the flag enabling it.

you can also just change the name of the method to reflect what is taking place

criccomini · 2016-06-01T15:34:42Z

One thing that I notice about this PR is that it doesn't appear to try to execute the most recent execution_date first in a case where there are multiple DAG runs that need to be run. For example, if you have 10 DAG runs that all need to be run, and only_run_latest=True, then it makes sense to run the most recent one first and skip the other 9.

@bolkedebruin @jlowin @mistercrunch Do we want to introduce a new state for this (FUTURE_SUCCEEDED) or just use SKIPPED?

This PR should include some tests as well.

PeterAttardo · 2016-06-01T18:55:28Z

I considered the issue to be two related issues:

For any task B, which depends on task A (marked as only_run_latest), the dependencies for an instance of B should be met if any instance of A has succeeded on or after the execution date.
Avoid scheduling multiple instances of a task that has been marked as only_run_latest and prioritize the most recent execution date.

This PR only addresses the first of those. The second piece raised many more architectural questions and potential changes to the way Airflow schedules jobs. I didn't want to commit to a potentially large architectural change before seeing how the maintainers were thinking about designing for this feature.

mistercrunch · 2016-06-02T06:21:31Z

airflow/models.py

            TaskInstance.task_id.in_(task.downstream_task_ids),
            TaskInstance.execution_date == self.execution_date,
-            TaskInstance.state == State.SUCCESS,
+            TaskInstance.state == State.SUCCESS or TaskInstance.state == State.FUTURE_SUCCEEDED,


TaskInstance.state.in_((State.SUCCESS, State.FUTURE_SUCCEEDED))

mistercrunch · 2016-06-02T06:23:37Z

Before I go further in the review, I think this may conflict with #1525. @aoen?

aoen · 2016-06-02T06:39:41Z

Yes it conflicts, thanks for finding this Max. Under the new model (you can see in my PR that max linked #1525 ) you would create a dependency class for future succeeded.

PeterAttardo · 2016-06-02T17:25:53Z

#1525 looks like it would allow this functionality to slot in nicely. I can circle back to this issue once #1525 has been merged and approach it in the new idiomatic way.

r39132 · 2016-09-28T00:25:56Z

This has been resolved by add a LatestOnlyOperator in #1752

Implement 'only_run_latest' feature in BaseOperator

aed7744

criccomini reviewed Jun 1, 2016
View reviewed changes

PeterAttardo added 2 commits June 1, 2016 12:54

Logging and docstring

5adc7fc

Merge remote-tracking branch 'upstream/master'

e051eb6

mistercrunch reviewed Jun 2, 2016
View reviewed changes

Fix query syntax

9862e1d

asfgit closed this in 5a8a448 Sep 28, 2016

alekstorm pushed a commit to alekstorm/incubator-airflow that referenced this pull request Jun 1, 2017

closes apache#1562 *fixed by another pr*

383b533

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIRFLOW-198] Implement 'only_run_latest' feature in BaseOperator #1562

[AIRFLOW-198] Implement 'only_run_latest' feature in BaseOperator #1562

Uh oh!

PeterAttardo commented May 31, 2016

Uh oh!

criccomini Jun 1, 2016

Uh oh!

PeterAttardo Jun 1, 2016

Uh oh!

mistercrunch Jun 2, 2016

Uh oh!

criccomini commented Jun 1, 2016

Uh oh!

PeterAttardo commented Jun 1, 2016

Uh oh!

mistercrunch Jun 2, 2016

Uh oh!

mistercrunch commented Jun 2, 2016

Uh oh!

aoen commented Jun 2, 2016

Uh oh!

PeterAttardo commented Jun 2, 2016

Uh oh!

r39132 commented Sep 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AIRFLOW-198] Implement 'only_run_latest' feature in BaseOperator #1562

[AIRFLOW-198] Implement 'only_run_latest' feature in BaseOperator #1562

Uh oh!

Conversation

PeterAttardo commented May 31, 2016

Uh oh!

criccomini Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

PeterAttardo Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

mistercrunch Jun 2, 2016

Choose a reason for hiding this comment

Uh oh!

criccomini commented Jun 1, 2016

Uh oh!

PeterAttardo commented Jun 1, 2016

Uh oh!

mistercrunch Jun 2, 2016

Choose a reason for hiding this comment

Uh oh!

mistercrunch commented Jun 2, 2016

Uh oh!

aoen commented Jun 2, 2016

Uh oh!

PeterAttardo commented Jun 2, 2016

Uh oh!

r39132 commented Sep 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants