Skip to content

Conversation

@PeterAttardo
Copy link

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:

).first()
if ti:
if self.state in State.runnable():
self.set_state(State.FUTURE_SUCCEEDED, session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an is_ method change state is a little unintuitive. @bolkedebruin just did some scheduler work to clean up this kind of pattern. Can you split the is_ method from the set call, so that change state is explicit, rather than done via what appears to be a read method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's not the cleanest, but I'm not sure how easily it can be extracted. It behaves very similarly to evaluate_trigger_rule() which gets called from are_dependencies_met() in the same chain. Is there a work in progress branch anywhere that shows how state assignments in evaluate_trigger_rule() were brought out of the is_ stack that I could reference? The other change I could make would be to add a flag similar to flag_upstream_failed so that state would only change if the original caller explicitly passed through the flag enabling it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also just change the name of the method to reflect what is taking place

@criccomini
Copy link
Contributor

One thing that I notice about this PR is that it doesn't appear to try to execute the most recent execution_date first in a case where there are multiple DAG runs that need to be run. For example, if you have 10 DAG runs that all need to be run, and only_run_latest=True, then it makes sense to run the most recent one first and skip the other 9.

@bolkedebruin @jlowin @mistercrunch Do we want to introduce a new state for this (FUTURE_SUCCEEDED) or just use SKIPPED?

This PR should include some tests as well.

@PeterAttardo
Copy link
Author

I considered the issue to be two related issues:

  1. For any task B, which depends on task A (marked as only_run_latest), the dependencies for an instance of B should be met if any instance of A has succeeded on or after the execution date.
  2. Avoid scheduling multiple instances of a task that has been marked as only_run_latest and prioritize the most recent execution date.

This PR only addresses the first of those. The second piece raised many more architectural questions and potential changes to the way Airflow schedules jobs. I didn't want to commit to a potentially large architectural change before seeing how the maintainers were thinking about designing for this feature.

TaskInstance.task_id.in_(task.downstream_task_ids),
TaskInstance.execution_date == self.execution_date,
TaskInstance.state == State.SUCCESS,
TaskInstance.state == State.SUCCESS or TaskInstance.state == State.FUTURE_SUCCEEDED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskInstance.state.in_((State.SUCCESS, State.FUTURE_SUCCEEDED))

@mistercrunch
Copy link
Member

Before I go further in the review, I think this may conflict with #1525. @aoen?

@aoen
Copy link
Contributor

aoen commented Jun 2, 2016

Yes it conflicts, thanks for finding this Max. Under the new model (you can see in my PR that max linked #1525 ) you would create a dependency class for future succeeded.

@PeterAttardo
Copy link
Author

#1525 looks like it would allow this functionality to slot in nicely. I can circle back to this issue once #1525 has been merged and approach it in the new idiomatic way.

@r39132
Copy link
Contributor

r39132 commented Sep 28, 2016

This has been resolved by add a LatestOnlyOperator in #1752

@asfgit asfgit closed this in 5a8a448 Sep 28, 2016
alekstorm pushed a commit to alekstorm/incubator-airflow that referenced this pull request Jun 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants