AIRFLOW-124 Implement create_dagrun #1506

bolkedebruin · 2016-05-17T08:10:34Z

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:

AIRFLOW-124

This PR forms the basis for the roadmap (https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg) . It is one of the initial commits from master...bolkedebruin:AIRFLOW_SCHEDULER .

DAG.create_dagrun (please review: @aoen, @jlowin, @mistercrunch, @artwr )

This creates dagrun from a Dag. It also creates the TaskInstances from the tasks known at instantiation time. By having taskinstances created at dagrun instantiation time, deadlocks that were tested for will not take place anymore (@jlowin, correct? different test required?). For now, the visual consequence of having these taskinstances already there is that they will be black in the tree view.

Tests in core.py were adjusted as they were supposedly creating a dagrun with tasks, while they were actually creating dagruns and orphaned TaskInstances (ie. the dag_id was not matching the dag_id from the dagrun). This was discussed with @artwr, who said these were remnants from the past.

By doing this I also fixed an issue in models.py that a dag was not set for a task if called from dag.add_task (@aoen, @jlowin).

DagRun.find is a convenience function that returns the DagRuns for a given dag. It makes sure to have a single place how to find dagruns

DagRun.find does not work for all cases yet (ie multiple execution dates etc). My aim is to limit the ORM required in sub functions that implement somethings just a bit differently creating compatibility issues.

criccomini · 2016-05-17T17:18:18Z

airflow/models.py

Should there only ever by one dag run with a given dag_id/execution_date/run_id? If so, wonder if we want to fail, or log grumpily if we get more than one response back.

It refreshes itself so no we cannot have more. Maybe one() would be better.

criccomini · 2016-05-17T17:21:16Z

Couple of nits/questions, but LGTM overall. Take my opinion with a grain of salt, since my scheduler knowledge is unfortunately a bit lacking.

aoen · 2016-05-17T18:58:40Z

airflow/models.py

Nit: order of params is backwards (state should come before start_date), and conf/session are missing

This adds the create_dagrun function to DAG and the staticmethod DagRun.find. create_dagrun will create a dagrun including its tasks. By having taskinstances created at dagrun instantiation time, deadlocks that were tested for will not take place anymore. Tests have been adjusted accordingly. In addition, integrity has been improved by a bugfix to add_task of the BaseOperator to make sure to always assign a Dag if it is present to a task. DagRun.find is a convenience function that returns the DagRuns for a given dag. It makes sure to have a single place how to find dagruns.

jlowin · 2016-05-18T16:34:20Z

@bolkedebruin LGTM

aoen · 2016-05-18T17:38:39Z

LGTM after commented added

artwr · 2016-05-19T17:19:44Z

airflow/models.py

+
+    @staticmethod
+    @provide_session
+    def find(dag_id, run_id=None, state=None, external_trigger=None, session=None):


Thanks for encapsulating the logic by the way, this is amazing.

mistercrunch · 2016-05-20T16:53:15Z

LGTM except for the unnecessary if under the sqlaclehmy's one() call
http://docs.sqlalchemy.org/en/rel_1_0/orm/query.html#sqlalchemy.orm.query.Query.one

bolkedebruin · 2016-05-21T10:27:05Z

@mistercrunch thanks. There is one operational regression (also heads up @aoen, @jlowin, @artwr) due to the eager creation of task instances. In its current form the scheduler will evaluate all State.NONE task instances and thus due to eager creation it will take more time to evaluate them.

I'm addressing this issue in a follow up PR (almost done/ready for review).

criccomini reviewed May 17, 2016
View reviewed changes

aoen reviewed May 17, 2016
View reviewed changes

bolkedebruin force-pushed the dag_run branch from 80e0120 to cb56289 Compare May 18, 2016 10:31

asfgit merged commit cb56289 into apache:master May 18, 2016

artwr reviewed May 19, 2016
View reviewed changes

criccomini mentioned this pull request Jun 15, 2016

[AIRFLOW-184] Add mark success to CLI #1590

Closed

sunank200 mentioned this pull request Feb 22, 2024

Refactor dataset class inheritance #37590

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIRFLOW-124 Implement create_dagrun #1506

AIRFLOW-124 Implement create_dagrun #1506

Uh oh!

bolkedebruin commented May 17, 2016 •

edited

Loading

Uh oh!

criccomini May 17, 2016

Uh oh!

bolkedebruin May 17, 2016

Uh oh!

criccomini commented May 17, 2016

Uh oh!

aoen May 17, 2016 •

edited

Loading

Uh oh!

jlowin commented May 18, 2016

Uh oh!

aoen commented May 18, 2016

Uh oh!

artwr May 19, 2016

Uh oh!

mistercrunch commented May 20, 2016 •

edited

Loading

Uh oh!

bolkedebruin commented May 21, 2016 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

AIRFLOW-124 Implement create_dagrun #1506

AIRFLOW-124 Implement create_dagrun #1506

Uh oh!

Conversation

bolkedebruin commented May 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

criccomini May 17, 2016

Choose a reason for hiding this comment

Uh oh!

bolkedebruin May 17, 2016

Choose a reason for hiding this comment

Uh oh!

criccomini commented May 17, 2016

Uh oh!

aoen May 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlowin commented May 18, 2016

Uh oh!

aoen commented May 18, 2016

Uh oh!

artwr May 19, 2016

Choose a reason for hiding this comment

Uh oh!

mistercrunch commented May 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bolkedebruin commented May 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bolkedebruin commented May 17, 2016 •

edited

Loading

aoen May 17, 2016 •

edited

Loading

mistercrunch commented May 20, 2016 •

edited

Loading

bolkedebruin commented May 21, 2016 •

edited

Loading