-
Notifications
You must be signed in to change notification settings - Fork 16.4k
[AIRFLOW-5172] Add choice of interval edge scheduling #5787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -426,8 +426,8 @@ def manage_slas(self, dag, session=None): | |
| if isinstance(task.sla, timedelta): | ||
| dttm = dag.following_schedule(dttm) | ||
| while dttm < timezone.utcnow(): | ||
| following_schedule = dag.following_schedule(dttm) | ||
| if following_schedule + task.sla < timezone.utcnow(): | ||
| period_end = dag.period_end(dttm) | ||
| if period_end + task.sla < timezone.utcnow(): | ||
| session.merge(SlaMiss( | ||
| task_id=ti.task_id, | ||
| dag_id=ti.dag_id, | ||
|
|
@@ -595,7 +595,7 @@ def create_dag_run(self, dag, session=None): | |
| # one period before, so that timezone.utcnow() is AFTER | ||
| # the period end, and the job can be created... | ||
| now = timezone.utcnow() | ||
| next_start = dag.following_schedule(now) | ||
| next_start = dag.period_end(now) | ||
| last_start = dag.previous_schedule(now) | ||
| if next_start <= now: | ||
| new_start = last_start | ||
|
|
@@ -648,7 +648,7 @@ def create_dag_run(self, dag, session=None): | |
| if dag.schedule_interval == '@once': | ||
| period_end = next_run_date | ||
| elif next_run_date: | ||
| period_end = dag.following_schedule(next_run_date) | ||
| period_end = dag.period_end(next_run_date) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it seems there is an inconsistency in the language here. The function is called The name That's why the code checks that But language of this PR is in conflict with that. The parameter It seems that what So in this bit of code, we probably don't even need And this: But elsewhere, it seems that
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nomenclature has absolutely been a pain in this change. The entire vocabulary around scheduling seems to have exploded, and I can't find a clear definition for all the terms. This is well outside the scope of this change, but having well defined terms for things like: period_start_ts - Timestamp of the start of the period to be processed Many of the current names, like start_date, are datetimes, not dates. It's also not immediately obvious what "start" means: is it period_start_ts, or execution_ts? I'm happy to change the functions to be whatever is popular, but I think there'd be real value in defining, clearly, and in a single spot, all of the timestamps around a particular DAG's execution, giving them meaningful names, and consolidating on that.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah i'm not saying you are responsible for all confusing naming, just suggesting not to add to it ;)
And I think it is definitely in scope to change the local |
||
|
|
||
| # Don't schedule a dag beyond its end_date (as specified by the dag param) | ||
| if next_run_date and dag.end_date and next_run_date > dag.end_date: | ||
|
|
@@ -1214,7 +1214,7 @@ def _process_dags(self, dagbag, dags, tis_out): | |
| if not dag.is_subdag: | ||
| dag_run = self.create_dag_run(dag) | ||
| if dag_run: | ||
| expected_start_date = dag.following_schedule(dag_run.execution_date) | ||
| expected_start_date = dag.period_end(dag_run.execution_date) | ||
| if expected_start_date: | ||
| schedule_delay = dag_run.start_date - expected_start_date | ||
| Stats.timing( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the default to be false as this is the intuitive / common understanding on how Airflow scheduling works.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is designed to have zero impact to current behaviours, to avoid surprises for devops everywhere and to increase the chance that the PR could be accepted for the next 1.x.x stable release.
Would it make more sense to leave this PR as is, and then submit a second PR to make the default False at a later date, for 2.0?