-
Notifications
You must be signed in to change notification settings - Fork 16.4k
[AIRFLOW-198] Implement latest_only_operator #1752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 64.70% (diff: 0.00%)@@ master #1752 diff @@
==========================================
Files 128 129 +1
Lines 9630 9656 +26
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 6249 6248 -1
- Misses 3381 3408 +27
Partials 0 0
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra '
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.. it's causing Landscape.io to show an issue. Our rule is that landscape must be green for merge. It both travis and landscape take a long time to run, you can also run them in your fork.
|
Please include unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this? I'm curious about the context['ti'].start_date. Why not use datetime's now()? What is the start_date of ti set to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TaskInstance.run sets start_date to be datetime.now(): https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1185
There are also tests using freezegun provided with the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to use start_date instead of datetime.now() is because start_date gets stored in the metadata db so we can see it in the UI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out https://gist.github.com/r39132/ad9601e5d056f655936c0832ca772d6d
start_date won't work. The problem is that it's fixed and doesn't change with your tasks, especially, if passed via default_args, such as shown in https://gist.github.com/r39132/2e6e8353d42852cdb8f7717a3b1998b9
In other words, it's set one time for most people, at the time the dag is first parsed.
Also, the computation of left and right doesn't seem right.
For an hourly dag, context['execution_date'] of noon will run at 1p. The left_window would be noon, the right_window would be 1pm and now would be a few seconds past 1p.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right; airflow time is still super confusing at times.
|
Can you add this to an example dag? Also, can you provide a screen shot to accompany it running in the sample dag, so others get a sense of how it works and how to use it. Finally, please update the docs by adding a section for this new behavior : perhaps under https://airflow.incubator.apache.org/concepts.html#additional-functionality -s |
924cfdf to
76f5213
Compare
|
Docs updated. Example provided. |
|
@gwax I have not been able to get your example to work. Making it run every hour, it stuck in the first dagrun. |
|
I was also seeing the slow execution as well but I've been having enough On Wed, Sep 21, 2016, 10:14 PM Sid Anand notifications@github.com wrote:
|
|
After removing the |
|
Done. |
r39132
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a dag_id name conflict with the other example.
Perhaps call this one latest_only_with_trigger.
docs/concepts.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name here should match the dag name in the examples... so latest_only_with_trigger. You should change the image name too.
|
@gwax after you make the changes above, it should be good to merge. |
247f4a0 to
d7b7f1f
Compare
|
@r39132 : Done. |
|
@gwax thx. Now the only issue is that all of the tests are failing. |
|
Simple fix for this . Change The reason for this check I believe is to detect silent failures in importing examples. |
|
Sigh... done. |
|
+1 |
|
@gwax the problem with the "slowness of the scheduler" and other issues that you and I noticed are being tracked in another jira. The work-around for it is to add |
|
Looks good and works for me. I have 2 sample DAGs of my own. Here are some screenshots : The code for The tree view shows that it skipped 639 runs until the latest one, which it did execute, is shown below: The code for The tree view shows that it skipped 639 runs until the latest one, which it did execute, is shown below: |
|
@gwax Thank you for this contribution. It works well and I will use it in production in few days! • If my • My second point is about manual trigger with a defined So, this operator make impossible to manually trigger a DAG. Do you have any workaround ? |
|
I tried using this operator in production; however, I found some issues. For context, I'm using the I noticed in my logs for a single DAG run that it appeared the DAG run was reenqueued many times. What's interesting is that the DAG status was reporting as success; however, when I checked my Celery broker's queue, there appeared to have been thousands of tasks enqueued. Will attempt to do some more investigation. |
|
@sudowork Wow! Very interesting! Please open a JIRA for this. @gwax FYI! Also, @btallman since you use trigger_dag, refer to @poulainv 's comment above. |
|
@poulainv FYI.. if an issue like this occurs in the future and you don't get a response from the original PR submitter by commenting on the PR, send an email to dev@airflow.incubator.apache.org and someone else might be able to fix it. |
|
@r39132 Done! https://issues.apache.org/jira/browse/AIRFLOW-648 Let me know if there's more info you'd like me to provide. |
|
@sudowork Thx. I won't be able to take that bug on.. maybe @gwax can I'm fixing https://issues.apache.org/jira/browse/AIRFLOW-649 <-- @poulainv |
|
@r39132 thanks to handle this! |
|
I'll see if I can find some time over Thanksgiving. |
|
@poulainv I ran into this use case today for needing to manually trigger a DAG that has the |
|
Thanks @r39132 ! Just tested and verified it's working for me. |
|
Great! |
|
Cool thanks! |
Dear Airflow Maintainers, Please accept this PR that addresses the following issues: - https://issues.apache.org/jira/browse/AIRFLOW-198 Testing Done: - Local testing of dag operation with LatestOnlyOperator - Unit test added Closes apache#1752 from gwax/latest_only










Dear Airflow Maintainers,
Please accept this PR that addresses the following issues:
Testing Done: