-
Notifications
You must be signed in to change notification settings - Fork 16.4k
AIP-82 Send asset change event when trigger fires #44369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
On trigger restart or reassignment to another triggerer process the coroutine is cancelled and a check for Traceback on trying this out locally with sample dag and ctrl+c to stop the triggerer Checking for task_instance on trigger diff --git a/airflow/jobs/triggerer_job_runner.py b/airflow/jobs/triggerer_job_runner.py
index 8c226334f7..d7e5dbc6b1 100644
--- a/airflow/jobs/triggerer_job_runner.py
+++ b/airflow/jobs/triggerer_job_runner.py
@@ -641,7 +641,7 @@ class TriggerRunner(threading.Thread, LoggingMixin):
self.triggers[trigger_id]["events"] += 1
self.events.append((trigger_id, event))
except asyncio.CancelledError:
- if timeout := trigger.task_instance.trigger_timeout:
+ if timeout := (trigger.task_instance and trigger.task_instance.trigger_timeout):
timeout = timeout.replace(tzinfo=timezone.utc) if not timeout.tzinfo else timeout
if timeout < timezone.utcnow():
self.log.error("Trigger cancelled due to timeout")Sample dag : from __future__ import annotations
from datetime import datetime
from airflow import DAG
from airflow.operators.empty import EmptyOperator
from airflow.triggers.file import FileTrigger
from airflow.sdk.definitions.asset import Asset
trigger = FileTrigger(filepath="/tmp/a")
asset = Asset("test_asset_1", watchers=[trigger])
with DAG(
dag_id="file_trigger_timeout",
start_date=datetime(2021, 1, 1),
catchup=False,
schedule=[asset],
) as dag:
t1 = EmptyOperator(task_id="t1")
t1 |
|
There are Edit : Just saw #42514 to handle cleanup so my message might not be valid. Please ignore if not needed. Thanks. |
Nice! Thanks for the catch! I'll apply that |
I think it is fine here because when I read the examples you provided, you cancel external job (external from Airflow) if the task instance is not in deferred state. All the logic here implemented is specific to deferrable operators and should not overlap with this feature. At first I thought the triggers were being cleaned up but here it is external jobs, I dont see it overlapping. Triggers would be another story. But thanks for heads-up! #42514 is being resolved as part of this PR, therefore the logic handling the trigger cleaned-up is done in that PR (at least on my perspective). So if you think something is missing or off, please call it out :) |
a1550e7 to
9311c51
Compare
It should be fixed now |
adca98c to
cd3a16f
Compare
4fc16a2 to
266e687
Compare
266e687 to
4e06931
Compare
gopidesupavan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over all LGTM :)
One question , do we have to re think triggers default_capacity config , with the new event driven it shares the triggers capacity config. so is it okay to have a single config for regular triggers and event driver triggers?
I think so and to me it makes sense. The config The only issue with this approach is, triggers from event driven scheduling can max out the triggerer (if many many triggers are used to update assets). Event driven scheduling would become a noisy neighbour for deferrable tasks. That'd mean that no deferrable task could run unless some of the triggers used to update the assets (all that defined in DAGs) are removed. Knowing that the default value of |
|
There was a feature request and PR open to assign a specific queue to trigger and triggerer so that the specific triggerer instance listens to specific type of triggers but it's not active. Maybe the feature could be useful here once available to run asset related triggers in a separate triggerer instance and not to be part of the normal triggers. |
|
I tested this feature locally and will be useful at work once we upgrade for certain use cases. I am looking forward to how the "infinite scheduling" part is handled in future as noted in the AIP which will further improve usability for us. Thanks @vincbeck . |
Sure, that makes sense. I've heard from a couple of people two or three instances where they run thousands of triggers in their data pipelines, and am sure these people will leverage this awesome event driven feature effectively once it up, I just wanted to bring up that point here. :) |
I agree, that could be indeed useful for that feature. This is definitely something we can do later based on feedbacks/comments from users |
Thanks for testing it!! |
And thanks for doing it :) All these feedbacks/points are very useful when implementing features so thank you for doing it :) |
Resolves #42513 and #42514.
The purpose of this PR is to send events to assets when some watchers are associated to them. Example:
As part of AIP-82, it is now possible to associate
watchersto an asset. These watchers are triggers. By associating a trigger to an asset, whenever the trigger fires, the goal is to update the asset. This PR handles that part, when a trigger fires, it updates its associated triggers.I also updated the logic behind how the triggers are cleaned up by the triggerer. These triggers that are associated to an assets long as long as the association between the trigger and the asset is defined in the DAG.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.