WIP/Idea: Pass task output as outlet to dataset trigger params#37888
WIP/Idea: Pass task output as outlet to dataset trigger params#37888jscheffl wants to merge 1 commit intoapache:mainfrom
Conversation
|
FYI @jedcunningham @uranusjr WDYT? (I thought it is more complex but walking through the code looks quite simple...) |
airflow/jobs/scheduler_job_runner.py
Outdated
There was a problem hiding this comment.
This feels a bit too heavy-handed, but I like the idea of passing in event extras as the downstream DAG run parameters. (Should we use conf or params for this?)
There was a problem hiding this comment.
The code was just an idea - not finally thought through. If you say "heavy handed"... what do you mean? too brute force and the user does not know what comes out? Or do you mean we need a better merging mechanism? Or some hooking to be able to inject a custom merging strategy? Or just "code is ugly" :-D
Background: I would assume that in 90% of cases a single DAG triggers a dataset. THeremight be cases where multiple events come together to trigger. In such case we need to merge extras. I'd assume most times it is "conflict free" but you never know. Might be a feature to have it "last property wins" to collect events but otherwise if users feel there are too many conflicts, individual extras can also be produced "conflict free" with individual keys.
conf vs. params:
Yes, params and dag_run.conf somehow should be merged. I believe this is a leftover in the API from the past. CONF is the dict which is used to trigger a DAG. The conf is persisted as blob with the DagRun.
During runtime the conf is available in the context as dict, representing 1:1 the conf used to trigger. No validation. Just a dict.
params in contrast have default values, conf is setting values on top and the result is JSON validated.
Both üaramsand confare available in the context and can be used. I believre mid-term we should deprecate the usage of conf in the DAG and consolidate to the (more and better functional) params. But for today params only exist during runtime.
@hussein-awala did an attempt here but it dd not make it to finish line: #29174
cb3514a to
f00f806
Compare
|
My organization messed-up the airflow repo Fork - data is gone - will need to re-open the PR later when recovered :-( |
|
Repo at Bosch was restored, re-opening discussion :-D |
Ufff |
|
Other PR super-seeds this. |
This PR is a WIP proposal to fix/resolve the request for feature #37810
NOTE: It is just a code preview, therefore WIP.
Idea:
extra(if not provided in Dataset reference)extrause this as paramsextrato the data triggered DAG asparamsOpen items:
closes: #37810