-
Notifications
You must be signed in to change notification settings - Fork 16.4k
infer multiple_output from return type annotation #10349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
TODO
|
airflow/operators/python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support also set? Probably it would be awesome to support any iterable... WDYT? @casassg @evgenyshulman @jonathanshir
|
Thanks @AndersonReyes for taking interest in this issue! This looks good but we need to adjust this logic of XCom storage: airflow/airflow/operators/python.py Lines 243 to 254 in 382c101
Also the crucial part of this change is allowing users to reference those returned values before executing a task. For example this should work: @task
def task1() -> Tuple[str, int]:
return "magic", 42
@task
def task2(a: str, b: int) -> None:
pass
a, b = task1()
task2(a, b)And this means that we have to implement some smart But I think we can limit the scope of this PR to just resolve the |
|
I'm thinking For handling tuples in execute, the easiest thing would be to make the key the position of each iterable item. And be able to access by index Maybe? Hmmm or just add entire tuple to default XCOM key and using that . I think to stay consistent if it's a tuple just store each item with key as position , and to get item to pass the index in |
|
I agree that's the simplest approach to use is something like |
|
I can't quite figure a clean unpacking of the xcomargs without knowing the size of the output in advance. Right now i have this which is not clean, infer the number of outputs from the typing and pass that to the _PythonOperator but still brainstorming on how to do that unpacking or if something else got a solution will prob leave this for another pr. and the iter for XcomArg |
airflow/models/xcom_arg.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Implements xcomresult['some_result_key' | |
| Implements xcomresult['some_result_key'] |
airflow/operators/python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.xcom_push(context, str(i), value) | |
| self.xcom_push(context, f"return_value_{i}", value) |
How about something more informative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha fixed that, but wouldn't you want to index xcomarg though? like output[0] vs output["return_value_0"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when you think about it anywhere you have to index each item to pass to another task is better to take a generic iterable instead and just pass entire container output instead of each individual item so I see your point
|
Regarding supporting any My 2 cents on adding tuple support: Makes code quite more complex over not enough high value. Finally, should we add a custom class to make this more explicit? aka you need to import |
airflow/operators/python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about using this parenthesis here. Does this not fit in a single line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah that's just left over noise, originally had complex check before I found the is_container func existed. I'll clean it up
|
not sure whats up with the one test case getting exit 137 but cant tell if its me or github actions |
|
Quarantined tests do timoeout from time to time. |
airflow/operators/python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| multiple_outputs: Union[None, bool] = None, | |
| multiple_outputs: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh almost missed this good catch
I agree. Let's just add |
|
Note: You will need to update tests. Now only |
|
makes sense ill make the updates 👍 |
tests/operators/test_python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add back one of these tests to make sure it's not inferred?
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
@casassg @dimberman @turbaszek I pretty much forgot about this since it's crazy at work and I'm sure ya got your own stuff happening also but is there anything else to do for this PR? Or has it been implemented elsewhere and I can just close this one? |
|
@AndersonReyes we will need a note in documentation expelling what types will be treated as multiple_outpus and how users should use type hinting |
docs/tutorial_taskflow_api.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Not, If you manually set the ``multiple_outputs`` parameter the inference is disabled and | |
| Note, if you manually set the ``multiple_outputs`` parameter the inference is disabled and |
Or did you have something else in your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah definitely a typo
airflow/operators/python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sig = signature(python_callable).return_annotation | |
| ttype = getattr(sig, "__origin__", None) | |
| if sig != inspect.Signature.empty and ttype in (dict, Dict): | |
| multiple_outputs = True | |
| sig = signature(python_callable).return_annotation | |
| ttype = getattr(sig, "__origin__", None) | |
| multiple_outputs = sig != inspect.Signature.empty and ttype in (dict, Dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
airflow/operators/python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be able to skip this line? None has boolean value of False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do the patch suggestion you have above, removing the if statement would set it to false anyways so yeah the line is not needed
turbaszek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @AndersonReyes ! 👏
|
The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it! |
|
@AndersonReyes could you rebase onto latest master please? |
36ff36c to
d82bf07
Compare
yessir, squashed first to avoid merge conflicts 14 times for each commit lol |
|
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
d82bf07 to
f34fdc1
Compare
Good thinking 😄 |
|
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
|
nvm rebased one more time the static check readme error was fixed on master |
f34fdc1 to
3b24347
Compare
|
Hello. Is this something we really want for 2.0.0rc1? If not - can someone set the right milestone? Or maybe rebase and merge since it is approved already :) ? |
|
Awesome work, congrats on your first merged pull request! |
closes: #8996
using type hints to infer multiple outputs when using task decorator
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.