infer multiple_output from return type annotation #10349

AndersonReyes · 2020-08-16T00:47:33Z

closes: #8996

using type hints to infer multiple outputs when using task decorator
^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

boring-cyborg · 2020-08-16T00:47:35Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://apache-airflow-slack.herokuapp.com/

AndersonReyes · 2020-08-16T01:42:24Z

TODO

forgot to handle functions without type annotations for return type. Need to check if sig is set to Signature.empty

turbaszek · 2020-08-16T04:10:56Z

airflow/operators/python.py

Should we support also set? Probably it would be awesome to support any iterable... WDYT? @casassg @evgenyshulman @jonathanshir

turbaszek · 2020-08-16T04:26:32Z

Thanks @AndersonReyes for taking interest in this issue! This looks good but we need to adjust this logic of XCom storage:

airflow/airflow/operators/python.py

Lines 243 to 254 in 382c101

    
           if not self.multiple_outputs: 
        
               return return_value 
        
           if isinstance(return_value, dict): 
        
               for key in return_value.keys(): 
        
                   if not isinstance(key, str): 
        
                       raise AirflowException('Returned dictionary keys must be strings when using ' 
        
                                              f'multiple_outputs, found {key} ({type(key)}) instead') 
        
               for key, value in return_value.items(): 
        
                   self.xcom_push(context, key, value) 
        
           else: 
        
               raise AirflowException(f'Returned output was type {type(return_value)} expected dictionary ' 
        
                                      'for multiple_outputs')

Also the crucial part of this change is allowing users to reference those returned values before executing a task. For example this should work:

@task
def task1() -> Tuple[str, int]:
    return "magic", 42


@task 
def task2(a: str, b: int) -> None:
    pass


a, b = task1()
task2(a, b)

And this means that we have to implement some smart __iter__ for XComArg class because that's the return type of task1() invocation. Otherwise we get this:

    a, b = task2()
ValueError: too many values to unpack (expected 2)

But I think we can limit the scope of this PR to just resolve the multiple_output.

AndersonReyes · 2020-08-16T13:58:45Z

I'm thinking For handling tuples in execute, the easiest thing would be to make the key the position of each iterable item. And be able to access by index Maybe? Hmmm or just add entire tuple to default XCOM key and using that .

I think to stay consistent if it's a tuple just store each item with key as position , and to get item to pass the index in XcomArg but can also retrieve all of them at once using the default XCOM what ya think

turbaszek · 2020-08-17T08:02:22Z

I agree that's the simplest approach to use is something like return_value_0, return_value_1, ... as keys for tuples.

AndersonReyes · 2020-08-18T02:24:24Z

I can't quite figure a clean unpacking of the xcomargs without knowing the size of the output in advance. Right now i have this which is not clean, infer the number of outputs from the typing and pass that to the _PythonOperator but still brainstorming on how to do that unpacking or if something else got a solution will prob leave this for another pr.

def _infer_multiple_outputs(
    python_callable: Optional[Callable] = None,
    n_outputs: Optional[int] = None,
    multiple_outputs: bool = False,
) -> Tuple[bool, Union[None, int]]:
    """
    Try to infer multiple outputs and number of outputs from typing.
    This a hack really and only works for tuples.
    """
    if not python_callable:
        return multiple_outputs, n_outputs

    sig = signature(python_callable).return_annotation
    ttype = getattr(sig, "__origin__", None)

    if (
        sig != inspect.Signature.empty
        and is_container(ttype)
    ):
        multiple_outputs = True

        # see if we can infer the number of outputs
        type_args = sig.__args__
        if (not n_outputs )and (ttype in (Tuple, tuple)) and (Ellipsis not in type_args):
            n_outputs = len(type_args)

    return multiple_outputs, n_outputs


def task(
    python_callable: Optional[Callable] = None,
    multiple_outputs: bool = False,
    n_outputs: Optional[int] = None,
    **kwargs
) -> Callable[[T], T]:
    """
    Python operator decorator. Wraps a function into an Airflow operator.
    Accepts kwargs for operator kwarg. Can be reused in a single DAG.

    :param python_callable: Function to decorate
    :type python_callable: Optional[Callable]
    :param multiple_outputs: if set, function return value will be
        unrolled to multiple XCom values. List/Tuples will unroll to xcom values
        with index as key. Dict will unroll to xcom values with keys as XCom keys.
        Defaults to False.
    :type multiple_outputs: bool

    """

    multiple_outputs, n_outputs = _infer_multiple_outputs(
        python_callable=python_callable, n_outputs=n_outputs, multiple_outputs=multiple_outputs)

    def wrapper(f: T):
        """
        Python wrapper to generate PythonFunctionalOperator out of simple python functions.
        Used for Airflow functional interface
        """
        _PythonFunctionalOperator.validate_python_callable(f)
        kwargs.setdefault('task_id', f.__name__)

        @functools.wraps(f)
        def factory(*args, **f_kwargs):
            op = _PythonFunctionalOperator(python_callable=f, op_args=args, op_kwargs=f_kwargs,
                                           multiple_outputs=multiple_outputs, n_outputs=n_outputs,
                                           **kwargs)
            return XComArg(op)
        return cast(T, factory)
    if callable(python_callable):
        return wrapper(python_callable)
    elif python_callable is not None:
        raise AirflowException('No args allowed while using @task, use kwargs instead')
    return wrapper

and the iter for XcomArg

    def __iter__(self):
        return iter(XComArg(operator=self.operator, key=str(i)) for i in range(self.operator._n_outputs))

turbaszek · 2020-08-18T07:32:25Z

airflow/models/xcom_arg.py

Suggested change

Implements xcomresult['some_result_key'

Implements xcomresult['some_result_key']

turbaszek · 2020-08-18T07:33:44Z

airflow/operators/python.py

Suggested change

self.xcom_push(context, str(i), value)

self.xcom_push(context, f"return_value_{i}", value)

How about something more informative?

gotcha fixed that, but wouldn't you want to index xcomarg though? like output[0] vs output["return_value_0"]

But when you think about it anywhere you have to index each item to pass to another task is better to take a generic iterable instead and just pass entire container output instead of each individual item so I see your point

casassg · 2020-08-18T15:48:37Z

Regarding supporting any iter. We discussed that on the previous PR: #8962 (comment)

My 2 cents on adding tuple support: Makes code quite more complex over not enough high value.

Finally, should we add a custom class to make this more explicit? aka you need to import from airflow.ttypes import RetDict or something like that.

casassg · 2020-08-18T15:44:11Z

airflow/operators/python.py

Not sure about using this parenthesis here. Does this not fit in a single line?

Nah that's just left over noise, originally had complex check before I found the is_container func existed. I'll clean it up

tests/operators/test_python.py

AndersonReyes · 2020-08-19T03:38:47Z

not sure whats up with the one test case getting exit 137 but cant tell if its me or github actions

potiuk · 2020-08-19T16:45:01Z

Quarantined tests do timoeout from time to time.

casassg · 2020-08-19T17:57:29Z

airflow/operators/python.py

Suggested change

multiple_outputs: Union[None, bool] = None,

multiple_outputs: Optional[bool] = None,

Oh almost missed this good catch

tests/operators/test_python.py

turbaszek · 2020-08-26T14:33:49Z

My 2 cents on adding tuple support: Makes code quite more complex over not enough high value.

I agree. Let's just add __iter__ that will raise a meaningful error just in case someone tries unpacking.

casassg · 2020-08-26T17:19:32Z

Note: You will need to update tests. Now only Dict annotated return should be multiple_output. Since tuple, set and such should not work.

AndersonReyes · 2020-08-26T17:48:11Z

makes sense ill make the updates 👍

casassg · 2020-08-28T01:03:14Z

tests/operators/test_python.py

Can we add back one of these tests to make sure it's not inferred?

stale · 2020-10-12T06:01:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

AndersonReyes · 2020-10-29T14:06:29Z

@casassg @dimberman @turbaszek I pretty much forgot about this since it's crazy at work and I'm sure ya got your own stuff happening also but is there anything else to do for this PR? Or has it been implemented elsewhere and I can just close this one?

turbaszek · 2020-11-04T13:17:16Z

@AndersonReyes we will need a note in documentation expelling what types will be treated as multiple_outpus and how users should use type hinting

turbaszek · 2020-11-11T22:22:45Z

docs/tutorial_taskflow_api.rst

Suggested change

Not, If you manually set the ``multiple_outputs`` parameter the inference is disabled and

Note, if you manually set the ``multiple_outputs`` parameter the inference is disabled and

Or did you have something else in your mind?

nah definitely a typo

turbaszek · 2020-11-11T22:24:27Z

airflow/operators/python.py

Suggested change

sig = signature(python_callable).return_annotation

ttype = getattr(sig, "__origin__", None)

if sig != inspect.Signature.empty and ttype in (dict, Dict):

multiple_outputs = True

sig = signature(python_callable).return_annotation

ttype = getattr(sig, "__origin__", None)

multiple_outputs = sig != inspect.Signature.empty and ttype in (dict, Dict)

turbaszek · 2020-11-11T22:25:43Z

airflow/operators/python.py

Should we be able to skip this line? None has boolean value of False

I'll do the patch suggestion you have above, removing the if statement would set it to false anyways so yeah the line is not needed

turbaszek

Great work @AndersonReyes ! 👏

github-actions · 2020-11-12T09:53:27Z

The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!

turbaszek · 2020-11-13T18:58:10Z

@AndersonReyes could you rebase onto latest master please?

AndersonReyes · 2020-11-13T22:35:11Z

@AndersonReyes could you rebase onto latest master please?

yessir, squashed first to avoid merge conflicts 14 times for each commit lol

github-actions · 2020-11-13T23:06:11Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

turbaszek · 2020-11-13T23:28:14Z

squashed first to avoid merge conflicts 14 times for each commit lol

Good thinking 😄

github-actions · 2020-11-13T23:51:11Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

AndersonReyes · 2020-11-14T00:02:54Z

nvm rebased one more time the static check readme error was fixed on master

potiuk · 2020-12-07T08:59:45Z

Hello. Is this something we really want for 2.0.0rc1? If not - can someone set the right milestone? Or maybe rebase and merge since it is approved already :) ?

airflow/operators/python.py

boring-cyborg · 2020-12-09T14:45:18Z

Awesome work, congrats on your first merged pull request!

AndersonReyes force-pushed the AIP-31 branch from 9a65ec9 to 7fcc805 Compare August 16, 2020 03:07

turbaszek self-requested a review August 16, 2020 03:44

turbaszek reviewed Aug 16, 2020

View reviewed changes

turbaszek added the AIP-31 Task Flow API for nicer DAG definition label Aug 16, 2020

turbaszek reviewed Aug 18, 2020

View reviewed changes

casassg reviewed Aug 18, 2020

View reviewed changes

casassg reviewed Aug 19, 2020

View reviewed changes

kaxil added this to the Airflow 2.0.0 milestone Aug 24, 2020

AndersonReyes requested a review from turbaszek August 24, 2020 12:01

casassg reviewed Aug 28, 2020

View reviewed changes

AndersonReyes force-pushed the AIP-31 branch from bc836b4 to 1a1d179 Compare August 28, 2020 01:24

turbaszek requested a review from dimberman August 28, 2020 05:05

stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 12, 2020

stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 29, 2020

casassg approved these changes Nov 10, 2020

View reviewed changes

turbaszek reviewed Nov 11, 2020

View reviewed changes

turbaszek approved these changes Nov 12, 2020

View reviewed changes

github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 12, 2020

AndersonReyes force-pushed the AIP-31 branch from 36ff36c to d82bf07 Compare November 13, 2020 22:28

AndersonReyes force-pushed the AIP-31 branch from d82bf07 to f34fdc1 Compare November 13, 2020 23:16

AndersonReyes force-pushed the AIP-31 branch from f34fdc1 to 3b24347 Compare November 14, 2020 00:03

ashb reviewed Dec 7, 2020

View reviewed changes

airflow/operators/python.py Outdated Show resolved Hide resolved

ashb modified the milestones: Airflow 2.0.0rc1, Airflow 2.1 Dec 7, 2020

infer multiple outputs from dict annotations

046f519

ashb force-pushed the AIP-31 branch from 3b24347 to 046f519 Compare December 9, 2020 14:44

ashb merged commit 1d91ca7 into apache:master Dec 9, 2020

ashb modified the milestones: Airflow 2.1, Airflow 2.0.0rc1 Dec 9, 2020

AndersonReyes deleted the AIP-31 branch December 18, 2020 10:03

MatrixManAtYrService mentioned this pull request May 12, 2021

Support Tuple[foo, bar] type hints on @task decorated functions #15813

Open

josh-fell mentioned this pull request Dec 29, 2021

Add sensor decorator #20530

Closed

	Implements xcomresult['some_result_key'
	Implements xcomresult['some_result_key']

	self.xcom_push(context, str(i), value)
	self.xcom_push(context, f"return_value_{i}", value)

	multiple_outputs: Union[None, bool] = None,
	multiple_outputs: Optional[bool] = None,

	Not, If you manually set the ``multiple_outputs`` parameter the inference is disabled and
	Note, if you manually set the ``multiple_outputs`` parameter the inference is disabled and

infer multiple_output from return type annotation #10349

infer multiple_output from return type annotation #10349

Uh oh!

Conversation

AndersonReyes commented Aug 16, 2020

Uh oh!

boring-cyborg bot commented Aug 16, 2020

Uh oh!

AndersonReyes commented Aug 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

turbaszek Aug 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

turbaszek commented Aug 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndersonReyes commented Aug 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turbaszek commented Aug 17, 2020

Uh oh!

AndersonReyes commented Aug 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

turbaszek Aug 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

casassg commented Aug 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AndersonReyes commented Aug 19, 2020

Uh oh!

potiuk commented Aug 19, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

turbaszek commented Aug 26, 2020

Uh oh!

casassg commented Aug 26, 2020

Uh oh!

AndersonReyes commented Aug 26, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stale bot commented Oct 12, 2020

Uh oh!

AndersonReyes commented Oct 29, 2020

Uh oh!

turbaszek commented Nov 4, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndersonReyes Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndersonReyes commented Aug 16, 2020 •

edited

Loading

turbaszek Aug 16, 2020 •

edited

Loading

turbaszek commented Aug 16, 2020 •

edited

Loading

AndersonReyes commented Aug 16, 2020 •

edited

Loading

turbaszek Aug 18, 2020 •

edited

Loading

AndersonReyes Nov 11, 2020 •

edited

Loading

AndersonReyes commented Nov 14, 2020 •

edited

Loading