Skip to content

Conversation

@bolkedebruin
Copy link
Contributor

dict return values from functions were automatically unrolled in separate values. This meant that the default of multiple_outputs was changing based on the return type of the function.

This is confusing as it requires the user to be aware of this and rather un-pythonic (explicit over implicit) and makes conversion from plain python functions to Airflow tasks harder as unit tests would suddenly fail.

Closes: #27819

@uranusjr


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

dict return values from functions were automatically unrolled
in separate values. This meant that the default of `multiple_outputs`
was changing based on the return type of the function.

This is confusing as it requires the user to be aware of this and
rather un-pythonic (explicit over implicit) and makes conversion from
plain python functions to Airflow tasks harder as unit tests would suddenly
fail.

Closes: apache#27819
@uranusjr
Copy link
Member

I don’t think we can just remove this outright, there’s no tell who are depending on this. The best we can do is to fail the DAG when multiple_outputs is not explicitly supplied with a type annotation is detected, but even that is technically breaking compatibility.

@bolkedebruin
Copy link
Contributor Author

It was added in #19965 . That's just 11 months ago, but I havent figured out what release. So it is the choice of going through possibly a bit of pain for some if they made use of this functionality (which wasn't standard before. It is interesting that we just implemented it as it was a change of behavior before that!) or to go through a whole deprecation cycle. It is not backwards compatible in any case. In other words if you want to retain the old behavior you will need to update your DAGs.

I do think the impact is relatively low and we should change it asap before it gets more widespread. It requires typing which isn't truly integrated with everyone's workflow (heck we arent even consistent ourselves in our examples) and it requires the type to be dict. So it does not do every annotation which you seemed to imply above.

So I see three options:

  1. Remove outright in 2.5 (preferred imho, lowish impact on integrity easily fixed)
  2. Break on detecting dict and no explicit value for multiple_outputs, remove later (better on integrity)
  3. Deprecation cycle: log warning and remove later (long cycle, what is real benefit?)

Typically I would opt for 3 for a large change, 2 for a change that is easy to fix but with some impact and 1) if little impact.

My assessment is little impact so I would opt for 1, but maybe 2 is better.

cc @ashb wdyt?

@potiuk
Copy link
Member

potiuk commented Nov 22, 2022

I think we have more and more similar discussion and this convinces me more and more that we shuld agree how we should assess what constituces as backwards compatibility (it's not as obvious and straightforward as one might think).

More details and my reasoning here: #27067 (comment). I will start a devlist discussion.

@uranusjr
Copy link
Member

It was added in #19965

The current implementation was added in #19965, but the behaviour dates way back than that, as I mentioned in the issue. The PR you linked only refactored the code.

@potiuk
Copy link
Member

potiuk commented Nov 22, 2022

Discussion about potential approach we might take for similar cases started in devlist: https://lists.apache.org/thread/1by8ko8jrrp1xwxt5781bwn2tokxjodl

@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Nov 22, 2022

It was added in #19965

The current implementation was added in #19965, but the behaviour dates way back than that, as I mentioned in the issue. The PR you linked only refactored the code.

Apologies, yes it was refactored. As it exists for longer I'd opt for option 2 (pending the discussion)

@bolkedebruin
Copy link
Contributor Author

@uranusjr I've updated the code to fail explictily if assumed implicit. It's ugly though :-)

Comment on lines +304 to +306
raise AttributeError(
"multiple_outputs was not set and will not implicitly unroll dict for return values"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeError feels wrong; maybe something like RuntimeError or a custom exception class is more suitable. I’d also want the message to be clearer (tell the user “set multiple_outputs=True is to retain the old behaviour”).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the new DagWarning mechanism is even better for this. It would still allow the DAG to parse (i.e. not breaking people’s existing setups) but instruct the user to fix things.

Copy link
Contributor Author

@bolkedebruin bolkedebruin Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is an AttributeError. The attribute multiple_outputs was not set, so we don't know what to do. An AirflowException is very generic.

If we are going for option 2 I don't think the DAG should parse and in that case I think the error message should not refer to the old behavior as that implies knowing what that is. What you are describing seems more like a deprecation warning though and thus option 3?

@github-actions
Copy link

github-actions bot commented Jan 8, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 8, 2023
@github-actions github-actions bot closed this Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core kind:documentation stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TaskFlow unexpectedly unwinds 'dict'

3 participants