-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Remove auto unroll for dict values from taskflow #27826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dict return values from functions were automatically unrolled in separate values. This meant that the default of `multiple_outputs` was changing based on the return type of the function. This is confusing as it requires the user to be aware of this and rather un-pythonic (explicit over implicit) and makes conversion from plain python functions to Airflow tasks harder as unit tests would suddenly fail. Closes: apache#27819
|
I don’t think we can just remove this outright, there’s no tell who are depending on this. The best we can do is to fail the DAG when |
|
It was added in #19965 . That's just 11 months ago, but I havent figured out what release. So it is the choice of going through possibly a bit of pain for some if they made use of this functionality (which wasn't standard before. It is interesting that we just implemented it as it was a change of behavior before that!) or to go through a whole deprecation cycle. It is not backwards compatible in any case. In other words if you want to retain the old behavior you will need to update your DAGs. I do think the impact is relatively low and we should change it asap before it gets more widespread. It requires So I see three options:
Typically I would opt for 3 for a large change, 2 for a change that is easy to fix but with some impact and 1) if little impact. My assessment is little impact so I would opt for 1, but maybe 2 is better. cc @ashb wdyt? |
|
I think we have more and more similar discussion and this convinces me more and more that we shuld agree how we should assess what constituces as backwards compatibility (it's not as obvious and straightforward as one might think). More details and my reasoning here: #27067 (comment). I will start a devlist discussion. |
|
Discussion about potential approach we might take for similar cases started in devlist: https://lists.apache.org/thread/1by8ko8jrrp1xwxt5781bwn2tokxjodl |
|
@uranusjr I've updated the code to fail explictily if assumed implicit. It's ugly though :-) |
| raise AttributeError( | ||
| "multiple_outputs was not set and will not implicitly unroll dict for return values" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttributeError feels wrong; maybe something like RuntimeError or a custom exception class is more suitable. I’d also want the message to be clearer (tell the user “set multiple_outputs=True is to retain the old behaviour”).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the new DagWarning mechanism is even better for this. It would still allow the DAG to parse (i.e. not breaking people’s existing setups) but instruct the user to fix things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is an AttributeError. The attribute multiple_outputs was not set, so we don't know what to do. An AirflowException is very generic.
If we are going for option 2 I don't think the DAG should parse and in that case I think the error message should not refer to the old behavior as that implies knowing what that is. What you are describing seems more like a deprecation warning though and thus option 3?
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
dict return values from functions were automatically unrolled in separate values. This meant that the default of
multiple_outputswas changing based on the return type of the function.This is confusing as it requires the user to be aware of this and rather un-pythonic (explicit over implicit) and makes conversion from plain python functions to Airflow tasks harder as unit tests would suddenly fail.
Closes: #27819
@uranusjr
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.