-
Notifications
You must be signed in to change notification settings - Fork 16.4k
DataflowTemplatedJobStartOperator fix overwriting of location with default value, when a region is provided.
#31082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataflowTemplatedJobStartOperator fix overwriting of location with default value, when a region is provided.
#31082
Conversation
when a region is provided.
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| dataflow_default_options: dict[str, Any] | None = None, | ||
| parameters: dict[str, str] | None = None, | ||
| location: str = DEFAULT_DATAFLOW_LOCATION, | ||
| location: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a breaking change or a bug fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not breaking as the default location is assigned in hook when it is passed as null and there is no region defined (as far as I understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same concern, I think this needs to be tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is bug fix. The additional tests are provided - the case is valid but will be broken without the fix. The default value is not necessary on the operator level, since it is being passed to the hook anyway (and it has default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also additional level of protection can be provided at _fallback_variable_parameter method, but looks redundant since the default value is caught from the DataflowHook.start_template_dataflow definition itself. Please let me know if it is useful - I can publish the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was for clarification of this change in the release notes.
I have no objection either way
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
DataflowTemplatedJobStartOperator fix overwriting of location with default value, when a region is provided.
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
Fix overwriting of location with default value, when a region is provided.
TL/DR
When the
DataflowTemplateOperatoris being migrated from Airflow 1 to Airflow 2, ifregionproperty is setup with nolocationpassing - Fallback anyway throws a message:The mutually exclusive parameter
locationandregionkey invariablesparameter are both present. Please remove one.New unit tests of operator might explain better - they are not going to pass without the fix.
Details
The
DataflowTemplatedJobStartOperatorhas a parameterlocationwith non empty default value. This operator is using theDataflowHookobject.DataflowHookmakes a check: iflocationandregionare setup together at the same time, than throws the exception (like above). SinceDataflowTemplatedJobStartOperator'slocationhas a default value, it will always pass non-zero value to theDataflowHook.At the same time
DataflowHookhas the same default value for itslocationparameter, soDataflowTemplatedJobStartOperatorcan safely pass None value (Special unit test is also added, to make sureDataflowHookwill use default value whenlocationandregionare not provided at the same time).Notices
The same fallback
_fallback_to_location_from_variablesis applied to 4 methods (at the moment of writing):BeamRunJavaPipelineOperator,DataflowTemplatedJobStartOperatorandDataflowCreateJavaJobOperatorwith explicitly no passinglocationandregionat the same time)So, since other places are not affected, the change can be pretty small and focused on
DataflowTemplatedJobStartOperatoronly.P.S. Looks like some other people had the same problem, for example: https://stackoverflow.com/questions/70254851/dataflowtemplateoperator-job-failing-on-cloud-composer-after-upgrading-to-airflo