-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Support service_account_email pipeline option for Dataflow jobs in Python or Go
#24980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
service_account_email attr for Dataflow jobs in Python or Goservice_account_email pipeline option for Dataflow jobs in Python or Go
|
@pabloem Can you look at it? |
|
Can you please fix static checks? |
| :param service_account: Run the Java job as a specific service account, instead of the default Compute Engine service account. | ||
| See: https://cloud.google.com/dataflow/docs/reference/pipeline-options#java | ||
| :param service_account_email: Run the Python or Go job as a specific service account for Python or Go, instead of the default Compute Engine service account. | ||
| See: https://cloud.google.com/dataflow/docs/reference/pipeline-options#python or https://cloud.google.com/dataflow/docs/reference/pipeline-options#go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wonder if it doesn't make sense to put these into a single parameter rather than two separate ones? The documentation is quite clear, so it may not be a big deal - but ideally users only have to care about one parameter?
I understand that complicates the logic downstream, but maybe it's better to complicate downstream than enable users to misconfigure/be confused?
also, sorry about the delay ^^'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How well does the API handle unknown keys? If unknown keys are simply ignored, we could probably just always set both keys. Otherwise the implementation might be pretty complicated and not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once again my apologies for missing this - I am not sure about how the API deals with this. If not, I am happy to move forward with the way it is...
|
Any comments/feedbcack @adlersantos ? Are you still working on it (it needs at least reabase and likely the static checks fixed) |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
PR #23078 introduced the
service_accountparameter for Dataflow jobs, but this parameter only works for jobs written in Java. See Java pipeline options for Dataflow.To support Dataflow jobs written in Python or Go, the parameter to use is
service_account_email. See Python pipeline options and Go pipeline options for Dataflow.Additional references:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.