Skip to content

Conversation

@adlersantos
Copy link

@adlersantos adlersantos commented Jul 11, 2022

PR #23078 introduced the service_account parameter for Dataflow jobs, but this parameter only works for jobs written in Java. See Java pipeline options for Dataflow.

To support Dataflow jobs written in Python or Go, the parameter to use is service_account_email. See Python pipeline options and Go pipeline options for Dataflow.

Additional references:


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@adlersantos adlersantos requested a review from turbaszek as a code owner July 11, 2022 21:28
@boring-cyborg boring-cyborg bot added area:providers provider:Apache provider:google Google (including GCP) related issues labels Jul 11, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 11, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@adlersantos adlersantos changed the title Support service_account_email attr for Dataflow jobs in Python or Go Support service_account_email pipeline option for Dataflow jobs in Python or Go Jul 11, 2022
@mik-laj
Copy link
Member

mik-laj commented Jul 17, 2022

@pabloem Can you look at it?

@eladkal
Copy link
Contributor

eladkal commented Aug 3, 2022

Can you please fix static checks?

Run flake8.............................................................................Failed
- hook id: run-flake8
- exit code: 1

airflow/providers/google/cloud/operators/dataflow.py:127:111: E501 line too long (130 > 110 characters)
airflow/providers/google/cloud/operators/dataflow.py:129:111: E501 line too long (161 > 110 characters)
airflow/providers/google/cloud/operators/dataflow.py:130:111: E501 line too long (157 > 110 characters)

Comment on lines +127 to +130
:param service_account: Run the Java job as a specific service account, instead of the default Compute Engine service account.
See: https://cloud.google.com/dataflow/docs/reference/pipeline-options#java
:param service_account_email: Run the Python or Go job as a specific service account for Python or Go, instead of the default Compute Engine service account.
See: https://cloud.google.com/dataflow/docs/reference/pipeline-options#python or https://cloud.google.com/dataflow/docs/reference/pipeline-options#go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wonder if it doesn't make sense to put these into a single parameter rather than two separate ones? The documentation is quite clear, so it may not be a big deal - but ideally users only have to care about one parameter?

I understand that complicates the logic downstream, but maybe it's better to complicate downstream than enable users to misconfigure/be confused?

also, sorry about the delay ^^'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How well does the API handle unknown keys? If unknown keys are simply ignored, we could probably just always set both keys. Otherwise the implementation might be pretty complicated and not worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once again my apologies for missing this - I am not sure about how the API deals with this. If not, I am happy to move forward with the way it is...

@potiuk
Copy link
Member

potiuk commented Aug 23, 2022

Any comments/feedbcack @adlersantos ? Are you still working on it (it needs at least reabase and likely the static checks fixed)

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 15, 2022
@github-actions github-actions bot closed this Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants