Skip to content

Conversation

@kennknowles
Copy link
Member

This is a simple first step to fix #30634

The V1 worker container is wholly independent of the V2 SDK harness container. This change makes them very obviously separate codepaths.

The prior changes to try to deprecate --workerHarnessContainerImageUrl option were made in error. These two options are analogous but not synonymous.

There are probably service-side changes necessary to adjust for this, since we probably are incorrectly passing a V1 worker container image URL in the --sdkHarnessContainerImageUrl option.

Since V1 does not support custom container usage, this is not a breaking change. There is no supported usage of the V2 parameter for V1.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kennknowles, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the DataflowRunner to explicitly differentiate between container images used for V1 workers and V2 SDK harnesses. The change improves the clarity and correctness of how container images are managed within Dataflow, ensuring that the appropriate image is used for each pipeline version and addressing a previous misinterpretation of related configuration options.

Highlights

  • Container Image Handling Separation: The pull request cleanly separates the handling of container images for Dataflow V1 workers and V2 SDK harnesses. This clarifies the distinct roles of these container types, addressing previous ambiguities where the --workerHarnessContainerImageUrl and --sdkHarnessContainerImageUrl options were incorrectly treated as synonymous.
  • Introduction of Dedicated Methods: New, explicit methods have been introduced in DataflowRunner.java to retrieve the appropriate container image URLs: getV1WorkerContainerImageForJob for V1 workers and getV2SdkHarnessContainerImageForJob for V2 SDK harnesses. This replaces a single, ambiguous getContainerImageForJob method.
  • Correction of Deprecation: The @Deprecated annotations and associated comments for getWorkerHarnessContainerImage and setWorkerHarnessContainerImage in DataflowPipelineWorkerPoolOptions.java have been removed. This acknowledges that the V1 worker container image option is still relevant and distinct from the V2 SDK harness image option.
  • No Breaking Changes for V1: The changes do not introduce breaking changes for V1, as V1 does not support custom container usage, and there is no supported usage of the V2 parameter for V1.
  • Improved Code Clarity and Maintainability: By making the container image handling explicit and separate, the codebase becomes easier to understand and maintain, reducing potential for misconfiguration and paving the way for future service-side adjustments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@kennknowles
Copy link
Member Author

R: @Abacn

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@kennknowles kennknowles force-pushed the df-container-images branch 5 times, most recently from 52b658d to 0b5d216 Compare September 15, 2025 14:42
Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just had one comments

// it is a v1 or v2 job submission.
RunnerApi.Environment defaultEnvironmentForDataflow =
Environments.createDockerEnvironment(workerHarnessContainerImageURL);
Environments.createDockerEnvironment(v2SdkHarnessContainerImageURL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the comments of this line (L1284-1290) still relevant (or need update)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed the comment and named variable to make it obvious.

@kennknowles
Copy link
Member Author

The integration tests are now green. I haven't changed anything except the comment I think, so I'll merge.

@kennknowles kennknowles merged commit 2b65f46 into apache:master Sep 17, 2025
23 of 29 checks passed
@kennknowles kennknowles deleted the df-container-images branch September 17, 2025 16:19
@kennknowles
Copy link
Member Author

Internal tests didn't catch anything but I do think this has caused the service to not use the specified image. This is actually as expected, but we need to roll back and add tests.

https://console.cloud.google.com/dataflow/jobs/us-central1/2025-09-18_12_07_20-4196960529059100076?project=apache-beam-testing is a job from #34902 which clearly has --sdkContainerImage=us.gcr.io/apache-beam-testing/java-postcommit-it/java:20250918190048 (
image) but the SDK container pulled is simply from the repo (
image)

@Abacn
Copy link
Contributor

Abacn commented Sep 19, 2025

what does it mean in terms of "this is acually expected"? From this example job looks like sdkContainerImage pipeline option is currently broken after this PR. Do you mean this exposed bug in service side?

Yeah we should probably revert it at the moment.

@kennknowles
Copy link
Member Author

I don't remember what I meant when I typed "this is expected" 🤷

But yes this is a bug in the service side. Or potentially if the portable pipeline environment did not change the container then it could be an SDK bug. I actually didn't fully decode the pipeline proto to see what was in the ParDoPayload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Dataflow worker container resolved to legacy runner label if not explicitly disable/enable runner v2 in 2.54.0+.dev

2 participants