Skip to content

Conversation

@emilymye
Copy link
Contributor

This rewrites the container guide as we move to launch custom containers for Dataflow runner. There are two methods that should be highlighted - one is using the released Dockerhub images as a base image, and one is building from Beam source itself.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@emilymye emilymye force-pushed the customcontainerdocs branch 2 times, most recently from 7259ff9 to c9c75ab Compare November 24, 2020 22:27
@emilymye
Copy link
Contributor Author

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unfortunate. WDYT about Dataflow using this flag as well in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future there is a plan for Portable runner to support Dataflow endpoint, not sure how far on the roadmap that is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tvalentyn Can you send me more information on this plan if you have any?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I'd like to know about this as well since we shouldn't go to GA with custom containers until the flags are settled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ananvay @angoenka @robertwb may have input on this plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the plans for using the Portability API for submitting jobs, we should standardize on a single flag for specifying the image across runners. It could be argued that environment_config is not the best flag for this; maybe we should take it to the list?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the plans for using the Portability API for submitting jobs, we should standardize on a single flag for specifying the image across runners. It could be argued that environment_config is not the best flag for this; maybe we should take it to the list?

+1 I agree that standardizing the flag is a worthwhile improvement, but it's out of scope for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standardizing the flag should be done before we document and adveritize this feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our plan was to keep these as is for preview and standardize for GA. This is also editing existing documentation which used --environment_config. I can send an email to dev re:flags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're supporting old SDKs, we have to support the old flag anyway, so we should go ahead and get this in and standardizing going forward.

Comment on lines +216 to +241
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this into a code block or reformat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The {{< highlight class="runner-X" >}} formats it into code-blocks, tabbed by runner type - see existing page https://beam.apache.org/documentation/runtime/environments/#testing-customized-images

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I went to 'view file' in GitHub and it only showed the rendered markdown which made each of these H1 titles. Is there a way to preview?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way is to serve the website from source: ./gradlew :website:serveWebsite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also view the details of Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") to find the staged version of the website. You can rerun the test if the link's gone stale.

@emilymye emilymye force-pushed the customcontainerdocs branch from 07a91f0 to 96739ba Compare November 30, 2020 23:07
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future there is a plan for Portable runner to support Dataflow endpoint, not sure how far on the roadmap that is.

@tvalentyn
Copy link
Contributor

(some of my comments got hidden under Show Resolved due to new push, hope you can still find them :) )

@emilymye emilymye requested a review from tvalentyn November 30, 2020 23:27
Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to ping me when this is ready to merge.

@emilymye emilymye requested a review from ibzib December 4, 2020 22:01
@emilymye emilymye force-pushed the customcontainerdocs branch from c9c883f to e1702d8 Compare December 7, 2020 22:15
Copy link

@ibzib ibzib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It's exciting to finally have custom containers on Dataflow publicly documented.

Copy link
Contributor

@rosetn rosetn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for rewriting this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "Navigate to the root directory where you've installed your local copy of the Beam SDK." work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to make it clearer (added some env var setting in instructions) - mostly worried installed might get confused for package installation (i.e. where pip installed Beam)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an anchor to this so I can link to it in cloud docs please?

Copy link
Contributor Author

@emilymye emilymye Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added #running-pipelines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here--I also think you can just call this section "Troubleshooting" or "Considerations"

@emilymye emilymye force-pushed the customcontainerdocs branch from 4244d56 to b7a4fb6 Compare December 15, 2020 19:28
Copy link
Contributor

@rosetn rosetn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small typo but otherwise LGTM

COPY /src/path/to/file /dest/path/to/file/
```

This `Dockerfile`: uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra colon after Dockerfile

@tvalentyn tvalentyn merged commit f8d326e into apache:master Dec 17, 2020
dxichen pushed a commit to linkedin/beam that referenced this pull request Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants