-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Remove SparkPipelineOptionsFactory, SparkStreamingPipelineOptionsFactory #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R: @amitsela |
|
@amitsela: |
|
@lukecwik I guess SparkPipelineOptionsFactory.create() was a wrapper for PipelineOptionsFactory.as(SparkPipelineOptions.class). @tgroh It makes sense that if we provide a PipelineOptionsFactory.as wrapper to the Spark runner we should apply the SparkRunner there, but we should do the same in SparkStreamingPipelineOptionsFactory.create() So it's either letting the user choose the correct options as @lukecwik suggests: Or as @tgroh suggests: As the model writers, what is the correct way for a runner implementor to choose ? Which way we want the user to use this ? should he explicitly state the Runner or choosing the options will provide the correct runner ? |
|
Generally Pipeline authors should not have to use any runner-specific classes during pipeline construction or submission; so users should call neither Pipeline authors are responsible for selecting a runner, currently at or before the time of |
|
+1 for Thomas Groh response On Tue, Apr 12, 2016 at 8:54 AM, Thomas Groh notifications@github.com
|
|
I can see your point, and it's true for PipelineOptionsFactory.fromArgs(String[]). so I guess it's +1 for me as well |
|
I've pushed a change to remove the factories, and fix up all of the tests in which they were used. If this commit is merged, the first in this PR should be discarded. |
|
@tgroh I don't quite understand your last comment. Perhaps you could just rebase to exactly what you propose to merge? |
|
@kennknowles done. |
1fbe48e to
2f5be67
Compare
|
+1 for removing the factories as well |
2f5be67 to
95472e0
Compare
Pipeline authors should generally not use any runner-specific classes, but instead should select the runner and appropriate configurations through the PipelineOptionsFactory.fromArgs() method. The runner can then obtain the appropriately typed PipelineOptions class as required and do any neccessary validation. Failing this, they should use the provided PipelineOptions#as() method to acquire the appropriately typed options. If required, users should construct SparkPipelineOptions via PipelineOptionsFactory.as(SparkPipelineOptions.class).
95472e0 to
11ba2b9
Compare
|
Rebased on top of beam. If there aren't any more comments, this is ready to merge, pending jenkins |
|
LGTM |
…f-hosted runners (#23134) * Updating build_playground_backend workflow (#167) Co-authored-by: Elias Segundo <elias.segundo@luisrazo.local> * Added master changes in build_playground_backend to avoid merge conflicts * Reverted GO_VERSION and BEAM_VERSION to have the same as master build_playground_backend * Switching trigger to pull_request (#259) * Switching to pull_request * Removing ref from checkout Co-authored-by: Elias Segundo Antonio <eliassegundo.segundo@gmail.com> Co-authored-by: Elias Segundo <elias.segundo@luisrazo.local> Co-authored-by: elink22 <103056145+elink22@users.noreply.github.com> Co-authored-by: Danny McCormick <dannymccormick@google.com>
…f-hosted runners (apache#23134) * Updating build_playground_backend workflow (apache#167) Co-authored-by: Elias Segundo <elias.segundo@luisrazo.local> * Added master changes in build_playground_backend to avoid merge conflicts * Reverted GO_VERSION and BEAM_VERSION to have the same as master build_playground_backend * Switching trigger to pull_request (apache#259) * Switching to pull_request * Removing ref from checkout Co-authored-by: Elias Segundo Antonio <eliassegundo.segundo@gmail.com> Co-authored-by: Elias Segundo <elias.segundo@luisrazo.local> Co-authored-by: elink22 <103056145+elink22@users.noreply.github.com> Co-authored-by: Danny McCormick <dannymccormick@google.com>
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
number, if there is one.
Individual Contributor License Agreement.
The SparkPipelineOptionsFactory should ensure that the returned options
have Spark as the PipelineRunner.
Use the SparkPipelineOptionsFactory in the spark TfIdf test
Without using the SparkPipelineRunner explicitly, the Pipeline run with
the SparkRunner may have an unexpected graph due to runner-specific
interceptions of the Pipeline#apply method