-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-189] The Spark runner uses valueInEmptyWindow which causes values to be dropped #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n, and add per-value (non-RDD) windowing functions
…we provide in README
|
R: @tomwhite and also anyone from the thread in beam-dev ("question about windowed values") - @kennknowles / @tgroh / @robertwb |
| SparkPipelineOptions options = SparkPipelineOptionsFactory.create(); | ||
| options.setRunner(SparkPipelineRunner.class); | ||
| Pipeline p = Pipeline.create(options); | ||
| PCollection<String> inputWords = p.apply(Create.of(WORDS)).setCoder(StringUtf8Coder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not particularly relevant to the content of your change, but the recommended way to set the coder here would be Create.of(WORDS).withCoder(StringUtf8Coder.of()).
Separately, it isn't entirely necessary: it is a bit of a hack, but the methods in the default Create implementation that infer coders based on the values are made public just so that runner overrides can still invoke them, as the DataflowPipelineRunner does.
I hope the new whole-graph analysis will make it so individual runners no longer deal with this, but until then since it is not too hard to do, you might consider doing it for the Spark runner.
|
+1 from me |
Give root transforms step names
[euphoria-core] remove checkpointing from datasets - unusable feature
Co-authored-by: Elias Segundo <elias.segundo@luisrazo.local> Co-authored-by: Elias Segundo Antonio <eliassegundo.segundo@gmail.com> Co-authored-by: Elias Segundo <elias.segundo@luisrazo.local>
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.