-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[docs] update README Runner section #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
README.md
Outdated
| 2. The `DataflowPipelineRunner` submits the pipeline to the [Google Cloud Dataflow](http://cloud.google.com/dataflow/). | ||
| 3. The `SparkPipelineRunner` runs the pipeline on an Apache Spark cluster. See the code that will be donated at [cloudera/spark-dataflow](https://github.com/cloudera/spark-dataflow). | ||
| 4. The `FlinkPipelineRunner` runs the pipeline on an Apache Flink cluster. See the code that will be donated at [dataArtisans/flink-dataflow](https://github.com/dataArtisans/flink-dataflow). | ||
| Beam supports executing programs on multiple distributed processing backends (runners). It currently includes the following Runners: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of "distributed processing backends" as being different from "runners". A runner is the component that takes a Beam pipeline and submits it to the "processing backend", but doesn't include it. E.g., FlinkPipelineRunner vs. Flink backend.
Suggested: "Beam supports executing programs on multiple distributed processing backends. Currently, it supports the following:"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes absolutely. Would be nice to explain the notion of a Runner as well. How about changing it to "Beam supports executing programs on multiple distributed processing backends through Runners. Currently, the following Runners are supported:".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use PipelineRunner instead of Runner.
|
CC: @francesperry Nice! |
|
LGTM. |
|
Thanks. I've updated the pull request with your suggestions. Will merge this later on. |
Primitive transforms only have zero sub transforms and not more.
* [BEAM-8549] Do not use keyed operator state for checkpoint buffering The current buffer logic for items emitted during checkpointing is faulty in the sense that the buffer is partitioned on the output keys of the operator. The key may be changed or even be dropped. Thus, the original key partitioning will not be maintained which will cause checkpointing to fail. An alternative solution would be BEAM-6733 / apache#9652, but this change keeps the current buffering logic in place. The output buffer may now always be redistributed round-robin upon restoring from a checkpoint. Note that this is fine because no assumption can be made about the distribution of output elements of a DoFn operation. * [BEAM-8566] Fix checkpoint buffering when when another bundle is started during checkpointing As part of a checkpoint, the current bundle is finalized. When the bundle is finalized, the watermark, currently held back, may also be progressed which can cause the start of another bundle. When a new bundle is started, any to-be-buffered items from the previous bundle for the pending checkpoint may be emitted. This should not happen. This only effects portable pipelines where we have to hold back the watermark due to the asynchronous processing of elements. * [BEAM-8566] Do not swallow execution errors during checkpointing If a bundle fails to finalize before creating a checkpoint, it may be swallowed and just considered a checkpointing error. This breaks the execution flow and exactly-once guarantees.
* Implement colors shortcode * Migrate integrations, logos, policies, presentation, twitter and youtube pages * Migrate contact and person pages * fixup! Migrate contact and person pages
Does not parse the JSON coming from the mods, but instead just treat them as Strings. This will avoid us coming to pitfalls in the parsing when dealing with complex Spanner types.
…e-from-Coder refactor: remove `Coder<type E>` association type
No description provided.