Skip to content

Conversation

@mxm
Copy link
Contributor

@mxm mxm commented Mar 7, 2016

No description provided.

README.md Outdated
2. The `DataflowPipelineRunner` submits the pipeline to the [Google Cloud Dataflow](http://cloud.google.com/dataflow/).
3. The `SparkPipelineRunner` runs the pipeline on an Apache Spark cluster. See the code that will be donated at [cloudera/spark-dataflow](https://github.com/cloudera/spark-dataflow).
4. The `FlinkPipelineRunner` runs the pipeline on an Apache Flink cluster. See the code that will be donated at [dataArtisans/flink-dataflow](https://github.com/dataArtisans/flink-dataflow).
Beam supports executing programs on multiple distributed processing backends (runners). It currently includes the following Runners:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of "distributed processing backends" as being different from "runners". A runner is the component that takes a Beam pipeline and submits it to the "processing backend", but doesn't include it. E.g., FlinkPipelineRunner vs. Flink backend.

Suggested: "Beam supports executing programs on multiple distributed processing backends. Currently, it supports the following:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely. Would be nice to explain the notion of a Runner as well. How about changing it to "Beam supports executing programs on multiple distributed processing backends through Runners. Currently, the following Runners are supported:".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use PipelineRunner instead of Runner.

@davorbonaci
Copy link
Member

CC: @francesperry

Nice!

@davorbonaci
Copy link
Member

LGTM.

@mxm
Copy link
Contributor Author

mxm commented Mar 8, 2016

Thanks. I've updated the pull request with your suggestions. Will merge this later on.

@asfgit asfgit closed this in 5a817d1 Mar 8, 2016
@mxm mxm deleted the README branch March 8, 2016 16:28
echauchot added a commit to echauchot/beam that referenced this pull request May 12, 2017
cosmoskitten pushed a commit to cosmoskitten/beam that referenced this pull request Jun 16, 2017
query3: Use GlobalWindow to comply with the State/Timer APIs (issue apache#7). Use timer for personState expiration in GlobalWindow (issue apache#29). Add trigger to GlobalWindow

query12: Replace Count.perKey by Count.perElement (issue apache#34)
asfgit pushed a commit that referenced this pull request Aug 23, 2017
query3: Use GlobalWindow to comply with the State/Timer APIs (issue #7). Use timer for personState expiration in GlobalWindow (issue #29). Add trigger to GlobalWindow

query12: Replace Count.perKey by Count.perElement (issue #34)
lukecwik referenced this pull request in lukecwik/incubator-beam Mar 22, 2018
Primitive transforms only have zero sub transforms and not more.
tvalentyn pushed a commit to tvalentyn/beam that referenced this pull request May 15, 2018
mxm added a commit to mxm/beam that referenced this pull request Jan 16, 2020
* [BEAM-8549] Do not use keyed operator state for checkpoint buffering

The current buffer logic for items emitted during checkpointing is faulty in the
sense that the buffer is partitioned on the output keys of the operator. The key
may be changed or even be dropped. Thus, the original key partitioning will not
be maintained which will cause checkpointing to fail.

An alternative solution would be BEAM-6733 / apache#9652, but this change keeps the
current buffering logic in place. The output buffer may now always be
redistributed round-robin upon restoring from a checkpoint. Note that this is
fine because no assumption can be made about the distribution of output elements
of a DoFn operation.

* [BEAM-8566] Fix checkpoint buffering when when another bundle is started during checkpointing

As part of a checkpoint, the current bundle is finalized. When the bundle is
finalized, the watermark, currently held back, may also be progressed which can
cause the start of another bundle. When a new bundle is started, any
to-be-buffered items from the previous bundle for the pending checkpoint may be
emitted. This should not happen.

This only effects portable pipelines where we have to hold back the watermark
due to the asynchronous processing of elements.

* [BEAM-8566] Do not swallow execution errors during checkpointing

If a bundle fails to finalize before creating a checkpoint, it may be swallowed
and just considered a checkpointing error. This breaks the execution flow and
exactly-once guarantees.
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Apr 30, 2020
* Implement colors shortcode

* Migrate integrations, logos, policies, presentation, twitter and youtube pages

* Migrate contact and person pages

* fixup! Migrate contact and person pages
hengfengli referenced this pull request in hengfengli/beam Mar 21, 2022
Does not parse the JSON coming from the mods, but instead just treat
them as Strings. This will avoid us coming to pitfalls in the parsing
when dealing with complex Spanner types.
sjvanrossum pushed a commit to sjvanrossum/beam that referenced this pull request May 17, 2023
…e-from-Coder

refactor: remove `Coder<type E>` association type
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants