Skip to content
This repository was archived by the owner on Apr 7, 2025. It is now read-only.

Conversation

@mxm
Copy link

@mxm mxm commented Nov 4, 2019

The current buffer logic for items emitted during checkpointing is faulty in the
sense that the buffer is partitioned on the output keys of the operator. The key
may be changed or even be dropped. Thus, the original key partitioning will not
be maintained which will cause checkpointing to fail.

An alternative solution would be BEAM-6733 / apache#9652, but this change keeps the
current buffering logic in place. The output buffer may now always be
redistributed round-robin upon restoring from a checkpoint. Note that this is
fine because no assumption can be made about the distribution of output elements
of a DoFn operation.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

mxm added 3 commits November 7, 2019 15:37
The current buffer logic for items emitted during checkpointing is faulty in the
sense that the buffer is partitioned on the output keys of the operator. The key
may be changed or even be dropped. Thus, the original key partitioning will not
be maintained which will cause checkpointing to fail.

An alternative solution would be BEAM-6733 / apache#9652, but this change keeps the
current buffering logic in place. The output buffer may now always be
redistributed round-robin upon restoring from a checkpoint. Note that this is
fine because no assumption can be made about the distribution of output elements
of a DoFn operation.
…ted during checkpointing

As part of a checkpoint, the current bundle is finalized. When the bundle is
finalized, the watermark, currently held back, may also be progressed which can
cause the start of another bundle. When a new bundle is started, any
to-be-buffered items from the previous bundle for the pending checkpoint may be
emitted. This should not happen.

This only effects portable pipelines where we have to hold back the watermark
due to the asynchronous processing of elements.
If a bundle fails to finalize before creating a checkpoint, it may be swallowed
and just considered a checkpointing error. This breaks the execution flow and
exactly-once guarantees.
@mxm mxm force-pushed the release-2.16.0-lyft-checkpointing branch from 573db12 to 9ba8b29 Compare November 7, 2019 14:38
The default passed from the Kafka consumer wrapper for start_from_timestamp_millis is "None".
@mxm mxm merged commit ebc3c72 into release-2.16.0-lyft Nov 8, 2019
@mxm mxm deleted the release-2.16.0-lyft-checkpointing branch November 13, 2019 13:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants