Skip to content

Conversation

@mxm
Copy link
Contributor

@mxm mxm commented Nov 5, 2019

The current buffer logic for items emitted during checkpointing is faulty in the
sense that the buffer is partitioned on the output keys of the operator. The key
may be changed or even be dropped. Thus, the original key partitioning will not
be maintained which will cause checkpointing to fail.

An alternative solution would be BEAM-6733 / #9652, but this change keeps the
current buffering logic in place. The output buffer may now always be
redistributed round-robin upon restoring from a checkpoint. Note that this is
fine because no assumption can be made about the distribution of output elements
of a DoFn operation.

Backport of #9980.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

The current buffer logic for items emitted during checkpointing is faulty in the
sense that the buffer is partitioned on the output keys of the operator. The key
may be changed or even be dropped. Thus, the original key partitioning will not
be maintained which will cause checkpointing to fail.

An alternative solution would be BEAM-6733 / apache#9652, but this change keeps the
current buffering logic in place. The output buffer may now always be
redistributed round-robin upon restoring from a checkpoint. Note that this is
fine because no assumption can be made about the distribution of output elements
of a DoFn operation.
@mxm mxm requested review from Ardagan and tweise November 5, 2019 11:57
@mxm
Copy link
Contributor Author

mxm commented Nov 5, 2019

Run Python2_PVR_Flink PreCommit

@mxm
Copy link
Contributor Author

mxm commented Nov 5, 2019

Run Java PreCommit

@mxm mxm merged commit ea01a98 into apache:release-2.17.0 Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants