[BEAM-8549] Do not use keyed operator state for checkpoint buffering #9980

mxm · 2019-11-04T11:26:47Z

The current buffer logic for items emitted during checkpointing is faulty in the
sense that the buffer is partitioned on the output keys of the operator. The key
may be changed or even be dropped. Thus, the original key partitioning will not
be maintained which will cause checkpointing to fail.

An alternative solution would be BEAM-6733 / #9652, but this change keeps the
current buffering logic in place. The output buffer may now always be
redistributed round-robin upon restoring from a checkpoint. Note that this is
fine because no assumption can be made about the distribution of output elements
of a DoFn operation.

Post-Commit Tests Status (on master branch)

Lang	SDK	Apex	Dataflow	Gearpump	Samza	Spark
Go		---	---	---	---
Java
Python		---		---	---
XLang	---	---	---	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

The current buffer logic for items emitted during checkpointing is faulty in the sense that the buffer is partitioned on the output keys of the operator. The key may be changed or even be dropped. Thus, the original key partitioning will not be maintained which will cause checkpointing to fail. An alternative solution would be BEAM-6733 / apache#9652, but this change keeps the current buffering logic in place. The output buffer may now always be redistributed round-robin upon restoring from a checkpoint. Note that this is fine because no assumption can be made about the distribution of output elements of a DoFn operation.

mxm · 2019-11-04T12:36:28Z

Unrelated failure in org.apache.beam.sdk.io.FileIOTest.testMatchWatchForNewFiles.

tweise

This makes sense. There is no reason why the output should be partitioned in the same way. Would also explain why we only noticed this issue now, since our tests always use the same key (or don't produce output).

mxm force-pushed the BEAM-8549 branch from 8f40643 to c81b5f4 Compare November 4, 2019 11:48

mxm requested a review from tweise November 4, 2019 12:36

tweise approved these changes Nov 4, 2019

View reviewed changes

mxm merged commit 9f56639 into apache:master Nov 5, 2019

mxm mentioned this pull request Nov 5, 2019

[BEAM-8549] Do not use keyed operator state for checkpoint buffering #9993

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-8549] Do not use keyed operator state for checkpoint buffering #9980

[BEAM-8549] Do not use keyed operator state for checkpoint buffering #9980

Uh oh!

mxm commented Nov 4, 2019

Uh oh!

mxm commented Nov 4, 2019

Uh oh!

tweise left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BEAM-8549] Do not use keyed operator state for checkpoint buffering #9980

[BEAM-8549] Do not use keyed operator state for checkpoint buffering #9980

Uh oh!

Conversation

mxm commented Nov 4, 2019

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

Uh oh!

mxm commented Nov 4, 2019

Uh oh!

tweise left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants