Skip to content

Conversation

@mxm
Copy link
Contributor

@mxm mxm commented May 12, 2020

We had a couple of PRs in which we wanted to remove the buffering of bundle
output during checkpointing: #7940 #9652. Ultimately, we didn't merge any of
those because we weren't sure how the change would affect the checkpoint
performance.

As a better migration path, this introduces a pipeline option to change the
default, buffering bundle output during checkpointing, to finishing the bundle
and flushing all data before checkpointing.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@mxm
Copy link
Contributor Author

mxm commented May 12, 2020

Run Python Load Tests ParDo Flink Streaming

1 similar comment
@mxm
Copy link
Contributor Author

mxm commented May 12, 2020

Run Python Load Tests ParDo Flink Streaming

@mxm mxm force-pushed the pre-snapshot-barrier branch from 4e66ad3 to cfb6f32 Compare May 15, 2020 16:23
@mxm
Copy link
Contributor Author

mxm commented May 15, 2020

Had to rebase due to merge conflicts with the new website.

@mxm
Copy link
Contributor Author

mxm commented May 15, 2020

Run Python Load Tests ParDo Flink Streaming

@mxm mxm force-pushed the pre-snapshot-barrier branch from cfb6f32 to 638614a Compare May 15, 2020 17:54
@mxm
Copy link
Contributor Author

mxm commented May 15, 2020

Based on the load test results, I'm not changing the default behavior here. Instead, flushing the bundle output before a checkpoint can be turned on optionally.

@mxm mxm force-pushed the pre-snapshot-barrier branch from 638614a to 9bac874 Compare May 15, 2020 17:57
…ting

We had a couple of PRs in which we wanted to remove the buffering of bundle
output during checkpointing: apache#7940 apache#9652. Ultimately, we didn't merge any of
those because we weren't sure how the change would affect the checkpoint
performance.

As a better migration path, this introduces a pipeline option to change the
default, buffering bundle output during checkpointing, to finishing the bundle
and flushing all data before checkpointing.
@mxm mxm force-pushed the pre-snapshot-barrier branch from 9bac874 to 401f213 Compare May 15, 2020 18:02
@mxm
Copy link
Contributor Author

mxm commented May 18, 2020

Run Python2_PVR_Flink PreCommit

@mxm
Copy link
Contributor Author

mxm commented May 18, 2020

After more test runs, it looks like the effect of finishing the bundle before the checkpoint is marginal / non-existent. However, we will have to run more experiments to change it to be the default. The option will be useful for the testing.

@mxm mxm merged commit 7c80ecb into apache:master May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant