Flink: Data statistics operator sends local data statistics to coordinator and receive aggregated data statistics from coordinator for smart shuffling#7269
Merged
stevenzwu merged 2 commits intoapache:masterfrom Apr 6, 2023
Conversation
…coordinator and receive aggregated data statistics from coordinator for smart shuffling
hililiwei
reviewed
Apr 4, 2023
stevenzwu
reviewed
Apr 4, 2023
stevenzwu
reviewed
Apr 4, 2023
stevenzwu
reviewed
Apr 4, 2023
| @@ -126,8 +133,9 @@ public void snapshotState(StateSnapshotContext context) throws Exception { | |||
| globalStatisticsState.add(globalStatistics); | |||
Contributor
There was a problem hiding this comment.
Just realized one thing that I missed from last PR. It can be addressed with a separate PR. We don't want to use Kryo Java serialization for the DataStatistics. We need a stable parser (E.g. SimpleVersionedSerializer). You can find some example from IcebergEnumeratorStateSerializer.
You can find some more context from #1698.
Contributor
Author
There was a problem hiding this comment.
I will use a follow-up PR to address the serialization.
…er to convert dATAstatisticsEvent to string
3c068e5 to
48ac122
Compare
stevenzwu
approved these changes
Apr 5, 2023
Contributor
|
@hililiwei do you have more comments for this PR? |
Contributor
|
thanks @yegangy0718 for the contribution and @hililiwei for the review |
ericlgoodman
pushed a commit
to ericlgoodman/iceberg
that referenced
this pull request
Apr 12, 2023
…nator and receive aggregated data statistics from coordinator for smart shuffling (apache#7269)
This was referenced Apr 19, 2023
manisin
pushed a commit
to Snowflake-Labs/iceberg
that referenced
this pull request
May 9, 2023
…nator and receive aggregated data statistics from coordinator for smart shuffling (apache#7269)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is created as part of issue #6303 and project https://github.com/apache/iceberg/projects/27
In this PR, we implement the logic in DataStatisticsOperator to send local data statistics to the coordinator and receive aggregated data statistics from the coordinator.