Skip to content

Flink: Data statistics operator sends local data statistics to coordinator and receive aggregated data statistics from coordinator for smart shuffling#7269

Merged
stevenzwu merged 2 commits intoapache:masterfrom
yegangy0718:20230402-publish-data-statistics-from-operator-to-coordinator
Apr 6, 2023
Merged

Flink: Data statistics operator sends local data statistics to coordinator and receive aggregated data statistics from coordinator for smart shuffling#7269
stevenzwu merged 2 commits intoapache:masterfrom
yegangy0718:20230402-publish-data-statistics-from-operator-to-coordinator

Conversation

@yegangy0718
Copy link
Copy Markdown
Contributor

This PR is created as part of issue #6303 and project https://github.com/apache/iceberg/projects/27

In this PR, we implement the logic in DataStatisticsOperator to send local data statistics to the coordinator and receive aggregated data statistics from the coordinator.

…coordinator and receive aggregated data statistics from coordinator for smart shuffling
@github-actions github-actions Bot added the flink label Apr 3, 2023
@yegangy0718 yegangy0718 changed the title Flink: Data statistics operator sends local data statistics event to coordinator and receive aggregated data statistics from coordinator for smart shuffling Flink: Data statistics operator sends local data statistics to coordinator and receive aggregated data statistics from coordinator for smart shuffling Apr 3, 2023
@stevenzwu stevenzwu self-requested a review April 3, 2023 15:54
@@ -126,8 +133,9 @@ public void snapshotState(StateSnapshotContext context) throws Exception {
globalStatisticsState.add(globalStatistics);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized one thing that I missed from last PR. It can be addressed with a separate PR. We don't want to use Kryo Java serialization for the DataStatistics. We need a stable parser (E.g. SimpleVersionedSerializer). You can find some example from IcebergEnumeratorStateSerializer.

You can find some more context from #1698.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use a follow-up PR to address the serialization.

…er to convert dATAstatisticsEvent to string
@yegangy0718 yegangy0718 force-pushed the 20230402-publish-data-statistics-from-operator-to-coordinator branch from 3c068e5 to 48ac122 Compare April 4, 2023 23:08
@stevenzwu
Copy link
Copy Markdown
Contributor

@hililiwei do you have more comments for this PR?

Copy link
Copy Markdown
Contributor

@hililiwei hililiwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@stevenzwu stevenzwu merged commit a648533 into apache:master Apr 6, 2023
@stevenzwu
Copy link
Copy Markdown
Contributor

thanks @yegangy0718 for the contribution and @hililiwei for the review

ericlgoodman pushed a commit to ericlgoodman/iceberg that referenced this pull request Apr 12, 2023
…nator and receive aggregated data statistics from coordinator for smart shuffling (apache#7269)
manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023
…nator and receive aggregated data statistics from coordinator for smart shuffling (apache#7269)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

3 participants