Data: Add partition stats writer and reader #11216

ajantha-bhat · 2024-09-26T17:00:06Z

Introduce APIs to write the partition stats into files in table default format using Iceberg internal writers and readers.

PartitionStatisticsFile partitionStatisticsFile =
        PartitionStatsHandler.computeAndWriteStatsFile(testTable);

testTable.updatePartitionStatistics().setPartitionStatistics(partitionStatisticsFile).commit();

core/src/main/java/org/apache/iceberg/data/PartitionStatsRecord.java

ajantha-bhat · 2024-09-27T01:52:39Z

core/src/test/java/org/apache/iceberg/TestTables.java

+      Schema schema,
+      PartitionSpec spec,
+      int formatVersion,
+      Map<String, String> properties) {


There was no option to pass the table properties before.
Needed to pass different file format for paramterized test.

Why not adding the parameter to the old create method and call the new method from the old one?

Like:

public static TestTable create( File temp, String name, Schema schema, PartitionSpec spec, SortOrder sortOrder, int formatVersion) { return create(temp, name, schema, spec, SortOrder.unsorted(), formatVersion, ImmutableMap.of()); } public static TestTable create( File temp, String name, Schema schema, PartitionSpec spec, int formatVersion, Map<String, String> properties) {

Followed a similar style when they added MetricsReporter,

Why not adding the parameter to the old create

It is public. So, need to modify all the callers.
But we can refactor a private method that can be helper to all these public method. I can do it in a follow up to keep minimal changes for this PR.

I think I don't really understand your answer here 😢

I'm not suggesting just simply adding the parameter.
What I'm suggesting is to create a new constructor with the extra parameter, but in the old constructor we should remove the implementation and call the new constructor with an empty property map. This way we will not have duplicated code.

FYI, at some point I was considering introducing a Builder class for these TestTables because there are som many different create() functions now. Once this PR is merged, I think I'll re-visit this plan since we have yet another version of these functions.

Builder method is definitely better. I just extracted common code into a private function as of now to keep minimal changes in the PR.

ajantha-bhat · 2024-09-27T02:00:04Z

@aokolnychyi: This PR is ready. But as we discussed previously, this PR wraps the PartitionStats into a Record as the writers cannot work with Iceberg internal objects yet.

I will explore adding the internal writers for Parquet and Orc. Similar to #11108.
If we fail to have it ready by 1.7.0, I think it makes sense to merge this PR and introduce the optimized writer in the next version by deprecating this writer.

ajantha-bhat · 2024-10-23T16:18:50Z

@RussellSpitzer: It would be good to have this in 1.7.0.
I am waiting from a month for a review.

aokolnychyi · 2024-10-24T04:34:52Z

I think we should try to use "internal" writers. @rdblue added "internal" readers recently.

Any guidance on how to add a writer, @rdblue? We can start with Avro for now. We will also need such readers/writers for Parquet.

ajantha-bhat · 2024-10-24T08:39:46Z

@aokolnychyi, @rdblue:

I already tried POC for internal writers on another branch,
c209bc9

The problems:
a) I am using PartitionData instead of Record for partition value, but the PartitionData get() method wraps the byte array to the byte buffer, which is a problem for internal writers, they expect byte[]. So, I didn't felt like using a new class instead of PartitionData just for this.

b) Also, Using partitionData in StructLikeMap is not working fine. Some keys are missing in the map (looks like equals() logic), If I use Record, it is fine.

Maybe in the next version we can have optimized writer and reader (without converter using internal reader and writers).
For end user it doesn't make any difference as new readers can also read the old partition stats parquet file and old readers can read the new partition stats parquet file. So, can we merge this?

core/src/main/java/org/apache/iceberg/PartitionStats.java

core/src/test/java/org/apache/iceberg/TestTables.java

data/src/test/java/org/apache/iceberg/data/TestPartitionStatsHandler.java

RussellSpitzer · 2024-10-28T18:24:06Z

Moving out of 1.7.0 since we still have a bit of discussion here

ajantha-bhat · 2024-11-19T15:31:28Z

@RussellSpitzer: I have added the Assertion for Partition type as you suggested and replied to #11216 (comment), do you have anymore comments for this PR?

aokolnychyi · 2024-11-20T20:57:57Z

I had a conversation with @rdblue today about internal writers. Ryan should have a bit of time to help/guide.
I will check the current implementation today too.

core/src/main/java/org/apache/iceberg/PartitionStats.java

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java

jbonofre · 2024-11-28T06:55:13Z

@RussellSpitzer @aokolnychyi I'm reviewing the stale PRs, and this one is open for month. Do we have a way to move forward ? I can do a new review, but at the end of the day, it won't help for the merge (as only committers can merge PR).

ajantha-bhat · 2025-02-20T05:01:00Z

Rebased and PR is ready.

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java

pvary

Left one last comment, but otherwise looks good to me.
I would wait @gaborkaszab to approve too, as he seems genuinely interested. He is OOO for this week, but next week we can merge

ajantha-bhat · 2025-02-20T12:18:09Z

@pvary: Thanks a lot for the review and approval.
Yeah, we can wait till next week for @gaborkaszab. By that time @aokolnychyi may also review this if he gets time.

gaborkaszab

Thanks for waiting for me! I went through the PR one more time.

gaborkaszab · 2025-02-24T14:54:37Z

core/src/test/java/org/apache/iceberg/TestTables.java

-            schema, spec, sortOrder, temp.toString(), ImmutableMap.of(), formatVersion));
-
-    return new TestTable(ops, name);
+    return createTable(temp, name, schema, spec, formatVersion, ImmutableMap.of(), sortOrder, null);


You moved the implementation of these create functions into a common place that is a nice refactor. However, I see one create() function above at L55 that still has this same implementation body that seems redundant. Can't you also replace that with a function call to the common implementation?

This function came from latest rebase. Someone added it as they wanted to pass properties.

Updated it.

gaborkaszab · 2025-02-24T15:05:23Z

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java

+        return Parquet.writeData(outputFile)
+            .schema(dataSchema)
+            .createWriterFunc(InternalWriter::create)
+            .overwrite()


Thanks! Shouldn't we also remove the .overwrite() call for the writer here? In case we assume the stat files have unique name we shouldn't allow overwriting anything that shares the same name.

gaborkaszab · 2025-02-24T15:06:43Z

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java

+                .metadataFileLocation(
+                    fileFormat.addExtension(
+                        String.format(
+                            Locale.ROOT, "partition-stats-%d-%s", snapshotId, UUID.randomUUID()))));


I think this is the way to follow, however, we should somehow raise awareness that the approach to update both the table stats and partition stats can produce orphan files within the table folder. I think we should keep pinging the ongoing conversation on dev@ about this because with this approach we could easily flood the folder with orphan files and when we drop the table these files will remain.

Currently it is same as how other stats are supported (puffin files). I saw your reply on mailing list. Lets see what others think. We can normalize that behavior for all stats. No need to wait or block this PR for that.

I agree, we shouldn't block this PR with the orphaned stat files issue as it is an existing one already. I just wanted to raise awareness that we should continue that discussion on dev@ to to figure out a solution if there is any.

Fair enough. Let's move forward and discuss on dev@

gaborkaszab · 2025-02-24T15:07:38Z

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java

+        return Avro.writeData(outputFile)
+            .schema(dataSchema)
+            .createWriterFunc(org.apache.iceberg.avro.InternalWriter::create)
+            .overwrite()


See above: stat file names are expected to be unique, we shouldn't overwrite any existing files.

Followed other writers pattern. Removed it now.

I agree to keep it aligned with other writers pattern.

gaborkaszab · 2025-02-25T08:09:02Z

core/src/main/java/org/apache/iceberg/SetPartitionStatistics.java

  @Override
  public UpdatePartitionStatistics setPartitionStatistics(PartitionStatisticsFile file) {
-    Preconditions.checkArgument(null != file, "partition statistics file must not be null");
+    if (file == null) {


Shouldn't we update the comment on the interface to be clear that a null param results a noop?

I checked the implementation again for SetStatistics.setStatistics() and apparently there is no special handling for null inputs. I became a bit hesitant now because there giving a null stat file as param would result in an NPE when calling the Optional.of(statisticsFile).
Not sure now if noop is the way to go here or the Precondition was better. Sorry to this back and forth. What do you think?

I checked Trino & Spark Puffin implementation, caller is making sure it is not null.

IcebergMetadata#finishStatisticsCollection() in Trino ComputeTableStatsSparkAction#execute() in spark

API level NPE is not a good idea. I am ok with no-op as it avoids extra null handling for the callers.
So, I don't think we need to revert back again.

I actually think it was reasonable to throw an NPE and be consistent with UpdateStatistics. There is no problem with NPE as long as the error message is descriptive.

We went little back and forth on this.

For the end users, it avoids extra null check if the stats are null by going with no-op. So, went ahead with no-op

gaborkaszab

Thanks @ajantha-bhat !

jbonofre · 2025-02-25T16:09:39Z

I did the review and it looks good to me. Thanks !

ajantha-bhat · 2025-02-27T06:45:33Z

@RussellSpitzer, @aokolnychyi: Anymore comments? others have approved the changes.

ajantha-bhat · 2025-02-28T12:14:00Z

Just rebased the PR as there was a conflict from #12419 in TestTables

pvary · 2025-02-28T13:22:18Z

Thanks for all your work @ajantha-bhat!
Also thanks for all of the reviewers!

ajantha-bhat · 2025-02-28T14:46:23Z

Thanks everyone for the review and merging it.
This was a long awaited and super impactful feature. Hive, Trino, Dremio is waiting for it to be shipped.

aokolnychyi

Great to see this in, sorry it took me so long to get back. I left some comments to consider. Thanks, @ajantha-bhat!

aokolnychyi · 2025-03-13T00:34:56Z

core/src/main/java/org/apache/iceberg/SetPartitionStatistics.java

  @Override
  public UpdatePartitionStatistics setPartitionStatistics(PartitionStatisticsFile file) {
-    Preconditions.checkArgument(null != file, "partition statistics file must not be null");
+    if (file == null) {


I actually think it was reasonable to throw an NPE and be consistent with UpdateStatistics. There is no problem with NPE as long as the error message is descriptive.

aokolnychyi · 2025-03-13T00:39:20Z

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java

+
+    private final int id;
+
+    Column(int id) {


I wonder whether this enum is actually required. Why not simply define NestedField constants? We need those to create a schema anyway?

I have added it for two reasons,

a) I didn't want to use hardcoded strings or string constant for each field name, field id while preparing NestedField, hence using the name from enum.
b) Plus I thought enum is cleaner than multiple nested fields constants.

Do you see any drawback for using this enum in terms of readability or performance?

I think @aokolnychyi suggests to use constants for the fields instead of the field names.
Checked the codebase, and this is the typical pattern.
See:

https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MetadataColumns.java

https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestEntry.java

https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/DataFile.java

I agree that we should follow the same pattern.

@ajantha-bhat: Could you please create a PR for it?

I can't predefine the field for

Types.NestedField partition = NestedField.required(1, Column.PARTITION.name(), unifiedPartitionType);

because the unifiedPartitionType is a variable.

For other fields, I can define the NestedField, but for this I can't. So, it won't become uniform. I still feel Enum is cleaner. But I can modify if you guys really want it to be.

ok. We have similar non-uniform code in DataFile also for this kind.

iceberg/api/src/main/java/org/apache/iceberg/DataFile.java

Line 123 in fcea78f

static StructType getType(StructType partitionType) {

I will raise a PR to have it similar way.

aokolnychyi · 2025-03-13T00:43:14Z

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java

+  }
+
+  private static PartitionStats recordToPartitionStats(StructLike record) {
+    PartitionStats stats =


@ajantha-bhat, could you confirm whether CloseableIterable<StructLike> returned by the data reader reuses the same object? If so, we don't have to create a new PartitionStats object for every record.

CloseableIterable returned by the data reader reuses the same object? If so, we don't have to create a new PartitionStats object for every record

By default it doesn't. While adding the internal readers, I have added the support for reuseContainers too. So, we can enable it.

I am not sure about enabling it here as it is an end user API, if the user want to keep stats for more than one partition at once, like our testcase where we prepare a list from this iterable (instead of consuming one by one), we cannot reuse the container. Hence, I didn't enable it.

ajantha-bhat · 2025-03-20T10:25:51Z

@deniskuzZ : Hi, Is Hive intersted in the incremental compute of these partition stats instead of reading all the manifests at once? If so, I am planning to add a new API to incrementally compute stats (it checks the last snapshot that had partition stats and compute the difference manifest from that snapshot to current snapshot and compute new stats by reading the old stats + new stats from new manifests)

ajantha-bhat marked this pull request as draft September 26, 2024 17:00

github-actions bot added core data labels Sep 26, 2024

ajantha-bhat mentioned this pull request Sep 26, 2024

Data: Add a util to read write partition stats #10176

Closed

ajantha-bhat force-pushed the stats_writer branch 2 times, most recently from 941505a to 05a80f6 Compare September 27, 2024 01:43

ajantha-bhat commented Sep 27, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/data/PartitionStatsRecord.java Outdated Show resolved Hide resolved

ajantha-bhat commented Sep 27, 2024

View reviewed changes

ajantha-bhat added this to the Iceberg 1.7.0 milestone Sep 27, 2024

ajantha-bhat marked this pull request as ready for review September 27, 2024 02:00

ajantha-bhat requested a review from aokolnychyi September 27, 2024 02:00

ajantha-bhat mentioned this pull request Oct 16, 2024

Partition stats task tracker #8450

Closed

13 tasks

RussellSpitzer reviewed Oct 25, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/PartitionStats.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Oct 25, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/PartitionStats.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Oct 25, 2024

View reviewed changes

core/src/test/java/org/apache/iceberg/TestTables.java Show resolved Hide resolved

RussellSpitzer reviewed Oct 25, 2024

View reviewed changes

data/src/test/java/org/apache/iceberg/data/TestPartitionStatsHandler.java Outdated Show resolved Hide resolved

RussellSpitzer modified the milestones: Iceberg 1.7.0, Iceberg 2.0.0 Oct 28, 2024

ajantha-bhat force-pushed the stats_writer branch from 05a80f6 to ee3b273 Compare November 19, 2024 15:23

aokolnychyi reviewed Nov 21, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/PartitionStats.java Outdated Show resolved Hide resolved

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java Outdated Show resolved Hide resolved

ajantha-bhat modified the milestones: Iceberg 2.0.0, Iceberg 1.8.0 Nov 22, 2024

ajantha-bhat force-pushed the stats_writer branch 2 times, most recently from 7fafa91 to 113dc7d Compare February 20, 2025 03:46

pvary reviewed Feb 20, 2025

View reviewed changes

data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java Outdated Show resolved Hide resolved

pvary approved these changes Feb 20, 2025

View reviewed changes

gaborkaszab reviewed Feb 24, 2025

View reviewed changes

gaborkaszab reviewed Feb 25, 2025

View reviewed changes

ajantha-bhat force-pushed the stats_writer branch from 2551fc8 to 3003a5d Compare February 25, 2025 11:48

github-actions bot added the API label Feb 25, 2025

gaborkaszab approved these changes Feb 25, 2025

View reviewed changes

jbonofre approved these changes Feb 25, 2025

View reviewed changes

ajantha-bhat added this to the Iceberg 1.9.0 milestone Feb 28, 2025

ajantha-bhat added 6 commits February 28, 2025 17:13

Data: Add partition stats writer and reader

fea4984

Address comments

61916d2

Use FileFormat.fromFileName

26bb1da

handle leftover comments

7b25c0b

inline function

a976659

Clean up

4c5ff33

ajantha-bhat force-pushed the stats_writer branch from 3003a5d to 4c5ff33 Compare February 28, 2025 12:12

pvary merged commit e230f5d into apache:main Feb 28, 2025
43 checks passed

aokolnychyi reviewed Mar 13, 2025

View reviewed changes

ajantha-bhat mentioned this pull request Mar 17, 2025

Data: Refactor PartitionStatsHandler #12550

Merged

deniskuzZ mentioned this pull request Apr 7, 2025

Core: Support incremental compute for partition stats #12629

Merged

Data: Add partition stats writer and reader #11216

Data: Add partition stats writer and reader #11216

Uh oh!

Conversation

ajantha-bhat commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Sep 27, 2024

Uh oh!

ajantha-bhat commented Oct 23, 2024

Uh oh!

aokolnychyi commented Oct 24, 2024

Uh oh!

ajantha-bhat commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RussellSpitzer commented Oct 28, 2024

Uh oh!

ajantha-bhat commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aokolnychyi commented Nov 20, 2024

Uh oh!

Uh oh!

Uh oh!

jbonofre commented Nov 28, 2024

Uh oh!

ajantha-bhat commented Feb 20, 2025

Uh oh!

Uh oh!

pvary left a comment

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Feb 20, 2025

Uh oh!

gaborkaszab left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ajantha-bhat commented Sep 26, 2024 •

edited

Loading

ajantha-bhat commented Oct 24, 2024 •

edited

Loading

ajantha-bhat commented Nov 19, 2024 •

edited

Loading

ajantha-bhat Feb 24, 2025 •

edited

Loading