Skip to content

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Sep 26, 2024

Introduce APIs to write the partition stats into files in table default format using Iceberg internal writers and readers.

PartitionStatisticsFile partitionStatisticsFile =
        PartitionStatsHandler.computeAndWriteStatsFile(testTable);

testTable.updatePartitionStatistics().setPartitionStatistics(partitionStatisticsFile).commit();

Schema schema,
PartitionSpec spec,
int formatVersion,
Map<String, String> properties) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no option to pass the table properties before.
Needed to pass different file format for paramterized test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not adding the parameter to the old create method and call the new method from the old one?

Like:

  public static TestTable create(
      File temp,
      String name,
      Schema schema,
      PartitionSpec spec,
      SortOrder sortOrder,
      int formatVersion) {
    return create(temp, name, schema, spec, SortOrder.unsorted(), formatVersion, ImmutableMap.of());
  }

  public static TestTable create(
      File temp,
      String name,
      Schema schema,
      PartitionSpec spec,
      int formatVersion,
      Map<String, String> properties) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed a similar style when they added MetricsReporter,

Why not adding the parameter to the old create

It is public. So, need to modify all the callers.
But we can refactor a private method that can be helper to all these public method. I can do it in a follow up to keep minimal changes for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't really understand your answer here 😢

I'm not suggesting just simply adding the parameter.
What I'm suggesting is to create a new constructor with the extra parameter, but in the old constructor we should remove the implementation and call the new constructor with an empty property map. This way we will not have duplicated code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, at some point I was considering introducing a Builder class for these TestTables because there are som many different create() functions now. Once this PR is merged, I think I'll re-visit this plan since we have yet another version of these functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builder method is definitely better. I just extracted common code into a private function as of now to keep minimal changes in the PR.

@ajantha-bhat ajantha-bhat added this to the Iceberg 1.7.0 milestone Sep 27, 2024
@ajantha-bhat
Copy link
Member Author

@aokolnychyi: This PR is ready. But as we discussed previously, this PR wraps the PartitionStats into a Record as the writers cannot work with Iceberg internal objects yet.

I will explore adding the internal writers for Parquet and Orc. Similar to #11108.
If we fail to have it ready by 1.7.0, I think it makes sense to merge this PR and introduce the optimized writer in the next version by deprecating this writer.

@ajantha-bhat ajantha-bhat marked this pull request as ready for review September 27, 2024 02:00
@ajantha-bhat ajantha-bhat mentioned this pull request Oct 16, 2024
13 tasks
@ajantha-bhat
Copy link
Member Author

@RussellSpitzer: It would be good to have this in 1.7.0.
I am waiting from a month for a review.

@aokolnychyi
Copy link
Contributor

I think we should try to use "internal" writers. @rdblue added "internal" readers recently.

Any guidance on how to add a writer, @rdblue? We can start with Avro for now. We will also need such readers/writers for Parquet.

@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented Oct 24, 2024

@aokolnychyi, @rdblue:

I already tried POC for internal writers on another branch,
c209bc9

The problems:
a) I am using PartitionData instead of Record for partition value, but the PartitionData get() method wraps the byte array to the byte buffer, which is a problem for internal writers, they expect byte[]. So, I didn't felt like using a new class instead of PartitionData just for this.

b) Also, Using partitionData in StructLikeMap is not working fine. Some keys are missing in the map (looks like equals() logic), If I use Record, it is fine.

Maybe in the next version we can have optimized writer and reader (without converter using internal reader and writers).
For end user it doesn't make any difference as new readers can also read the old partition stats parquet file and old readers can read the new partition stats parquet file. So, can we merge this?

@RussellSpitzer
Copy link
Member

Moving out of 1.7.0 since we still have a bit of discussion here

@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented Nov 19, 2024

@RussellSpitzer: I have added the Assertion for Partition type as you suggested and replied to #11216 (comment), do you have anymore comments for this PR?

@aokolnychyi
Copy link
Contributor

I had a conversation with @rdblue today about internal writers. Ryan should have a bit of time to help/guide.
I will check the current implementation today too.

@jbonofre
Copy link
Member

@RussellSpitzer @aokolnychyi I'm reviewing the stale PRs, and this one is open for month. Do we have a way to move forward ? I can do a new review, but at the end of the day, it won't help for the merge (as only committers can merge PR).

@ajantha-bhat ajantha-bhat force-pushed the stats_writer branch 2 times, most recently from 7fafa91 to 113dc7d Compare February 20, 2025 03:46
@ajantha-bhat
Copy link
Member Author

Rebased and PR is ready.

Copy link
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one last comment, but otherwise looks good to me.
I would wait @gaborkaszab to approve too, as he seems genuinely interested. He is OOO for this week, but next week we can merge

@ajantha-bhat
Copy link
Member Author

@pvary: Thanks a lot for the review and approval.
Yeah, we can wait till next week for @gaborkaszab. By that time @aokolnychyi may also review this if he gets time.

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for waiting for me! I went through the PR one more time.

schema, spec, sortOrder, temp.toString(), ImmutableMap.of(), formatVersion));

return new TestTable(ops, name);
return createTable(temp, name, schema, spec, formatVersion, ImmutableMap.of(), sortOrder, null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved the implementation of these create functions into a common place that is a nice refactor. However, I see one create() function above at L55 that still has this same implementation body that seems redundant. Can't you also replace that with a function call to the common implementation?

Copy link
Member Author

@ajantha-bhat ajantha-bhat Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function came from latest rebase. Someone added it as they wanted to pass properties.

Updated it.

return Parquet.writeData(outputFile)
.schema(dataSchema)
.createWriterFunc(InternalWriter::create)
.overwrite()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Shouldn't we also remove the .overwrite() call for the writer here? In case we assume the stat files have unique name we shouldn't allow overwriting anything that shares the same name.

.metadataFileLocation(
fileFormat.addExtension(
String.format(
Locale.ROOT, "partition-stats-%d-%s", snapshotId, UUID.randomUUID()))));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the way to follow, however, we should somehow raise awareness that the approach to update both the table stats and partition stats can produce orphan files within the table folder. I think we should keep pinging the ongoing conversation on dev@ about this because with this approach we could easily flood the folder with orphan files and when we drop the table these files will remain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is same as how other stats are supported (puffin files). I saw your reply on mailing list. Lets see what others think. We can normalize that behavior for all stats. No need to wait or block this PR for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we shouldn't block this PR with the orphaned stat files issue as it is an existing one already. I just wanted to raise awareness that we should continue that discussion on dev@ to to figure out a solution if there is any.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Let's move forward and discuss on dev@

return Avro.writeData(outputFile)
.schema(dataSchema)
.createWriterFunc(org.apache.iceberg.avro.InternalWriter::create)
.overwrite()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above: stat file names are expected to be unique, we shouldn't overwrite any existing files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed other writers pattern. Removed it now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to keep it aligned with other writers pattern.

@Override
public UpdatePartitionStatistics setPartitionStatistics(PartitionStatisticsFile file) {
Preconditions.checkArgument(null != file, "partition statistics file must not be null");
if (file == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we update the comment on the interface to be clear that a null param results a noop?

I checked the implementation again for SetStatistics.setStatistics() and apparently there is no special handling for null inputs. I became a bit hesitant now because there giving a null stat file as param would result in an NPE when calling the Optional.of(statisticsFile).
Not sure now if noop is the way to go here or the Precondition was better. Sorry to this back and forth. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked Trino & Spark Puffin implementation, caller is making sure it is not null.

IcebergMetadata#finishStatisticsCollection() in Trino
ComputeTableStatsSparkAction#execute() in spark

API level NPE is not a good idea. I am ok with no-op as it avoids extra null handling for the callers.
So, I don't think we need to revert back again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it was reasonable to throw an NPE and be consistent with UpdateStatistics. There is no problem with NPE as long as the error message is descriptive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went little back and forth on this.

For the end users, it avoids extra null check if the stats are null by going with no-op. So, went ahead with no-op

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ajantha-bhat !

@jbonofre
Copy link
Member

I did the review and it looks good to me. Thanks !

@ajantha-bhat
Copy link
Member Author

@RussellSpitzer, @aokolnychyi: Anymore comments? others have approved the changes.

@ajantha-bhat ajantha-bhat added this to the Iceberg 1.9.0 milestone Feb 28, 2025
@ajantha-bhat
Copy link
Member Author

Just rebased the PR as there was a conflict from #12419 in TestTables

@pvary pvary merged commit e230f5d into apache:main Feb 28, 2025
43 checks passed
@pvary
Copy link
Contributor

pvary commented Feb 28, 2025

Thanks for all your work @ajantha-bhat!
Also thanks for all of the reviewers!

@ajantha-bhat
Copy link
Member Author

Thanks everyone for the review and merging it.
This was a long awaited and super impactful feature. Hive, Trino, Dremio is waiting for it to be shipped.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this in, sorry it took me so long to get back. I left some comments to consider. Thanks, @ajantha-bhat!

@Override
public UpdatePartitionStatistics setPartitionStatistics(PartitionStatisticsFile file) {
Preconditions.checkArgument(null != file, "partition statistics file must not be null");
if (file == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it was reasonable to throw an NPE and be consistent with UpdateStatistics. There is no problem with NPE as long as the error message is descriptive.


private final int id;

Column(int id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this enum is actually required. Why not simply define NestedField constants? We need those to create a schema anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added it for two reasons,

a) I didn't want to use hardcoded strings or string constant for each field name, field id while preparing NestedField, hence using the name from enum.
b) Plus I thought enum is cleaner than multiple nested fields constants.

Do you see any drawback for using this enum in terms of readability or performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't predefine the field for

Types.NestedField partition = NestedField.required(1, Column.PARTITION.name(), unifiedPartitionType);

because the unifiedPartitionType is a variable.

For other fields, I can define the NestedField, but for this I can't. So, it won't become uniform. I still feel Enum is cleaner. But I can modify if you guys really want it to be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. We have similar non-uniform code in DataFile also for this kind.

static StructType getType(StructType partitionType) {

I will raise a PR to have it similar way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

private static PartitionStats recordToPartitionStats(StructLike record) {
PartitionStats stats =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajantha-bhat, could you confirm whether CloseableIterable<StructLike> returned by the data reader reuses the same object? If so, we don't have to create a new PartitionStats object for every record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CloseableIterable returned by the data reader reuses the same object? If so, we don't have to create a new PartitionStats object for every record

By default it doesn't. While adding the internal readers, I have added the support for reuseContainers too. So, we can enable it.

I am not sure about enabling it here as it is an end user API, if the user want to keep stats for more than one partition at once, like our testcase where we prepare a list from this iterable (instead of consuming one by one), we cannot reuse the container. Hence, I didn't enable it.

@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented Mar 20, 2025

@deniskuzZ : Hi, Is Hive intersted in the incremental compute of these partition stats instead of reading all the manifests at once? If so, I am planning to add a new API to incrementally compute stats (it checks the last snapshot that had partition stats and compute the difference manifest from that snapshot to current snapshot and compute new stats by reading the old stats + new stats from new manifests)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants