Skip to content

Conversation

@zratkai
Copy link
Contributor

@zratkai zratkai commented Mar 5, 2025

HIVE-28801 Iceberg: Refactor HMS table parameter setting to be able to reuse

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zratkai ,

Could you give me a bit more details for the motivation of this PR? You mention code reusal in the description, but I saw that it's simply a refactor from Iceberg's perspective, and I also checked the Hive PR popping up here, but that doesn't add any code reusal either. So I guess, the motivation is rather code refactor.

In general, I think that having a util class that helps producing table properties relevant for HMS from some inputs makes sense. However, with the current PR I have the impressions that the work was just partially made. I saw some remaining functionality in HiveOperationsBase, and also in HiveViewOperations that also take care of HMS related table properties. I think with a refactor like this we should go all-in. Otherwise, it won't be very intuitive what responsibilities this new converter class has and what is left that are done outside the class.

So in general the idea would make sense, but I'm not comfortable with the implementation TBH.

@zratkai
Copy link
Contributor Author

zratkai commented Mar 11, 2025

@gaborkaszab here is the origin of the PR:
apache/hive#5628 (comment)

@zratkai
Copy link
Contributor Author

zratkai commented Mar 21, 2025

@pvary could you please review?

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the PR in general. I added some comments for better clarity on this code for future readers.

Copy link
Contributor

@wypoon wypoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small suggestions and test fixes.

@zratkai
Copy link
Contributor Author

zratkai commented Mar 31, 2025

@pvary could you please review? I addressed all your comments.

Copy link
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left my final comments.
LGTM +1

Thanks @zratkai for your persistence!

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Peter's suggestions, the PR looks fine for me. Thanks for this work, @zratkai !

@zratkai
Copy link
Contributor Author

zratkai commented Apr 3, 2025

@pvary thanks for the review! Fixed your comments.

Comment on lines 53 to 57
private static final Map<String, String> ICEBERG_TO_HMS_TRANSLATION =
ImmutableMap.of(
// gc.enabled in Iceberg and external.table.purge in Hive are meant to do the same things
// but with different names
GC_ENABLED, "external.table.purge");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested this before and I'll suggest it again:

I think that it would be valuable to preserve a small part of the javadoc for HiveTableOperations.translateToIcebergProp which is removed in this PR, and add it as a comment before this static field:
"Provides key translation where necessary between Iceberg and HMS props. This translation is needed because some properties control the same behaviour but are named differently in Iceberg and Hive."
Then the inner comment about gc.enabled and external.table.purge is not needed, as it would be clear.

At the moment, there is only one property that needs translation, and the inner comment explains why we currently do this translation. It is conceivable that in future, more properties may need translation. Having a comment explaining the reason for the Map is better than an inner comment explaining why gc.enabled needs translation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description was for the translateToIcebergProp() method, which is not used, so it is deleted. This is why this is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zratkai. You are missing the point. The point is that
"Provides key translation where necessary between Iceberg and HMS props. This translation is needed because some properties control the same behaviour but are named differently in Iceberg and Hive."
is an appropriate, relevant, and informative description of the Map here. What is the purpose of the Map? It is exactly this.

You may think, "I just want to move some chunks of code around; why am I being made to make all these other changes?"
To my mind, a refactor is an opportunity to improve the usability and the readability of code, not just a rearrangement of code. And I'm not making you do anything. I understand someone may not accept all suggestions offered. However, I offer you my time and attention, and make only suggestions that I believe are constructive, and take the time to explain my reasoning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR based on @wypoon's suggestion

@pvary pvary merged commit 369fe89 into apache:main Apr 7, 2025
42 checks passed
@pvary
Copy link
Contributor

pvary commented Apr 7, 2025

Merged to main.
Thanks @zratkai for the PR and the persistence to enhance it! Please next time be more respectful for reviewers. They often spend their own personal time on reviewing changes.
Thanks @wypoon, @gaborkaszab for the reviews!

@pvary pvary changed the title HIVE-28801 Iceberg: Refactor HMS table parameter setting to be able to reuse Hive: Refactor HMS table parameter setting to be able to reuse Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants