-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-12679: Allow users to be able to specify an implementation of IMetaStoreClient via HiveConf #5955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-12679: Allow users to be able to specify an implementation of IMetaStoreClient via HiveConf #5955
Conversation
| private static final Logger LOG = LoggerFactory.getLogger(HiveMetaStoreClientFactory.class); | ||
|
|
||
| public static IMetaStoreClient newClient(Configuration conf, boolean allowEmbedded) throws MetaException { | ||
| String mscClassName = MetastoreConf.getVar(conf, MetastoreConf.ConfVars.METASTORE_CLIENT_CLASS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use MetastoreConf.getClass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll modify it to use getClass once HIVE-20189 is merged.
| METASTORE_CLIENT_CLASS("metastore.client.class", | ||
| "hive.metastore.client.class", | ||
| "org.apache.hadoop.hive.metastore.client.ThriftHiveMetaStoreClient", | ||
| "The name of MetaStoreClient class that implements the IMetaStoreClient interface."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moomindani Please feel free to give us a suggestion if you have any thoughts.
Regarding the design decision about MetaStoreClientFactory:
This patch does not support a customizable MSC factory as the original patch did; instead, it supports a customizable MSC. In other words, this patch introduces metastore.client.class, not metastore.client.factory.class.
I made this change not because I preferred this approach, but because I followed the majority opinion in the discussion #4444 (review). Personally, I think both approaches are valid and have their merits, and I do not have a strong opinion among them. I would appreciate it if you could share your thoughts on this decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM, I have no concerns around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern here: if we specify the client via the configuration, it means each time we are allowed to query one type of metadata only. For example, we set the client to Glue for the tables stored in Glue, if we need the Hive tables later on in the same session, we should reset the client to Hive, and just think loud, if the SQL contains the Hive and Glue table?
From my point of view, I would suggest the pattern(the database or table) or the catalog routed way to choose which the client will use to obtain the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view, I would suggest the pattern(the database or table) or the catalog routed way to choose which the client will use to obtain the metadata.
I believe this is somewhat similar to the design proposed in HIVE-28879 (Federated Catalog).
From my perspective, I see HIVE-12679 as a milestone for both HIVE-28658 (Iceberg REST Catalog, #5628) and HIVE-28879. So, I'm fine with moving directly to HIVE-28879 or pursuing any other approach, as long as we aim to support third-party data catalogs. However, since this ticket predates my contributions and has many watchers, I want to ask for and respect others' opinions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngsg maybe METASTORE_CLIENT_IMPL, similar to RAW_STORE_IMPL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about a new property to configure those mappings? e.g, metastore.client.meta.class.mappings=catalog:client_classname,catalog.db(or db_pattern):client_classname,catalog.db.table:client_classname
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need multiple client impl for the same catalog (aka Snowflake, Glue, HMS, Rest, etc)
Instead, I’d suggest using a simple comma-separated list. Later, that would be persisted to the backend database during the Catalog registration (HIVE-26227)
METASTORE_CLIENT_IMPL("metastore.client.impl", catalog:client_classname)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should address this in HIVE-28879, since we also need to provide connection details per catalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngsg maybe
METASTORE_CLIENT_IMPL, similar toRAW_STORE_IMPL?
I changed METASTORE_CLINT_CLASS to METASTORE_CLIENT_IMPL and metastore.client.class to metastore.client.impl.
how about a new property to configure those mappings? e.g,
metastore.client.meta.class.mappings=catalog:client_classname,catalog.db(or db_pattern):client_classname,catalog.db.table:client_classname
Regarding the mapping, if there were no ThriftHiveMetaStoreClient-related logic in HiveMetaStoreClient, I think I could implement a mapping from a catalog to the actual client. However, due to that logic, I don't currently have a clear design to support it. Let me think more about a better design and share my thoughts once I’ve summarized them.
Another thought I had is that if we eventually implement HIVE-28879, we may not need this configuration key to specify either a single class or a mapping, since the relevant information would be stored in the MetaStore backend database. Given that we likely have enough time before the next release, I'm unsure about the feasibility of introducing a temporary configuration key that won't appear in any release. What do you think about keeping this patch as a fallback option in case HIVE-28879 isn't ready in time for the next release, and proceeding directly with HIVE-28879 for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought I had is that if we eventually implement HIVE-28879, we may not need this configuration key to specify either a single class or a mapping, since the relevant information would be stored in the MetaStore backend database.
Totally agree + the proposed mapping might require connection details as well. I think we should go ahead and merge this PR, and start thinking on a more generic solution like HIVE-28879
|
HIVE-20189 is merged, @ngsg could you please rebase |
605e7a6 to
fe11b25
Compare
...rc/main/java/org/apache/hadoop/hive/metastore/client/builder/HiveMetaStoreClientBuilder.java
Outdated
Show resolved
Hide resolved
...ore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
Outdated
Show resolved
Hide resolved
...ore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
Outdated
Show resolved
Hide resolved
...ore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
Show resolved
Hide resolved
|
deniskuzZ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
This seems good to go? if yes, @ngsg mind hitting the merge button? |
|
@ayushtkn, thanks for the reminder. I'll merge it shortly. |
|
Thanks all for the review! I have read the guide and tried to follow the documented process for merging. Since this is my first time merging, I would appreciate it if someone could check whether I've made any mistakes I might have overlooked. |



What changes were proposed in this pull request?
This patch aims to support configurable IMetaStoreClient.
This patch has been submitted three times previously. Austin originally proposed HIVE-12679 and submitted the initial patch in 2016. @moomindani later resubmitted it in #1402, and then @okumin took it over in #4444.
Why are the changes needed?
In some scenarios, users may want to run Hive as a general-purpose execution engine on top of a third-party data catalog. For instance, AWS Glue Data Catalog is an example of such usage, and there is also an open ticket to implement an Iceberg REST catalog (HIVE-28658). As noted in the previous PR (#4444), more than 40 people are watching HIVE-12679, which I believe reflects strong demand for using Hive in this more flexible architecture. Therefore, I believe it is worth completing this patch to support external data catalogs in Hive.
Does this PR introduce any user-facing change?
No, users can use the existing HMS client without any changes in their configuration files.
For the documentation fix, I will update hive-site once this patch is accepted and merged.
How was this patch tested?
This patch includes two simple tests loading MetaStoreClient via
HiveMetaStoreClientFactory.Note for those familiar with the earlier patch:
Regarding the design decision about MetaStoreClientFactory:
This patch does not support a customizable MSC factory as the original patch did; instead, it supports a customizable MSC. In other words, this patch introduces
metastore.client.class, notmetastore.client.factory.class.I made this change not because I preferred this approach, but because I followed the majority opinion in the discussion here. Personally, I think both approaches are valid and have their merits, and I do not have a strong opinion among them. I would appreciate it if you could share your thoughts on this decision.
Regarding RetryingMetaStoreClient and other classes that instantiate MetaStoreClient (link):
After resolving HIVE-27473, Hive now creates ThriftHiveMetaStoreClient only in
Hive.javaandHiveMetaStoreClient. Therefore, I believe it is sufficient to replace these two call sites with the newHiveMetaStoreClientFactory.