Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 22 additions & 6 deletions ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@
import org.apache.hadoop.hive.metastore.api.SourceTable;
import org.apache.hadoop.hive.metastore.api.UpdateTransactionalStatsRequest;
import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
import org.apache.hadoop.hive.ql.ddl.table.AlterTableType;
import org.apache.hadoop.hive.ql.io.HdfsUtils;
import org.apache.hadoop.hive.metastore.HiveMetaException;
Expand All @@ -123,7 +124,6 @@
import org.apache.hadoop.hive.metastore.HiveMetaStoreUtils;
import org.apache.hadoop.hive.metastore.IMetaStoreClient;
import org.apache.hadoop.hive.metastore.PartitionDropOptions;
import org.apache.hadoop.hive.metastore.RetryingMetaStoreClient;
import org.apache.hadoop.hive.metastore.SynchronizedMetaStoreClient;
import org.apache.hadoop.hive.metastore.TableType;
import org.apache.hadoop.hive.metastore.Warehouse;
Expand Down Expand Up @@ -236,6 +236,7 @@
import org.apache.hadoop.mapred.InputFormat;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.util.StringUtils;
import org.apache.hadoop.util.ReflectionUtils;
import org.apache.hive.common.util.HiveVersionInfo;
import org.apache.thrift.TException;
import org.apache.thrift.TApplicationException;
Expand Down Expand Up @@ -5700,11 +5701,26 @@ public HiveMetaHook getHook(
}
};

if (conf.getBoolVar(ConfVars.METASTORE_FASTPATH)) {
return new SessionHiveMetaStoreClient(conf, hookLoader, allowEmbedded);
} else {
return RetryingMetaStoreClient.getProxy(conf, hookLoader, metaCallTimeMap,
SessionHiveMetaStoreClient.class.getName(), allowEmbedded);
return createMetaStoreClientFactory(conf)
Copy link
Member

@deniskuzZ deniskuzZ Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we avoid new factory object creation on every new client request?

.createMetaStoreClient(conf, hookLoader, allowEmbedded, metaCallTimeMap);
}

private static HiveMetaStoreClientFactory createMetaStoreClientFactory(HiveConf conf) throws
MetaException {
String metaStoreClientFactoryClassName = MetastoreConf.getVar(conf,
MetastoreConf.ConfVars.METASTORE_CLIENT_FACTORY_CLASS);
Copy link
Member

@deniskuzZ deniskuzZ Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of providing a custom factory, you need to supply the custom client impl (SessionHiveMetaStoreClient, RetryingMetaStoreClient, etc). Please take a look at https://github.com/apache/hive/pull/4257/files#diff-6561f3987ba0c11e6a8998efcdc862d3d3340d4babbe003ae8da98b1e4020faf
cc @wecharyu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! The code will be clearer if we instantiate the client impl in a single factory, maybe just named HiveMetaStoreClientFactory .

Copy link
Contributor Author

@okumin okumin Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deniskuzZ @wecharyu Thanks for your opinions. Let me clarify your suggestions first. This is my understanding.

  • IHMSHandler: The essential interface. HMSHandler is the primary and only implementation for now
  • AbstractHMSHandlerProxy: The interface of dynamic proxy for IHMHandler. RetryingHMSHandler is the primary implementation and it becomes configurable thanks to HIVE-27284: Make HMSHandler proxy pluggable #4257
  • HMSHandlerProxyFactory: It has one static method, getProxy(Configuration, IMSHandler, boolean), to wrap the given IHMSHandler with the configured AbstractHMSHandlerProxy

I would also say the relation between IHMSHandler and RetryingHMSHandler is similar to between IMetaStoreClient and RetryingMetaStoreClient. But the purpose is a bit different. We'd like to make pluggable not RetryingMetaStoreClient but IMetaStoreClient here. Also, it is not evident that a custom IMetaStoreClient needs a dynamic proxy(e.g. I think at least the one for AWS Data Catelog may not need RetryingMetaStoreClient or another proxy because AWS SDK can provide more purpose-built helpers).
So, if we want to satisfy our requirements, we have to make both IMetaStoreClient and the proxy(e.g. NopProxy for AWS Glue Catalog) configurable. I think it is acceptable but I wonder if we should configure two parameters, maybe metastore.client.class and metastore.client.proxy.class, in order to generate one IMetaStoreClient.
As to another aspect, the current patch and hive.metastore.client.factory.class are already and unfortunately accepted by several services... Looks like it is used in AWS Glue, Amazon EMR, and Databricks. My company is also using it. We might not need to take care of those ones too much, but they and their users might be a little confused.

I don't have strong opinions, and to be honest, I could be misunderstanding your suggestion. Please feel free to give me your thoughts. I am willing to follow that advice!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need metastore.client.class parameter, and make a static method like newClient(Configuration conf) in HiveMetaStoreClientFactory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think there are 3 existing patterns.

  • hive.metastore.fastpath=true and metastore.client.class is default
    • We use SessionHiveMetaStoreClient without RetryingMetaStoreClient
  • hive.metastore.fastpath=false and metastore.client.class is default
    • We use SessionHiveMetaStoreClient with RetryingMetaStoreClient
  • hive.metastore.fastpath=false and metastore.client.class is a custom one
    • We use the custom client without RetryingMetaStoreClient. RetryingMetaStoreClient is tightly coupled with Thrift-based SessionHiveMetaStoreClient

I have not come up with a very smart way to resolve the combination... It could be likely to be like this but I'd say there is a better way.

String className = MetastoreConf.getVar(conf, MetastoreConf.ConfVars.METASTORE_CLIENT_FACTORY_CLASS);
if (conf.getBoolVar(ConfVars.METASTORE_FASTPATH) || className != "SessionHiveMetaStoreClient") {
  return HiveMetaStoreClientFactory.create(...);
} else {
  return RetryingMetaStoreClient.getProxy(conf, hookLoader, metaCallTimeMap,
      SessionHiveMetaStoreClient.class.getName(), allowEmbedded);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify the match pattern as follows?

String className = MetastoreConf.getVar(conf, MetastoreConf.ConfVars.METASTORE_CLIENT_CLASS);
if (conf.getBoolVar(ConfVars.METASTORE_FASTPATH)) {
  return HiveMetaStoreClientFactory.create(...);
} else {
  return RetryingMetaStoreClient.getProxy(conf, hookLoader, metaCallTimeMap, className, allowEmbedded);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two thoughts.

  • I think we have to explicitly disable hive.metastore.fastpath when configuring metastore.client.class since RetryingMetaStoreClient is tightly coupled with SessionHiveMetaStoreClient
  • I guess people in the world already depend on hive.metastore.client.factory.class since we have failed to merge the patch for 7 years... This is the biggest headache


try {
Class<? extends HiveMetaStoreClientFactory> factoryClass =
conf.getClassByName(metaStoreClientFactoryClassName)
.asSubclass(HiveMetaStoreClientFactory.class);
return ReflectionUtils.newInstance(factoryClass, conf);
} catch (Exception e) {
String errorMessage = String.format(
"Unable to instantiate a metastore client factory %s due to: %s",
metaStoreClientFactoryClassName, e);
LOG.error(errorMessage, e);
throw new MetaException(errorMessage);
}
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hadoop.hive.ql.metadata;

import java.util.concurrent.ConcurrentHashMap;
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.metastore.HiveMetaHookLoader;
import org.apache.hadoop.hive.metastore.IMetaStoreClient;
import org.apache.hadoop.hive.metastore.api.MetaException;

/**
* Abstract factory that defines an interface for other factories that produce concrete
* MetaStoreClient objects.
*
*/
public interface HiveMetaStoreClientFactory {

/**
* A method for producing IMetaStoreClient objects.
*
* The implementation returned by this method must throw a MetaException if allowEmbedded = true
* and it does not support embedded mode.
*
* @param conf
* Hive Configuration.
* @param hookLoader
* Hook for handling events related to tables.
* @param allowEmbedded
* Flag indicating the implementation must run in-process, e.g. for unit testing or
* "fast path".
* @param metaCallTimeMap
* A container for storing entry and exit timestamps of IMetaStoreClient method
* invocations.
* @return IMetaStoreClient An implementation of IMetaStoreClient.
* @throws MetaException if this method fails to create IMetaStoreClient
*/
IMetaStoreClient createMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader,
boolean allowEmbedded, ConcurrentHashMap<String, Long> metaCallTimeMap) throws MetaException;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hadoop.hive.ql.metadata;

import static com.google.common.base.Preconditions.checkNotNull;
import java.util.concurrent.ConcurrentHashMap;
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
import org.apache.hadoop.hive.metastore.HiveMetaHookLoader;
import org.apache.hadoop.hive.metastore.IMetaStoreClient;
import org.apache.hadoop.hive.metastore.RetryingMetaStoreClient;
import org.apache.hadoop.hive.metastore.api.MetaException;

/**
* Default MetaStoreClientFactory for Hive which produces SessionHiveMetaStoreClient objects.
*
*/
public final class SessionHiveMetaStoreClientFactory implements HiveMetaStoreClientFactory {

@Override
public IMetaStoreClient createMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader,
boolean allowEmbedded,
ConcurrentHashMap<String, Long> metaCallTimeMap) throws MetaException {

checkNotNull(conf, "conf cannot be null!");
checkNotNull(hookLoader, "hookLoader cannot be null!");
checkNotNull(metaCallTimeMap, "metaCallTimeMap cannot be null!");

if (conf.getBoolVar(ConfVars.METASTORE_FASTPATH)) {
return new SessionHiveMetaStoreClient(conf, hookLoader, allowEmbedded);
} else {
return RetryingMetaStoreClient.getProxy(conf, hookLoader, metaCallTimeMap,
SessionHiveMetaStoreClient.class.getName(), allowEmbedded);
}
}
}
39 changes: 39 additions & 0 deletions ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
package org.apache.hadoop.hive.ql.metadata;

import static org.apache.hadoop.hive.metastore.Warehouse.DEFAULT_DATABASE_NAME;
import static org.junit.Assert.assertThat;
import static org.hamcrest.CoreMatchers.instanceOf;

import java.io.OutputStream;
import java.util.ArrayList;
Expand All @@ -38,6 +40,7 @@
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
import org.apache.hadoop.hive.metastore.IMetaStoreClient;
import org.apache.hadoop.hive.metastore.MetaStoreEventListener;
import org.apache.hadoop.hive.metastore.PartitionDropOptions;
import org.apache.hadoop.hive.metastore.Warehouse;
Expand Down Expand Up @@ -953,6 +956,42 @@ private String getFileCheckSum(FileSystem fileSystem, Path p) throws Exception {
return "";
}

@Test
public void testLoadingHiveMetaStoreClientFactory() throws Throwable {
String factoryClassName = SessionHiveMetaStoreClientFactory.class.getName();
HiveConf conf = new HiveConf();
MetastoreConf.setVar(conf, MetastoreConf.ConfVars.METASTORE_CLIENT_FACTORY_CLASS, factoryClassName);
// Make sure we instantiate the embedded version
// so the implementation chosen is SessionHiveMetaStoreClient, not a retryable version of it.
conf.setBoolVar(ConfVars.METASTORE_FASTPATH, true);
// The current object was constructed in setUp() before we got here
// so clean that up so we can inject our own dummy implementation of IMetaStoreClient
Hive.closeCurrent();
Hive hive = Hive.get(conf);
IMetaStoreClient tmp = hive.getMSC();
assertNotNull("getMSC() failed.", tmp);
assertThat("Invalid default client implementation created.", tmp,
instanceOf(SessionHiveMetaStoreClient.class));
}

@Test
public void testLoadingInvalidHiveMetaStoreClientFactory() throws Throwable {
// Intentionally invalid class
String factoryClassName = String.class.getName();
HiveConf conf = new HiveConf();
MetastoreConf.setVar(conf, MetastoreConf.ConfVars.METASTORE_CLIENT_FACTORY_CLASS, factoryClassName);
// The current object was constructed in setUp() before we got here
// so clean that up so we can inject our own dummy implementation of IMetaStoreClient
Hive.closeCurrent();
Hive hive = Hive.get(conf);
try {
hive.getMSC();
fail("getMSC() was expected to throw MetaException.");
} catch (Exception e) {
assertTrue("getMSC() failed, which IS expected.", true);
}
}

// shamelessly copied from Path in hadoop-2
private static final String SEPARATOR = "/";
private static final char SEPARATOR_CHAR = '/';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ public String toString() {
ConfVars.KERBEROS_PRINCIPAL,
ConfVars.USE_THRIFT_SASL,
ConfVars.METASTORE_CLIENT_AUTH_MODE,
ConfVars.METASTORE_CLIENT_FACTORY_CLASS,
ConfVars.METASTORE_CLIENT_PLAIN_USERNAME,
ConfVars.CACHE_PINOBJTYPES,
ConfVars.CONNECTION_POOLING_TYPE,
Expand Down Expand Up @@ -1609,6 +1610,10 @@ public enum ConfVars {
" and password. Any other value is ignored right now but may be used later."
+ "If JWT- Supported only in HTTP transport mode. If set, HMS Client will pick the value of JWT from "
+ "environment variable HMS_JWT and set it in Authorization header in http request"),
METASTORE_CLIENT_FACTORY_CLASS("metastore.client.factory.class",
"hive.metastore.client.factory.class",
"org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory",
"The name of the factory class that produces objects implementing the IMetaStoreClient interface."),
METASTORE_CLIENT_ADDITIONAL_HEADERS("metastore.client.http.additional.headers",
"hive.metastore.client.http.additional.headers", "",
"Comma separated list of headers which are passed to the metastore service in the http headers"),
Expand Down