Skip to content

Conversation

@sokui
Copy link
Contributor

@sokui sokui commented Mar 11, 2022

What changes were proposed in this pull request?

  1. we can control to disable JVM network address cache to be enabled or not2.
  2. OM can use hostname (ozone-om-0) instead of fqdn (ozone-om-o.ozone-om-service) to resolve itself's hostname during the launching time.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5919

How was this patch tested?

Test it in k8s production with kerberos enabled. works well. In the launching time OM can resolve itself with the hostname.

@kerneltime
Copy link
Contributor

cc @sodonnel

@sokui sokui force-pushed the HDDS-5919-OM-fqdn branch from 1de7b7c to 9a1efdb Compare March 30, 2022 19:42
import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_JVM_NETWORK_ADDRESS_CACHE_ENABLED_DEFAULT;

/**
* FQDN related utils.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more context as to why this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do . Thanks

/**
* Tests for {@link FlexibleFQDNResolution} class.
*/
public class TestFlexibleFQDNResolution {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work only in k8s environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By turning on ozone.flexible.fqdn.resolution.enabled, it should work for both traditional servers and k8s based servers. Of course, the traditional servers do not require it. You can turn it off when deploying to the traditional servers

@adoroszlai adoroszlai self-requested a review May 4, 2022 16:52
@adoroszlai adoroszlai changed the title HDDS-5919: In kubernettes om HA has circular dependency on the servic… HDDS-5919: In kubernetes OM HA has circular dependency on the service availability May 4, 2022
@adoroszlai adoroszlai changed the title HDDS-5919: In kubernetes OM HA has circular dependency on the service availability HDDS-5919. In kubernetes OM HA has circular dependency on the service availability May 4, 2022
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sokui for working on this.

Comment on lines 451 to 460
public static final String OZONE_FLEXIBLE_FQDN_RESOLUTION_ENABLED =
"ozone.flexible.fqdn.resolution.enabled";
public static final boolean OZONE_FLEXIBLE_FQDN_RESOLUTION_ENABLED_DEFAULT =
false;

public static final String OZONE_JVM_NETWORK_ADDRESS_CACHE_ENABLED =
"ozone.jvm.network.address.cache.enabled";
public static final boolean OZONE_JVM_NETWORK_ADDRESS_CACHE_ENABLED_DEFAULT =
true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to group these config keys by adding a common prefix (after ozone.):

  • ozone.network.flexible.fqdn.resolution.enabled
  • ozone.network.jvm.address.cache.enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 72 to 75
if (addr.getAddress() == null) {
return false;
}
return NetUtils.isLocalAddress(addr.getAddress());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: store value locally and simplify condition.

Suggested change
if (addr.getAddress() == null) {
return false;
}
return NetUtils.isLocalAddress(addr.getAddress());
InetAddress addr = addr.getAddress();
return addr != null && NetUtils.isLocalAddress(addr.getAddress());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 236 to 238
if (!flexibleFqdnResolutionEnabled && addr.isUnresolved()
|| flexibleFqdnResolutionEnabled
&& !isAddressHostNameLocal(addr)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to create a helper method to avoid repeating this condition.

Suggested change
if (!flexibleFqdnResolutionEnabled && addr.isUnresolved()
|| flexibleFqdnResolutionEnabled
&& !isAddressHostNameLocal(addr)) {
if (isUnresolved(addr, conf)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 244 to 248
if (!isPeer && (!flexibleFqdnResolutionEnabled
&& !addr.isUnresolved()
&& ConfUtils.isAddressLocal(addr)
|| flexibleFqdnResolutionEnabled
&& isAddressHostNameLocal(addr))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for this condition, something like this:

Suggested change
if (!isPeer && (!flexibleFqdnResolutionEnabled
&& !addr.isUnresolved()
&& ConfUtils.isAddressLocal(addr)
|| flexibleFqdnResolutionEnabled
&& isAddressHostNameLocal(addr))) {
if (!isPeer && isAddressLocal(addr, conf)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

* the FQDN [pod_name].[service_name] is not resolvable at the service
* starting time.
*/
public final class FlexibleFQDNResolution {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this class to a bit more generic Ozone-specific network utility.

  1. Rename to OzoneNetUtils (or similar)
  2. Move to a non-ha-specific package, e.g. org/apache/hadoop/ozone/util or org/apache/hadoop/hdds/utils.
  3. Move isAddressLocal from ConfUtils to this class.
  4. Also add the suggested new isUnresolved and isAddressLocal methods in this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 155 to 156
FlexibleFQDNResolution.disableJvmNetworkAddressCacheIfRequired(
new OzoneConfiguration());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think S3 Gateway and Recon could benefit from the same setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this to both S3G and Recon

Comment on lines 55 to 56
FlexibleFQDNResolution.disableJvmNetworkAddressCacheIfRequired(
new OzoneConfiguration());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new OzoneConfiguration, can we place this call in commonInit? Or is that too late?

private void commonInit() {
conf = createOzoneConfiguration();
TracingUtil.initTracing("OzoneManager", conf);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. I think I tried that before. It is too late

Comment on lines 60 to 61
FlexibleFQDNResolution.disableJvmNetworkAddressCacheIfRequired(
new OzoneConfiguration());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about commonInit().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@adoroszlai
Copy link
Contributor

hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/ha/ConfUtils.java
 23: Unused import - org.apache.hadoop.net.NetUtils.
 26: Unused import - java.net.InetAddress.
 27: Unused import - java.net.InetSocketAddress.

@adoroszlai
Copy link
Contributor

Thanks @sokui for updating the patch. LGTM.

@adoroszlai adoroszlai requested a review from JacksonYao287 May 8, 2022 09:25
@adoroszlai
Copy link
Contributor

@GeorgeJahad @xBis7 can you please take a look at this Kubernetes-related fix?

@GeorgeJahad
Copy link
Contributor

I'll have some time later today @adoroszlai


// Get host name.
String hostname = scmAddress.getAddress().getHostName();
String hostname = scmAddress.getHostName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot exactly remember. I think I got some cert issue when deploying to k8s. But when I just tested this, it seems both of these correctly return the hostname. I cannot remember now why I have this change. Are you OK to keep my change, or you think I should revert it? Please let me know. Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with keeping it, I think it does the same thing. I was just wondering.

* @param addr a FQDN address
* @return The address of host name
*/
public static InetSocketAddress getAddressWithHostName(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: for me, the OM changes would be easier to understand if this method were called "getAddressWithHostNameLocal()", (which would parallel the "isAddressHostNameLocal()" method above.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@GeorgeJahad
Copy link
Contributor

lgtm

@adoroszlai
Copy link
Contributor

@kerneltime would you like to take another look?
@JacksonYao287 would you like to take a look?

If not, I would like to merge this.

@JacksonYao287
Copy link
Contributor

LGTM +1

@adoroszlai adoroszlai merged commit f131de5 into apache:master May 14, 2022
@adoroszlai
Copy link
Contributor

Thanks @sokui for the patch, @GeorgeJahad, @JacksonYao287, @kerneltime for the review.

errose28 added a commit to errose28/ozone that referenced this pull request May 20, 2022
* master: (96 commits)
  HDDS-6738. Migrate tests with rules in hdds-server-framework to JUnit5 (apache#3415)
  HDDS-6650. S3MultipartUpload support update bucket usedNamespace. (apache#3404)
  HDDS-6491. Support FSO keys in getExpiredOpenKeys (apache#3226)
  HDDS-6596. EC: Support ListBlock from CoordinatorDN (apache#3410)
  HDDS-6737. Migrate parameterized tests in hdds-server-framework to JUnit5 (apache#3414)
  HDDS-6660: EC: Add the DN side Reconstruction Handler class. (apache#3399)
  HDDS-6750. Migrate simple tests in hdds-server-scm to JUnit5 (apache#3417)
  HDDS-6749. SCM includes itself as peer in addSCM request (apache#3413)
  HDDS-6657. Improve Ozone integrated Ranger configuration instructions (apache#3365)
  HDDS-6742. Audit operation category mismatch (apache#3407)
  HDDS-6748. Intermittent timeout in TestECBlockReconstructedInputStream#testReadDataWithUnbuffer (apache#3416)
  HDDS-6731. Migrate simple tests in hdds-server-framework to JUnit5 (apache#3412)
  HDDS-5919. In kubernetes OM HA has circular dependency on service availability (apache#3185)
  HDDS-6730. Migrate tests in hdds-tools to JUnit5 (apache#3402)
  HDDS-6630. Explicitly remove node after being chosen (apache#3332)
  HDDS-6560. Add general Grafana dashboard (apache#3285)
  HDDS-6704. EC: ReplicationManager - create version of ContainerReplicaCounts applicable to EC (apache#3405)
  HDDS-6680. Pre-Finalize behaviour for Bucket Layout Feature. (apache#3377)
  HDDS-6619. Add freon command to run r/w mix workload using ObjectStore APIs (apache#3383)
  HDDS-6734. ozone admin pipeline list CLI is not backward compatible (apache#3406)
  ...

Conflicts:
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStore.java
hadoop-hdds/interface-server/src/main/proto/SCMRatisProtocol.proto
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMDBDefinition.java
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStoreImpl.java
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants