-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-5919. In kubernetes OM HA has circular dependency on the service availability #3185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @sodonnel |
1de7b7c to
9a1efdb
Compare
| import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_JVM_NETWORK_ADDRESS_CACHE_ENABLED_DEFAULT; | ||
|
|
||
| /** | ||
| * FQDN related utils. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more context as to why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do . Thanks
| /** | ||
| * Tests for {@link FlexibleFQDNResolution} class. | ||
| */ | ||
| public class TestFlexibleFQDNResolution { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work only in k8s environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By turning on ozone.flexible.fqdn.resolution.enabled, it should work for both traditional servers and k8s based servers. Of course, the traditional servers do not require it. You can turn it off when deploying to the traditional servers
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sokui for working on this.
| public static final String OZONE_FLEXIBLE_FQDN_RESOLUTION_ENABLED = | ||
| "ozone.flexible.fqdn.resolution.enabled"; | ||
| public static final boolean OZONE_FLEXIBLE_FQDN_RESOLUTION_ENABLED_DEFAULT = | ||
| false; | ||
|
|
||
| public static final String OZONE_JVM_NETWORK_ADDRESS_CACHE_ENABLED = | ||
| "ozone.jvm.network.address.cache.enabled"; | ||
| public static final boolean OZONE_JVM_NETWORK_ADDRESS_CACHE_ENABLED_DEFAULT = | ||
| true; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to group these config keys by adding a common prefix (after ozone.):
ozone.network.flexible.fqdn.resolution.enabledozone.network.jvm.address.cache.enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| if (addr.getAddress() == null) { | ||
| return false; | ||
| } | ||
| return NetUtils.isLocalAddress(addr.getAddress()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: store value locally and simplify condition.
| if (addr.getAddress() == null) { | |
| return false; | |
| } | |
| return NetUtils.isLocalAddress(addr.getAddress()); | |
| InetAddress addr = addr.getAddress(); | |
| return addr != null && NetUtils.isLocalAddress(addr.getAddress()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| if (!flexibleFqdnResolutionEnabled && addr.isUnresolved() | ||
| || flexibleFqdnResolutionEnabled | ||
| && !isAddressHostNameLocal(addr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to create a helper method to avoid repeating this condition.
| if (!flexibleFqdnResolutionEnabled && addr.isUnresolved() | |
| || flexibleFqdnResolutionEnabled | |
| && !isAddressHostNameLocal(addr)) { | |
| if (isUnresolved(addr, conf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| if (!isPeer && (!flexibleFqdnResolutionEnabled | ||
| && !addr.isUnresolved() | ||
| && ConfUtils.isAddressLocal(addr) | ||
| || flexibleFqdnResolutionEnabled | ||
| && isAddressHostNameLocal(addr))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for this condition, something like this:
| if (!isPeer && (!flexibleFqdnResolutionEnabled | |
| && !addr.isUnresolved() | |
| && ConfUtils.isAddressLocal(addr) | |
| || flexibleFqdnResolutionEnabled | |
| && isAddressHostNameLocal(addr))) { | |
| if (!isPeer && isAddressLocal(addr, conf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| * the FQDN [pod_name].[service_name] is not resolvable at the service | ||
| * starting time. | ||
| */ | ||
| public final class FlexibleFQDNResolution { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this class to a bit more generic Ozone-specific network utility.
- Rename to
OzoneNetUtils(or similar) - Move to a non-
ha-specific package, e.g.org/apache/hadoop/ozone/utilororg/apache/hadoop/hdds/utils. - Move
isAddressLocalfromConfUtilsto this class. - Also add the suggested new
isUnresolvedandisAddressLocalmethods in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| FlexibleFQDNResolution.disableJvmNetworkAddressCacheIfRequired( | ||
| new OzoneConfiguration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think S3 Gateway and Recon could benefit from the same setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this to both S3G and Recon
| FlexibleFQDNResolution.disableJvmNetworkAddressCacheIfRequired( | ||
| new OzoneConfiguration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new OzoneConfiguration, can we place this call in commonInit? Or is that too late?
ozone/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManagerStarter.java
Lines 166 to 168 in a3f5021
| private void commonInit() { | |
| conf = createOzoneConfiguration(); | |
| TracingUtil.initTracing("OzoneManager", conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. I think I tried that before. It is too late
| FlexibleFQDNResolution.disableJvmNetworkAddressCacheIfRequired( | ||
| new OzoneConfiguration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about commonInit().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
|
|
Thanks @sokui for updating the patch. LGTM. |
|
@GeorgeJahad @xBis7 can you please take a look at this Kubernetes-related fix? |
|
I'll have some time later today @adoroszlai |
|
|
||
| // Get host name. | ||
| String hostname = scmAddress.getAddress().getHostName(); | ||
| String hostname = scmAddress.getHostName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot exactly remember. I think I got some cert issue when deploying to k8s. But when I just tested this, it seems both of these correctly return the hostname. I cannot remember now why I have this change. Are you OK to keep my change, or you think I should revert it? Please let me know. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with keeping it, I think it does the same thing. I was just wondering.
| * @param addr a FQDN address | ||
| * @return The address of host name | ||
| */ | ||
| public static InetSocketAddress getAddressWithHostName( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: for me, the OM changes would be easier to understand if this method were called "getAddressWithHostNameLocal()", (which would parallel the "isAddressHostNameLocal()" method above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
lgtm |
|
@kerneltime would you like to take another look? If not, I would like to merge this. |
|
LGTM +1 |
|
Thanks @sokui for the patch, @GeorgeJahad, @JacksonYao287, @kerneltime for the review. |
* master: (96 commits) HDDS-6738. Migrate tests with rules in hdds-server-framework to JUnit5 (apache#3415) HDDS-6650. S3MultipartUpload support update bucket usedNamespace. (apache#3404) HDDS-6491. Support FSO keys in getExpiredOpenKeys (apache#3226) HDDS-6596. EC: Support ListBlock from CoordinatorDN (apache#3410) HDDS-6737. Migrate parameterized tests in hdds-server-framework to JUnit5 (apache#3414) HDDS-6660: EC: Add the DN side Reconstruction Handler class. (apache#3399) HDDS-6750. Migrate simple tests in hdds-server-scm to JUnit5 (apache#3417) HDDS-6749. SCM includes itself as peer in addSCM request (apache#3413) HDDS-6657. Improve Ozone integrated Ranger configuration instructions (apache#3365) HDDS-6742. Audit operation category mismatch (apache#3407) HDDS-6748. Intermittent timeout in TestECBlockReconstructedInputStream#testReadDataWithUnbuffer (apache#3416) HDDS-6731. Migrate simple tests in hdds-server-framework to JUnit5 (apache#3412) HDDS-5919. In kubernetes OM HA has circular dependency on service availability (apache#3185) HDDS-6730. Migrate tests in hdds-tools to JUnit5 (apache#3402) HDDS-6630. Explicitly remove node after being chosen (apache#3332) HDDS-6560. Add general Grafana dashboard (apache#3285) HDDS-6704. EC: ReplicationManager - create version of ContainerReplicaCounts applicable to EC (apache#3405) HDDS-6680. Pre-Finalize behaviour for Bucket Layout Feature. (apache#3377) HDDS-6619. Add freon command to run r/w mix workload using ObjectStore APIs (apache#3383) HDDS-6734. ozone admin pipeline list CLI is not backward compatible (apache#3406) ... Conflicts: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStore.java hadoop-hdds/interface-server/src/main/proto/SCMRatisProtocol.proto hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMDBDefinition.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStoreImpl.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
What changes were proposed in this pull request?
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5919
How was this patch tested?
Test it in k8s production with kerberos enabled. works well. In the launching time OM can resolve itself with the hostname.