Skip to content

Conversation

@debiswal
Copy link
Contributor

What changes were proposed in this pull request?

Adding HTTP & HTTPS Port for SCM UI

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7817

Patch Tested

Manually

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Preconditions;

import static org.apache.hadoop.hdds.protocol.DatanodeDetails.Port.Name.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java
 71: Using the '.*' form of import should be avoided - org.apache.hadoop.hdds.protocol.DatanodeDetails.Port.Name.*.

https://github.com/debiswal/ozone/actions/runs/4184593284/jobs/7250464577#step:6:435

Copy link
Contributor Author

@debiswal debiswal Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was unintentional though, IDE has imported anyway have Removed it .

@adoroszlai
Copy link
Contributor

@fapifta Upgrade acceptance test is failing due to:

dn1_1    | 2023-02-16 09:02:55,670 [main] ERROR ozone.HddsDatanodeService: Exception in HddsDatanodeService.
dn1_1    | java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hdds.protocol.DatanodeDetails.Port.Name.HTTP
dn1_1    | 	at java.base/java.lang.Enum.valueOf(Enum.java:240)
dn1_1    | 	at org.apache.hadoop.hdds.protocol.DatanodeDetails$Port$Name.valueOf(DatanodeDetails.java:785)
dn1_1    | 	at org.apache.hadoop.ozone.container.common.helpers.DatanodeIdYaml.readDatanodeIdFile(DatanodeIdYaml.java:99)
dn1_1    | 	at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.readDatanodeDetailsFrom(ContainerUtils.java:171)
dn1_1    | 	at org.apache.hadoop.ozone.HddsDatanodeService.initializeDatanodeDetails(HddsDatanodeService.java:467)
dn1_1    | 	at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:221)
dn1_1    | 	at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:207)
dn1_1    | 	at org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:175)
dn1_1    | 	at org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:92)

So this is a backwards incompatible change.

@fapifta
Copy link
Contributor

fapifta commented Feb 21, 2023

@adoroszlai
it seems that this problem occurs with the DNs during the test in the phase when the cluster is downgraded after an upgrade.
The DataNodeIDYaml class reads the DataNodeDetails from the DataNode's yaml file, and it runs into a port that is not recognized by the downgraded version.

This though should mean that the change in HDDS-5480 (#2452) is also backward incompatible, but it seems that the port information so far is not saved in the yaml file, as even if the ports are set in the DataNodeDetails, the DataNodeDetails is not persisted to the yaml file after the ports are set.
Three things seems to persist the DataNodeDetails:

  1. InitDatanodeState
  2. SetNodeOperationalStateCommandHandler
  3. and the certificate client when a new certificate is persisted to the DN

I am unsure about the initialization phases of DN, but it seems that in a cluster 1. runs before the ports are set, 2. does run only when a node is decommissioned or offlined, and after that when it is recommissioned, while 3. runs also before the ports are set.

In this change though the HTTP and HTTPS ports are set before the InitDatanode state runs, and saved, and that is why we run into the problem. However this way there are a set of events that cause backward incompatibility with all the ports that were added after V0_PORTS (upgrade -> decomm/offline a node -> recomm the node -> downgrade).

I think we might need to solve this similarly as it was solved for the client, but now we need to introduce the handling of unknown ports in the DatanodeIdYaml load logic instead of the client. As this was not discovered earlier the incompatibility remains for the REPLICATION, RATIS_ADMIN, RATIS_SERVER ports in the mentioned scenario, while we can handle it for the RATIS_DATASTREAM port as it was introduced after 1.3.0.

@adoroszlai if you agree with my analisys, then I will set up the corresponding JIRAs to fix and handle these, but would not like to run ahead, and do it without consensus on how to solve the problem, and until that @debiswal we should keep this one open, as we need to find out how we want to handle the compatibility aspect of adding these new ports.

@adoroszlai
Copy link
Contributor

As this was not discovered earlier the incompatibility remains for the REPLICATION, RATIS_ADMIN, RATIS_SERVER ports in the mentioned scenario,

Those were introduced in 1.1.0. I don't think we support downgrade to any version before that.

CC @errose28

@errose28
Copy link
Contributor

errose28 commented Mar 6, 2023

Right @adoroszlai. When this change was made on the datanode, it broke downgrade from 1.1.0 to 1.0.0. Since the upgrade framework was still WIP at that time and downgrade not officially supported, we decided not to support downgrade from 1.1.0 to 1.0.0 and allow the change. Now that we support downgrades this will need to be done in a backwards compatible way.

@debiswal
Copy link
Contributor Author

debiswal commented Mar 7, 2023

closing this PR due to incompatibility in downgrade/upgrade system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants