Skip to content

Conversation

@slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Aug 1, 2024

What changes were proposed in this pull request?

When viewing roles of OM/SCM via CLI, a lengthy string is typically displayed, which lacks user-friendly formatting. We aim to improve its presentation. I have added a table option to allow displaying this data in a tabular format.

OM

om-current/bin/ozone admin om roles

om32 : FOLLOWER (bigdata-ozone-online32)
om30 : FOLLOWER (bigdata-ozone-online30)
om31 : LEADER (bigdata-ozone-online31)

om-current/bin/ozone admin om roles --table

+------------------------------------------------------+
|                 Ozone Manager Roles                  |
+---------------------------------+---------+----------+
|            Host Name            | Node ID |   Role   |
+---------------------------------+---------+----------+
| bigdata-ozone-online32          |  om32   | FOLLOWER |
| bigdata-ozone-online30          |  om30   | FOLLOWER |
| bigdata-ozone-online31          |  om31   |  LEADER  |
+---------------------------------+---------+----------+

SCM

scm-current/bin/ozone admin scm roles

bigdata-ozone-online31:9894:FOLLOWER:61b1c8e5-da40-4567-8a17-96a0234ba14e:100.3.197.98
bigdata-ozone-online32:9894:LEADER:e428ca07-b2a3-4756-bf9b-a4abb033c7d1:100.3.192.89
bigdata-ozone-online30:9894:FOLLOWER:41f90734-b3ee-4284-ad96-40a286654952:100.3.196.51

scm-current/bin/ozone admin scm roles --table

+---------------------------------------------------------------------------------------------------------------+
|                                        Storage Container Manager Roles                                        |
+---------------------------------+------------+----------+--------------------------------------+--------------+
|            Host Name            | Ratis Port | Role     |                 Node ID              | Host Address |
+---------------------------------+------------+----------+--------------------------------------+--------------+
| bigdata-ozone-online31          |    9894    | FOLLOWER | 61b1c8e5-da40-4567-8a17-96a0234ba14e | 100.3.197.98 |
| bigdata-ozone-online32          |    9894    |  LEADER  | e428ca07-b2a3-4756-bf9b-a4abb033c7d1 | 100.3.192.89 |
| bigdata-ozone-online30          |    9894    | FOLLOWER | 41f90734-b3ee-4284-ad96-40a286654952 | 100.3.196.51 |
+---------------------------------+------------+----------+--------------------------------------+--------------+

What is the link to the Apache JIRA

JIRA: HDDS-11268. [CLI] Improve CLI Display OM/SCM Roles.

How was this patch tested?

To validate this in a production environment.

@errose28
Copy link
Contributor

errose28 commented Aug 2, 2024

Thanks @slfan1989 the table looks nice. In the description the Role and Node ID headers are swapped. I wonder if it would be better to make this format the "verbose" output and trigger it with a --verbose flag, which is more standard across CLI tools. I don't have a strong opinion on this though.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Aug 4, 2024

Thanks @slfan1989 the table looks nice. In the description the Role and Node ID headers are swapped. I wonder if it would be better to make this format the "verbose" output and trigger it with a --verbose flag, which is more standard across CLI tools. I don't have a strong opinion on this though.

Thank you for your attention to this pr! I have tried to output verbose information using --verbose, but encountered some issues. ozone admin om does not yet support recognizing the verbose option.

Usage: ozone admin om [-hV] [--verbose] [-conf=<configurationPath>]
                      [-D=<String=String>]... [COMMAND]
Ozone Manager specific admin operations
      -conf=<configurationPath>

  -D, --set=<String=String>

  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.
      --verbose   More verbose output. Show the stack trace of the errors.

[hadoop@bigdata-ozone-online32 ~/om-current/bin]$ ./ozone admin om --verbose
Incomplete command

After completing this pr, I can submit a new pr to add support for the verbose option in ozone admin om/scm, including support for tabular-style output.

@adoroszlai
Copy link
Contributor

Thanks @slfan1989 for the patch.

Please check acceptance test failure:

List om roles as TABLE with OM service ID passed                      | FAIL |
No keyword with name 'Assert Leader Present in TABLE' found. Did you try using keyword 'Assert Leader Present' and forgot to use enough whitespace between keyword and arguments?
--
List om roles as TABLE without OM service ID passed                   | FAIL |
No keyword with name 'Assert Leader Present in TABLE' found. Did you try using keyword 'Assert Leader Present' and forgot to use enough whitespace between keyword and arguments?
--
Om-Roles :: Smoke test for listing om roles.                          | FAIL |
6 tests, 4 passed, 2 failed

https://github.com/apache/ozone/actions/runs/10236952464/job/28319907708?pr=7016#step:6:9

@slfan1989
Copy link
Contributor Author

@errose28 @adoroszlai Can you help me review this pr again? Thank you very much!

Should Contain ${output_without_id_passed} no Ozone Manager service ID specified

List om roles as TABLE with OM service ID passed
${output_without_id_passed} = Execute ozone admin om roles --table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 , this should be "output_with_id_passed". BTW, the first command "ozone admin om roles --table" doesn't have service ID set, why it's under this "with OM service ID passed" category?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for helping review the code. The issue was due to my oversight, and I have updated the unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi Can you help review this pr again? thank you very much!

Copy link
Contributor

@whbing whbing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slfan1989 for working on this. Overall looks good, left a few comments.

@whbing
Copy link
Contributor

whbing commented Aug 27, 2024

Seems some related ci failures, such as org.apache.hadoop.ipc.RemoteException(com.google.protobuf.UninitializedMessageException): Message missing required fields: scmHaEnabled.

@slfan1989 From my point, maybe we don't need to make so many changes. How about simplifying the codes similar to the —json ? cc @adoroszlai @ChenSammi

@ChenSammi
Copy link
Contributor

@slfan1989 , thanks for improving the patch. For this scmHaEnabled, can you explain a little bit about its purpose?


// Determine which header to use based on whether Ratis is enabled or not.
if (isRatisEnabled) {
formattingCLIUtils.addHeaders(RATIS_SCM_ROLES_HEADER);
Copy link
Contributor Author

@slfan1989 slfan1989 Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whbing @ChenSammi

@slfan1989 , thanks for improving the patch. For this scmHaEnabled, can you explain a little bit about its purpose?

Thank you very much for reviewing this PR! I’d like to explain why I added the scmRatisEnable:

When using --table to output SCM information, we need to first determine the length of the table headers.

For non-HA SCMs, we will display it in the following format:

+-----------------------------------------------+
|        Storage Container Manager Roles        | 
+---------------------------------+-------------+
|            Host Name            |     Port    | 
+---------------------------------+-------------+

For HA SCMs, we will display it in the following format:

+---------------------------------------------------------------------------------------------------------------+
|                                        Storage Container Manager Roles                                        |
+---------------------------------+------------+----------+--------------------------------------+--------------+
|            Host Name            | Ratis Port | Role     |                 Node ID              | Host Address |
+---------------------------------+------------+----------+--------------------------------------+--------------+

I need a clear method to determine if SCM has Ratis enabled.

I am considering two approaches:

Approach 1: Use the ratisRole length of the returned data for judgment.
Approach 2: Add a return attribute that explicitly indicates whether Ratis is being used.

From my perspective, Approach 2 is better, as the code looks clearer.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Aug 28, 2024

Seems some related ci failures, such as org.apache.hadoop.ipc.RemoteException(com.google.protobuf.UninitializedMessageException): Message missing required fields: scmHaEnabled.

@slfan1989 From my point, maybe we don't need to make so many changes. How about simplifying the codes similar to the —json ? cc @adoroszlai @ChenSammi

@whbing Thank you for your question! I believe using a clear method to indicate whether SCM is using Ratis would be better. While a JSON output approach is also feasible, determining whether Ratis is used based on the size of the length might not be very reliable.

@slfan1989 slfan1989 force-pushed the HDDS-11268 branch 2 times, most recently from 4845692 to e7b81f3 Compare September 1, 2024 02:51
@slfan1989 slfan1989 requested a review from whbing September 1, 2024 05:08
* while `false` indicates that it is in STANDALONE mode.
* @throws IOException an I/O exception of some sort has occurred.
*/
boolean isSCMRatisEnable() throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSCMRatisEnable -> isScmRatisEnable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing the code! I will improve this part of the code.

ScmInfo.Builder builder = new ScmInfo.Builder()
.setClusterId(resp.getClusterId())
.setScmId(resp.getScmId())
.setScmRatisEnabled(resp.getScmRatisEnabled())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here to be backward compatible, we should first check whether the response has "scmRatisEnabled" field. If it doesn't have, which means the SCM server is an old version server, then we need to set this ScmRatisEnabled field in other way, for example, the size of RolesList.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here to be backward compatible, we should first check whether the response has "scmRatisEnabled" field.

Thank you for the reminder! It's indeed very necessary. I will check if ScmRatisEnabled exists.

If it doesn't have, which means the SCM server is an old version server, then we need to set this ScmRatisEnabled field in other way, for example, the size of RolesList.

We use the size of RolesList to determine if Ratis is enabled. If RolesList.size > 1, we consider Ratis enabled. If RolesList.size <= 1, we consider Ratis not enabled ?

Copy link
Contributor

@ChenSammi ChenSammi Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one case that SCM ratis is ended, but only one SCM instance in the raft group. In this case, we can check whether the role string has LEADER or FOLLOWER key word, if neither key words exist, then ratis is not enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion! I will continue to improve the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi Could you review this PR again? Thank you very much!

.setRatisPeerRoles(resp.getPeerRolesList());

return builder.build();
// By default, we assume that SCM Ratis is not enabled.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have considered whether the ScmRatisEnabled field exists in the proto. If the field is not present, we determine whether Ratis is enabled based on whether peerRolesList includes leader or follower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi Kindly ping, Can you review this PR again? Thank you very much!

@slfan1989 slfan1989 requested a review from ChenSammi September 18, 2024 11:58
Copy link
Contributor

@whbing whbing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slfan1989 for update. Overall looks good. Some comments left.

Should Not Be Equal ${leader} ${EMPTY}
Assert Leader Present in TABLE
[Arguments] ${output}
Should Match Regexp ${output} \|\s+om\d+\s+\|\s+LEADER\s+\|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. om host is a user-defined string and does not necessarily start with om, right ?
  2. NIT: string don't look aligned.
  3. I verified this RE, and it seems that any format will pass. So, this RE looks like it needs to be changed.
    You can install robot in python and execute it locally:
    om-roles-test.robot :
*** Settings ***
Documentation       Smoke test for listing om roles.
Library             OperatingSystem
Test Timeout        5 minutes

*** Keywords ***
Assert Leader Present in TABLE
     [Arguments]                     ${output}
     Should Match Regexp             ${output}      \|\s+om\d+\s+\|\s+LEADER\s+\|

*** Test Cases ***

List om roles as TABLE without OM service ID passed
    ${result}=        Set Variable       | host32 | om32 | OTHER
    Should Be Equal   ${result}          | host32 | om32 | OTHER
    Assert Leader Present in TABLE  ${result}

run: robot om-roles-test.robot

==============================================================================
Om-Roles-Test :: Smoke test for listing om roles.
==============================================================================
List om roles as TABLE without OM service ID passed                   | PASS |
------------------------------------------------------------------------------
Om-Roles-Test :: Smoke test for listing om roles.                     | PASS |
1 test, 1 passed, 0 failed
==============================================================================

Copy link
Contributor Author

@slfan1989 slfan1989 Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

om host is a user-defined string and does not necessarily start with om, right ?

This regex matches strings containing "om" followed by digits, specifically formatted with "LEADER" surrounded by pipe delimiters.

image

If we replace it with other characters (like "scm"), it should not match.

image

NIT: string don't look aligned.

This should be fixed, and I will modify the format issue.

I verified this RE, and it seems that any format will pass. So, this RE looks like it needs to be changed.
You can install robot in python and execute it locally:

Sorry, I don't think there's an issue with this regex.

Test1 regex test

The regex validation indicates that, based on the provided example, it cannot match the regex.

image

I tested the regex functionality in VSCode, and everything is working fine. I'm not sure if Robot has issues recognizing the regular expression, so I changed it to ensure that it works properly with Robot.

==============================================================================
Om-Roles                                                                      
==============================================================================
List om roles as TABLE without OM service ID passed                   | FAIL |
'| host32 | om32 | OTHER' does not match '\|.*LEADER.*'
------------------------------------------------------------------------------
Om-Roles                                                              | FAIL |
1 test, 0 passed, 1 failed
==============================================================================

Test Passed.

Assert Leader Present in TABLE
     [Arguments]                     ${output}
     Should Match Regexp             ${output}                     \\|.*LEADER.*

*** Test Cases ***

List om roles as TABLE without OM service ID passed
    ${result}=        Set Variable      | host32 | om32 | LEADER
    Log               ${result}
    Assert Leader Present in TABLE  ${result}
==============================================================================
Om-Roles                                                                      
==============================================================================
List om roles as TABLE without OM service ID passed                   | PASS |
------------------------------------------------------------------------------
Om-Roles                                                              | PASS |
1 test, 1 passed, 0 failed
==============================================================================

@slfan1989 slfan1989 requested a review from whbing September 23, 2024 05:00
@slfan1989
Copy link
Contributor Author

@whbing @ChenSammi Could you please help review this PR again? Thank you very much!

* This is used to check if 'leader' or 'follower' exists,
* in order to confirm whether we have enabled Ratis.
*/
private final List<String> scmRatisRolesToCheck = Arrays.asList("leader", "follower");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable scmRatisRolesToCheck is only referenced in one place, and I'm not sure if defining it as a global variable is a good idea. Additionally, from my understanding, using static might be better if it needs to be used. @ChenSammi Please help give some suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@errose28 @adoroszlai May I take up some of your time to ask for some advice? Thank you very much!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whbing How should I modify this? Should I change the variable to uppercase and make it static?

@slfan1989
Copy link
Contributor Author

@whbing @ChenSammi This PR is a small feature, but I believe it is valuable. We have already deployed it internally for 2 months, and I use this feature whenever switching SCM or OM. It should also make the information display more user-friendly for other users. If there are no major issues, I hope we can merge this PR into the master branch.

cc: @errose28 @adoroszlai

@slfan1989
Copy link
Contributor Author

@adoroszlai @errose28 @whbing @ChenSammi

I'm sorry to disturb everyone, but I really need this pr to assist with some subsequent work, such as #7266 (HDDS-11463. Track and display failed DataNode storage locations in SCM). I think we should provide a way to present this in a table format. Currently, I'm not sure what issues this PR has.

@slfan1989 slfan1989 requested a review from whbing October 3, 2024 08:29
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slfan1989 for the patch. I have some minor code comments, otherwise looks good.

private final String clusterId;
private final String scmId;
private final List<String> peerRoles;
private final Boolean scmRatisEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally Boolean instead of boolean? Since its value is set in the constructor from a boolean parameter, it doesn't seem to be ever set to null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for helping review the code! I will improve this part of the code.

}

// In STANDALONE mode, output only the Host Name and Port information.
if (roleItems.length == 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that roleItems.length matches the value expected according to isRatisEnabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your revision looks great. We can simplify it by removing the array size check, since we'll standardize it to formattingCLIUtils.addLine(roleItems);.

slfan1989 and others added 3 commits October 3, 2024 20:32
…/scm/GetScmRatisRolesSubcommand.java

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
…/scm/GetScmRatisRolesSubcommand.java

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
@slfan1989
Copy link
Contributor Author

@adoroszlai Could you help review this PR again? Thank you very much!

@slfan1989 slfan1989 requested a review from adoroszlai October 4, 2024 06:42
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slfan1989 for updating the patch. Two previous minor items still apply, but no need to push another commit just for these.

@ChenSammi @whbing do you have any other comments?

@adoroszlai adoroszlai changed the title HDDS-11268. [CLI] Improve CLI Display OM/SCM Roles. HDDS-11268. Add --table mode for OM/SCM Roles CLI Oct 4, 2024
slfan1989 and others added 2 commits October 4, 2024 16:04
…/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
…/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
@slfan1989
Copy link
Contributor Author

@adoroszlai Thank you very much for reviewing the code. Friendly ping. @whbing @ChenSammi

@whbing
Copy link
Contributor

whbing commented Oct 4, 2024

Sorry for the late reply due to the holiday. I'll have a look ASAP.

@whbing
Copy link
Contributor

whbing commented Oct 4, 2024

Thanks for @slfan1989 for the update, +1.

@slfan1989
Copy link
Contributor Author

@adoroszlai Thank you very much for reviewing this PR!

Regarding whbing, he/she has agreed to this PR.

Regarding ChenSammi , he/she may have been quite busy recently and hasn’t responded to this PR in the past month. If he/she has any feedback later, I’m willing to make improvements to the code at any time.

Can we merge this PR into the master branch? so that I can continue working on HDDS-11463(#7266) as discussed with Ethan.

@adoroszlai adoroszlai merged commit eb26677 into apache:master Oct 5, 2024
@adoroszlai
Copy link
Contributor

Thanks @slfan1989 for the patch, @ChenSammi, @errose28, @whbing for the review.

@slfan1989
Copy link
Contributor Author

@adoroszlai Thank you again for your help with this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants