New `admin check` command structure implementation #4806

kevinrr888 · 2024-08-13T19:48:29Z

This is part of #4687
Changes:

Added new admin check command
admin check list prints the checks that can be run along with their descriptions and dependencies
admin check run runs all the checks or a specified list of checks (explicitly provide each check, provide a regex of the checks to run, or just run all if no further args are provided). The dependencies are run first and if a checks dependency fails, the check is skipped
Current impl of admin check run does not do any actual work besides printing that the check is running (to ensure correct run order) and returning an OK status
New IT AdminCheckIT
- This IT includes a verion of the Checks where no work is actually done (right now, this is equivalent to the actual implementation). This is done so correct run order, correct checks run, etc. can be
  verified without actually running the checks which may take a long time. More tests can be added later to test the actual check functionality when that is implemented.
No existing functionality was changed

Questions:

Are new classes and packages in suitable locations?
Is the way I went about implementing the dummy checks in AdminCheckIT okay? Mostly concerned if changing the real static CHECK_RUNNERS field is okay (changes it during the test but asserts that it is reset after the test completes).

(Part of apache#4687) - Added new `admin check` command - `admin check list` prints the checks that can be run along with their descriptions and dependencies - `admin check run` runs all the checks or a specified list of checks (explicitly provide each check, provide a regex of the checks to run, or just run all if no further args are provided). The dependencies are run first and if a checks dependency fails, the check is skipped - Current impl of `admin check run` does not do any actual work besides printing that the check is running (to ensure correct run order) and returning an `OK` status - New IT AdminCheckIT - This IT includes a verion of the Checks where no work is actually done (right now, this is equivalent to the actual implementation). This is done so correct run order, correct checks run, etc. can be verified without actually running the checks which may take a long time. More tests can be added later to test the actual check functionality when that is implemented. - No existing functionality was changed

kevinrr888 · 2024-08-13T19:49:24Z

Some example outputs and a sketch of the current dependency tree as support:

Dependency tree sketch:

        SYSTEM_CONFIG
              |
              V
        ROOT_METADATA
              |
              V
          ROOT_TABLE
         /          \
   METADATA_TABLE  SYSTEM_FILES
         |
         V
     USER_FILES

Sample outputs:

$ accumulo admin check list

Check Name           | Description                                                                      | Depends on          
------------------------------------------------------------------------------------------------------------------------
SYSTEM_CONFIG        | Validate the system config stored in ZooKeeper                                   |                     
ROOT_METADATA        | Checks integrity of the root tablet metadata stored in ZooKeeper                 | SYSTEM_CONFIG       
ROOT_TABLE           | Scans all the tablet metadata stored in the root table and checks integrity      | ROOT_METADATA       
METADATA_TABLE       | Scans all the tablet metadata stored in the metadata table and checks integrity  | ROOT_TABLE          
SYSTEM_FILES         | Checks that files in system tablet metadata exist in DFS                         | ROOT_TABLE          
USER_FILES           | Checks that files in user tablet metadata exist in DFS                           | METADATA_TABLE      
------------------------------------------------------------------------------------------------------------------------

$ accumulo admin check run

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status OK
Running check METADATA_TABLE
Check METADATA_TABLE completed with status OK
Running check SYSTEM_FILES
Check SYSTEM_FILES completed with status OK
Running check USER_FILES
Check USER_FILES completed with status OK

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | OK                  
METADATA_TABLE       | OK                  
SYSTEM_FILES         | OK                  
USER_FILES           | OK                  
--------------------------------------------------

$ accumulo admin check run -p ".*table"

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status OK
Running check METADATA_TABLE
Check METADATA_TABLE completed with status OK

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | OK                  
METADATA_TABLE       | OK                  
--------------------------------------------------

In this example, I made ROOT_TABLE fail when it was run and attempted to run all the checks

$ accumulo admin check run

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status FAILED

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | FAILED              
METADATA_TABLE       | SKIPPED_DEPENDENCY_FAILED
SYSTEM_FILES         | SKIPPED_DEPENDENCY_FAILED
USER_FILES           | SKIPPED_DEPENDENCY_FAILED
--------------------------------------------------

ddanielr · 2024-08-14T15:21:59Z

server/base/src/main/java/org/apache/accumulo/server/util/Admin.java

+    }
+  }
+
+  private static void listChecks() {


Is there a reason you didn't want to use a logger here?

The admin commands print using System.out which is what I stuck with. I'm not really sure how using a logger here would work... It wouldn't print to the console when something like accumulo admin check run is executed from the shell as far as I'm aware. It may also be harder to format nicely using a logger. I'm not sure what would be gained from using a logger. If we want time to be included, we can just manually include that.

If we want the output to go to a file, we can just redirect the output there.

ddanielr · 2024-08-14T15:22:31Z

server/base/src/main/java/org/apache/accumulo/server/util/Admin.java

+
+  private static void
+      printChecksResults(TreeMap<CheckCommand.Check,CheckCommand.CheckStatus> checkStatus) {
+    System.out.println();


Same logger question here

kevinrr888 · 2024-08-14T18:01:05Z

Closing this to be replaced by #4807.
As suggested by @ctubbsii, making these changes target main

kevinrr888 self-assigned this Aug 13, 2024

kevinrr888 added this to the 4.0.0 milestone Aug 13, 2024

kevinrr888 requested a review from keith-turner August 13, 2024 19:58

ctubbsii linked an issue Aug 14, 2024 that may be closed by this pull request

Create an admin command that performs a comprehensive check for problems #4687

Closed

ddanielr reviewed Aug 14, 2024

View reviewed changes

kevinrr888 closed this Aug 14, 2024

kevinrr888 mentioned this pull request Aug 14, 2024

New admin check command structure implementation #4807

Merged

ctubbsii removed this from the 4.0.0 milestone Aug 15, 2024

kevinrr888 deleted the elasticity-feature-4687 branch November 1, 2024 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New `admin check` command structure implementation #4806

New `admin check` command structure implementation #4806

Uh oh!

kevinrr888 commented Aug 13, 2024

Uh oh!

kevinrr888 commented Aug 13, 2024 •

edited

Loading

Uh oh!

ddanielr Aug 14, 2024

Uh oh!

kevinrr888 Aug 14, 2024

Uh oh!

kevinrr888 Aug 14, 2024

Uh oh!

ddanielr Aug 14, 2024

Uh oh!

kevinrr888 commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New admin check command structure implementation #4806

New admin check command structure implementation #4806

Uh oh!

Conversation

kevinrr888 commented Aug 13, 2024

Uh oh!

kevinrr888 commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Some example outputs and a sketch of the current dependency tree as support:

Sample outputs:

Uh oh!

ddanielr Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

kevinrr888 Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

kevinrr888 Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

ddanielr Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

kevinrr888 commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New `admin check` command structure implementation #4806

New `admin check` command structure implementation #4806

kevinrr888 commented Aug 13, 2024 •

edited

Loading