New `admin check` command structure implementation #4807

kevinrr888 · 2024-08-14T17:59:03Z

This is part of #4687
Changes:

Added new admin check command
admin check list prints the checks that can be run along with their descriptions and dependencies
admin check run runs all the checks or a specified list of checks (explicitly provide each check, provide a regex of the checks to run, or just run all if no further args are provided). The dependencies are run first and if a checks dependency fails, the check is skipped
Current impl of admin check run does not do any actual work besides printing that the check is running (to ensure correct run order) and returning an OK status
New IT AdminCheckIT
- This IT includes a verion of the Checks where no work is actually done (right now, this is equivalent to the actual implementation). This is done so correct run order, correct checks run, etc. can be
  verified without actually running the checks which may take a long time. More tests can be added later to test the actual check functionality when that is implemented.
No existing functionality was changed

Questions:

Are new classes and packages in suitable locations?
Is the way I went about implementing the dummy checks in AdminCheckIT okay? Mostly concerned if changing the real static CHECK_RUNNERS field is okay (changes it during the test but asserts that it is reset after the test completes).

(Part of apache#4687) - Added new `admin check` command - `admin check list` prints the checks that can be run along with their descriptions and dependencies - `admin check run` runs all the checks or a specified list of checks (explicitly provide each check, provide a regex of the checks to run, or just run all if no further args are provided). The dependencies are run first and if a checks dependency fails, the check is skipped - Current impl of `admin check run` does not do any actual work besides printing that the check is running (to ensure correct run order) and returning an `OK` status - New IT AdminCheckIT - This IT includes a verion of the Checks where no work is actually done (right now, this is equivalent to the actual implementation). This is done so correct run order, correct checks run, etc. can be verified without actually running the checks which may take a long time. More tests can be added later to test the actual check functionality when that is implemented. - No existing functionality was changed

kevinrr888 · 2024-08-14T17:59:29Z

Some example outputs and a sketch of the current dependency tree as support:

Dependency tree sketch:

        SYSTEM_CONFIG
              |
              V
        ROOT_METADATA
              |
              V
          ROOT_TABLE
         /          \
   METADATA_TABLE  SYSTEM_FILES
         |
         V
     USER_FILES

Sample outputs:

$ accumulo admin check list

Check Name           | Description                                                                      | Depends on          
------------------------------------------------------------------------------------------------------------------------
SYSTEM_CONFIG        | Validate the system config stored in ZooKeeper                                   |                     
ROOT_METADATA        | Checks integrity of the root tablet metadata stored in ZooKeeper                 | SYSTEM_CONFIG       
ROOT_TABLE           | Scans all the tablet metadata stored in the root table and checks integrity      | ROOT_METADATA       
METADATA_TABLE       | Scans all the tablet metadata stored in the metadata table and checks integrity  | ROOT_TABLE          
SYSTEM_FILES         | Checks that files in system tablet metadata exist in DFS                         | ROOT_TABLE          
USER_FILES           | Checks that files in user tablet metadata exist in DFS                           | METADATA_TABLE      
------------------------------------------------------------------------------------------------------------------------

$ accumulo admin check run

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status OK
Running check METADATA_TABLE
Check METADATA_TABLE completed with status OK
Running check SYSTEM_FILES
Check SYSTEM_FILES completed with status OK
Running check USER_FILES
Check USER_FILES completed with status OK

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | OK                  
METADATA_TABLE       | OK                  
SYSTEM_FILES         | OK                  
USER_FILES           | OK                  
--------------------------------------------------

$ accumulo admin check run -p ".*table"

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status OK
Running check METADATA_TABLE
Check METADATA_TABLE completed with status OK

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | OK                  
METADATA_TABLE       | OK                  
--------------------------------------------------

In this example, I made ROOT_TABLE fail when it was run and attempted to run all the checks

$ accumulo admin check run

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status FAILED

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | FAILED              
METADATA_TABLE       | SKIPPED_DEPENDENCY_FAILED
SYSTEM_FILES         | SKIPPED_DEPENDENCY_FAILED
USER_FILES           | SKIPPED_DEPENDENCY_FAILED
--------------------------------------------------

kevinrr888 · 2024-08-14T18:02:49Z

This was created as a replacement for #4806 which targeted elasticity. This PR targets main

kevinrr888 · 2024-08-14T18:04:17Z

Relevant review comment from old PR:
#4806 (comment)

server/base/src/main/java/org/apache/accumulo/server/util/Admin.java

kevinrr888 · 2024-08-22T16:36:55Z

@keith-turner question:
Which of these would be the correct way to handle something like this:

$ accumulo admin check run SYSTEM_CONFIG USER_FILES
when SYSTEM_CONFIG fails

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | FAILED                  
ROOT_METADATA        | FILTERED_OUT        
ROOT_TABLE           | FILTERED_OUT        
METADATA_TABLE       | FILTERED_OUT        
SYSTEM_FILES         | FILTERED_OUT        
USER_FILES           | OK                  
--------------------------------------------------

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | FAILED                  
ROOT_METADATA        | FILTERED_OUT        
ROOT_TABLE           | FILTERED_OUT        
METADATA_TABLE       | FILTERED_OUT        
SYSTEM_FILES         | FILTERED_OUT        
USER_FILES           | SKIPPED_DEPENDENCY_FAILED                  
--------------------------------------------------

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | FAILED                  
ROOT_METADATA        | SKIPPED_DEPENDENCY_FAILED         
ROOT_TABLE           | SKIPPED_DEPENDENCY_FAILED         
METADATA_TABLE       | SKIPPED_DEPENDENCY_FAILED         
SYSTEM_FILES         | SKIPPED_DEPENDENCY_FAILED         
USER_FILES           | SKIPPED_DEPENDENCY_FAILED                  
--------------------------------------------------

keith-turner · 2024-08-22T19:29:12Z

@kevinrr888 they all seems like good ways to handle the situation. In the examples you posted it seems to come down to

Do we only look at the status of direct dependencies or do we analyze transitive dependencies?
When a check is filtered out and its dependency failed, what should it status be?

Personally I would go with the option that is the most straightforward to implement an maintain. Option 1 or 3 seem like they are the most straightforward to implement because they do not do transitive analysis. Seems like option 3 chooses SKIPPED_DEPENDENCY_FAILED when its also filtered out. I lean towards option 3.

- Added FILTERED_OUT status for checks that are not specified to run - When checks are specified, no longer run their dependencies first - Switched from static maps for the fields (CHECK_DESCRIPTION, CHECK_DEPENDENCIES, CHECK_RUNNERS) to instead be declared in the enum constructor. - This also meant refactoring AdminCheckIT

kevinrr888 · 2024-08-27T20:42:29Z

Addressed review comments:

Added FILTERED_OUT status for checks that are not specified to run
When checks are specified, no longer run their dependencies first
Switched from static maps for the fields (CHECK_DESCRIPTION, CHECK_DEPENDENCIES, CHECK_RUNNERS) to instead be declared in the enum constructor.
- This also meant refactoring AdminCheckIT

AdminCheckIT does not have many tests yet to avoid too much refactoring if further structural changes are needed. Will add more after implementation is approved.
Re #4807 (comment): I went with option 3 in b9f6d5d

keith-turner

This foundation looks really good. I think we can merge this as long as we have blocker issues about creating some of the actual checks.

test/src/main/java/org/apache/accumulo/test/AdminCheckIT.java

server/base/src/main/java/org/apache/accumulo/server/util/Admin.java

kevinrr888 · 2024-08-29T19:44:24Z

Once this is merged, I can create the follow on issue for adding the functionality for these checks

kevinrr888 added this to the 3.1.0 milestone Aug 14, 2024

kevinrr888 mentioned this pull request Aug 14, 2024

New admin check command structure implementation #4806

Closed

kevinrr888 self-assigned this Aug 14, 2024

ctubbsii linked an issue Aug 15, 2024 that may be closed by this pull request

Create an admin command that performs a comprehensive check for problems #4687

Closed

keith-turner reviewed Aug 19, 2024

View reviewed changes

server/base/src/main/java/org/apache/accumulo/server/util/Admin.java Outdated Show resolved Hide resolved

server/base/src/main/java/org/apache/accumulo/server/util/Admin.java Outdated Show resolved Hide resolved

keith-turner reviewed Aug 19, 2024

View reviewed changes

server/base/src/main/java/org/apache/accumulo/server/util/Admin.java Outdated Show resolved Hide resolved

dlmarion changed the base branch from main to 3.1 August 26, 2024 11:59

formatting (removed unused import)

cb2e8c8

keith-turner approved these changes Aug 29, 2024

View reviewed changes

test/src/main/java/org/apache/accumulo/test/AdminCheckIT.java Show resolved Hide resolved

server/base/src/main/java/org/apache/accumulo/server/util/Admin.java Outdated Show resolved Hide resolved

more tests, changed the regex param from boolean to string

3bd6ce4

ctubbsii mentioned this pull request Sep 4, 2024

Run FindCompactionTmpFiles utility via accumulo Admin #4465

Open

keith-turner approved these changes Sep 14, 2024

View reviewed changes

kevinrr888 merged commit 33cff13 into apache:3.1 Sep 16, 2024

This was referenced Sep 16, 2024

admin check run command functionality #4892

Open

Create an admin command that performs a comprehensive check for problems #4687

Closed

kevinrr888 deleted the feature-4687 branch November 1, 2024 15:02

kevinrr888 mentioned this pull request Dec 10, 2024

AdminCheckIT needs to test failure cases #5163

Closed

ctubbsii modified the milestones: 3.1.0, 4.0.0 Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New `admin check` command structure implementation #4807

New `admin check` command structure implementation #4807

Uh oh!

kevinrr888 commented Aug 14, 2024

Uh oh!

kevinrr888 commented Aug 14, 2024

Uh oh!

kevinrr888 commented Aug 14, 2024

Uh oh!

kevinrr888 commented Aug 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevinrr888 commented Aug 22, 2024

Uh oh!

keith-turner commented Aug 22, 2024

Uh oh!

kevinrr888 commented Aug 27, 2024

Uh oh!

keith-turner left a comment

Uh oh!

Uh oh!

Uh oh!

kevinrr888 commented Aug 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New admin check command structure implementation #4807

New admin check command structure implementation #4807

Uh oh!

Conversation

kevinrr888 commented Aug 14, 2024

Uh oh!

kevinrr888 commented Aug 14, 2024

Some example outputs and a sketch of the current dependency tree as support:

Sample outputs:

Uh oh!

kevinrr888 commented Aug 14, 2024

Uh oh!

kevinrr888 commented Aug 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevinrr888 commented Aug 22, 2024

Uh oh!

keith-turner commented Aug 22, 2024

Uh oh!

kevinrr888 commented Aug 27, 2024

Uh oh!

keith-turner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kevinrr888 commented Aug 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New `admin check` command structure implementation #4807

New `admin check` command structure implementation #4807