Skip to content

Conversation

@kevinrr888
Copy link
Member

This is part of #4687
Changes:

  • Added new admin check command
  • admin check list prints the checks that can be run along with their descriptions and dependencies
  • admin check run runs all the checks or a specified list of checks (explicitly provide each check, provide a regex of the checks to run, or just run all if no further args are provided). The dependencies are run first and if a checks dependency fails, the check is skipped
  • Current impl of admin check run does not do any actual work besides printing that the check is running (to ensure correct run order) and returning an OK status
  • New IT AdminCheckIT
    • This IT includes a verion of the Checks where no work is actually done (right now, this is equivalent to the actual implementation). This is done so correct run order, correct checks run, etc. can be
      verified without actually running the checks which may take a long time. More tests can be added later to test the actual check functionality when that is implemented.
  • No existing functionality was changed

Questions:

  1. Are new classes and packages in suitable locations?
  2. Is the way I went about implementing the dummy checks in AdminCheckIT okay? Mostly concerned if changing the real static CHECK_RUNNERS field is okay (changes it during the test but asserts that it is reset after the test completes).

(Part of apache#4687)
- Added new `admin check` command
- `admin check list` prints the checks that can be run along with their descriptions and dependencies
- `admin check run` runs all the checks or a specified list of checks (explicitly provide each check, provide a regex of the checks to run, or just run all if no further args are provided). The dependencies are run first and if a checks dependency fails, the check is skipped
- Current impl of `admin check run` does not do any actual work besides printing that the check is running (to ensure correct run order) and returning an `OK` status
- New IT AdminCheckIT
	- This IT includes a verion of the Checks where no work is actually done (right now, this is equivalent to the actual implementation). This is done so correct run order, correct checks run, etc. can be
	verified without actually running the checks which may take a long time. More tests can be added later to test the actual check functionality when that is implemented.
- No existing functionality was changed
@kevinrr888
Copy link
Member Author

kevinrr888 commented Aug 13, 2024

Some example outputs and a sketch of the current dependency tree as support:

Dependency tree sketch:

        SYSTEM_CONFIG
              |
              V
        ROOT_METADATA
              |
              V
          ROOT_TABLE
         /          \
   METADATA_TABLE  SYSTEM_FILES
         |
         V
     USER_FILES

Sample outputs:

$ accumulo admin check list

Check Name           | Description                                                                      | Depends on          
------------------------------------------------------------------------------------------------------------------------
SYSTEM_CONFIG        | Validate the system config stored in ZooKeeper                                   |                     
ROOT_METADATA        | Checks integrity of the root tablet metadata stored in ZooKeeper                 | SYSTEM_CONFIG       
ROOT_TABLE           | Scans all the tablet metadata stored in the root table and checks integrity      | ROOT_METADATA       
METADATA_TABLE       | Scans all the tablet metadata stored in the metadata table and checks integrity  | ROOT_TABLE          
SYSTEM_FILES         | Checks that files in system tablet metadata exist in DFS                         | ROOT_TABLE          
USER_FILES           | Checks that files in user tablet metadata exist in DFS                           | METADATA_TABLE      
------------------------------------------------------------------------------------------------------------------------
$ accumulo admin check run

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status OK
Running check METADATA_TABLE
Check METADATA_TABLE completed with status OK
Running check SYSTEM_FILES
Check SYSTEM_FILES completed with status OK
Running check USER_FILES
Check USER_FILES completed with status OK

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | OK                  
METADATA_TABLE       | OK                  
SYSTEM_FILES         | OK                  
USER_FILES           | OK                  
--------------------------------------------------
$ accumulo admin check run -p ".*table"

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status OK
Running check METADATA_TABLE
Check METADATA_TABLE completed with status OK

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | OK                  
METADATA_TABLE       | OK                  
--------------------------------------------------

In this example, I made ROOT_TABLE fail when it was run and attempted to run all the checks

$ accumulo admin check run

Running check SYSTEM_CONFIG
Check SYSTEM_CONFIG completed with status OK
Running check ROOT_METADATA
Check ROOT_METADATA completed with status OK
Running check ROOT_TABLE
Check ROOT_TABLE completed with status FAILED

Check Name           | Status              
--------------------------------------------------
SYSTEM_CONFIG        | OK                  
ROOT_METADATA        | OK                  
ROOT_TABLE           | FAILED              
METADATA_TABLE       | SKIPPED_DEPENDENCY_FAILED
SYSTEM_FILES         | SKIPPED_DEPENDENCY_FAILED
USER_FILES           | SKIPPED_DEPENDENCY_FAILED
--------------------------------------------------

@kevinrr888 kevinrr888 self-assigned this Aug 13, 2024
@kevinrr888 kevinrr888 added this to the 4.0.0 milestone Aug 13, 2024
}
}

private static void listChecks() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you didn't want to use a logger here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admin commands print using System.out which is what I stuck with. I'm not really sure how using a logger here would work... It wouldn't print to the console when something like accumulo admin check run is executed from the shell as far as I'm aware. It may also be harder to format nicely using a logger. I'm not sure what would be gained from using a logger. If we want time to be included, we can just manually include that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the output to go to a file, we can just redirect the output there.


private static void
printChecksResults(TreeMap<CheckCommand.Check,CheckCommand.CheckStatus> checkStatus) {
System.out.println();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logger question here

@kevinrr888
Copy link
Member Author

Closing this to be replaced by #4807.
As suggested by @ctubbsii, making these changes target main

@kevinrr888 kevinrr888 closed this Aug 14, 2024
@ctubbsii ctubbsii removed this from the 4.0.0 milestone Aug 15, 2024
@kevinrr888 kevinrr888 deleted the elasticity-feature-4687 branch November 1, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Create an admin command that performs a comprehensive check for problems

3 participants