Skip to content

Conversation

@kevinrr888
Copy link
Member

This PR:

  • Moves existing checks (checkTablets and the fate check for dangling locks) into the appropriate new admin check command
  • Adds new checks
  • New tests in AdminCheckIT
  • SYSTEM_CONFIG now checks for
    • valid locked table/namespace ids (the locked table/namespaces exist)
    • locked table/namespaces are associated with a fate op
  • ROOT_METADATA now checks for
    • offline tablets
    • missing "columns"
    • invalid "columns"
  • ROOT_TABLE now checks for
    • offline tablets
    • tablets for metadata table have no holes, valid (null) prev end row for first tablet, and valid (null) end row for last tablet
    • missing columns
    • invalid columns
  • METADATA_TABLE now checks for
    • offline tablets
    • tablets for user tables (and scanref) have no holes, valid (null) prev end row for first tablet, and valid (null) end row for last tablet
    • missing columns
    • invalid columns
  • SYSTEM_FILES now checks for
    • missing system files
  • USER_FILES now checks for
    • missing user files

There are still quite a few checks that need to be added (mentioned in #4687) and probably more. This is a first/starting PR for these checks. More checks will be added in follow-ons. Something else still left todo are tests for these checks for FAILING cases.

Part of #4892

This commit:
- Moves existing checks (`checkTablets` and the fate check for dangling locks) into the appropriate new `admin check` command
- Adds new checks
- New tests in AdminCheckIT
- SYSTEM_CONFIG now checks for
	- valid locked table/namespace ids (the locked table/namespaces exist)
	- locked table/namespaces are associated with a fate op
- ROOT_METADATA now checks for
	- offline tablets
	- missing "columns"
	- invalid "columns"
- ROOT_TABLE now checks for
	- offline tablets
	- tablets for metadata table have no holes, valid (null) prev end row for first tablet, and valid (null) end row for last tablet
	- missing columns
	- invalid columns
- METADATA_TABLE now checks for
	- offline tablets
	- tablets for user tables (and scanref) have no holes, valid (null) prev end row for first tablet, and valid (null) end row for last tablet
	- missing columns
	- invalid columns
- SYSTEM_FILES now checks for
	- missing system files
- USER_FILES now checks for
	- missing user files

Part of apache#4892
@kevinrr888 kevinrr888 self-assigned this Oct 8, 2024
@kevinrr888 kevinrr888 added this to the 3.1.0 milestone Oct 8, 2024
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinrr888 these changes look good. I was running the changes locally and I noticed at the top level the accumulo check-server-config command exists. Was wondering if that should be rolled into this check command, if it should could add that to the checklist on #4892.

- `System.out` -> `log.trace/warn` to avoid flooding output with unnecessary/detailed info. The most important info (e.g., output of `admin check list` command, and the final run status table from the `admin check run` command) is still printed to stdout. Problems found are now logged at warn instead of stdout. Detailed, non-error info logged at trace.
- Created new check `Check.TABLE_LOCKS` which ensures that table and namespace locks are valid and are associated with a FATE op.
- New check `assertNoOtherChecksRan()` in `AdminCheckIT` which ensures only the expected checks ran
- A few misc review changes: `MetadataCheckRunner` code improved to only fetch required columns when scanning, object creation moved outside of a loop
@kevinrr888
Copy link
Member Author

@kevinrr888 these changes look good. I was running the changes locally and I noticed at the top level the accumulo check-server-config command exists. Was wondering if that should be rolled into this check command, if it should could add that to the checklist on #4892.

Thanks, I had missed that. Added to #4892. Also noticed accumulo check-compaction-config, which I also added to the checklist. Did you think this should be included as well or no?

@kevinrr888
Copy link
Member Author

Changes in 11f1f76:

  • System.out -> log.trace/warn to avoid flooding output with unnecessary/detailed info. The most important info (e.g., output of admin check list command, and the final run status table from the admin check run command) is still printed to stdout. Problems found are now logged at warn instead of stdout. Detailed, non-error info logged at trace.
  • Created new check Check.TABLE_LOCKS which ensures that table and namespace locks are valid and are associated with a FATE op. This depends on SYSTEM_CONFIG, which seemed like the most fitting location for it in the dependency tree.
  • New check assertNoOtherChecksRan() in AdminCheckIT which ensures only the expected checks ran
  • A few misc review changes: MetadataCheckRunner code improved to only fetch required columns when scanning, object creation moved outside of a loop

Let me know how these changes look. Something still left TODO is tests for failing checks. Tests should be added to ensure everything that is supposed to be checked correctly fails when it is expected to. I imagine this will take quite a bit of time to do and the changes are already pretty large, so thinking it might be best to leave for follow-on.

@kevinrr888 kevinrr888 merged commit c0979fd into apache:3.1 Dec 10, 2024
@kevinrr888 kevinrr888 deleted the 3.1-feature-4892 branch December 10, 2024 18:28
asfgit pushed a commit that referenced this pull request Dec 16, 2024
* Fix unused import introduced in #5127
* Remove dead code for `--fixFiles` in Admin checks left by #4957
* Fix unclosed resource warning introduced in #5148
@ctubbsii ctubbsii modified the milestones: 3.1.0, 4.0.0 Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants