Skip to content

Conversation

@Tejaskriya
Copy link
Contributor

@Tejaskriya Tejaskriya commented May 9, 2025

What changes were proposed in this pull request?

Add a verifier that checks the container state for all replicas. This can detect keys which are mapped to a container in a few bad states:
Container in [DELETED, DELETING] states
Replicas in [DELETED, UNHEALTHY, INVALID] states

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12595

How was this patch tested?

Manual testing, added to existing robot test and integration test.

bash-4.2$ ozone debug replicas verify --container-state cli-debug-volume/cli-debug-bucket -o ./temp1.json
{
  "keys" : [ ],
  "pass" : true
}

bash-4.2$ ozone debug replicas verify --container-state cli-debug-volume/cli-debug-bucket -o ./temp2.json --all-results
{
  "keys" : [ {
    "volumeName" : "cli-debug-volume",
    "bucketName" : "cli-debug-bucket",
    "name" : "testfile",
    "blocks" : [ {
      "containerID" : 1,
      "blockID" : 115816896921600001,
      "replicas" : [ {
        "datanode" : {
          "uuid" : "26b55e91-6d75-40d5-a22f-a0ebf54aaab2",
          "hostname" : "ozone-datanode-5.ozone_default"
        },
        "checks" : [ {
          "type" : "replicaState",
          "completed" : true,
          "pass" : true,
          "failures" : [ ]
        } ],
        "replicaIndex" : 0
      } ]
    }, {
      "containerID" : 2,
      "blockID" : 115816896921600002,
      "replicas" : [ {
        "datanode" : {
          "uuid" : "a0a033f9-b4f4-436a-a927-58f0167d7489",
          "hostname" : "ozone-datanode-3.ozone_default"
        },
        "checks" : [ {
          "type" : "replicaState",
          "completed" : true,
          "pass" : true,
          "failures" : [ ]
        } ],
        "replicaIndex" : 0
      } ]
    } ],
    "pass" : true
  } ],
  "pass" : true
}

In case of failure:

bash-4.2$ ozone debug replicas verify --container-state vol1/bb1/k1 -o ~/
{
  "keys" : [ {
    "volumeName" : "vol1",
    "bucketName" : "bb1",
    "name" : "k1",
    "blocks" : [ {
      "containerID" : 1,
      "blockID" : 115816896921600001,
      "replicas" : [ {
        "datanode" : {
          "uuid" : "29957c01-6c43-4918-92cd-f3e21176c058",
          "hostname" : "ozonesecure-ha-datanode2-1.ozonesecure-ha_ozone_net"
        },
        "checks" : [ {
          "type" : "replicaState",
          "completed" : true,
          "pass" : true,
          "failures" : [ ]
        } ],
        "replicaIndex" : 0
      }, {
        "datanode" : {
          "uuid" : "aadb8d06-fb64-4c35-a0e4-c90b30b3edfd",
          "hostname" : "ozonesecure-ha-datanode3-1.ozonesecure-ha_ozone_net"
        },
        "checks" : [ {
          "type" : "replicaState",
          "completed" : true,
          "pass" : true,
          "failures" : [ ]
        } ],
        "replicaIndex" : 0
      }, {
        "datanode" : {
          "uuid" : "6de2703f-770b-4ab2-add7-40921d997efb",
          "hostname" : "ozonesecure-ha-datanode1-1.ozonesecure-ha_ozone_net"
        },
        "checks" : [ {
          "type" : "replicaState",
          "completed" : true,
          "pass" : false,
          "failures" : [ {
            "message" : "ContainerID 1 does not exist"
          } ]
        } ],
        "replicaIndex" : 0
      } ]
    } ],
    "pass" : false
  } ],
  "pass" : false
}

@Tejaskriya Tejaskriya changed the title Hdds 12595 container verifier HDDS-12595. Add verifier for container replica states May 9, 2025
@ivandika3 ivandika3 added the tools Tools that helps with debugging label May 10, 2025
@Tejaskriya Tejaskriya marked this pull request as ready for review May 12, 2025 06:53
@Tejaskriya Tejaskriya requested a review from dombizita May 12, 2025 06:54
@Tejaskriya
Copy link
Contributor Author

@sarvekshayr Requesting for a review!

Copy link
Contributor

@sarvekshayr sarvekshayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Tejaskriya for working on this!

To align better with the check type and the corresponding verifier class, it might be clearer to name the flag --replica-state instead of --container-state or vice versa. What do you think?

Also, the pass field doesn't clearly indicate what the check is validating or what a failure represents. It would be helpful to explain in the description as to what qualifies as a "pass" for this check, so the results are easier to interpret.

Copy link
Contributor

@aryangupta1998 aryangupta1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the command output, for the replicas that aren't in bad health, can we display the container state as well, like,
"containerState": "OPEN"

@adoroszlai adoroszlai marked this pull request as draft May 13, 2025 15:53
@adoroszlai
Copy link
Contributor

Please check acceptance test failure.

@Tejaskriya Tejaskriya marked this pull request as ready for review May 14, 2025 04:16
@Tejaskriya
Copy link
Contributor Author

@aryangupta1998 we can do that, but as this command is meant to operate on large number of keys, the output will become more bulky with the addition of 1 more field. Also as the aim is to get only the bad replicas and not to fetch more info on all replicas, I am not sure if this aligns with the aim

cc @dombizita what do you suggest?

Copy link
Contributor

@sarvekshayr sarvekshayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the patch. Overall changes LGTM.

Could you also include the failure output in the description?

Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @Tejaskriya! Overall the code changes are looking good to me. Two things that were already brought up before:

  • Output: I think it's better to keep it simple and only show the containers that are [DELETED, UNHEALTHY, INVALID]. It's great that you updated the CLI description, I think it's straightforward this way.
  • Name of the check/flag: I understand the reason why you changed it to --replica-state from --container-state, but maybe I'm more in favour with the container state. The main command looks like this ozone debug replicas verify, I think after this --replica-state loses the information that this verifier will check on the container states. Also we may rename the check type to containerState too. I think it won't be misleading, as each block replica will have their own section in the json. Please let me know what do you think of this!

One idea: what if we go with "container-availability" or "container-health"? That kind of describes what does this verifier actually checks for and maybe the "failures" part fits better too. I'm still not sure, naming things are always hard :)

Could you please add an example in the PR description where the keys are ratis 3 replicated and there is an error somewhere? Thank you!

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @Tejaskriya. With regards to the above discussion, IMO --container-state and a class called ContainerStateVerifier makes the most sense, since this is also checking SCM state which does not pertain to the replicas.


client = xceiverClientManager.acquireClientForReadData(pipeline);
response = ContainerProtocolCalls
.readContainer(client, containerId, encodedToken);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the container replica is not found (for example, the container was deleted), I believe we will get a StorageContainerException with result code CONTAINER_NOT_FOUND. We need to display this in the json and count it as a check failure instead of an incomplete check due to an exception. For now can you manually test this case? We will create fault injection tests for these commands once HDDS-12715/#8209 is complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we do not get the result code in the exception at the client side. We are getting this: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 1 does not exist
So I am checking the exception message for "ContainerID" and "does not exist", and marking this as failed instead of incomplete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a sample in the PR description for this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file a Jira for this and add a comment to this part of the code linking to the Jira? Ideally we should depend on protocol aspects like result codes instead of error messages for determining what went wrong, but I'm ok leaving this workaround in if we have documented it as an area that needs to be fixed.

@Tejaskriya Tejaskriya force-pushed the HDDS-12595_containerVerifier branch from 20627a0 to b3694d2 Compare May 19, 2025 14:42
@Tejaskriya
Copy link
Contributor Author

The cache with 1 million entries is taking about 43MB (tested how much memory the cache is occupying in a full cache scenario)
@errose28 I think this should be okay then?

@Tejaskriya Tejaskriya marked this pull request as draft May 19, 2025 17:58
@errose28
Copy link
Contributor

Awesome, staying in the 10s of MBs for a one-off client command like this is good. 1 million containers should allow us to read 15PB of data before any cache evictions best case (1 million containers * 5GB per container * 3x replication of each container).

Should we make this cache size configurable? We could add a --container-cache-size flag, but we would need to print a warning that it will be ignored if it is used without the --container-state check.

@Tejaskriya Tejaskriya force-pushed the HDDS-12595_containerVerifier branch from 4386521 to 2789207 Compare May 29, 2025 04:47
@Tejaskriya Tejaskriya marked this pull request as ready for review May 29, 2025 10:04
@Tejaskriya Tejaskriya requested review from dombizita and errose28 May 29, 2025 10:04
Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the patch @Tejaskriya!

@Tejaskriya
Copy link
Contributor Author

Thanks @dombizita
@aryangupta1998, would you like to take another look?

Copy link
Contributor

@aryangupta1998 aryangupta1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Tejaskriya for updating the patch!
Overall LGTM, some minor comments inline!

@Tejaskriya
Copy link
Contributor Author

Thanks for the review @aryangupta1998 , I have addressed all of the comments.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments after a quick read through. With the other +1s I think we're good to go.

@Tejaskriya Tejaskriya requested a review from aryangupta1998 June 3, 2025 06:17
Copy link
Contributor

@aryangupta1998 aryangupta1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @Tejaskriya, LGTM!

@Tejaskriya Tejaskriya merged commit 3c25e7d into apache:master Jun 3, 2025
42 checks passed
@Tejaskriya
Copy link
Contributor Author

Thanks for the reviews @aryangupta1998 @errose28 @dombizita @sarvekshayr !

hevinhsu pushed a commit to hevinhsu/ozone that referenced this pull request Jun 3, 2025
aswinshakil added a commit to aswinshakil/ozone that referenced this pull request Jun 9, 2025
…239-container-reconciliation

Commits: 80 commits
5e273a4 HDDS-12977. Fail build on dependency problems (apache#8574)
5081ba2 HDDS-13034. Refactor DirectoryDeletingService to use ReclaimableDirFilter and ReclaimableKeyFilter (apache#8546)
e936e4d HDDS-12134. Implement Snapshot Cache lock for OM Bootstrap (apache#8474)
31d13de HDDS-13165. [Docs] Python client developer guide. (apache#8556)
9e6955e HDDS-13205. Bump common-custom-user-data-maven-extension to 2.0.3 (apache#8581)
750b629 HDDS-13203. Bump Bouncy Castle to 1.81 (apache#8580)
ba5177e HDDS-13202. Bump build-helper-maven-plugin to 3.6.1 (apache#8579)
07ee5dd HDDS-13204. Bump awssdk to 2.31.59 (apache#8582)
e1964f2 HDDS-13201. Bump jersey2 to 2.47 (apache#8578)
81295a5 HDDS-13013. [Snapshot] Add metrics and tests for snapshot operations. (apache#8436)
b3d75ab HDDS-12976. Clean up unused dependencies (apache#8521)
e0f08b2 HDDS-13179. rename-generated-config fails on re-compile without clean (apache#8569)
f388317 HDDS-12554. Support callback on completed reconfiguration (apache#8391)
c13a3fe HDDS-13154 Link more Grafana dashboard json files to the Observability user doc (apache#8533)
2a761f7 HDDS-11967. [Docs]DistCP Integration in Kerberized environment. (apache#8531)
81fc4c4 HDDS-12550. Use DatanodeID instead of UUID in NodeManager CommandQueue. (apache#8560)
2360af4 HDDS-13169. Intermittent failure in testSnapshotOperationsNotBlockedDuringCompaction (apache#8553)
f19789d HDDS-13170. Reclaimable filter should always reclaim entries when buckets and volumes have already been deleted (apache#8551)
315ef20 HDDS-13175. Leftover reference to OM-specific trash implementation (apache#8563)
902e715 HDDS-13159. Refactor KeyManagerImpl for getting deleted subdirectories and deleted subFiles (apache#8538)
46a93d0 HDDS-12817. Addendum rename ecIndex to replicaIndex in chunkinfo output (apache#8552)
19b9b9c HDDS-13166. Set pipeline ID in BlockExistenceVerifier to avoid cached pipeline with different node (apache#8549)
b3ff67c HDDS-13068. Validate Container Balancer move timeout and replication timeout configs (apache#8490)
7a7b9a8 HDDS-13139. Introduce bucket layout flag in freon rk command (apache#8539)
3c25e7d HDDS-12595. Add verifier for container replica states (apache#8422)
6d59220 HDDS-13104. Move auditparser acceptance test under debug (apache#8527)
8e8c432 HDDS-13071. Documentation for Container Replica Debugger Tool (apache#8485)
0e8c8d4 HDDS-13158. Bump junit to 5.13.0 (apache#8537)
8e552b4 HDDS-13157. Bump exec-maven-plugin to 3.5.1 (apache#8534)
168f690 HDDS-13155. Bump jline to 3.30.4 (apache#8535)
cc1e4d1 HDDS-13156. Bump awssdk to 2.31.54 (apache#8536)
3bfb7af HDDS-13136. KeyDeleting Service should not run for already deep cleaned snapshots (apache#8525)
006e691 HDDS-12503. Compact snapshot DB before evicting a snapshot out of cache (apache#8141)
568b228 HDDS-13067. Container Balancer delete commands should not be sent with an expiration time in the past (apache#8491)
53673c5 HDDS-11244. OmPurgeDirectoriesRequest should clean up File and Directory tables of AOS for deleted snapshot directories (apache#8509)
07f4868 HDDS-13099. ozone admin datanode list ignores --json flag when --id filter is used (apache#8500)
08c0ab8 HDDS-13075. Fix default value in description of container placement policy configs (apache#8511)
58c87a8 HDDS-12177. Set runtime scope where missing (apache#8513)
10c470d HDDS-12817. Add EC block index in the ozone debug replicas chunk-info (apache#8515)
7027ab7 HDDS-13124. Respect config hdds.datanode.use.datanode.hostname when reading from datanode (apache#8518)
b8b226c HDDS-12928. datanode min free space configuration (apache#8388)
fd3d70c HDDS-13026. KeyDeletingService should also delete RenameEntries (apache#8447)
4c1c6cf HDDS-12714. Create acceptance test framework for debug and repair tools (apache#8510)
fff80fc HDDS-13118. Remove duplicate mockito-core dependency from hdds-test-utils (apache#8508)
10d5555 HDDS-13115. Bump awssdk to 2.31.50 (apache#8505)
360d139 HDDS-13017. Fix warnings due to non-test scoped test dependencies (apache#8479)
1db1cca HDDS-13116. Bump jline to 3.30.3 (apache#8504)
322ca93 HDDS-13025. Refactor KeyDeletingService to use ReclaimableKeyFilter (apache#8450)
988b447 HDDS-5287. Document S3 ACL classes (apache#8501)
64bb29d HDDS-12777. Use module-specific name for generated config files (apache#8475)
54ed115 HDDS-9210. Update snapshot chain restore test to incorporate snapshot delete. (apache#8484)
87dfa5a HDDS-13014. Improve PrometheusMetricsSink#normalizeName performance (apache#8438)
7cdc865 HDDS-13100. ozone admin datanode list --json should output a newline at the end (apache#8499)
9cc4194 HDDS-13089. [snapshot] Add an integration test to verify snapshotted data can be read by S3 SDK client (apache#8495)
cb9867b HDDS-13065. Refactor SnapshotCache to return AutoCloseSupplier instead of ReferenceCounted (apache#8473)
a88ff71 HDDS-10979. Support STANDARD_IA S3 storage class to accept EC replication config (apache#8399)
6ec8f85 HDDS-13080. Improve delete metrics to show number of timeout DN command from SCM (apache#8497)
3bb8858 HDDS-12378. Change default hdds.scm.safemode.min.datanode to 3 (apache#8331)
0171bef HDDS-13073. Set pipeline ID in checksums verifier to avoid cached pipeline with different node (apache#8480)
5c7726a HDDS-11539. OzoneClientCache `@PreDestroy` is never called (apache#8493)
a8ed19b HDDS-13031. Implement a Flat Lock resource in OzoneManagerLock (apache#8446)
e9e8b30 HDDS-12935. Support unsigned chunked upload with STREAMING-UNSIGNED-PAYLOAD-TRAILER (apache#8366)
7590268 HDDS-13079. Improve logging in DN for delete operation. (apache#8489)
435fe7e HDDS-12870. Fix listObjects corner cases (apache#8307)
eb5dabd HDDS-12926. Remove *.tmp.* exclusion in DU (apache#8486)
eeb98c7 HDDS-13030. Snapshot Purge should unset deep cleaning flag for next 2 snapshots in the chain (apache#8451)
6bf121e HDDS-13032. Support proper S3OwnerId representation (apache#8478)
5d1b43d HDDS-13076. Refactor OzoneManagerLock class to rename Resource class to LeveledResource (apache#8482)
bafe6d9 HDDS-13064. [snapshot] Add test coverage for SnapshotUtils.isBlockLocationInfoSame() (apache#8476)
7035846 HDDS-13040. Add user doc highlighting the difference between Ozone ACL and S3 ACL. (apache#8457)
1825cdf HDDS-13049. Deprecate VolumeName & BucketName in OmKeyPurgeRequest and prevent Key version purge on Block Deletion Failure (apache#8463)
211c76c HDDS-13060. Change NodeManager.addDatanodeCommand(..) to use DatanodeID (apache#8471)
f410238 HDDS-13061. Add test for key ACL operations without permission (apache#8472)
d1a2f48 HDDS-13057. Increment block delete processed transaction counts regardless of log level (apache#8466)
0cc6fcc HDDS-13043. Replace != with assertNotEquals in TestSCMContainerPlacementRackAware (apache#8470)
e1c779a HDDS-13051. Use DatanodeID in server-scm. (apache#8465)
35e1126 HDDS-13042. [snapshot] Add future proofing test cases for unsupported file system API (apache#8458)
619c05d HDDS-13008. Exclude same SST files when calculating full snapdiff (apache#8423)
21b49d3 HDDS-12965. Fix warnings about "used undeclared" dependencies (apache#8468)
8136119 HDDS-13048. Create new module for Recon integration tests (apache#8464)

Conflicts:
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tools Tools that helps with debugging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants