HDDS-10963. Implement a headBlocks API on the Datanode #6774

xichen01 · 2024-06-05T16:08:44Z

What changes were proposed in this pull request?

Implement a headBlocks API on the Datanode to allow clients to check the existence of blocks both on the database and on the disk.
The API not supports containers with FILE_PER_BLOCK layout.
This API will be used to find the missing key command.

What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10963

How was this patch tested?
Unit tests.

errose28

Hi @xichen01 I just took a quick look at this and have some questions, maybe you could fill in the details.

What is the overall goal?

I'm not sure I understand the overall feature being implemented here. Is the goal that for a given key in the Ozone manager, you can verify that all the blocks are present without actually reading the data? Please add more details to HDDS-10962 explaining the use and plan for this tool since this will provide context for changes related to it.

There looks to be overlap with other features

I think there may be some overlap with container reconciliation here (HDDS-10239). In particular that change will require an API to pull the merkle/checksum tree of a container and its blocks. This is eventually consistent since it is computed by the scanner, but it avoids any locking or iteration by simply returning a pre-computed proto. See the abstract/blueprint proposed in the doc and HDDS-10376.

Additionally, ensuring that disk and DB content match is the job of the scanner. This runs on each volume of each datanode and is therefore a much more efficient way to catch and fix consistency issues than a request that a client must make for every container. The scanner also accounts for nuances like container state, block deletion that modifies disk and DB in multiple steps, and optimistic locking. I don't see handling for these in this change and re-implementing them seems like duplicate work. On first glance of this implementation the large iterations done under a read lock also look concerning. The scanner is designed to avoid such things.

The "head request" terminology is kind of confusing in this context

Head type requests usually return just the metadata about the objects being requested. In this case the proposed API is sort of inverted. If you do "head" of a block, it tells you if the block is not present, instead of telling you it is present based on its metadata.

errose28 · 2024-06-06T21:03:04Z

...-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/BlockManagerImpl.java

+    try {
+      KeyValueContainerData containerData = (KeyValueContainerData) container.getContainerData();
+      if (!containerData.getLayoutVersion().equals(ContainerLayoutVersion.FILE_PER_BLOCK)) {
+        throw new UnsupportedEncodingException("Not support Container Layout " + containerData.getLayoutVersion());


I'm not sure this is the correct type to use. The Java doc says this is for character encodings. We probably want to throw some type of StorageContainerException.

Corrected to UnsupportedOperationException

xichen01 · 2024-06-07T05:19:12Z

@errose28 Thanks for your detailed response.

What is the overall goal?

The goal is to eventually implement a tool(command line tool) that can quickly find missing keys.
Here the "missing keys" means the key exists in OM, but the Block is missing.

What is the context

There are two scenarios that require this tool

We're performing an Orphan Block cleanup, which is done with an external tool, and this “find missing key” can be used to make sure that all keys in the cluster are not lost as a result of the Orphan Block cleanup.
We find that there are missing keys in the cluster, and when we read them, we get an error “NO_REPLICA_FOUND” or “Unable to find the block”. Our cluster has been running for several years and has gone through many releases, and due to some historical bugs, some keys are missing and we need to be able to find them.

We can't guarantee that there won't be other reasons for data loss in the future, so we need a tool that can quickly scan the entire cluster and make sure that all keys aren't missing.

There looks to be overlap with other features

“find missing keys” and "container reconciliation" and "Datanode scanner" have different purposes.

find missing keys is to find possible missing keys.
- This part may overlap with HDDS-9346, but its development process is uncertain.
Container reconciliation is more about solving the Contianer data consistency problem.
Datanode scanner is to ensure the reliability of Datanode data, it can find checksum errors or disk data loss. But it can't find blocks that are supposed to be in the Container, but aren't actually in the Container.

The scanner also accounts for nuances like container state, block deletion that modifies disk and DB in multiple steps

I think “find missing keys” doesn't need to take into account such things as Container state, because find missing key is to find Block replica of keys that are completely missing, and as long as any of these keys “exist” then the key is not missing in the cluster, and the key can be recovered by some way.

Does “find missing key” require reading and checking of data?

“find missing key” is only used to verify existence, not correctness, which can be guaranteed by the Datanode scanner and checksum.

The "head request" terminology is kind of confusing in this context

Makes sense , I think we can change the name of the API. maybe we can rename it to verifyBlocksExistence

adoroszlai · 2024-10-17T09:45:12Z

@xichen01 Based on comments here and in Jira, I think a design doc would be needed, which the community can discuss.

https://ozone.apache.org/docs/edge/design/ozone-enhancement-proposals.html

xichen01 · 2024-12-09T16:19:32Z

@xichen01 Based on comments here and in Jira, I think a design doc would be needed, which the community can discuss.

https://ozone.apache.org/docs/edge/design/ozone-enhancement-proposals.html

@adoroszlai @errose28 I have raised a doc about this PR in #7548

github-actions · 2025-11-12T00:06:21Z

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

github-actions · 2025-11-19T00:06:32Z

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

xichen01 added 5 commits June 5, 2024 23:58

HDDS-10963. Implement a headBlocks API on the Datanode

bb9be66

remove unused code

edc52bb

remove unused change

ab3f383

Merge branch 'master' into HDDS-10963

b64b57a

Merge branch 'master' into HDDS-10963

adfc378

errose28 reviewed Jun 6, 2024

View reviewed changes

Corrected UnsupportedEncodingException to UnsupportedOperationException

3acf800

adoroszlai marked this pull request as draft October 18, 2024 05:43

github-actions bot added the stale label Nov 12, 2025

github-actions bot closed this Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-10963. Implement a headBlocks API on the Datanode #6774

HDDS-10963. Implement a headBlocks API on the Datanode #6774

Uh oh!

xichen01 commented Jun 5, 2024

Uh oh!

errose28 left a comment

Uh oh!

errose28 Jun 6, 2024

Uh oh!

xichen01 Jun 7, 2024

Uh oh!

xichen01 commented Jun 7, 2024

Uh oh!

adoroszlai commented Oct 17, 2024

Uh oh!

xichen01 commented Dec 9, 2024

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HDDS-10963. Implement a headBlocks API on the Datanode #6774

HDDS-10963. Implement a headBlocks API on the Datanode #6774

Uh oh!

Conversation

xichen01 commented Jun 5, 2024

Uh oh!

errose28 left a comment

Choose a reason for hiding this comment

Uh oh!

errose28 Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

xichen01 Jun 7, 2024

Choose a reason for hiding this comment

Uh oh!

xichen01 commented Jun 7, 2024

What is the overall goal?

What is the context

There looks to be overlap with other features

The scanner also accounts for nuances like container state, block deletion that modifies disk and DB in multiple steps

Does “find missing key” require reading and checking of data?

The "head request" terminology is kind of confusing in this context

Uh oh!

adoroszlai commented Oct 17, 2024

Uh oh!

xichen01 commented Dec 9, 2024

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants