HDDS-7743. [Snapshot] Implement snapshot disk usage: Exclusive size and more #5301

aswinshakil · 2023-09-15T20:23:53Z

What changes were proposed in this pull request?

The change proposes to add exclusive size for a snapshot. PR #5175 address Reference Size calculation for each Snapshot. I have linked a PDF in the JIRA explaining the design. I have summarized it below for future reference. A Snapshot has 4 size,

Reference Size, it is explained here HDDS-9159. [Snapshot] Implement snapshot disk usage: Referenced size #5175
Total Size - Refers to the sum of shared size and exclusive size of the snapshot. This is nothing but the bucket size during snapshot.
Shared Size - Refers to data shared between snapshots. Hence this size can’t be quantified as data held by one snapshot but shared between 2 or more snapshots.
Exclusive Size - Refers to the data held only by that snapshot.

Calculations:

Total Size

We can get Total Size from the bucketInfo omBucketInfo.getUsedBytes() in the Snapshot. This gives the total size used by the snapshot.

Note: This give the size of the bucket after replication.

Shared Size

Once we get the Total Size and Exclusive Size we can calculate Shared Size.
Shared Size = Total Size - Exclusive Size

Exclusive Size

To calculate Exclusive Size for current snapshot, Check the next snapshot deletedTable if the deleted key is referenced in current snapshot and not referenced in the previous snapshot then that key is exclusive to the current snapshot.

An example is discussed in the attached Doc in the JIRA.

Things to do in follow-up PRs

Expand deletedDirTable for all the snapshot.
Update the code to accommodate files moved to deleteTable and calculate Exclusive size.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7743

How was this patch tested?

Added New Unit Test and Integration test.

hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java

smengcl · 2023-09-18T00:41:48Z

hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto

+  repeated SnapshotSize snapshotSize = 1;
+}
+
+message SnapshotSize {


Suggested change

message SnapshotSize {

message SnapshotProperty {

Actually. We can use SnapshotInfo directly if this is generalized into SetProperty.

~~Only those fields that are set in SnapshotInfo will be updated.~~ We should ask the caller to pass in the complete SnapshotInfo including all existing fields in its desired state.

We can generalise this, but we shouldn't send the snapshot info as it is. Imagine this case,

I get SnapshotInfo in KeyDeletetingService, the snapshot is still active.

Now someone deletes the snapshot, So snapshot metadata says SnapshotDeleted.

Now when I update the SnapshotInfo it still says it's active. Though the snapshot is deleted, we will make it active again with this flow and it will never be GCed. This is the same for all the other metadata in the snapshot as well.

Your thoughts on this?

@aswinshakil Yup that can be a problem. But I'm worried that refactoring would be much harder later on (when we implement this and want to merge update SnapshotInfo operations).

Can we use a snapshotTableLock to synchronize the operations (e.g. snapshot delete and snapshot update), so that the case you listed above won't happen?

The current approach relies on the current state of single-threaded execution of Ratis Txs. And this is can also be a hidden problem when we enable parallel Ratis Txs at some point. So a SNAPSHOT_LOCK for example inside of OMSnapshotUpdateSizeRequest#validateAndUpdateCache would still be required. Just like the VOLUME_LOCK/BUCKET_LOCK that we use in volume/bucket/key operations.

I agree we need to have snapshot lock in general for these operations. I will create another JIRA to address this issue. I was just discussing this with @hemantk-12 as well. Yes, we should not rely only on Ratis to serialize the request for the snapshot.

...r/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotUpdateSizeRequest.java

smengcl · 2023-09-18T00:53:26Z

...r/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotUpdateSizeRequest.java

+        metadataManager.getSnapshotInfoTable().addCacheEntry(
+            new CacheKey<>(snapshotKey),
+            CacheValue.get(trxnLogIndex, snapshotInfo));
+        updatedSnapInfos.put(snapshotKey, snapshotInfo);


We could just re-use snapshotSizeList instead of having to copy the whole list over again into this updatedSnapInfos if we use SnapshotInfo in the proto message directly. Just remove the invalid entries in snapshotSizeList using an iterator (and log error).

hemantk-12

Thanks for the patch @aswinshakil.
Apart from the inline suggestions, I've couple of question.

Exclusive size won't be available if there is only one snapshot. For example, if someone has a job which creates a snapshot and deletes the previous and only maintain a snapshot at a time. Is it correct understanding and acceptable by product?
Exclusive size will only be calculated after KeyDeletingService run. Which means it will be eventually calculated in case there are millions of keys in delete table.

hemantk-12 · 2023-09-21T00:25:24Z

hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto

 }

+message SetSnapshotPropertyRequest {
+  repeated SnapshotProperty snapshotProperty = 1;


I'm not a fan of batch API for couple of reasons.

Error handling: For a batch request, if one of the operations fail, the whole request will fail and throw error. And can leave in inconsistent state.

Performance: Batch APIs don't give correct numbers in terms of server side load handling, latency and other performance metrics.

Not sure how much it matters in this case.

I initially thought of making it a single API call for each request. But for 1000 snapshots, we will be sending 1000 requests. We are only updating the snapshot metadata, so I see no harm in sending batch requests. But I'm open to change it.

@hemantk-12

For error handling, the caller has to handle any error in the response carefully when some of the requests failed and some succeeded.

Performance wise I'd say we just need to count the metrics in finer granularity. For a SetSnapshotPropertyRequest that contains 100 snapshot updates, the metric could be increased by 100 rather than 1.

@smengcl

My concern is on server side error handling, e.g. in OMSnapshotSetPropertyRequest and OMSnapshotPurgeRequest is something fails in the loop, it will be partially updated. It won't be a problem if we handle each entry individually.

Regarding performance, it is not just about request count metric, that is easier compare to latency and failure metrics. E.g. latency for a batch of 1000 would be different that a batch of 1 or 2. You can argue that we can use avg but it won't give correct picture.

Again, as I said earlier these concerns might not be valid in this case because this API will be used by internal [KeyDeletingService] only.

.../src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotSetPropertyRequest.java

...rc/main/java/org/apache/hadoop/ozone/om/response/snapshot/OMSnapshotSetPropertyResponse.java

...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java

hemantk-12 · 2023-09-25T17:25:59Z

...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java

  private final AtomicBoolean suspended;
+  private final Map<String, Long> exclusiveSizeList;
+  private final Map<String, Long> exclusiveReplicatedSizeList;
+  private final Set<String> completedExclusiveSizeList;


nit: I think this could be a local variable. I see you are clearing it after updating the snapshot info with exclusive size but someone might remove the line by mistake and no test will catch it unless it is explicitly checking completedExclusiveSizeList after the run.

The reason for keeping it global is because of the keyLimitPerTask. At a time it is not guaranteed the entire snapshot is processed, If we keep it local it will be cleared in the next iteration. I will be doing a follow-up PR, where we will be continuing size calculation from where we stopped during the pervious iteration because of keyLimitPerTask

...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java

aswinshakil · 2023-09-26T17:48:45Z

Thanks for the review @hemantk-12 I will be updating the PR shortly with the suggestions. For the questions,

Correct. The last snapshot will not have exclusive size. For the last snapshot, we should calculate it using Active DB's deletedTable. The Active DB is constantly changing and we would be iterating through the deletedTable every time the KeyDeletingService runs, This takes hold of deletedTable lock and we will be constantly holding this lock even though there is nothing to update it.
Yes, It will be eventually calculated. Not immediately when the snapshot is taken.

.../src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotSetPropertyRequest.java

...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java

hemantk-12

LGTM.

aswinshakil · 2023-10-02T19:53:58Z

For some weird reason, The test case testSnapshotExclusiveSize() is failing. I'll fix it before merging this PR.

hemantk-12 · 2023-10-04T16:45:10Z

Thanks for the patch @aswinshakil and @smengcl for review.

…usive size and more (apache#5301) Change-Id: I89d1ec9f3a521e15069fa53de8b1236c7036691f

HDDS-7743. [Snapshot] Implement snapshot disk usage.

c825d0f

aswinshakil requested review from hemantk-12 and smengcl September 15, 2023 20:24

aswinshakil self-assigned this Sep 15, 2023

aswinshakil added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Sep 15, 2023

smengcl changed the title ~~HDDS-7743. [Snapshot] Implement snapshot disk usage.~~ HDDS-7743. [Snapshot] Implement snapshot disk usage: Exclusive size and more Sep 16, 2023

smengcl reviewed Sep 17, 2023

View reviewed changes

hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java Outdated Show resolved Hide resolved

smengcl reviewed Sep 18, 2023

View reviewed changes

...r/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotUpdateSizeRequest.java Outdated Show resolved Hide resolved

smengcl reviewed Sep 18, 2023

View reviewed changes

...r/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotUpdateSizeRequest.java Show resolved Hide resolved

smengcl reviewed Sep 18, 2023

View reviewed changes

Address Review Comments.

1ddeb65

hemantk-12 reviewed Sep 25, 2023

View reviewed changes

Address review comments.

99177da

hemantk-12 reviewed Sep 27, 2023

View reviewed changes

aswinshakil added 2 commits September 27, 2023 16:22

Merge master

423ac7c

Change single API call to multiple API

0b9be1e

hemantk-12 reviewed Oct 2, 2023

View reviewed changes

Address review comments.

5451812

hemantk-12 reviewed Oct 2, 2023

View reviewed changes

aswinshakil added 2 commits October 3, 2023 11:30

Merge master

5ef41fe

Use different client id for requests.

6d3918b

hemantk-12 merged commit 1b33900 into apache:master Oct 4, 2023

aswinshakil mentioned this pull request Nov 10, 2023

HDDS-9426. Calculate Exclusive size for deep cleaned snapshot's deleted directories. #5579

Merged

aswinshakil mentioned this pull request Jan 26, 2024

HDDS-9426. Calculate Exclusive size for deep cleaned snapshot's deleted directories #6099

Merged

swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025

CDPD-52405. HDDS-7743. [Snapshot] Implement snapshot disk usage: Excl…

3c587d0

…usive size and more (apache#5301) Change-Id: I89d1ec9f3a521e15069fa53de8b1236c7036691f

HDDS-7743. [Snapshot] Implement snapshot disk usage: Exclusive size and more #5301

HDDS-7743. [Snapshot] Implement snapshot disk usage: Exclusive size and more #5301

Uh oh!

Conversation

aswinshakil commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Calculations:

Total Size

Shared Size

Exclusive Size

Things to do in follow-up PRs

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smengcl Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

smengcl Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hemantk-12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aswinshakil commented Sep 26, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hemantk-12 left a comment

Choose a reason for hiding this comment

Uh oh!

aswinshakil commented Oct 2, 2023

Uh oh!

hemantk-12 commented Oct 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

aswinshakil commented Sep 15, 2023 •

edited

Loading

smengcl Sep 18, 2023 •

edited

Loading

smengcl Sep 18, 2023 •

edited

Loading

hemantk-12 left a comment •

edited

Loading