-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-7743. [Snapshot] Implement snapshot disk usage: Exclusive size and more #5301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java
Outdated
Show resolved
Hide resolved
| repeated SnapshotSize snapshotSize = 1; | ||
| } | ||
|
|
||
| message SnapshotSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| message SnapshotSize { | |
| message SnapshotProperty { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually. We can use SnapshotInfo directly if this is generalized into SetProperty.
Only those fields that are set in We should ask the caller to pass in the complete SnapshotInfo will be updated.SnapshotInfo including all existing fields in its desired state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can generalise this, but we shouldn't send the snapshot info as it is. Imagine this case,
- I get
SnapshotInfoin KeyDeletetingService, the snapshot is still active. - Now someone deletes the snapshot, So snapshot metadata says SnapshotDeleted.
- Now when I update the SnapshotInfo it still says it's active. Though the snapshot is deleted, we will make it active again with this flow and it will never be GCed. This is the same for all the other metadata in the snapshot as well.
Your thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aswinshakil Yup that can be a problem. But I'm worried that refactoring would be much harder later on (when we implement this and want to merge update SnapshotInfo operations).
Can we use a snapshotTableLock to synchronize the operations (e.g. snapshot delete and snapshot update), so that the case you listed above won't happen?
The current approach relies on the current state of single-threaded execution of Ratis Txs. And this is can also be a hidden problem when we enable parallel Ratis Txs at some point. So a SNAPSHOT_LOCK for example inside of OMSnapshotUpdateSizeRequest#validateAndUpdateCache would still be required. Just like the VOLUME_LOCK/BUCKET_LOCK that we use in volume/bucket/key operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we need to have snapshot lock in general for these operations. I will create another JIRA to address this issue. I was just discussing this with @hemantk-12 as well. Yes, we should not rely only on Ratis to serialize the request for the snapshot.
...r/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotUpdateSizeRequest.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotUpdateSizeRequest.java
Show resolved
Hide resolved
| metadataManager.getSnapshotInfoTable().addCacheEntry( | ||
| new CacheKey<>(snapshotKey), | ||
| CacheValue.get(trxnLogIndex, snapshotInfo)); | ||
| updatedSnapInfos.put(snapshotKey, snapshotInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just re-use snapshotSizeList instead of having to copy the whole list over again into this updatedSnapInfos if we use SnapshotInfo in the proto message directly. Just remove the invalid entries in snapshotSizeList using an iterator (and log error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @aswinshakil.
Apart from the inline suggestions, I've couple of question.
- Exclusive size won't be available if there is only one snapshot. For example, if someone has a job which creates a snapshot and deletes the previous and only maintain a snapshot at a time. Is it correct understanding and acceptable by product?
- Exclusive size will only be calculated after KeyDeletingService run. Which means it will be eventually calculated in case there are millions of keys in delete table.
| } | ||
|
|
||
| message SetSnapshotPropertyRequest { | ||
| repeated SnapshotProperty snapshotProperty = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of batch API for couple of reasons.
- Error handling: For a batch request, if one of the operations fail, the whole request will fail and throw error. And can leave in inconsistent state.
- Performance: Batch APIs don't give correct numbers in terms of server side load handling, latency and other performance metrics.
Not sure how much it matters in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought of making it a single API call for each request. But for 1000 snapshots, we will be sending 1000 requests. We are only updating the snapshot metadata, so I see no harm in sending batch requests. But I'm open to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For error handling, the caller has to handle any error in the response carefully when some of the requests failed and some succeeded.
Performance wise I'd say we just need to count the metrics in finer granularity. For a SetSnapshotPropertyRequest that contains 100 snapshot updates, the metric could be increased by 100 rather than 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
My concern is on server side error handling, e.g. in OMSnapshotSetPropertyRequest and OMSnapshotPurgeRequest is something fails in the loop, it will be partially updated. It won't be a problem if we handle each entry individually.
-
Regarding performance, it is not just about request count metric, that is easier compare to latency and failure metrics. E.g. latency for a batch of 1000 would be different that a batch of 1 or 2. You can argue that we can use avg but it won't give correct picture.
Again, as I said earlier these concerns might not be valid in this case because this API will be used by internal [KeyDeletingService] only.
.../src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotSetPropertyRequest.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/hadoop/ozone/om/response/snapshot/OMSnapshotSetPropertyResponse.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
| private final AtomicBoolean suspended; | ||
| private final Map<String, Long> exclusiveSizeList; | ||
| private final Map<String, Long> exclusiveReplicatedSizeList; | ||
| private final Set<String> completedExclusiveSizeList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this could be a local variable. I see you are clearing it after updating the snapshot info with exclusive size but someone might remove the line by mistake and no test will catch it unless it is explicitly checking completedExclusiveSizeList after the run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for keeping it global is because of the keyLimitPerTask. At a time it is not guaranteed the entire snapshot is processed, If we keep it local it will be cleared in the next iteration. I will be doing a follow-up PR, where we will be continuing size calculation from where we stopped during the pervious iteration because of keyLimitPerTask
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review @hemantk-12 I will be updating the PR shortly with the suggestions. For the questions,
|
.../src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotSetPropertyRequest.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Outdated
Show resolved
Hide resolved
...ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/KeyDeletingService.java
Show resolved
Hide resolved
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
For some weird reason, The test case |
|
Thanks for the patch @aswinshakil and @smengcl for review. |
…usive size and more (apache#5301) Change-Id: I89d1ec9f3a521e15069fa53de8b1236c7036691f
What changes were proposed in this pull request?
The change proposes to add exclusive size for a snapshot. PR #5175 address Reference Size calculation for each Snapshot. I have linked a PDF in the JIRA explaining the design. I have summarized it below for future reference. A Snapshot has 4 size,
Calculations:
Total Size
We can get
Total Sizefrom the bucketInfoomBucketInfo.getUsedBytes()in the Snapshot. This gives the total size used by the snapshot.Note: This give the size of the bucket after replication.
Shared Size
Once we get the Total Size and Exclusive Size we can calculate Shared Size.
Shared Size = Total Size - Exclusive SizeExclusive Size
To calculate
Exclusive Sizefor current snapshot, Check the next snapshotdeletedTableif the deleted key is referenced in current snapshot and not referenced in the previous snapshot then that key is exclusive to the current snapshot.An example is discussed in the attached Doc in the JIRA.
Things to do in follow-up PRs
deletedDirTablefor all the snapshot.deleteTableand calculate Exclusive size.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7743
How was this patch tested?
Added New Unit Test and Integration test.