Skip to content

Conversation

@aswinshakil
Copy link
Member

@aswinshakil aswinshakil commented Sep 15, 2023

What changes were proposed in this pull request?

The change proposes to add exclusive size for a snapshot. PR #5175 address Reference Size calculation for each Snapshot. I have linked a PDF in the JIRA explaining the design. I have summarized it below for future reference. A Snapshot has 4 size,

  • Reference Size, it is explained here HDDS-9159. [Snapshot] Implement snapshot disk usage: Referenced size #5175
  • Total Size - Refers to the sum of shared size and exclusive size of the snapshot. This is nothing but the bucket size during snapshot.
  • Shared Size - Refers to data shared between snapshots. Hence this size can’t be quantified as data held by one snapshot but shared between 2 or more snapshots.
  • Exclusive Size - Refers to the data held only by that snapshot.

Calculations:

Total Size

We can get Total Size from the bucketInfo omBucketInfo.getUsedBytes() in the Snapshot. This gives the total size used by the snapshot.

Note: This give the size of the bucket after replication.

Shared Size

Once we get the Total Size and Exclusive Size we can calculate Shared Size.
Shared Size = Total Size - Exclusive Size

Exclusive Size

To calculate Exclusive Size for current snapshot, Check the next snapshot deletedTable if the deleted key is referenced in current snapshot and not referenced in the previous snapshot then that key is exclusive to the current snapshot.

An example is discussed in the attached Doc in the JIRA.

Things to do in follow-up PRs

  • Expand deletedDirTable for all the snapshot.
  • Update the code to accommodate files moved to deleteTable and calculate Exclusive size.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7743

How was this patch tested?

Added New Unit Test and Integration test.

@aswinshakil aswinshakil self-assigned this Sep 15, 2023
@aswinshakil aswinshakil added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Sep 15, 2023
@smengcl smengcl changed the title HDDS-7743. [Snapshot] Implement snapshot disk usage. HDDS-7743. [Snapshot] Implement snapshot disk usage: Exclusive size and more Sep 16, 2023
repeated SnapshotSize snapshotSize = 1;
}

message SnapshotSize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message SnapshotSize {
message SnapshotProperty {

Copy link
Contributor

@smengcl smengcl Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually. We can use SnapshotInfo directly if this is generalized into SetProperty.

Only those fields that are set in SnapshotInfo will be updated. We should ask the caller to pass in the complete SnapshotInfo including all existing fields in its desired state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can generalise this, but we shouldn't send the snapshot info as it is. Imagine this case,

  1. I get SnapshotInfo in KeyDeletetingService, the snapshot is still active.
  2. Now someone deletes the snapshot, So snapshot metadata says SnapshotDeleted.
  3. Now when I update the SnapshotInfo it still says it's active. Though the snapshot is deleted, we will make it active again with this flow and it will never be GCed. This is the same for all the other metadata in the snapshot as well.

Your thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aswinshakil Yup that can be a problem. But I'm worried that refactoring would be much harder later on (when we implement this and want to merge update SnapshotInfo operations).

Can we use a snapshotTableLock to synchronize the operations (e.g. snapshot delete and snapshot update), so that the case you listed above won't happen?

The current approach relies on the current state of single-threaded execution of Ratis Txs. And this is can also be a hidden problem when we enable parallel Ratis Txs at some point. So a SNAPSHOT_LOCK for example inside of OMSnapshotUpdateSizeRequest#validateAndUpdateCache would still be required. Just like the VOLUME_LOCK/BUCKET_LOCK that we use in volume/bucket/key operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need to have snapshot lock in general for these operations. I will create another JIRA to address this issue. I was just discussing this with @hemantk-12 as well. Yes, we should not rely only on Ratis to serialize the request for the snapshot.

metadataManager.getSnapshotInfoTable().addCacheEntry(
new CacheKey<>(snapshotKey),
CacheValue.get(trxnLogIndex, snapshotInfo));
updatedSnapInfos.put(snapshotKey, snapshotInfo);
Copy link
Contributor

@smengcl smengcl Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just re-use snapshotSizeList instead of having to copy the whole list over again into this updatedSnapInfos if we use SnapshotInfo in the proto message directly. Just remove the invalid entries in snapshotSizeList using an iterator (and log error).

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @aswinshakil.
Apart from the inline suggestions, I've couple of question.

  1. Exclusive size won't be available if there is only one snapshot. For example, if someone has a job which creates a snapshot and deletes the previous and only maintain a snapshot at a time. Is it correct understanding and acceptable by product?
  2. Exclusive size will only be calculated after KeyDeletingService run. Which means it will be eventually calculated in case there are millions of keys in delete table.

}

message SetSnapshotPropertyRequest {
repeated SnapshotProperty snapshotProperty = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of batch API for couple of reasons.

  • Error handling: For a batch request, if one of the operations fail, the whole request will fail and throw error. And can leave in inconsistent state.
  • Performance: Batch APIs don't give correct numbers in terms of server side load handling, latency and other performance metrics.

Not sure how much it matters in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought of making it a single API call for each request. But for 1000 snapshots, we will be sending 1000 requests. We are only updating the snapshot metadata, so I see no harm in sending batch requests. But I'm open to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hemantk-12

For error handling, the caller has to handle any error in the response carefully when some of the requests failed and some succeeded.

Performance wise I'd say we just need to count the metrics in finer granularity. For a SetSnapshotPropertyRequest that contains 100 snapshot updates, the metric could be increased by 100 rather than 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smengcl

  • My concern is on server side error handling, e.g. in OMSnapshotSetPropertyRequest and OMSnapshotPurgeRequest is something fails in the loop, it will be partially updated. It won't be a problem if we handle each entry individually.

  • Regarding performance, it is not just about request count metric, that is easier compare to latency and failure metrics. E.g. latency for a batch of 1000 would be different that a batch of 1 or 2. You can argue that we can use avg but it won't give correct picture.

Again, as I said earlier these concerns might not be valid in this case because this API will be used by internal [KeyDeletingService] only.

private final AtomicBoolean suspended;
private final Map<String, Long> exclusiveSizeList;
private final Map<String, Long> exclusiveReplicatedSizeList;
private final Set<String> completedExclusiveSizeList;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this could be a local variable. I see you are clearing it after updating the snapshot info with exclusive size but someone might remove the line by mistake and no test will catch it unless it is explicitly checking completedExclusiveSizeList after the run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for keeping it global is because of the keyLimitPerTask. At a time it is not guaranteed the entire snapshot is processed, If we keep it local it will be cleared in the next iteration. I will be doing a follow-up PR, where we will be continuing size calculation from where we stopped during the pervious iteration because of keyLimitPerTask

@aswinshakil
Copy link
Member Author

Thanks for the review @hemantk-12 I will be updating the PR shortly with the suggestions. For the questions,

  1. Correct. The last snapshot will not have exclusive size. For the last snapshot, we should calculate it using Active DB's deletedTable. The Active DB is constantly changing and we would be iterating through the deletedTable every time the KeyDeletingService runs, This takes hold of deletedTable lock and we will be constantly holding this lock even though there is nothing to update it.
  2. Yes, It will be eventually calculated. Not immediately when the snapshot is taken.

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@aswinshakil
Copy link
Member Author

For some weird reason, The test case testSnapshotExclusiveSize() is failing. I'll fix it before merging this PR.

@hemantk-12 hemantk-12 merged commit 1b33900 into apache:master Oct 4, 2023
@hemantk-12
Copy link
Contributor

Thanks for the patch @aswinshakil and @smengcl for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants