-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-8128. Deduplicate the ops in RDBBatchOperation. #4424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Galsza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szetszwo Thank you for working on this patch. I've left a few comments.
Please fix the batchSize resetting, the rest are nitpicks.
| public class RDBBatchOperation implements BatchOperation { | ||
| static final Logger LOG = LoggerFactory.getLogger(RDBBatchOperation.class); | ||
|
|
||
| static void debug(Supplier<String> message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method should be private. Are we going to use it anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
| class PutOpCache { | ||
| class FamilyCache { | ||
| private final ColumnFamily family; | ||
| /** A (key -> value) map. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** A (key -> value) map. */ | |
| /** | |
| * The map used to store the key and value byte array pairs. | |
| * <p> | |
| * {@link ByteArray} is used for the keys to avoid incorrect map | |
| * operations. | |
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Will update it.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java
Show resolved
Hide resolved
| } | ||
|
|
||
| void put(byte[] key, byte[] value) { | ||
| batchSize += key.length + value.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If batchSize is not reset anywhere this is going to overflow. Also, could this overflow even with a reset included? (Will cached put operations ever reach 2 gigs in size?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We should use long for sizes.
| } | ||
|
|
||
| /** A (family name -> {@link FamilyCache}) map. */ | ||
| private final Map<String, FamilyCache> map = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest renaming the variable to nameToCacheMap for easier understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Galsza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes, looks good to me.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java
Show resolved
Hide resolved
|
The force-pushed was to sync with master since |
Thanks for the info. Sync with |
|
@adoroszlai , thanks for the advices. I tend to prefer rebase over merge since it is easier to see the new code. I agree that when everything works fine, rebase or merge do not matter. When something does not work, rebase seems easier to trace the problem. |
How does the reviewer see what changes were made to the PR since the previous round of review, i.e. to address the review comments? |
|
If we rebase but not squash, the comments in Github may still work. Also, I usually keep the old branch https://github.com/szetszwo/ozone/tree/HDDS-8128b . Reviewers may download the patches and the run diff locally (this is like posting patches on the JIRA). |
|
Indeed, a previous comment still show in https://github.com/apache/ozone/pull/4424/files . The other comments become outdated. I guess it is because the code has changed. It has nothing to do with merge or rebase. |
|
When posting patches to Jira, old ones are usually kept. Force-pushing is like keeping only the latest patch. Keeping around old branches helps, but you have to let others know about them, and even then it's still not as streamlined as incremental commits. You are right that comments are still visible after force-push. However, it is hard to see the state of code they were referring to and how code was changed in response to them. https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review |
|
For merging, the downloaded patch file could be very messy since it may include the merged code from other commit. This is another reason I prefer rebase over merge.
Since the old branch is still around, we still can see the newer commits addressing the comments. |
duongkame
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @szetszwo for the change. This looks like a nice improvement and maybe RocksDB should've done it internally. I only put a few minor comments inline.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java
Show resolved
Hide resolved
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java
Show resolved
Hide resolved
|
@duongkame , thanks a lot for reviewing this! Just have pushed a new commit for addressing your comments. |
|
@szetszwo Did we see the disk usage go up because of the rocksdb wal? or did we see an sst redundant sst flush happen? |
|
WAL |
What changes were proposed in this pull request?
OM rocksdb uses a lot of space.
This PR adds a cache in RDBBatchOperation for caching the put ops. When there are multiple ops with the same key, it will overwrite the previous op in the cache so the previous op won't be written into RocksDB. In a test, the cache can reduce the disk usage in OM rocksdb of a single upload with 1000 parts from 755MB to 23MB.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8128
How was this patch tested?
The is an optimization. The existing test already has covered. Also have tested this with ozone-1.3...szetszwo:ozone:multipartUpload manually.