Skip to content

Conversation

@szetszwo
Copy link
Contributor

@szetszwo szetszwo commented Mar 20, 2023

What changes were proposed in this pull request?

OM rocksdb uses a lot of space.

This PR adds a cache in RDBBatchOperation for caching the put ops. When there are multiple ops with the same key, it will overwrite the previous op in the cache so the previous op won't be written into RocksDB. In a test, the cache can reduce the disk usage in OM rocksdb of a single upload with 1000 parts from 755MB to 23MB.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8128

How was this patch tested?

The is an optimization. The existing test already has covered. Also have tested this with ozone-1.3...szetszwo:ozone:multipartUpload manually.

Copy link
Contributor

@Galsza Galsza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo Thank you for working on this patch. I've left a few comments.

Please fix the batchSize resetting, the rest are nitpicks.

public class RDBBatchOperation implements BatchOperation {
static final Logger LOG = LoggerFactory.getLogger(RDBBatchOperation.class);

static void debug(Supplier<String> message) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method should be private. Are we going to use it anywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

class PutOpCache {
class FamilyCache {
private final ColumnFamily family;
/** A (key -> value) map. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** A (key -> value) map. */
/**
* The map used to store the key and value byte array pairs.
* <p>
* {@link ByteArray} is used for the keys to avoid incorrect map
* operations.
*/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Will update it.

}

void put(byte[] key, byte[] value) {
batchSize += key.length + value.length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If batchSize is not reset anywhere this is going to overflow. Also, could this overflow even with a reset included? (Will cached put operations ever reach 2 gigs in size?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We should use long for sizes.

}

/** A (family name -> {@link FamilyCache}) map. */
private final Map<String, FamilyCache> map = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming the variable to nameToCacheMap for easier understanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor

@Galsza Galsza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes, looks good to me.

@szetszwo szetszwo changed the title HDDS-8128. OM rocksdb uses a lot of space. HDDS-8128. Deduplicate the ops in RDBBatchOperation. Mar 22, 2023
@szetszwo
Copy link
Contributor Author

The force-pushed was to sync with master since TestEndPoint.testTmpDirCleanup will fail without #4445 (misused value as key). The values (misused as keys) in the tests are the same, this change removes the duplications and cause failure.

@adoroszlai
Copy link
Contributor

force-pushed was to sync with master

Thanks for the info. Sync with master can also be done by merging instead of rebasing. The commits in the PR will be squashed anyway, so linearity doesn't matter.

@szetszwo
Copy link
Contributor Author

@adoroszlai , thanks for the advices. I tend to prefer rebase over merge since it is easier to see the new code. I agree that when everything works fine, rebase or merge do not matter. When something does not work, rebase seems easier to trace the problem.

@adoroszlai
Copy link
Contributor

I tend to prefer rebase over merge since it is easier to see the new code.

How does the reviewer see what changes were made to the PR since the previous round of review, i.e. to address the review comments?

@szetszwo
Copy link
Contributor Author

If we rebase but not squash, the comments in Github may still work.

Also, I usually keep the old branch https://github.com/szetszwo/ozone/tree/HDDS-8128b . Reviewers may download the patches and the run diff locally (this is like posting patches on the JIRA).

@szetszwo
Copy link
Contributor Author

Indeed, a previous comment still show in https://github.com/apache/ozone/pull/4424/files .

The other comments become outdated. I guess it is because the code has changed. It has nothing to do with merge or rebase.

@adoroszlai
Copy link
Contributor

When posting patches to Jira, old ones are usually kept. Force-pushing is like keeping only the latest patch. Keeping around old branches helps, but you have to let others know about them, and even then it's still not as streamlined as incremental commits.

You are right that comments are still visible after force-push. However, it is hard to see the state of code they were referring to and how code was changed in response to them.

https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review

@szetszwo
Copy link
Contributor Author

For merging, the downloaded patch file could be very messy since it may include the merged code from other commit. This is another reason I prefer rebase over merge.

... However, it is hard to see the state of code they were referring to and how code was changed in response to them.

Since the old branch is still around, we still can see the newer commits addressing the comments.

Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @szetszwo for the change. This looks like a nice improvement and maybe RocksDB should've done it internally. I only put a few minor comments inline.

@szetszwo
Copy link
Contributor Author

@duongkame , thanks a lot for reviewing this! Just have pushed a new commit for addressing your comments.

@szetszwo szetszwo merged commit 386d1be into apache:master Mar 25, 2023
@swamirishi
Copy link
Contributor

@szetszwo Did we see the disk usage go up because of the rocksdb wal? or did we see an sst redundant sst flush happen?

@szetszwo
Copy link
Contributor Author

WAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants