KAFKA-10433 Reuse the ByteBuffer in validating compressed records by chia7712 · Pull Request #9220 · apache/kafka

chia7712 · 2020-08-25T17:49:02Z

issue: https://issues.apache.org/jira/browse/KAFKA-10433

It is hot method so reusing the ByteBuffer can reduce a bunch of memory usage if the compression type supports BufferSupplier.

experiment

duration: 1 minute
compression: LZ4
workload: 10 producers -> 1 broker
memory usage of byte array allocation: 4.52 GB -> 0.68 GB

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

ijuma · 2020-08-25T18:01:28Z

@chia7712 Thanks for the PR. The intent is good, but I think the approach should be a bit different. As it happens, I have implemented this other approach. Would you be OK if I submit that as a PR and we can compare?

chia7712 · 2020-08-25T18:24:41Z

Would you be OK if I submit that as a PR and we can compare?

Please feel free to submit another PR. We all love to see the better solution :)

ijuma · 2020-08-29T21:10:21Z

@chia7712 here's what I had in mind:
#9229

What do you think?

chia7712 · 2020-08-30T09:55:55Z

close as there is a better approach (#9229)

Use a caching `BufferSupplier` per request handler thread so that decompression buffers are cached if supported by the underlying `CompressionType`. This achieves a similar outcome as #9220, but with less contention. We introduce a `RequestLocal` class to make it easier to introduce new request scoped stateful instances (one example we discussed previously was an `ActionQueue` that could be used to avoid some of the complex group coordinator locking). This is a small win for zstd (no synchronization or soft references) and a more significant win for lz4. In particular, it reduces allocations significantly when the number of partitions is high. The decompression buffer size is typically 64 KB, so a produce request with 1000 partitions results in 64 MB of allocations even if each produce batch is small (likely, when there are so many partitions). I did a quick producer perf local test with 5000 partitions, 1 KB record size, 1 broker, lz4 and ~0.5 for the producer compression rate metric: Before this change: > 20000000 records sent, 346314.349535 records/sec (330.27 MB/sec), 148.33 ms avg latency, 2267.00 ms max latency, 115 ms 50th, 383 ms 95th, 777 ms 99th, 1514 ms 99.9th. After this change: > 20000000 records sent, 431956.113259 records/sec (411.95 MB/sec), 117.79 ms avg latency, 1219.00 ms max latency, 99 ms 50th, 295 ms 95th, 440 ms 99th, 662 ms 99.9th. That's a 25% throughput improvement and p999 latency was reduced to under half (in this test). Default arguments will be removed in a subsequent PR. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

junrao · 2023-10-27T21:02:18Z

@chia7712 : We filed KAFKA-15674 to revisit this. The current implementation (#9229) passes around a thread unsafe data structure to multiple places such as the ReplicaManager, GroupCoordinator, TxnCoordinator, etc. Each of those places has the logic to add callbacks. If one is not careful (as in KAFKA-15674), the thread unsafe data structure could be included in those callbacks, which will lead to subtle concurrency issues.

It's probably safer to replace the thread unsafe data structure with the thread safe one as you had in this PR. Would you be interested in resurrecting this PR? If there is no significant performance impact, we could just take the approach in this PR.

cc @ijuma

KAFKA-10433 Reuse the ByteBuffer in validating compressed records

41be8fa

chia7712 mentioned this pull request Aug 29, 2020

MINOR: Reduce allocations in requests via buffer caching #9229

Merged

3 tasks

chia7712 closed this Aug 30, 2020

chia7712 mentioned this pull request Oct 11, 2020

Support to reuse the byte array created by ZstdInputStream/ZstdOutputStream luben/zstd-jni#131

Closed

junrao mentioned this pull request Oct 30, 2023

KAFKA-15653: Pass requestLocal as argument to callback so we use the correct one for the thread #14629

Merged

3 tasks

chia7712 deleted the KAFKA-10433 branch March 25, 2024 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-10433 Reuse the ByteBuffer in validating compressed records#9220

KAFKA-10433 Reuse the ByteBuffer in validating compressed records#9220
chia7712 wants to merge 1 commit intoapache:trunkfrom
chia7712:KAFKA-10433

chia7712 commented Aug 25, 2020 •

edited

Loading

Uh oh!

ijuma commented Aug 25, 2020

Uh oh!

chia7712 commented Aug 25, 2020

Uh oh!

ijuma commented Aug 29, 2020

Uh oh!

chia7712 commented Aug 30, 2020

Uh oh!

junrao commented Oct 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chia7712 commented Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

ijuma commented Aug 25, 2020

Uh oh!

chia7712 commented Aug 25, 2020

Uh oh!

ijuma commented Aug 29, 2020

Uh oh!

chia7712 commented Aug 30, 2020

Uh oh!

junrao commented Oct 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chia7712 commented Aug 25, 2020 •

edited

Loading