Skip to content

KAFKA-10433 Reuse the ByteBuffer in validating compressed records#9220

Closed
chia7712 wants to merge 1 commit intoapache:trunkfrom
chia7712:KAFKA-10433
Closed

KAFKA-10433 Reuse the ByteBuffer in validating compressed records#9220
chia7712 wants to merge 1 commit intoapache:trunkfrom
chia7712:KAFKA-10433

Conversation

@chia7712
Copy link
Copy Markdown
Member

@chia7712 chia7712 commented Aug 25, 2020

issue: https://issues.apache.org/jira/browse/KAFKA-10433

It is hot method so reusing the ByteBuffer can reduce a bunch of memory usage if the compression type supports BufferSupplier.

experiment

  • duration: 1 minute
  • compression: LZ4
  • workload: 10 producers -> 1 broker
  • memory usage of byte array allocation: 4.52 GB -> 0.68 GB

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Aug 25, 2020

@chia7712 Thanks for the PR. The intent is good, but I think the approach should be a bit different. As it happens, I have implemented this other approach. Would you be OK if I submit that as a PR and we can compare?

@chia7712
Copy link
Copy Markdown
Member Author

Would you be OK if I submit that as a PR and we can compare?

Please feel free to submit another PR. We all love to see the better solution :)

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Aug 29, 2020

@chia7712 here's what I had in mind:
#9229

What do you think?

@chia7712
Copy link
Copy Markdown
Member Author

close as there is a better approach (#9229)

@chia7712 chia7712 closed this Aug 30, 2020
ijuma added a commit that referenced this pull request May 30, 2021
Use a caching `BufferSupplier` per request handler thread so that
decompression buffers are cached if supported by the underlying
`CompressionType`. This achieves a similar outcome as #9220, but
with less contention.

We introduce a `RequestLocal` class to make it easier to introduce
new request scoped stateful instances (one example we discussed
previously was an `ActionQueue` that could be used to avoid
some of the complex group coordinator locking).

This is a small win for zstd (no synchronization or soft references) and
a more significant win for lz4. In particular, it reduces allocations
significantly when the number of partitions is high. The decompression
buffer size is typically 64 KB, so a produce request with 1000 partitions
results in 64 MB of allocations even if each produce batch is small (likely,
when there are so many partitions).

I did a quick producer perf local test with 5000 partitions, 1 KB record
size,
1 broker, lz4 and ~0.5 for the producer compression rate metric:

Before this change:
> 20000000 records sent, 346314.349535 records/sec (330.27 MB/sec),
148.33 ms avg latency, 2267.00 ms max latency, 115 ms 50th, 383 ms 95th, 777 ms 99th, 1514 ms 99.9th.

After this change:
> 20000000 records sent, 431956.113259 records/sec (411.95 MB/sec),
117.79 ms avg latency, 1219.00 ms max latency, 99 ms 50th, 295 ms 95th, 440 ms 99th, 662 ms 99.9th.

That's a 25% throughput improvement and p999 latency was reduced to
under half (in this test).

Default arguments will be removed in a subsequent PR.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
@junrao
Copy link
Copy Markdown
Contributor

junrao commented Oct 27, 2023

@chia7712 : We filed KAFKA-15674 to revisit this. The current implementation (#9229) passes around a thread unsafe data structure to multiple places such as the ReplicaManager, GroupCoordinator, TxnCoordinator, etc. Each of those places has the logic to add callbacks. If one is not careful (as in KAFKA-15674), the thread unsafe data structure could be included in those callbacks, which will lead to subtle concurrency issues.

It's probably safer to replace the thread unsafe data structure with the thread safe one as you had in this PR. Would you be interested in resurrecting this PR? If there is no significant performance impact, we could just take the approach in this PR.

cc @ijuma

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants