Skip to content

MINOR: Reduce sends created by SendBuilder#9619

Merged
hachikuji merged 4 commits intoapache:trunkfrom
hachikuji:improve-send-builder
Nov 19, 2020
Merged

MINOR: Reduce sends created by SendBuilder#9619
hachikuji merged 4 commits intoapache:trunkfrom
hachikuji:improve-send-builder

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji commented Nov 18, 2020

In #9401, I observed some strange latency behavior when testing locally (on macos). The behavior seemed to be caused by slightly different write behavior as a result of the new SendBuilder class. After the change, we were making several calls down to both GatheringByteChannel.write(ByteBuffer[]) as well as WritableByteChannel.write(ByteBuffer) whereas previously there would be just one path through the former. The difference is clearly seen in the flame graphs posted here: #9401 (comment).

This patch changes the grouping of Send objects created by SendBuilder to try and restore the previous write behavior. Rather than creating a single ByteBufferSend for each ByteBuffer, we attempt to group consecutive buffers into a single ByteBufferSend. I confirmed that did indeed address the latency issue I observed using the same test setup with a single broker that was discussed in #9401.

bin/kafka-topics.sh --create --topic foo --replication-factor 1 --partitions 10 --bootstrap-server $BROKER
bin/kafka-producer-perf-test.sh --topic foo --num-records 250000000 --throughput -1  --record-size 256 --producer-props bootstrap.servers=$BROKER

Here are the results:

Before KAFKA-9628:
250000000 records sent, 1398546.630342 records/sec (341.44 MB/sec), 39.71 ms avg latency, 424.00 ms max latency, 16 ms 50th, 137 ms 95th, 194 ms 99th, 270 ms 99.9th.

Trunk:
250000000 records sent, 1413619.374502 records/sec (345.12 MB/sec), 87.72 ms avg latency, 2058.00 ms max latency, 64 ms 50th, 247 ms 95th, 429 ms 99th, 790 ms 99.9th.                                             

This patch:
250000000 records sent, 1469723.691946 records/sec (358.82 MB/sec), 12.72 ms avg latency, 320.00 ms max latency, 2 ms 50th, 83 ms 95th, 162 ms 99th, 256 ms 99.9th.                                                

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hachikuji
Copy link
Copy Markdown
Contributor Author

cc @chia7712 @dajac

Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji +1 to this improvement.

just curious. Why the "big" latency can be made without this improvement. I run the perf on my machine and can't get such big different latency. Is MultiRecordsSend a bad pattern to our code?

Comment thread clients/src/main/java/org/apache/kafka/common/protocol/SendBuilder.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/protocol/SendBuilder.java Outdated
@hachikuji
Copy link
Copy Markdown
Contributor Author

@chia7712 To be honest, I'm not sure. I think it makes sense in general to try and batch the writes together to reduce number of calls to the kernel in any case, but I am not sure why it has such a big impact locally. I also didn't see anything like this when I was testing in on linux, but I didn't feel great about the difference anyway.

@chia7712
Copy link
Copy Markdown
Member

+1 to last commit

Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice optimization!

Comment thread clients/src/main/java/org/apache/kafka/common/protocol/SendBuilder.java Outdated
Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the update.

@hachikuji hachikuji merged commit 6054837 into apache:trunk Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants