[improve][java-client] Improve performance of multi-topic consumer with more than one IO thread #16336

codelipenghui · 2022-07-01T16:34:20Z

Motivation

After running the test with the partitioned topic(partitioned topic with only 1 partition) and 4 IO threads.

bin/pulsar-perf produce test -r 500000 -s 1 -o 10000 -threads 2
bin/pulsar-perf consume test -q 100000 -ioThreads 4

The consumer got a very bad performance:

Profiling started

2022-07-01T23:03:13,230+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received:  216223 msg --- 21517.245  msg/s --- 0.164 Mbit/s  --- Latency: mean: 2702.319 ms - med: 2644 - 95pct: 4594 - 99pct: 4758 - 99.9pct: 4844 - 99.99pct: 4854 - Max: 4854
2022-07-01T23:03:23,494+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received:  812896 msg --- 58004.006  msg/s --- 0.443 Mbit/s  --- Latency: mean: 9826.540 ms - med: 9992 - 95pct: 12905 - 99pct: 13210 - 99.9pct: 13284 - 99.99pct: 13288 - Max: 13288
2022-07-01T23:03:33,506+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 1548826 msg --- 73501.942  msg/s --- 0.561 Mbit/s  --- Latency: mean: 18000.596 ms - med: 18038 - 95pct: 22170 - 99pct: 22501 - 99.9pct: 22585 - 99.99pct: 22593 - Max: 22594
2022-07-01T23:03:43,519+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 2271634 msg --- 72190.579  msg/s --- 0.551 Mbit/s  --- Latency: mean: 26907.162 ms - med: 26914 - 95pct: 30770 - 99pct: 31196 - 99.9pct: 31290 - 99.99pct: 31297 - Max: 31298
2022-07-01T23:03:53,527+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 2852575 msg --- 58021.315  msg/s --- 0.443 Mbit/s  --- Latency: mean: 35547.844 ms - med: 35503 - 95pct: 39662 - 99pct: 40044 - 99.9pct: 40120 - 99.99pct: 40129 - Max: 40130
2022-07-01T23:04:03,539+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 3317622 msg --- 46424.278  msg/s --- 0.354 Mbit/s  --- Latency: mean: 44615.969 ms - med: 44597 - 95pct: 48807 - 99pct: 49149 - 99.9pct: 49218 - 99.99pct: 49224 - Max: 49225
2022-07-01T23:04:13,554+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 3715533 msg --- 39746.763  msg/s --- 0.303 Mbit/s  --- Latency: mean: 53487.733 ms - med: 53445 - 95pct: 57427 - 99pct: 58195 - 99.9pct: 58413 - 99.99pct: 58425 - Max: 58443

consumer01.html.txt

Use batch receive in the MultiTopicsConsumerImpl internal to fix the performance issue.
After this PR:

Profiling started

2022-07-04T18:11:26,136+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 113768036 msg --- 1033960.241  msg/s --- 7.888 Mbit/s  --- Latency: mean: 26.907 ms - med: 11 - 95pct: 186 - 99pct: 333 - 99.9pct: 345 - 99.99pct: 346 - Max: 347
2022-07-04T18:11:36,148+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 123437199 msg --- 965982.869  msg/s --- 7.370 Mbit/s  --- Latency: mean: 17.216 ms - med: 11 - 95pct: 81 - 99pct: 100 - 99.9pct: 202 - 99.99pct: 231 - Max: 242
Profiling started
2022-07-04T18:11:46,163+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 133784660 msg --- 1033280.877  msg/s --- 7.883 Mbit/s  --- Latency: mean: 29.536 ms - med: 11 - 95pct: 169 - 99pct: 194 - 99.9pct: 198 - 99.99pct: 200 - Max: 200
2022-07-04T18:11:56,176+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 143800200 msg --- 1000179.779  msg/s --- 7.631 Mbit/s  --- Latency: mean: 10.326 ms - med: 10 - 95pct: 17 - 99pct: 23 - 99.9pct: 28 - 99.99pct: 29 - Max: 30
2022-07-04T18:12:06,192+0800 [main] INFO  org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 153796349 msg --- 998201.268  msg/s --- 7.616 Mbit/s  --- Latency: mean: 12.046 ms - med: 9 - 95pct: 29 - 99pct: 35 - 99.9pct: 36 - 99.99pct: 37 - Max: 39

consumer2.html.txt

Documentation

Check the box below or label this PR directly.

Need to update docs?

doc-required
(Your PR needs to update docs and you will update later)
doc-not-needed
(Please explain why)
doc
(Your PR contains doc changes)
doc-complete
(Docs have been already added)

Jason918 · 2022-07-02T07:52:40Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java

+                                .maxNumBytes(-1)
+                                .timeout(0, TimeUnit.MILLISECONDS)
+                                .build();
+                        configurationData.setBatchReceivePolicy(internalBatchReceivePolicy);


There are other 2 places creating ConsumerImpl in this class.

@Jason918 Fixed.

zymap · 2022-07-04T01:35:06Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java

    private void receiveMessageFromConsumer(ConsumerImpl<T> consumer) {
-        consumer.receiveAsync().thenAcceptAsync(message -> {
+        CompletableFuture<List<Message<T>>> messagesFuture;
+        if (consumer.numMessagesInQueue() >= 10) {


Just make sure It's worth using batch receive API. If less than 10 messages are in the internal consumer and the batch receive operation will also create an ArrayList. And I think we'd better not introduce a new configuration to make the client configuration more complex.

Jason918 · 2022-07-04T02:51:45Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java

    protected abstract void updateAutoScaleReceiverQueueHint();

    protected boolean hasEnoughMessagesForBatchReceive() {
+        if (batchReceivePolicy.getTimeoutMs() <= 0) {


It seems like a break change if user set this to 0, then batchReceive() would return Messages with empty message immediately when there is no message to consume.

@Jason918 fixed, use 1ms instead.

315157973 · 2022-07-04T12:09:28Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java

    private void receiveMessageFromConsumer(ConsumerImpl<T> consumer) {
-        consumer.receiveAsync().thenAcceptAsync(message -> {
+        CompletableFuture<List<Message<T>>> messagesFuture;
+        if (consumer.numMessagesInQueue() >= 10) {


Since it is not configurable, should we test out a good threshold, 10 is the best?

Can we add a new batchReceive implementation to directly get as many messages in incomingMessages? This way you don't need to judge the size of the numMessagesInQueue.

Similar to just running this logic.

MessagesImpl<T> messages = getNewMessagesImpl(); Message<T> msgPeeked = incomingMessages.peek(); while (msgPeeked != null && messages.canAdd(msgPeeked)) { Message<T> msg = incomingMessages.poll(); if (msg != null) { messageProcessed(msg); if (!isValidConsumerEpoch(msg)) { msgPeeked = incomingMessages.peek(); continue; } Message<T> interceptMsg = beforeConsume(msg); messages.add(interceptMsg); } msgPeeked = incomingMessages.peek(); }

Seeing your latest changes, oh, just call batch directly.

hangc0276 · 2022-07-04T14:55:58Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java

                // Call receiveAsync() if the incoming queue is not full. Because this block is run with
                // thenAcceptAsync, there is no chance for recursion that would lead to stack overflow.
-                receiveMessageFromConsumer(consumer);
+                receiveMessageFromConsumer(consumer, messages.size() > 0);


If entered receiveMessageFromConsumer from this line and the messages.size() = 0 in this cycle, is there any chance to call batch receive in the future?

Yes, if messages.size() = 0, the current round of receive messages from the internal consumer will use consumer.receiveAsync(). After the internal consumer has new incoming messages, the next round will use the batchReceiveAsync() again.

… consumer with more than one IO thread ### Motivation After run the test with partitioned topic and 4 IO thread. ``` bin/pulsar-perf produce test -r 500000 -s 1 -mk random -o 10000 -threads 2 bin/pulsar-perf consume test -q 100000 -ioThreads 4 ``` The consumer got a very bad performance: ``` Profiling started 2022-07-01T23:03:13,230+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 216223 msg --- 21517.245 msg/s --- 0.164 Mbit/s --- Latency: mean: 2702.319 ms - med: 2644 - 95pct: 4594 - 99pct: 4758 - 99.9pct: 4844 - 99.99pct: 4854 - Max: 4854 2022-07-01T23:03:23,494+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 812896 msg --- 58004.006 msg/s --- 0.443 Mbit/s --- Latency: mean: 9826.540 ms - med: 9992 - 95pct: 12905 - 99pct: 13210 - 99.9pct: 13284 - 99.99pct: 13288 - Max: 13288 2022-07-01T23:03:33,506+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 1548826 msg --- 73501.942 msg/s --- 0.561 Mbit/s --- Latency: mean: 18000.596 ms - med: 18038 - 95pct: 22170 - 99pct: 22501 - 99.9pct: 22585 - 99.99pct: 22593 - Max: 22594 2022-07-01T23:03:43,519+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 2271634 msg --- 72190.579 msg/s --- 0.551 Mbit/s --- Latency: mean: 26907.162 ms - med: 26914 - 95pct: 30770 - 99pct: 31196 - 99.9pct: 31290 - 99.99pct: 31297 - Max: 31298 2022-07-01T23:03:53,527+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 2852575 msg --- 58021.315 msg/s --- 0.443 Mbit/s --- Latency: mean: 35547.844 ms - med: 35503 - 95pct: 39662 - 99pct: 40044 - 99.9pct: 40120 - 99.99pct: 40129 - Max: 40130 2022-07-01T23:04:03,539+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 3317622 msg --- 46424.278 msg/s --- 0.354 Mbit/s --- Latency: mean: 44615.969 ms - med: 44597 - 95pct: 48807 - 99pct: 49149 - 99.9pct: 49218 - 99.99pct: 49224 - Max: 49225 2022-07-01T23:04:13,554+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 3715533 msg --- 39746.763 msg/s --- 0.303 Mbit/s --- Latency: mean: 53487.733 ms - med: 53445 - 95pct: 57427 - 99pct: 58195 - 99.9pct: 58413 - 99.99pct: 58425 - Max: 58443 ``` Use batch receive in the MultiTopicsConsumerImpl internal to fix the performance issue. After this PR: ``` Profiling started 2022-07-02T00:15:33,291+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 91938234 msg --- 500221.875 msg/s --- 3.816 Mbit/s --- Latency: mean: 17.755 ms - med: 18 - 95pct: 23 - 99pct: 28 - 99.9pct: 34 - 99.99pct: 36 - Max: 36 2022-07-02T00:15:43,308+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 96929487 msg --- 498127.011 msg/s --- 3.800 Mbit/s --- Latency: mean: 27.666 ms - med: 18 - 95pct: 80 - 99pct: 98 - 99.9pct: 104 - 99.99pct: 105 - Max: 106 2022-07-02T00:15:53,328+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 101955467 msg --- 501660.867 msg/s --- 3.827 Mbit/s --- Latency: mean: 19.226 ms - med: 18 - 95pct: 32 - 99pct: 59 - 99.9pct: 66 - 99.99pct: 67 - Max: 68 2022-07-02T00:16:03,356+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 106959191 msg --- 499143.020 msg/s --- 3.808 Mbit/s --- Latency: mean: 132.371 ms - med: 28 - 95pct: 474 - 99pct: 505 - 99.9pct: 511 - 99.99pct: 515 - Max: 515 2022-07-02T00:16:13,378+0800 [main] INFO org.apache.pulsar.testclient.PerformanceConsumer - Throughput received: 111982006 msg --- 501017.414 msg/s --- 3.822 Mbit/s --- Latency: mean: 21.249 ms - med: 18 - 95pct: 53 - 99pct: 74 - 99.9pct: 81 - 99.99pct: 83 - Max: 84 ```

…th more than one IO thread (#16336) (cherry picked from commit bdda1eb)

…th more than one IO thread (apache#16336) (cherry picked from commit bdda1eb) (cherry picked from commit 1649ef4)

…th more than one IO thread (apache#16336)

…th more than one IO thread (#16336) (cherry picked from commit bdda1eb)

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 1, 2022

codelipenghui self-assigned this Jul 1, 2022

codelipenghui added this to the 2.11.0 milestone Jul 1, 2022

codelipenghui added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages component/client-java release/2.8.4 release/2.10.2 release/2.9.4 and removed release/2.8.4 labels Jul 1, 2022

codelipenghui requested a review from merlimat July 1, 2022 16:55

Jason918 reviewed Jul 2, 2022

View reviewed changes

codelipenghui requested a review from Jason918 July 2, 2022 12:02

zymap reviewed Jul 4, 2022

View reviewed changes

Jason918 reviewed Jul 4, 2022

View reviewed changes

315157973 reviewed Jul 4, 2022

View reviewed changes

Jason918 approved these changes Jul 4, 2022

View reviewed changes

codelipenghui force-pushed the penghui/improve-multi-topic-consumer-performance branch from a950955 to 258593b Compare July 4, 2022 14:18

shibd approved these changes Jul 4, 2022

View reviewed changes

zymap approved these changes Jul 4, 2022

View reviewed changes

hangc0276 reviewed Jul 4, 2022

View reviewed changes

hangc0276 approved these changes Jul 4, 2022

View reviewed changes

315157973 approved these changes Jul 5, 2022

View reviewed changes

codelipenghui added 7 commits July 5, 2022 11:29

Apply comments.

d58a82f

Remove schedule.

602ba66

Remove schedule.

714ae60

Remove public method.

4d8ce0b

Address comments.

853909d

remove queue size check

07280ff

codelipenghui force-pushed the penghui/improve-multi-topic-consumer-performance branch from 258593b to 07280ff Compare July 5, 2022 03:29

codelipenghui merged commit bdda1eb into apache:master Jul 5, 2022

codelipenghui deleted the penghui/improve-multi-topic-consumer-performance branch July 5, 2022 06:31

codelipenghui added a commit that referenced this pull request Jul 10, 2022

[improve][java-client] Improve performance of multi-topic consumer wi…

1649ef4

…th more than one IO thread (#16336) (cherry picked from commit bdda1eb)

codelipenghui added the cherry-picked/branch-2.10 label Jul 10, 2022

wuxuanqicn pushed a commit to wuxuanqicn/pulsar that referenced this pull request Jul 14, 2022

[improve][java-client] Improve performance of multi-topic consumer wi…

0807021

…th more than one IO thread (apache#16336)

congbobo184 pushed a commit that referenced this pull request Nov 10, 2022

[improve][java-client] Improve performance of multi-topic consumer wi…

d892148

…th more than one IO thread (#16336) (cherry picked from commit bdda1eb)

congbobo184 added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Nov 10, 2022

congbobo184 pushed a commit that referenced this pull request Nov 26, 2022

[improve][java-client] Improve performance of multi-topic consumer wi…

1ac2587

…th more than one IO thread (#16336) (cherry picked from commit bdda1eb)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][java-client] Improve performance of multi-topic consumer with more than one IO thread #16336

[improve][java-client] Improve performance of multi-topic consumer with more than one IO thread #16336

Uh oh!

codelipenghui commented Jul 1, 2022 •

edited

Loading

Uh oh!

Jason918 Jul 2, 2022

Uh oh!

codelipenghui Jul 2, 2022

Uh oh!

zymap Jul 4, 2022

Uh oh!

codelipenghui Jul 4, 2022

Uh oh!

Jason918 Jul 4, 2022

Uh oh!

codelipenghui Jul 4, 2022

Uh oh!

315157973 Jul 4, 2022

Uh oh!

shibd Jul 4, 2022

Uh oh!

shibd Jul 4, 2022

Uh oh!

hangc0276 Jul 4, 2022

Uh oh!

codelipenghui Jul 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[improve][java-client] Improve performance of multi-topic consumer with more than one IO thread #16336

[improve][java-client] Improve performance of multi-topic consumer with more than one IO thread #16336

Uh oh!

Conversation

codelipenghui commented Jul 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codelipenghui commented Jul 1, 2022 •

edited

Loading