KAFKA-12410 KafkaAPis ought to group fetch data before generating fet… by chia7712 · Pull Request #10269 · apache/kafka

chia7712 · 2021-03-05T05:49:04Z

The fetch data generated by KafkaApis is re-grouped when it is converted to FetchResponse. That is unnecessary since KafkaApis can keep a grouped collection for fetch data. The other main changes are shown below.

remove FetchResponse#of
remove useless constructor from FetchResponse
remove PartitionIterator

JMH Tests

cpu: intel i9-10900
ram: 64 GB
IncrementalFetchContextBenchmark

trunk (#10291)

IncrementalFetchContextBenchmark.getResponseSize                                                 avgt   10        18.006 ±       1.749   ms/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.alloc.rate                                  avgt   10       564.158 ±      54.754  MB/sec
IncrementalFetchContextBenchmark.getResponseSize:·gc.alloc.rate.norm                             avgt   10  11190518.325 ±      48.355    B/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Eden_Space                         avgt   10       564.717 ±     102.891  MB/sec
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Eden_Space.norm                    avgt   10  11202996.185 ± 1780869.078    B/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Old_Gen                            avgt   10         0.001 ±       0.004  MB/sec
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Old_Gen.norm                       avgt   10        17.825 ±      85.222    B/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Survivor_Space                     avgt   10        12.259 ±      19.367  MB/sec
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Survivor_Space.norm                avgt   10    252405.775 ±  410781.330    B/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.count                                       avgt   10        48.000                counts
IncrementalFetchContextBenchmark.getResponseSize:·gc.time                                        avgt   10      8665.000                    ms
IncrementalFetchContextBenchmark.updateAndGenerateResponseData                                   avgt   10        16.580 ±       1.551   ms/op
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.alloc.rate                    avgt   10       581.410 ±      54.691  MB/sec
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.alloc.rate.norm               avgt   10  10649362.111 ±      69.365    B/op
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.churn.G1_Eden_Space           avgt   10       569.026 ±      86.815  MB/sec
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.churn.G1_Eden_Space.norm      avgt   10  10441914.443 ± 1589284.501    B/op
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.churn.G1_Survivor_Space       avgt   10        11.383 ±      22.870  MB/sec
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.churn.G1_Survivor_Space.norm  avgt   10    205891.249 ±  413155.900    B/op
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.count                         avgt   10        45.000                counts
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.time                          avgt   10      8692.000                    ms

patch (`603da4d`)

IncrementalFetchContextBenchmark.getResponseSize                                                 avgt   10        15.883 ±       1.696   ms/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.alloc.rate                                  avgt   10       621.511 ±      66.191  MB/sec
IncrementalFetchContextBenchmark.getResponseSize:·gc.alloc.rate.norm                             avgt   10  10823170.429 ±      68.899    B/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Eden_Space                         avgt   10       615.643 ±     108.319  MB/sec
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Eden_Space.norm                    avgt   10  10747008.602 ± 1840451.381    B/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Survivor_Space                     avgt   10         4.642 ±      10.201  MB/sec
IncrementalFetchContextBenchmark.getResponseSize:·gc.churn.G1_Survivor_Space.norm                avgt   10     86612.407 ±  193293.062    B/op
IncrementalFetchContextBenchmark.getResponseSize:·gc.count                                       avgt   10        49.000                counts
IncrementalFetchContextBenchmark.getResponseSize:·gc.time                                        avgt   10      9855.000                    ms
IncrementalFetchContextBenchmark.updateAndGenerateResponseData                                   avgt   10        16.457 ±       1.462   ms/op
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.alloc.rate                    avgt   10       601.353 ±      53.868  MB/sec
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.alloc.rate.norm               avgt   10  10865891.101 ±      60.561    B/op
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.churn.G1_Eden_Space           avgt   10       597.302 ±      82.682  MB/sec
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.churn.G1_Eden_Space.norm      avgt   10  10813118.583 ± 1536423.202    B/op
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.churn.G1_Survivor_Space       avgt   10         2.892 ±       8.490  MB/sec
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.churn.G1_Survivor_Space.norm  avgt   10     54084.799 ±  159225.286    B/op
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.count                         avgt   10        48.000                counts
IncrementalFetchContextBenchmark.updateAndGenerateResponseData:·gc.time                          avgt   10      8639.000                    ms

Performance Tests

cpu: intel i9-10900
ram: 64 GB
script: benchmark_test.py::Benchmark.test_consumer_throughput
loops: 30

case 0: +2.760595128 %

{
  "compression_type": "none",
  "security_protocol": "PLAINTEXT",
  "interbroker_security_protocol": "PLAINTEXT"
}

TRUNK: 259.228705 MB/sec
PATCH: 266.38496 MB/sec

case 1: -0.6257537776 %

{
  "compression_type": "snappy",
  "security_protocol": "PLAINTEXT",
  "interbroker_security_protocol": "PLAINTEXT"
}

TRUNK: 364.161605 MB/sec
PATCH: 361.88285 MB/sec

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…ch response

chia7712 · 2021-03-05T05:51:04Z

-  // Iterator that goes over the given partition map and selects partitions that need to be included in the response.
-  // If updateFetchContextAndRemoveUnselected is set to true, the fetch context will be updated for the selected
-  // partitions and also remove unselected ones as they are encountered.
-  private class PartitionIterator(val iter: FetchSession.RESP_MAP_ITER,


The iterator is unnecessary since we have to generate list collection in order to calculate message size.

chia7712 · 2021-03-05T05:53:16Z


-  def partitionsToLogString(partitions: util.Collection[TopicPartition]): String =
-    FetchSession.partitionsToLogString(partitions, isTraceEnabled)
+  def partitionsToLogString(topics: FetchSession.RESP_MAP): String = {


this method is used to log (DEBUG level) so it should be fine to iterate through whole collection.

chia7712 · 2021-03-05T05:54:14Z

    // the callback for process a fetch response, invoked before throttling
    def processResponseCallback(responsePartitionData: Seq[(TopicPartition, FetchPartitionData)]): Unit = {
-      val partitions = new util.LinkedHashMap[TopicPartition, FetchResponseData.PartitionData]
+      val topicResponses = new util.ArrayList[FetchResponseData.FetchableTopicResponse]()


This is the main purpose of this PR. KafkaApis keeps grouped data.

dajac · 2021-03-05T06:53:41Z

Nice PR! I will take a look at it on Monday.

dajac

I've left a few comments. I couldn't ready all the PR yet.

ijuma · 2021-03-05T15:22:13Z

Thanks for the PR. Can you check the perf impact of these changes?

chia7712 · 2021-03-08T17:44:24Z

Thanks for the PR. Can you check the perf impact of these changes?

sure. will add benchmark results tomorrow.

chia7712 · 2021-03-09T17:45:24Z

@ijuma The results of performance tests are attached. It does not show obvious performance regression. Will run more tests tomorrow.

ijuma · 2021-03-09T17:49:50Z

@chia7712 I was thinking about the jmh microbenchmarks that stress fetch, fetch session and so on.

chia7712 · 2021-03-09T17:59:08Z

I was thinking about the jmh microbenchmarks that stress fetch, fetch session and so on.

will copy that.

chia7712 · 2021-03-10T08:17:13Z

@ijuma the JMH result of fetch session is attached. I tried to have a JMH for stress fetch. However, KafkaApis.handleFetchRequest is hard to be a JMH. It requires a lot of changes ...

jolshan · 2021-03-16T22:13:39Z

@chia7712 is there any work in progress for a KafkaApis.handleFetchRequest test? I suspect it would be similar but maybe a bit harder than what I did for the LeaderAndIsr version #10071 (trading replicamanager for fetchmanager, etc). This benchmark would be helpful for #9944 as you could probably guess :)

chia7712 · 2021-03-17T02:32:58Z

is there any work in progress for a KafkaApis.handleFetchRequest test? I suspect it would be similar but maybe a bit harder than what I did for the LeaderAndIsr version #10071 (trading replicamanager for fetchmanager, etc). This benchmark would be helpful for #9944 as you could probably guess :)

this PR is blocked by #9944. This PR (and other related issues) aim to remove all extra collection creation by using auto-generated data. In #9944 we have to create a lot of collections to handle the topic id in fetch request. Hence, I need to rethink the value (and approach) of this PR :)

jolshan · 2021-03-17T15:42:09Z

@chia7712 I'm rewriting #9944 to use the autogenerated structures based on this PR. Just pushed a version that simplifies the unresolved topic ID handling. I tried to make it easier to build the fetch response using the data object. Going to try to build the response using the data object in most places today and I can push that version as soon as I can.

chia7712 · 2021-03-17T15:54:36Z

I'm rewriting #9944 to use the autogenerated structures based on this PR. Just pushed a version that simplifies the unresolved topic ID handling. I tried to make it easier to build the fetch response using the data object. Going to try to build the response using the data object in most places today and I can push that version as soon as I can.

sounds good. If the new approach is very different from #9944, please open a new PR in order to compare them :)

jolshan · 2021-03-19T16:56:28Z

sounds good. If the new approach is very different from #9944, please open a new PR in order to compare them :)

@chia7712 I've updated the code. I think this is the direction we want so I didn't open a PR.
Biggest changes to the structure are in these commits:
5a0a6d6
a0b2bc9
ace90b0

The idea is that FetchSession can now generate a list of the unresolvedTopics' FetchResponseData.FetchableTopicResponse. Hopefully from there, it is not too difficult to combine with your approach here.

But let me know if it's hard to read. I can open a new one and revert the changes on the old.

KAFKA-12410 KafkaAPis ought to group fetch data before generating fet…

80a81e3

…ch response

chia7712 requested a review from ijuma March 5, 2021 05:49

chia7712 commented Mar 5, 2021

View reviewed changes

tweak code

8e93f81

dajac self-requested a review March 5, 2021 06:52

dajac reviewed Mar 5, 2021

View reviewed changes

Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated

Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated

Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated

Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated

chia7712 added 3 commits March 5, 2021 15:48

tweak code

597f293

create Errors only if we need

2b16aa6

avoid copy of response

1a98fa7

chia7712 added 2 commits March 9, 2021 00:42

Merge branch 'trunk' into KAFKA-12410

c1027ec

avoid collection copy

2e5fc2e

ijuma reviewed Mar 8, 2021

View reviewed changes

Comment thread core/src/main/scala/kafka/server/KafkaApis.scala

chia7712 mentioned this pull request Mar 9, 2021

MINOR: remove FetchResponse.AbortedTransaction and redundant construc… #9758

Merged

3 tasks

Merge branch 'trunk' into KAFKA-12410

e4827cc

Merge branch 'trunk' into KAFKA-12410

f2bd242

chia7712 mentioned this pull request Mar 10, 2021

(DON"T MERGE) add benchmark #10291

Closed

add benchmark

603da4d

chia7712 mentioned this pull request Mar 12, 2021

KAFKA-10580: Add topic ID support to Fetch request #9944

Merged

3 tasks

chia7712 closed this Mar 25, 2024

chia7712 deleted the KAFKA-12410 branch March 25, 2024 15:21

Conversation

chia7712 commented Mar 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JMH Tests

trunk (#10291)

patch (603da4d)

Performance Tests

case 0: +2.760595128 %

case 1: -0.6257537776 %

Committer Checklist (excluded from commit message)

Uh oh!

chia7712 Mar 5, 2021

Choose a reason for hiding this comment

Uh oh!

chia7712 Mar 5, 2021

Choose a reason for hiding this comment

Uh oh!

chia7712 Mar 5, 2021

Choose a reason for hiding this comment

Uh oh!

dajac commented Mar 5, 2021

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ijuma commented Mar 5, 2021

Uh oh!

chia7712 commented Mar 8, 2021

Uh oh!

Uh oh!

chia7712 commented Mar 9, 2021

Uh oh!

ijuma commented Mar 9, 2021

Uh oh!

chia7712 commented Mar 9, 2021

Uh oh!

chia7712 commented Mar 10, 2021

Uh oh!

jolshan commented Mar 16, 2021

Uh oh!

chia7712 commented Mar 17, 2021

Uh oh!

jolshan commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chia7712 commented Mar 17, 2021

Uh oh!

jolshan commented Mar 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chia7712 commented Mar 5, 2021 •

edited

Loading

patch (`603da4d`)

jolshan commented Mar 17, 2021 •

edited

Loading