Skip to content

KAFKA-9628 Replace Produce request/response with automated protocol#9401

Merged
hachikuji merged 15 commits intoapache:trunkfrom
chia7712:KAFKA-9628-1
Nov 18, 2020
Merged

KAFKA-9628 Replace Produce request/response with automated protocol#9401
hachikuji merged 15 commits intoapache:trunkfrom
chia7712:KAFKA-9628-1

Conversation

@chia7712
Copy link
Copy Markdown
Member

@chia7712 chia7712 commented Oct 9, 2020

issue: https://issues.apache.org/jira/browse/KAFKA-9628

Benchmark

  1. loop 30 times
  2. calculate average

kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput

@cluster(num_nodes=5)
@parametrize(acks=-1, topic=TOPIC_REP_THREE)

  • +0.3144915325 %
  • 28.08766667 -> 28.1715625 (mb_per_sec)

@cluster(num_nodes=5)
@matrix(acks=[1], topic=[TOPIC_REP_THREE], message_size=[100000],compression_type=["none"], security_protocol=['PLAINTEXT'])

  • +4.220730323 %
  • 157.145 -> 163.7776667 (mb_per_sec)

@cluster(num_nodes=7)
@parametrize(acks=1, topic=TOPIC_REP_THREE, num_producers=3)

  • +5.996241145%
  • 57.64166667 -> 61.098 (mb_per_sec)

@cluster(num_nodes=5)
@parametrize(acks=1, topic=TOPIC_REP_THREE)

  • +0.3979572536%
  • 44.05833333 -> 44.23366667 (mb_per_sec)

@cluster(num_nodes=5)
@parametrize(acks=1, topic= TOPIC_REP_ONE)

  • +2.228235226%
  • 69.23266667 -> 70.77533333 (mb_per_sec)

JMH results

In short, most ops performance are regression since we have to convert data to protocol data. The cost is inevitable (like other request/response) before we use protocol data directly.

JMH for ProduceRequest

  1. construction regression:
    • 281.474 -> 454.935 ns/op
    • 296.000 -> 1888.000 B/op
  2. toErrorResponse regression:
    • 41.942 -> 107.528 ns/op
    • 1216.000 -> 1616.000 B/op
  3. toStruct improvement:
    • 255.185 -> 90.728 ns/op
    • 864.000 -> 304.000 B/op

BEFORE

Benchmark                                                                        Mode  Cnt     Score    Error   Units
ProducerRequestBenchmark.constructorErrorResponse                                avgt   15    41.942 ±  0.036   ns/op
ProducerRequestBenchmark.constructorErrorResponse:·gc.alloc.rate                 avgt   15  6409.263 ±  5.478  MB/sec
ProducerRequestBenchmark.constructorErrorResponse:·gc.alloc.rate.norm            avgt   15   296.000 ±  0.001    B/op
ProducerRequestBenchmark.constructorErrorResponse:·gc.churn.G1_Eden_Space        avgt   15  6416.420 ± 76.071  MB/sec
ProducerRequestBenchmark.constructorErrorResponse:·gc.churn.G1_Eden_Space.norm   avgt   15   296.331 ±  3.539    B/op
ProducerRequestBenchmark.constructorErrorResponse:·gc.churn.G1_Old_Gen           avgt   15     0.002 ±  0.002  MB/sec
ProducerRequestBenchmark.constructorErrorResponse:·gc.churn.G1_Old_Gen.norm      avgt   15    ≈ 10⁻⁴             B/op
ProducerRequestBenchmark.constructorErrorResponse:·gc.count                      avgt   15   698.000           counts
ProducerRequestBenchmark.constructorErrorResponse:·gc.time                       avgt   15   378.000               ms
ProducerRequestBenchmark.constructorProduceRequest                               avgt   15   281.474 ±  3.286   ns/op
ProducerRequestBenchmark.constructorProduceRequest:·gc.alloc.rate                avgt   15  3923.868 ± 46.303  MB/sec
ProducerRequestBenchmark.constructorProduceRequest:·gc.alloc.rate.norm           avgt   15  1216.000 ±  0.001    B/op
ProducerRequestBenchmark.constructorProduceRequest:·gc.churn.G1_Eden_Space       avgt   15  3923.375 ± 59.568  MB/sec
ProducerRequestBenchmark.constructorProduceRequest:·gc.churn.G1_Eden_Space.norm  avgt   15  1215.844 ± 11.184    B/op
ProducerRequestBenchmark.constructorProduceRequest:·gc.churn.G1_Old_Gen          avgt   15     0.004 ±  0.001  MB/sec
ProducerRequestBenchmark.constructorProduceRequest:·gc.churn.G1_Old_Gen.norm     avgt   15     0.001 ±  0.001    B/op
ProducerRequestBenchmark.constructorProduceRequest:·gc.count                     avgt   15   515.000           counts
ProducerRequestBenchmark.constructorProduceRequest:·gc.time                      avgt   15   279.000               ms
ProducerRequestBenchmark.constructorStruct                                       avgt   15   255.185 ±  0.069   ns/op
ProducerRequestBenchmark.constructorStruct:·gc.alloc.rate                        avgt   15  3074.889 ±  0.823  MB/sec
ProducerRequestBenchmark.constructorStruct:·gc.alloc.rate.norm                   avgt   15   864.000 ±  0.001    B/op
ProducerRequestBenchmark.constructorStruct:·gc.churn.G1_Eden_Space               avgt   15  3077.737 ± 31.537  MB/sec
ProducerRequestBenchmark.constructorStruct:·gc.churn.G1_Eden_Space.norm          avgt   15   864.800 ±  8.823    B/op
ProducerRequestBenchmark.constructorStruct:·gc.churn.G1_Old_Gen                  avgt   15     0.003 ±  0.001  MB/sec
ProducerRequestBenchmark.constructorStruct:·gc.churn.G1_Old_Gen.norm             avgt   15     0.001 ±  0.001    B/op
ProducerRequestBenchmark.constructorStruct:·gc.count                             avgt   15   404.000           counts
ProducerRequestBenchmark.constructorStruct:·gc.time                              avgt   15   214.000               ms

AFTER

Benchmark                                                                        Mode  Cnt     Score    Error   Units
ProducerRequestBenchmark.constructorErrorResponse                                avgt   15   107.528 ±  0.270   ns/op
ProducerRequestBenchmark.constructorErrorResponse:·gc.alloc.rate                 avgt   15  4864.899 ± 12.132  MB/sec
ProducerRequestBenchmark.constructorErrorResponse:·gc.alloc.rate.norm            avgt   15   576.000 ±  0.001    B/op
ProducerRequestBenchmark.constructorErrorResponse:·gc.churn.G1_Eden_Space        avgt   15  4868.023 ± 61.943  MB/sec
ProducerRequestBenchmark.constructorErrorResponse:·gc.churn.G1_Eden_Space.norm   avgt   15   576.371 ±  7.331    B/op
ProducerRequestBenchmark.constructorErrorResponse:·gc.churn.G1_Old_Gen           avgt   15     0.005 ±  0.001  MB/sec
ProducerRequestBenchmark.constructorErrorResponse:·gc.churn.G1_Old_Gen.norm      avgt   15     0.001 ±  0.001    B/op
ProducerRequestBenchmark.constructorErrorResponse:·gc.count                      avgt   15   639.000           counts
ProducerRequestBenchmark.constructorErrorResponse:·gc.time                       avgt   15   339.000               ms
ProducerRequestBenchmark.constructorProduceRequest                               avgt   15   454.935 ±  0.332   ns/op
ProducerRequestBenchmark.constructorProduceRequest:·gc.alloc.rate                avgt   15  3769.014 ±  2.767  MB/sec
ProducerRequestBenchmark.constructorProduceRequest:·gc.alloc.rate.norm           avgt   15  1888.000 ±  0.001    B/op
ProducerRequestBenchmark.constructorProduceRequest:·gc.churn.G1_Eden_Space       avgt   15  3763.407 ± 31.530  MB/sec
ProducerRequestBenchmark.constructorProduceRequest:·gc.churn.G1_Eden_Space.norm  avgt   15  1885.190 ± 15.594    B/op
ProducerRequestBenchmark.constructorProduceRequest:·gc.churn.G1_Old_Gen          avgt   15     0.004 ±  0.001  MB/sec
ProducerRequestBenchmark.constructorProduceRequest:·gc.churn.G1_Old_Gen.norm     avgt   15     0.002 ±  0.001    B/op
ProducerRequestBenchmark.constructorProduceRequest:·gc.count                     avgt   15   494.000           counts
ProducerRequestBenchmark.constructorProduceRequest:·gc.time                      avgt   15   264.000               ms
ProducerRequestBenchmark.constructorStruct                                       avgt   15    90.728 ±  0.695   ns/op
ProducerRequestBenchmark.constructorStruct:·gc.alloc.rate                        avgt   15  3043.140 ± 23.246  MB/sec
ProducerRequestBenchmark.constructorStruct:·gc.alloc.rate.norm                   avgt   15   304.000 ±  0.001    B/op
ProducerRequestBenchmark.constructorStruct:·gc.churn.G1_Eden_Space               avgt   15  3047.251 ± 59.638  MB/sec
ProducerRequestBenchmark.constructorStruct:·gc.churn.G1_Eden_Space.norm          avgt   15   304.404 ±  5.034    B/op
ProducerRequestBenchmark.constructorStruct:·gc.churn.G1_Old_Gen                  avgt   15     0.003 ±  0.001  MB/sec
ProducerRequestBenchmark.constructorStruct:·gc.churn.G1_Old_Gen.norm             avgt   15    ≈ 10⁻⁴             B/op
ProducerRequestBenchmark.constructorStruct:·gc.count                             avgt   15   400.000           counts
ProducerRequestBenchmark.constructorStruct:·gc.time                              avgt   15   205.000               ms

JMH for ProduceResponse

  1. construction regression:
    • 3.293 -> 303.226 ns/op
    • 24.000 -> 1848.000 B/op
  2. toStruct improvement:
    • 825.889 -> 311.725 ns/op
    • 2208.000 -> 896.000 B/op

BEFORE

Benchmark                                                                          Mode  Cnt     Score    Error   Units
ProducerResponseBenchmark.constructorProduceResponse                               avgt   15     3.293 ±  0.004   ns/op
ProducerResponseBenchmark.constructorProduceResponse:·gc.alloc.rate                avgt   15  6619.731 ±  9.075  MB/sec
ProducerResponseBenchmark.constructorProduceResponse:·gc.alloc.rate.norm           avgt   15    24.000 ±  0.001    B/op
ProducerResponseBenchmark.constructorProduceResponse:·gc.churn.G1_Eden_Space       avgt   15  6618.648 ±  0.153  MB/sec
ProducerResponseBenchmark.constructorProduceResponse:·gc.churn.G1_Eden_Space.norm  avgt   15    23.996 ±  0.033    B/op
ProducerResponseBenchmark.constructorProduceResponse:·gc.churn.G1_Old_Gen          avgt   15     0.003 ±  0.002  MB/sec
ProducerResponseBenchmark.constructorProduceResponse:·gc.churn.G1_Old_Gen.norm     avgt   15    ≈ 10⁻⁵             B/op
ProducerResponseBenchmark.constructorProduceResponse:·gc.count                     avgt   15   720.000           counts
ProducerResponseBenchmark.constructorProduceResponse:·gc.time                      avgt   15   383.000               ms
ProducerResponseBenchmark.constructorStruct                                        avgt   15   825.889 ±  0.638   ns/op
ProducerResponseBenchmark.constructorStruct:·gc.alloc.rate                         avgt   15  2428.000 ±  1.899  MB/sec
ProducerResponseBenchmark.constructorStruct:·gc.alloc.rate.norm                    avgt   15  2208.000 ±  0.001    B/op
ProducerResponseBenchmark.constructorStruct:·gc.churn.G1_Eden_Space                avgt   15  2430.196 ± 55.894  MB/sec
ProducerResponseBenchmark.constructorStruct:·gc.churn.G1_Eden_Space.norm           avgt   15  2210.001 ± 51.009    B/op
ProducerResponseBenchmark.constructorStruct:·gc.churn.G1_Old_Gen                   avgt   15     0.003 ±  0.001  MB/sec
ProducerResponseBenchmark.constructorStruct:·gc.churn.G1_Old_Gen.norm              avgt   15     0.002 ±  0.001    B/op
ProducerResponseBenchmark.constructorStruct:·gc.count                              avgt   15   319.000           counts
ProducerResponseBenchmark.constructorStruct:·gc.time                               avgt   15   166.000               ms

AFTER

Benchmark                                                                          Mode  Cnt     Score    Error   Units
ProducerResponseBenchmark.constructorProduceResponse                               avgt   15   303.226 ±  0.517   ns/op
ProducerResponseBenchmark.constructorProduceResponse:·gc.alloc.rate                avgt   15  5534.940 ±  9.439  MB/sec
ProducerResponseBenchmark.constructorProduceResponse:·gc.alloc.rate.norm           avgt   15  1848.000 ±  0.001    B/op
ProducerResponseBenchmark.constructorProduceResponse:·gc.churn.G1_Eden_Space       avgt   15  5534.046 ± 51.849  MB/sec
ProducerResponseBenchmark.constructorProduceResponse:·gc.churn.G1_Eden_Space.norm  avgt   15  1847.710 ± 18.105    B/op
ProducerResponseBenchmark.constructorProduceResponse:·gc.churn.G1_Old_Gen          avgt   15     0.007 ±  0.001  MB/sec
ProducerResponseBenchmark.constructorProduceResponse:·gc.churn.G1_Old_Gen.norm     avgt   15     0.002 ±  0.001    B/op
ProducerResponseBenchmark.constructorProduceResponse:·gc.count                     avgt   15   602.000           counts
ProducerResponseBenchmark.constructorProduceResponse:·gc.time                      avgt   15   318.000               ms
ProducerResponseBenchmark.constructorStruct                                        avgt   15   311.725 ±  3.132   ns/op
ProducerResponseBenchmark.constructorStruct:·gc.alloc.rate                         avgt   15  2610.602 ± 25.964  MB/sec
ProducerResponseBenchmark.constructorStruct:·gc.alloc.rate.norm                    avgt   15   896.000 ±  0.001    B/op
ProducerResponseBenchmark.constructorStruct:·gc.churn.G1_Eden_Space                avgt   15  2613.021 ± 42.965  MB/sec
ProducerResponseBenchmark.constructorStruct:·gc.churn.G1_Eden_Space.norm           avgt   15   896.824 ± 11.331    B/op
ProducerResponseBenchmark.constructorStruct:·gc.churn.G1_Old_Gen                   avgt   15     0.003 ±  0.001  MB/sec
ProducerResponseBenchmark.constructorStruct:·gc.churn.G1_Old_Gen.norm              avgt   15     0.001 ±  0.001    B/op
ProducerResponseBenchmark.constructorStruct:·gc.count                              avgt   15   343.000           counts
ProducerResponseBenchmark.constructorStruct:·gc.time                               avgt   15   194.000               ms

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@chia7712 chia7712 changed the title KAFKA-9628 Replace Produce request with automated protocol KAFKA-9628 Replace Produce request/response with automated protocol Oct 9, 2020
@chia7712 chia7712 marked this pull request as draft October 9, 2020 08:46
@chia7712 chia7712 marked this pull request as ready for review October 9, 2020 16:24
@chia7712
Copy link
Copy Markdown
Member Author

@hachikuji Could you please take a look?

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. Left a few small comments.

Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceRequest.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java Outdated
Comment thread clients/src/main/resources/common/message/ProduceRequest.json Outdated
Comment thread clients/src/main/resources/common/message/ProduceRequest.json Outdated
Comment thread clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java Outdated
@hachikuji
Copy link
Copy Markdown
Contributor

@chia7712 One thing that would be useful is running the producer-performance test, just to make sure the the performance is inline. Might be worth checking flame graphs as well.

@chia7712
Copy link
Copy Markdown
Member Author

chia7712 commented Nov 1, 2020

One thing that would be useful is running the producer-performance test, just to make sure the the performance is inline. Might be worth checking flame graphs as well.

The cost of conversion is tiny in whole IO path. It seems to me JMH is more suitable for this patch. There are two new JMH for request and response. It benchmark the construction, toStruct and hot method. The result is attached to "description". @hachikuji Please take a look, thanks!

@chia7712 chia7712 force-pushed the KAFKA-9628-1 branch 3 times, most recently from 643fc46 to 2bdede5 Compare November 3, 2020 10:48
@lbradstreet
Copy link
Copy Markdown
Contributor

@chia7712 @hachikuji for the ProduceResponse handling, is this the overall broker side regression since you need both the construction and toStruct?

construction regression: 3.293 -> 580.099 ns/op
toStruct improvement: 825.889 -> 318.530 ns/op

overall response: 3.293+825.889 (old) = 829.182 vs 898.629 (new)

Could you please also provide an analysis of the garbage generation using gc.alloc.rate.norm?

@chia7712
Copy link
Copy Markdown
Member Author

chia7712 commented Nov 4, 2020

@lbradstreet Thanks for your response.

is this the overall broker side regression since you need both the construction and toStruct?

you are right and it seems to me the solution to fix regression is that server should use automatic protocol response rather than wrapped response. However, it may make a big patch so it would be better to address in another PR. (BTW, fetch protocol has similar issue https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/requests/FetchResponse.java#L281)

Could you please also provide an analysis of the garbage generation using gc.alloc.rate.norm?

construction regression:

  • 3.293 -> 580.099 ns/op
  • 24.000 -> 2776.000 B/op

toStruct improvement:

  • 825.889 -> 318.530 ns/op
  • 2208.000 -> 896.000 B/op

We can reduce the regression (in construction) by replacing steam APIs by for-loop. However, I prefer stream Apis since it is more readable and the true solution is to use auto-generated protocols on server-side.

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. Left a few more comments.

Comment thread clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceRequest.java Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required, but this would be easier to follow up if we had some helpers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon me. why it is not required?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was just emphasizing that it is a matter of taste. It's up to you if you agree or not.

Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java Outdated
Comment thread clients/src/main/resources/common/message/ProduceRequest.json Outdated
Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated
Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated
Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated
@ijuma
Copy link
Copy Markdown
Member

ijuma commented Nov 8, 2020

Can we summarize the regression here for a real world workload?

@chia7712
Copy link
Copy Markdown
Member Author

chia7712 commented Nov 9, 2020

Can we summarize the regression here for a real world workload?

@ijuma I have attached benchmark result to description. I will loop more benchmark later.

@hachikuji
Copy link
Copy Markdown
Contributor

For what it's worth, I think we'll get back whatever we lose here by taking Struct out of the serialization path.

@chia7712
Copy link
Copy Markdown
Member Author

@hachikuji @ijuma @lbradstreet Could you take a look? There are some follow-ups which can get back the performance we lose here and I'd like to work on them as soon as possible :)

Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 Thanks for the PR. I have left few suggestions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could avoid all of this by requesting the Sender to create TopicProduceData directly. It seems that the Sender creates partitionRecords right before calling the builder: https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L734. So we may be able to directly construct the expect data structure there.

Copy link
Copy Markdown
Member Author

@chia7712 chia7712 Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice suggestion.

Could I address this in follow-up? I had filed jira (KAFKA-10696 ~ KAFKA-10698) to have Sender use auto-generated protocol directly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. The jira I created does not cover this issue. open a new ticket (https://issues.apache.org/jira/browse/KAFKA-10709)

Comment on lines 227 to 242
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we could create ProduceResponseData based on data. This avoids the cost of the group-by operation and the cost of constructing partitionSizes. That should bring the benchmark inline with what we had before. Would this work?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used data to generate ProduceResponseData. However, the data may be null when create ProduceResponseData. That is to say, it require if-else to handle null data in getErrorResponse. It seems to me that is a bit ugly so not sure whether it is worth doing that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we care of performances here, I wonder if we should try not using the stream api here.

Another trick would be to turn TopicProduceResponse in the ProduceResponse schema into a map by setting "mapKey": true for the topic name. This would allow to iterate over responses, get or create TopicProduceResponse for the topic, and add the PartitionProduceResponse into it.

It may be worth trying different implementation to compare their performances.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth trying different implementation to compare their performances.

As we all care for performance, I'm ok to say goodbye to stream api :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed your suggestion and it does improve the performance.

@chia7712 chia7712 self-assigned this Nov 12, 2020
@chia7712 chia7712 force-pushed the KAFKA-9628-1 branch 2 times, most recently from 1b7cc44 to 1e36d8a Compare November 16, 2020 06:12
@chia7712
Copy link
Copy Markdown
Member Author

The last commit borrows some improvement from #9563.

@chia7712
Copy link
Copy Markdown
Member Author

@hachikuji @ijuma @lbradstreet @dajac I have updated the perf result. The regression is reduced by last commit. Please take a look.

@hachikuji
Copy link
Copy Markdown
Contributor

hachikuji commented Nov 16, 2020

Posting allocation flame graphs from the producer before and after this patch:

Screen Shot 2020-11-16 at 5 26 56 PM
Screen Shot 2020-11-16 at 5 27 25 PM

So we succeeded in getting rid of the extra allocations in the network layer!

I generated these graphs using the producer performance test writing to a topic with 10 partitions on a cluster with a single broker.

> bin/kafka-producer-perf-test.sh --topic foo --num-records 250000000 --throughput -1  --record-size 256 --producer-props bootstrap.servers=localhost:9092

Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending Jenkins. Thanks for the PR!

Comment thread clients/src/test/java/org/apache/kafka/common/requests/RequestResponseTest.java Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be ignorable. Transactional requests require this in order to authorize.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me ignorable should be true in order to keep behavior consistency. With "ignore=false", setting value to TransactionalId can cause UnsupportedVersionException if the version is small than 3. The previous code (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/requests/ProduceRequest.java#L286) does not cause such exception.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code probably relied on the range checking of the message format to imply support here. My point is that the request is doomed to fail if it holds transactional data and we drop the transactionalId. So we may as well fail fast.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we may as well fail fast.

That make sense. will revert this change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that makes sense. My bad!

@hachikuji
Copy link
Copy Markdown
Contributor

Great work @chia7712 ! With this and #9547, we have converted all of the protocols, which was a huge community effort!

gardnervickers added a commit to gardnervickers/kafka-1 that referenced this pull request Nov 18, 2020
OffsetForLeaderEpoch and Produce are not yet generated RPCs, but will be once apache#9401 and apache#9547 are merged.
@hachikuji hachikuji merged commit 30bc21c into apache:trunk Nov 18, 2020
@chia7712 chia7712 deleted the KAFKA-9628-1 branch March 25, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants