KAFKA-10794 Replica leader election is too slow in the case of too many partitions#9675
Conversation
Thanks |
|
@huxihx Please help me review the code, thanks. |
|
@Montyleo Is the failed test related to this PR? |
|
Is there a existent ticket? If not, could you file a jira to log it? Also, you can assign the ticket to yourself ( I have given the permission to you) if you have free cycle to trace it. I will merge this PR tomorrow if no objection. |
Ok,I have no objection. I have created a jira to log it, [https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-10797?filter=allissues]. I'll trace it in my local environment. |
|
@Montyleo Thanks for your contribution! |
|
@chia7712 does the patch can resolve the issue ? I find the only differences is that controllerContext.allPartitions can be invoked once or the number of partition times . please correct me if I am wrong. thanks. |
|
@lqjack good question!
@Montyleo It seems to me the optimization of this PR is good enough. However, it would be better to show the improvement on your env by this patch. |
…t-for-generated-requests * apache-github/trunk: MINOR: Fix flaky test shouldQueryOnlyActivePartitionStoresByDefault (apache#9681) KAFKA-10799 AlterIsr utilizes ReplicaManager ISR metrics (apache#9677) MINOR: Fix KTable-KTable foreign-key join example (apache#9683) KAFKA-10473: Add docs on partition size-on-disk, and other log-related metrics (apache#9276) KAFKA-10739; Replace EpochEndOffset with automated protocol (apache#9630) KAFKA-10460: ReplicaListValidator format checking is incomplete (apache#9326) KAFKA-10554; Perform follower truncation based on diverging epochs in Fetch response (apache#9382) MINOR: Align the UID inside/outside container (apache#9652) KAFKA-10794 Replica leader election is too slow in the case of too many partitions (apache#9675) KAFKA-10090 Misleading warnings: The configuration was supplied but i… (apache#8826) clients/src/main/java/org/apache/kafka/common/requests/OffsetsForLeaderEpochResponse.java clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java core/src/test/scala/unit/kafka/server/epoch/util/ReplicaFetcherMockBlockingSend.scala
Hi,lqjack Thanks for your question. |
… slow in the case of too many partitions (apache#9675) Co-authored-by: limengmonty <limengmonty@didichuxing.com> Reviewers: Chia-Ping Tsai <chia7712@gmail.com>


There is more than 6000 topics and 300 brokers in my kafka cluster, and we frequently run kafka-preferred-replica-election.sh to rebalance our cluster. But the reblance process spendes too more time and cpu resource like the picture blow.
We find that the function:'controllerContext.allPartitions' is invoked too many times.
Thr jira link is https://issues.apache.org/jira/browse/KAFKA-10794