KAFKA-14505; [6/N] Avoid recheduling callback in request thread by dajac · Pull Request #15176 · apache/kafka

dajac · 2024-01-11T13:08:52Z

This patch removes the extra hop via the request thread when the new group coordinator verifies a transaction. Prior to it, the ReplicaManager would automatically re-schedule the callback to a request thread. However, the new group coordinator does not need this as it already schedules the write into its own thread. With this patch, the decision to re-schedule on a request thread or not is left to the caller.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

dajac · 2024-01-11T13:47:44Z

    baseSequence: Int,
-    requestLocal: RequestLocal,
-    callback: (Errors, RequestLocal, VerificationGuard) => Unit
+    callback: Either[Errors, VerificationGuard] => Unit


I had to change the callback because KafkaRequestHandler.wrapAsyncCallback only support wrapping unary functions.

dajac · 2024-01-11T15:34:43Z

@artemlivshits @jolshan Could you please take a look at this one when you get a chance? Let me know what you think.

artemlivshits · 2024-01-13T01:25:14Z

-        preAppendErrors.getOrElse(topicPartition, Errors.NONE),
-        newRequestLocal,
-        verificationGuards.getOrElse(topicPartition, VerificationGuard.SENTINEL))
+    def generalizedCallback(results: Map[TopicPartition, Either[Errors, VerificationGuard]]): Unit = {


I think we could do the translation from preAppendErrors, newRequestLocal, verificationGuards here, then we'd avoid propagating the changes all the way to replication layer.

Yeah, I was debating about this. The reasoning for not doing it here is that @jolshan may also use maybeStartTransactionVerificationForPartitions from the core path in conjunction with wrapAsyncCallback so she will also need to have a unary callback. @jolshan Is my understanding correct? Otherwise, we could limit the change to this method.

Please take a look at my refactor PR. I have some this to some extent.
I'd prefer not to overhaul it again (as I did after the previous group coordinator change)
Hopefully it makes this work easier too.

Ack. I will take a look at #15087.

artemlivshits · 2024-01-13T01:27:33Z

-        newRequestLocal,
-        verificationGuards.getOrElse(topicPartition, VerificationGuard.SENTINEL))
+    def generalizedCallback(results: Map[TopicPartition, Either[Errors, VerificationGuard]]): Unit = {
+      callback(results.getOrElse(topicPartition, Right(VerificationGuard.SENTINEL)))


This logic is just a translation from the current implementation (so it's not introducing anything new), but is it expected that we don't get the results for the requested topicPartition? Should we log a warning, so that we know that we're hitting some unexpected code path?

It should not happen but who knows. Let me log an error and fail the callback if it would ever happen.

artemlivshits · 2024-01-13T01:30:19Z

    producerEpoch: Short,
-    requestLocal: RequestLocal,
-    callback: (Map[TopicPartition, Errors], RequestLocal, Map[TopicPartition, VerificationGuard]) => Unit
+    callback: mutable.Map[TopicPartition, Either[Errors, VerificationGuard]] => Unit


My understanding, that once we refactor these changes, this function could be either called from GC code path (that may not care about requestLocal) or from the core data path, that needs requestLocal, because the callback may be called immediately in this thread context.

This is correct. The requestLocal is not required within this function though. The caller if it uses wrapAsyncCallback will get the correct one to use.

artemlivshits · 2024-01-13T01:38:18Z

-          postVerificationCallback
+          // Wrap the callback to be handled on an arbitrary request handler thread
+          // when transaction verification is complete.
+          KafkaRequestHandler.wrapAsyncCallback(postVerificationCallback, requestLocal)


It's interesting to note (I don't think we need to change anything) is that now we'll have a production code path (and not just unit test) where we can call wrapped callback on the same request thread and we'll go through the optimized code path where we call the callback directly.

Why are we calling this here? I thought we wanted to avoid this wrap here and only do it for produce requests.

We still need this in the old coordinator.

Oh sorry. I guess I was just confused I didn't see it in the replica manager flow. (for produce)

dajac · 2024-01-29T13:17:08Z

@jolshan I reworked the PR based on #15087. It is quite different from the previous one. Please take a look when you get a chance.

dajac · 2024-01-30T08:28:18Z

Thanks @jolshan. I have addressed your comments.

jolshan

thanks!

…he#15176) This patch removes the extra hop via the request thread when the new group coordinator verifies a transaction. Prior to it, the ReplicaManager would automatically re-schedule the callback to a request thread. However, the new group coordinator does not need this as it already schedules the write into its own thread. With this patch, the decision to re-schedule on a request thread or not is left to the caller. Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>

dajac commented Jan 11, 2024

View reviewed changes

dajac added the KIP-848 The Next Generation of the Consumer Rebalance Protocol label Jan 11, 2024

dajac requested a review from jolshan January 11, 2024 15:33

artemlivshits reviewed Jan 13, 2024

View reviewed changes

KAFKA-14505; [6/N] Avoid recheduling callback in request thread

2a5dcaf

dajac force-pushed the KAFKA-14505-6 branch from c01866d to 2a5dcaf Compare January 29, 2024 13:05

jolshan reviewed Jan 29, 2024

View reviewed changes

Comment thread core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala Outdated

Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala

Comment thread core/src/test/scala/unit/kafka/coordinator/group/GroupCoordinatorConcurrencyTest.scala

address minor comments

6fe10cd

dajac requested a review from jolshan January 30, 2024 08:28

jolshan approved these changes Jan 30, 2024

View reviewed changes

dajac merged commit 6dd517d into apache:trunk Jan 31, 2024

dajac deleted the KAFKA-14505-6 branch January 31, 2024 07:27

Conversation

dajac commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dajac commented Jan 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dajac commented Jan 29, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dajac commented Jan 30, 2024

Uh oh!

jolshan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dajac commented Jan 11, 2024 •

edited

Loading