KAFKA-15987: Refactor ReplicaManager code for transaction verification by jolshan · Pull Request #15087 · apache/kafka

jolshan · 2023-12-28T23:53:17Z

I originally did some refactors in #14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors:

Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords
appendRecords has been simplified
Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that
Fixed incorrect capital param name in KafkaRequestHandler
Updated ReplicaManager test to handle transaction verification appends separately. (I may revise this)

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…r txnOffsetCommits

This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when #15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>

dajac

@jolshan Thanks for the PR. Overall, the patch looks pretty good. I left a few initial comments. I need to re-read the appendRecords again. I will do it tomorrow.

dajac · 2024-01-18T15:06:40Z

-      // Map transaction coordinator errors to known errors for the response
-      val convertedErrors = verificationErrors.map { case (tp, error) =>
-        error match {
-          case Errors.CONCURRENT_TRANSACTIONS |
-            Errors.COORDINATOR_LOAD_IN_PROGRESS |
-            Errors.COORDINATOR_NOT_AVAILABLE |
-            Errors.NOT_COORDINATOR => tp -> Errors.NOT_ENOUGH_REPLICAS
-          case _ => tp -> error
-        }
-
-      }


For my understanding, we remove this here and we adds it back in handleProduceAppend and we rely on the conversion in the group coordinator. Did I get it right? In the group coordinator, we don't handle CONCURRENT_TRANSACTIONS, I think. I need to double check.

We have separate handling for produce requests and txn offset commit requests.

for produce:

case Errors.INVALID_TXN_STATE => Some(error.exception("Partition was not added to the transaction")) case Errors.CONCURRENT_TRANSACTIONS | Errors.COORDINATOR_LOAD_IN_PROGRESS | Errors.COORDINATOR_NOT_AVAILABLE | Errors.NOT_COORDINATOR => Some(new NotEnoughReplicasException( s"Unable to verify the partition has been added to the transaction. Underlying error: ${error.toString}")) case _ => None

for txn offset commit:

error match { case Errors.UNKNOWN_TOPIC_OR_PARTITION | Errors.NOT_ENOUGH_REPLICAS | Errors.NOT_ENOUGH_REPLICAS_AFTER_APPEND => Errors.COORDINATOR_NOT_AVAILABLE case Errors.NOT_LEADER_OR_FOLLOWER | Errors.KAFKA_STORAGE_ERROR => Errors.NOT_COORDINATOR case Errors.MESSAGE_TOO_LARGE | Errors.RECORD_LIST_TOO_LARGE | Errors.INVALID_FETCH_SIZE => Errors.INVALID_COMMIT_OFFSET_SIZE // We may see INVALID_TXN_STATE or INVALID_PID_MAPPING here due to transaction verification. // They can be returned without mapping to a new error. case other => other }

But yes, we simply pass through concurrent txns which will be fatal to the client.

dajac · 2024-01-18T15:13:16Z


-  @ParameterizedTest
-  @EnumSource(value = classOf[Errors], names = Array("NOT_COORDINATOR", "CONCURRENT_TRANSACTIONS", "COORDINATOR_LOAD_IN_PROGRESS", "COORDINATOR_NOT_AVAILABLE"))
-  def testMaybeVerificationErrorConversions(error: Errors): Unit = {


Don't we need to keep this one as we still have those conversion but in a different place now?

We have the test above in this file.

We also have one for the GroupCoordinator.

This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when apache#15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>

dajac

LGTM, thanks @jolshan!

This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when apache#15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>

apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>

This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when apache#15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>

apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>

This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when apache#15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>

apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>

chia7712 · 2025-08-24T20:04:49Z

+            hasCustomErrorMessage = customException.isDefined
+          )
+      }
+      val entriesWithoutErrorsPerPartition = entriesPerPartition.filter { case (key, _) => !errorResults.contains(key) }


Sorry for raising a question on this merged PR. I just have one small concern: in this non-transaction path, errorResults is always empty, so there is no need to recreate a map collection, since entriesWithoutErrorsPerPartition is identical to entriesPerPartition. Did I misunderstand anything? If not, I can file a minor PR to improve it

Yeah, we could probably only do this filtering if the errorResults is nonEmpty.

I open the PR to addressed this improvement #20410

jolshan added 16 commits November 15, 2023 17:45

Redo verification path

e894d84

Fix build issues

2a52318

Fix tests

865078b

Merge branch 'trunk' of github.com:apache/kafka into kafka-15784

a20f238

Fix test failures

9bac9a9

Rewrite GroupCoordinator and GroupMetadataManager to handle checks fo…

b4a920b

…r txnOffsetCommits

Merge branch 'trunk' of github.com:apache/kafka into kafka-15784

05c0d28

Update comments and method names

7bc6d06

Fix style issues and passing verification guards

a471f04

Clean up GroupCoordinator, GroupMetadataManger, and ReplicaManager code

7c01682

remove package private scoping that is not needed

965ec39

Merge commit '965ec39460324d77d55c90771f23ac0c9c2ad70b' into kafka-15987

54197cb

clean ups

e214171

cleanups

4b944d5

Test fixes

95480af

Merge branch 'trunk' of github.com:apache/kafka into kafka-15987

1f41590

hachikuji reviewed Jan 3, 2024

View reviewed changes

Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated

Hide extra parameter away

7ae321c

dajac mentioned this pull request Jan 8, 2024

KAFKA-14505; [4/N] Wire transaction verification #15142

Merged

3 tasks

jolshan added 2 commits January 8, 2024 15:45

Small format fix

e1fa7d6

Merge branch 'trunk' of github.com:apache/kafka into kafka-15987

9d2ba2e

jolshan marked this pull request as ready for review January 8, 2024 23:59

Merge branch 'trunk' of github.com:apache/kafka into kafka-15987

5518d80

jolshan mentioned this pull request Jan 17, 2024

KAFKA-14505; [6/N] Avoid recheduling callback in request thread #15176

Merged

3 tasks

dajac reviewed Jan 18, 2024

View reviewed changes

Addressing comments

3034e52

dajac approved these changes Jan 25, 2024

View reviewed changes

jolshan merged commit 5eb8201 into apache:trunk Jan 26, 2024

chia7712 reviewed Aug 24, 2025

View reviewed changes

Conversation

jolshan commented Dec 28, 2023

Committer Checklist (excluded from commit message)

Uh oh!

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants