KAFKA-15987: Refactor ReplicaManager code for transaction verification#15087
KAFKA-15987: Refactor ReplicaManager code for transaction verification#15087jolshan merged 21 commits intoapache:trunkfrom
Conversation
…r txnOffsetCommits
This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when #15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>
| // Map transaction coordinator errors to known errors for the response | ||
| val convertedErrors = verificationErrors.map { case (tp, error) => | ||
| error match { | ||
| case Errors.CONCURRENT_TRANSACTIONS | | ||
| Errors.COORDINATOR_LOAD_IN_PROGRESS | | ||
| Errors.COORDINATOR_NOT_AVAILABLE | | ||
| Errors.NOT_COORDINATOR => tp -> Errors.NOT_ENOUGH_REPLICAS | ||
| case _ => tp -> error | ||
| } | ||
|
|
||
| } |
There was a problem hiding this comment.
For my understanding, we remove this here and we adds it back in handleProduceAppend and we rely on the conversion in the group coordinator. Did I get it right? In the group coordinator, we don't handle CONCURRENT_TRANSACTIONS, I think. I need to double check.
There was a problem hiding this comment.
We have separate handling for produce requests and txn offset commit requests.
for produce:
case Errors.INVALID_TXN_STATE => Some(error.exception("Partition was not added to the transaction"))
case Errors.CONCURRENT_TRANSACTIONS |
Errors.COORDINATOR_LOAD_IN_PROGRESS |
Errors.COORDINATOR_NOT_AVAILABLE |
Errors.NOT_COORDINATOR => Some(new NotEnoughReplicasException(
s"Unable to verify the partition has been added to the transaction. Underlying error: ${error.toString}"))
case _ => None
for txn offset commit:
error match {
case Errors.UNKNOWN_TOPIC_OR_PARTITION
| Errors.NOT_ENOUGH_REPLICAS
| Errors.NOT_ENOUGH_REPLICAS_AFTER_APPEND =>
Errors.COORDINATOR_NOT_AVAILABLE
case Errors.NOT_LEADER_OR_FOLLOWER
| Errors.KAFKA_STORAGE_ERROR =>
Errors.NOT_COORDINATOR
case Errors.MESSAGE_TOO_LARGE
| Errors.RECORD_LIST_TOO_LARGE
| Errors.INVALID_FETCH_SIZE =>
Errors.INVALID_COMMIT_OFFSET_SIZE
// We may see INVALID_TXN_STATE or INVALID_PID_MAPPING here due to transaction verification.
// They can be returned without mapping to a new error.
case other => other
}
There was a problem hiding this comment.
But yes, we simply pass through concurrent txns which will be fatal to the client.
|
|
||
| @ParameterizedTest | ||
| @EnumSource(value = classOf[Errors], names = Array("NOT_COORDINATOR", "CONCURRENT_TRANSACTIONS", "COORDINATOR_LOAD_IN_PROGRESS", "COORDINATOR_NOT_AVAILABLE")) | ||
| def testMaybeVerificationErrorConversions(error: Errors): Unit = { |
There was a problem hiding this comment.
Don't we need to keep this one as we still have those conversion but in a different place now?
There was a problem hiding this comment.
We have the test above in this file.
There was a problem hiding this comment.
We also have one for the GroupCoordinator.
This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when apache#15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>
This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when apache#15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>
apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when apache#15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>
apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller. Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when apache#15087 is merged. Reviewers: Justine Olshan <jolshan@confluent.io>
apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
| hasCustomErrorMessage = customException.isDefined | ||
| ) | ||
| } | ||
| val entriesWithoutErrorsPerPartition = entriesPerPartition.filter { case (key, _) => !errorResults.contains(key) } |
There was a problem hiding this comment.
Sorry for raising a question on this merged PR. I just have one small concern: in this non-transaction path, errorResults is always empty, so there is no need to recreate a map collection, since entriesWithoutErrorsPerPartition is identical to entriesPerPartition. Did I misunderstand anything? If not, I can file a minor PR to improve it
There was a problem hiding this comment.
Yeah, we could probably only do this filtering if the errorResults is nonEmpty.
There was a problem hiding this comment.
I open the PR to addressed this improvement #20410
I originally did some refactors in #14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors:
Committer Checklist (excluded from commit message)