KAFKA-15784: Ensure atomicity of in memory update and write when transactionally committing offsets#14774
Conversation
…r txnOffsetCommits
|
@junrao Thanks for taking a look. I was just rewriting the code to make this clearer. I will take a look at @artemlivshits and your questions about locking now. I wasn't sure if this PR was trying to remove locks -- I think we want to address that as a follow-up? |
| (topicIdPartition, Errors.NOT_COORDINATOR) | ||
| } | ||
| responseCallback(commitStatus) | ||
| group.inLock { |
There was a problem hiding this comment.
For the group.inLocks in this method -- we should always be holding the group lock when we call storeOffsets from doTxnCommitOffsets and doCommitOffsets. What do we think about removing these locks and stating that we should be holding the group lock on this method? (and/or just wrapping the method in a lock)
There was a problem hiding this comment.
I don't think we can just wrap this method in the lock to show the proper intent -- the intent is that the lock must be already held by the caller because the caller does some validation under the lock as well, and the atomicity of that validation needs to be preserved across local write. Note this is not the correctness issue (the lock is already taken outside, so any random locking pattern works), it's a comment about how to make the code maintenance better. The atomicity requirement is absolutely non-obvious and required a lot of effort from multiple people to figure out and I think this effort is not reflected in this change in any way -- the code got re-arranged into some form that makes it work, but the underlying issue (unclear and confusing atomicity invariants) is not addressed.
So I'd do 3 things:
- Remove explicit locking.
- Add a comment in the Java doc stating a requirement that this function must be called under the lock.
- Add a comment near the appendForGroup call that we rely on it not returning until the local append is done to preserve atomicity protected by the lock.
If it wasn't soon-to-be-dead code I'd probably do more with naming conventions and asserts, but in this situation adding proper comments should be good and easy.
There was a problem hiding this comment.
Ok makes sense.
The only thing that made me wonder about keeping the locking is that this method is used in unit tests without the locking on the outside. I wasn't sure if the best solution is to put locks around those calls or not. I would hope that the test doesn't rely on locking given there really should only be one thread running the tests and the ReplicaManager is mocked (so no async appends) but I would need to double check.
| * @param verificationGuards the mapping from topic partition to verification guards if transaction verification is used | ||
| * @param preAppendErrors the mapping from topic partition to LogAppendResult for errors that occurred before appending | ||
| */ | ||
| def appendForGroup(timeout: Long, |
There was a problem hiding this comment.
Can we generalize the name? I don't see any logic specific to offsets or groups.
There was a problem hiding this comment.
I planned to take this out after the refactor. It is only used for appendForGroup to minimize the diff.
There was a problem hiding this comment.
It will be unified with appendRecords in the refactor. I can leave a comment referencing https://issues.apache.org/jira/browse/KAFKA-15987 if that helps.
There was a problem hiding this comment.
Makes sense. But any harm having a more general name for now? It might be a week or two before the refactor gets checked in.
There was a problem hiding this comment.
I guess I just didn't see anything else using it before I refactored and wanted to make the usage clear.
I didn't want to name it appendRecords as to not cause conflicts with that flow. What name were you thinking?
There was a problem hiding this comment.
We should also add a comment that would reflect 2 points:
- (For the maintainer of this code) -- this code must not return until the local write is done, it is an important invariant that the callers rely upon. Otherwise it looks like a generic async call that can return and continue asynchronously at any point. This way, if an additional async stage is required in this function before the the local write is complete, the maintainer would know to hunt down all usages of this function and figure out the correct action.
- (For the caller of this code) -- a quick example of the full workflow of how the caller should use this method: call maybeStartTransactionVerificationForPartition with a callback that would call this method.
There was a problem hiding this comment.
call maybeStartTransactionVerificationForPartition with a callback that would call this method
This will change when I do the refactor since this will become appendRecords where we only call maybeStartTransactionVerificationForPartition(s) if the append requires transaction verification. I can add the comment now, but it will be changed in the refactor (https://issues.apache.org/jira/browse/KAFKA-15987)
There was a problem hiding this comment.
Adding comment in the refactor should be fine.
There was a problem hiding this comment.
If we keep the name, how about we add a check to ensure that the write is for __consumer_offsets? Also, we can drop the internalTopicsAllowed and appendOrigin arguments since they will be implicit.
| recordValidationStatsCallback: Map[TopicPartition, RecordValidationStats] => Unit = _ => (), | ||
| requestLocal: RequestLocal = RequestLocal.NoCaching, | ||
| actionQueue: ActionQueue = null, | ||
| verificationGuards: Map[TopicPartition, VerificationGuard] = Map.empty, |
There was a problem hiding this comment.
What is the difference between passing no verification guard and passing VerificationGuard.SENTINEL?
There was a problem hiding this comment.
I believe checking if the map is empty is a shortcut for skipping verification. That doesn't really matter for the offset change but does for the produce flow.
There was a problem hiding this comment.
when we get to the log layer if we don't have an entry in the map we do a getOrElse and return the sentinel
| } | ||
| } | ||
|
|
||
| appendForGroup(group, records, requestLocal, putCacheCallback, verificationGuards) |
There was a problem hiding this comment.
What error do we expect if the guard check fails during write?
There was a problem hiding this comment.
We expect INVALID_TXN_STATE which is fatal. I think we previously discussed this and decided it was ok for old clients.
There was a problem hiding this comment.
We have logic in the createPutCacheCallback to convert the error code returned in TxnOffsetCommit. Do we want to add a case for INVALID_TXN_STATE?
There was a problem hiding this comment.
We considered this on the first PR but decided that the abortable/retriable errors were not specific enough.
From KIP-890
Return Abortable Error for TxnOffsetCommitRequests
Instead of INVALID_TXN_STATE and INVALID_PID_MAPPING we considered using UNKNOWN_MEMBER_ID which is abortable. However, this is not a clear message and is not guaranteed to be abortable on non-Java clients. Since we can't specify a message in the response, we thought it would be better to just send the actual (but fatal) errors.
There was a problem hiding this comment.
Ok. Mainly I was considering whether we should have an explicit case for this so that it is clearly intentional. What do you think?
There was a problem hiding this comment.
I can do that. I just thought that all the ones there were ones that were changed. There are also errors that are returned but not mapped. (ie, coordinator_not_available)
We can also include InvalidPidMapping if we do want to map errors.
There was a problem hiding this comment.
We could just include a comment under the default case to emphasize that no mapping is expected for these error codes?
| } | ||
| } | ||
|
|
||
| groupManager.replicaManager.maybeStartTransactionVerificationForPartition( |
There was a problem hiding this comment.
The access to replicaManager here probably suggests that we probably should be going through GroupMetadataManager. We could expose a wrapped maybeStartTransactionVerificationForPartition from GroupMetadataManager instead. That might also help us encapsulate the error conversion a little better.
There was a problem hiding this comment.
i was just about to push my change before I saw this comment. I will address this comment tomorrow.
|
I took a look at the tests and the only one that looked suspicious were the mirrorIntegration tests related to Note, although the last build had a failure for a version, the previous build succeeded and only included a minor code change. Given the nature of the change and the failure, I believe this is non-blocking. I ran system tests and noticed an issue with |
…sactionally committing offsets (#14774) Rewrote the verification flow to pass a callback to execute after verification completes. For the TxnOffsetCommit, we will call doTxnCommitOffsets. This allows us to do offset validations post verification. I've reorganized the verification code and group coordinator code to make these code paths clearer. The followup refactor (https://issues.apache.org/jira/browse/KAFKA-15987) will further clean up the produce verification code. Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
| ) | ||
| } | ||
|
|
||
| def maybeStartTransactionVerificationForPartitions( |
There was a problem hiding this comment.
I am doing a refactor PR so I can address your comments there. https://issues.apache.org/jira/browse/KAFKA-15987
| * This method should not return until the write to the local log is completed because updating offsets requires updating | ||
| * the in-memory and persisted state under a lock together. | ||
| * | ||
| * Noted that all pending delayed check operations are stored in a queue. All callers to ReplicaManager.appendRecords() |
There was a problem hiding this comment.
It's weird to refer to ReplicaManager.appendRecords() here since this method is appendForGroup.
There was a problem hiding this comment.
This will be fixed in the refactor. I plan to get rid of appendForGroup and unify in a single appendRecords method.
| // Produce requests (only requests that require verification) should only have one batch per partition in "batches" but check all just to be safe. | ||
| val transactionalBatches = records.batches.asScala.filter(batch => batch.hasProducerId && batch.isTransactional) | ||
| transactionalBatches.foreach(batch => transactionalProducerIds.add(batch.producerId)) | ||
| private def sendInvalidRequiredAcksResponse(entries: Map[TopicPartition, MemoryRecords], |
There was a problem hiding this comment.
Should we reuse this method in appendRecords?
There was a problem hiding this comment.
Another change for the refactor. Jason asked me to keep the diff here to a minimum, but I plan to unify the code in the refactor. (He asked me to revert these changes for this PR)
| if (delayedProduceRequestRequired(requiredAcks, allEntries, allResults)) { | ||
| debug("Produce to local log in %d ms".format(time.milliseconds - sTime)) | ||
|
|
||
| val allResults = localProduceResults |
There was a problem hiding this comment.
Could we just get rid of allResults and just use localProduceResults?
| } | ||
| } | ||
|
|
||
| private def buildProducePartitionStatus( |
There was a problem hiding this comment.
Could we reuse buildProducePartitionStatus in appendEntries?
There was a problem hiding this comment.
All of these will be covered in the refactor. I didn't touch the produce flow to minimize the diff and cause minimal confusion when reviewing.
| ) | ||
| } | ||
|
|
||
| private def maybeAddDelayedProduce( |
There was a problem hiding this comment.
Could we reuse maybeAddDelayedProduce in appendEntries?
| * | ||
| * When the verification returns, the callback will be supplied the error if it exists or Errors.NONE. | ||
| * If the verification guard exists, it will also be supplied. Otherwise the SENTINEL verification guard will be returned. | ||
| * This guard can not be used for verification and any appends that attenpt to use it will fail. |
| requestLocal | ||
| ) | ||
|
|
||
| addPartitionsToTxnManager.get.verifyTransaction( |
There was a problem hiding this comment.
Hmm, addPartitionsToTxnManager could be empty in tests. Should we change addPartitionsToTxnManager.get to addPartitionsToTxnManager.foreach like in the original code?
There was a problem hiding this comment.
Thanks for pointing this out. I will fix this in the followup.
…sactionally committing offsets (apache#14774) Rewrote the verification flow to pass a callback to execute after verification completes. For the TxnOffsetCommit, we will call doTxnCommitOffsets. This allows us to do offset validations post verification. I've reorganized the verification code and group coordinator code to make these code paths clearer. The followup refactor (https://issues.apache.org/jira/browse/KAFKA-15987) will further clean up the produce verification code. Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
#15087) I originally did some refactors in #14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
…sactionally committing offsets (apache#14774) Rewrote the verification flow to pass a callback to execute after verification completes. For the TxnOffsetCommit, we will call doTxnCommitOffsets. This allows us to do offset validations post verification. I've reorganized the verification code and group coordinator code to make these code paths clearer. The followup refactor (https://issues.apache.org/jira/browse/KAFKA-15987) will further clean up the produce verification code. Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
…sactionally committing offsets (apache#14774) Rewrote the verification flow to pass a callback to execute after verification completes. For the TxnOffsetCommit, we will call doTxnCommitOffsets. This allows us to do offset validations post verification. I've reorganized the verification code and group coordinator code to make these code paths clearer. The followup refactor (https://issues.apache.org/jira/browse/KAFKA-15987) will further clean up the produce verification code. Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
…sactionally committing offsets (apache#14774) Rewrote the verification flow to pass a callback to execute after verification completes. For the TxnOffsetCommit, we will call doTxnCommitOffsets. This allows us to do offset validations post verification. I've reorganized the verification code and group coordinator code to make these code paths clearer. The followup refactor (https://issues.apache.org/jira/browse/KAFKA-15987) will further clean up the produce verification code. Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
apache#15087) I originally did some refactors in apache#14774, but we decided to keep the changes minimal since the ticket was a blocker. Here are those refactors: * Removed separate append paths so that produce, group coordinator, and other append paths all call appendRecords * AppendRecords has been simplified * Removed unneeded error conversions in verification code since group coordinator and produce path convert errors differently, removed test for that * Fixed incorrect capital param name in KafkaRequestHandler * Updated ReplicaManager test to handle produce appends separately when transactions are used. Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
Rewrote the verification flow to pass a callback to execute after verification completes.
For the TxnOffsetCommit, we will call doTxnCommitOffsets. This allows us to do offset validations post verification.
I've reorganized the verification code and group coordinator code to make these code paths clearer. The followup refactor (https://issues.apache.org/jira/browse/KAFKA-15987) will further clean up the produce verification code.
Committer Checklist (excluded from commit message)