KAFKA-8002: Replica reassignment to new log dir may not complete if f…#6346
KAFKA-8002: Replica reassignment to new log dir may not complete if f…#6346hachikuji merged 3 commits intoapache:trunkfrom
Conversation
…uture and current replicas segment files have different base offsets This patch fixes a bug in log dir reassignment where Partition.maybeReplaceCurrentWithFutureReplica would compare the entire LogEndOffsetMetadata of each replica to determine whether the reassignment has completed. If the active segments of both replicas have different base segments (for example, if the current replica had previously been cleaned and the future replica rolled segments at different points), the reassignment will never complete. The fix is to compare only the LogEndOffsetMetadata.messageOffset for each replica. Tested with a unit test that simulates the compacted current replica case.
|
retest this please |
hachikuji
left a comment
There was a problem hiding this comment.
Thanks, looks good overall. Left a minor suggestion.
| futureLocalReplica match { | ||
| case Some(futureReplica) => | ||
| if (replica.logEndOffset == futureReplica.logEndOffset) { | ||
| if (replica.logEndOffset.messageOffset == futureReplica.logEndOffset.messageOffset) { |
There was a problem hiding this comment.
One thing I was discussing with @apovzner is whether we should consider renaming logEndOffset. The bug probably would never have occurred if we had a better name. Maybe we should have two methods:
def logEndOffset: Long
def logEndOffsetMetadata: LogOffsetMetadataThere was a problem hiding this comment.
I like the suggestion of having logEndOffsetMetadata method. Although, need to be careful not to miss any calls that would need to be renamed to
logEndOffsetMetadata (and get to the same type of bug we found here). I guess one way to make sure this does not happen is to first change the name of logEndOffset to
logEndOffsetMetadata, make sure to compile, and only after that add
logEndOffset that returns Long.
| } | ||
|
|
||
| @Test | ||
| // Verify that replacement works when the replicas have the same log end offset but different base offsets in the |
There was a problem hiding this comment.
nit: Move this comment above annotation?
|
Thanks for the PR! Looks good -- I don't have anything to add to Jason's comments. |
…fset return the message offset
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the fix!
…nt segment base offset (#6346) This patch fixes a bug in log dir reassignment where Partition.maybeReplaceCurrentWithFutureReplica would compare the entire LogEndOffsetMetadata of each replica to determine whether the reassignment has completed. If the active segments of both replicas have different base segments (for example, if the current replica had previously been cleaned and the future replica rolled segments at different points), the reassignment will never complete. The fix is to compare only the LogEndOffsetMetadata.messageOffset for each replica. Tested with a unit test that simulates the compacted current replica case. Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>
… has different segment base offset (apache#6346) This patch fixes a bug in log dir reassignment where Partition.maybeReplaceCurrentWithFutureReplica would compare the entire LogEndOffsetMetadata of each replica to determine whether the reassignment has completed. If the active segments of both replicas have different base segments (for example, if the current replica had previously been cleaned and the future replica rolled segments at different points), the reassignment will never complete. The fix is to compare only the LogEndOffsetMetadata.messageOffset for each replica. Tested with a unit test that simulates the compacted current replica case. backported to 2.0-li Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>
* apache/trunk: KAFKA-7880:Naming worker thread by task id (apache#6275) improve some logging statements (apache#6078) KAFKA-7312: Change broker port used in testMinimumRequestTimeouts and testForceClose KAFKA-7997: Use automatic RPC generation in SaslAuthenticate KAFKA-8002; Log dir reassignment stalls if future replica has different segment base offset (apache#6346) KAFKA-3522: Add TimestampedKeyValueStore builder/runtime classes (apache#6152) HOTFIX: add igore import to streams_upgrade_test MINOR: ConsumerNetworkClient does not need to send the remaining requests when the node is not ready (apache#6264) KAFKA-7922: Return authorized operations in describe consumer group responses (KIP-430 Part-1) KAFKA-7918: Inline generic parameters Pt. III: in-memory window store (apache#6328) KAFKA-8012; Ensure partitionStates have not been removed before truncating. (apache#6333) KAFKA-8011: Fix for race condition causing concurrent modification exception (apache#6338) KAFKA-7912: Support concurrent access in InMemoryKeyValueStore (apache#6336) MINOR: Skip quota check when replica is in sync (apache#6344) HOTFIX: Change header back to http instead of https to path license header test (apache#6347) MINOR: fix release.py script (apache#6317) MINOR: Remove types from caching stores (apache#6331) MINOR: Improve logging for alter log dirs (apache#6302) MINOR: state.cleanup.delay.ms default is 600,000 ms (10 minutes). (apache#6345) MINOR: disable Streams system test for broker upgrade/downgrade (apache#6341)
…nt segment base offset (apache#6346) This patch fixes a bug in log dir reassignment where Partition.maybeReplaceCurrentWithFutureReplica would compare the entire LogEndOffsetMetadata of each replica to determine whether the reassignment has completed. If the active segments of both replicas have different base segments (for example, if the current replica had previously been cleaned and the future replica rolled segments at different points), the reassignment will never complete. The fix is to compare only the LogEndOffsetMetadata.messageOffset for each replica. Tested with a unit test that simulates the compacted current replica case. Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>
…uture and current replicas segment files have different base offsets
This patch fixes a bug in log dir reassignment where Partition.maybeReplaceCurrentWithFutureReplica would compare the entire LogEndOffsetMetadata of each replica to determine whether the reassignment has completed. If the active segments of both replicas have different base segments (for example, if the current replica had previously been cleaned and the future replica rolled segments at different points), the reassignment will never complete. The fix is to compare only the LogEndOffsetMetadata.messageOffset for each replica. Tested with a unit test that simulates the compacted current replica case.
Committer Checklist (excluded from commit message)