KAFKA-14240; Validate kraft snapshot state on startup#12653
KAFKA-14240; Validate kraft snapshot state on startup#12653hachikuji merged 5 commits intoapache:trunkfrom
Conversation
jsancio
left a comment
There was a problem hiding this comment.
Thanks for the change @hachikuji .
| // If the log start offset is not 0, then we must have a snapshot which covers the | ||
| // initial state up to the current log start offset. | ||
| if (log.logStartOffset > 0) { | ||
| val latestSnapshotId = snapshots.lastOption.map(_._1) | ||
| if (!latestSnapshotId.exists(snapshotId => snapshotId.offset >= log.logStartOffset)) { | ||
| throw new IllegalStateException("Inconsistent snapshot state: there must be a snapshot " + | ||
| s"at an offset larger then the current log start offset ${log.logStartOffset}, but the " + | ||
| s"latest snapshot is ${latestSnapshotId}") | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
What do you think about moving this check to recoverSnapshots or before calling recoverSnapshots?
recoverSnapshots makes changes to the log dir. Kafka should probably validate the log dir before making changes to it?
There was a problem hiding this comment.
I guess a specific case where doing this prior to recoverSnapshots might help is if we have a deleted file corresponding to a snapshot in the range we are looking for.
There was a problem hiding this comment.
Yes. recoverSnapshots doesn't make a log dir invalid but it may make it "more" invalid by deleting more snapshots.
76ea1b0 to
421b9dd
Compare
jsancio
left a comment
There was a problem hiding this comment.
Great tests. LGTM after one minor comment.
df09abf to
9d3b891
Compare
|
2 out of 3 builds failed (terminated) without results. I've re-run it: https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12653/7/ |
|
Test failures look unrelated. I've triggered one more build to be on the safe side. |
|
All 3 builds failed in the re-run build: https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12653/8/ |
|
Sigh. |
|
2 out of 3 builds failed again in the latest build. |
We should prevent the metadata log from initializing in a known bad state. If the log start offset of the first segment is greater than 0, then must be a snapshot an offset greater than or equal to it order to ensure that the initialized state is complete. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
…eptember 2022) `Jenkinsfile` was the only conflict and we ignore the changes since they are not relevant to the Confluent build. * apache-github/3.3: (61 commits) KAFKA-14214: Introduce read-write lock to StandardAuthorizer for consistent ACL reads. (apache#12628) KAFKA-14243: Temporarily disable unsafe downgrade (apache#12664) KAFKA-14240; Validate kraft snapshot state on startup (apache#12653) KAFKA-14233: disable testReloadUpdatedFilesWithoutConfigChange first to fix the build (apache#12658) KAFKA-14238; KRaft metadata log should not delete segment past the latest snapshot (apache#12655) KAFKA-14156: Built-in partitioner may create suboptimal batches (apache#12570) MINOR: Adds KRaft versions of most streams system tests (apache#12458) MINOR; Add missing li end tag (apache#12640) MINOR: Mention that kraft is production ready in upgrade notes (apache#12635) MINOR: Add upgrade note regarding the Strictly Uniform Sticky Partitioner (KIP-794) (apache#12630) KAFKA-14222; KRaft's memory pool should always allocate a buffer (apache#12625) KAFKA-14208; Do not raise wakeup in consumer during asynchronous offset commits (apache#12626) KAFKA-14196; Do not continue fetching partitions awaiting auto-commit prior to revocation (apache#12603) KAFKA-14215; Ensure forwarded requests are applied to broker request quota (apache#12624) MINOR; Remove end html tag from upgrade (apache#12605) Remove the html end tag from upgrade.html KAFKA-14205; Document how to replace the disk for the KRaft Controller (apache#12597) KAFKA-14203 Disable snapshot generation on broker after metadata errors (apache#12596) KAFKA-14216: Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback javadoc (apache#12617) KAFKA-14217: app-reset-tool.html should not show --zookeeper flag that no longer exists (apache#12618) ...
We should prevent the metadata log from initializing in a known bad state. If the log start offset of the first segment is greater than 0, then must be a snapshot an offset greater than or equal to it order to ensure that the initialized state is complete. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
We should prevent the metadata log from initializing in a known bad state. If the log start offset of the first segment is greater than 0, then must be a snapshot an offset greater than or equal to it order to ensure that the initialized state is complete.
Committer Checklist (excluded from commit message)