Skip to content

KAFKA-12332: Error partitions from topics with invalid IDs in LISR requests#10143

Merged
hachikuji merged 4 commits intoapache:trunkfrom
jolshan:KAFKA-12332
Feb 19, 2021
Merged

KAFKA-12332: Error partitions from topics with invalid IDs in LISR requests#10143
hachikuji merged 4 commits intoapache:trunkfrom
jolshan:KAFKA-12332

Conversation

@jolshan
Copy link
Copy Markdown
Member

@jolshan jolshan commented Feb 17, 2021

Changes how invalid IDs are handled in LeaderAndIsr requests. The ID check now occurs before leader epoch. If the ID exists and is invalid, the partition is ignored and a new INCONSISTENT_TOPIC_ID error is returned in the response.

This error should be rare, but if it occurs, it signals the need for manual intervention.

Added tests for this behavior.
I also plan to rerun the benchmark from #10071 to ensure there are no regressions with this change.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated

// If we found an invalid ID, we don't need to check the leader epoch
if (!invalidId) {
if (requestLeaderEpoch > currentLeaderEpoch) {
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor thing, but we can avoid this nesting by restructuring the checks a little bit. For example, it would be a good idea to have a helper in Partition which encapsulates the update of the topicId state in Log. Maybe something like this:

class Partition {
  // Update topicid if necessary. 
  // Return false if the update failed because the topicId is inconsistent
  def maybeUpdateTopicId(topicId: Uuid): Boolean
}

// in ReplicaManager
if (!partition.maybeUpdateTopicId(requestTopicId)) {
  error(...)
  responseMap.put(topicPartition, Errors.UNKNOWN_TOPIC_ID)
} else if (requestLeaderEpoch > currentLeaderEpoch) {
...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggling with this since I didn't think it was the best solution. I'll take a look at the version you have here. Would we want to ever update the topic ID though? Maybe just a method to check if they are equal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmmm I see how this could be used on the path for actually setting it. I'll think of a good name

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, if log is None, does it make sense to error here? If it is None, we are unable to check the topic ID

Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
Comment thread core/src/main/scala/kafka/cluster/Partition.scala Outdated
Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I had one minor nitpick. Can you update the KIP for the new error code? I think it's worth sending a message to the vote thread as well to see if anyone has any concerns.

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 18, 2021

@hachikuji I was wondering if I should do that. I think I've been having issues with mailer daemon trying to send to DISCUSS so I'll send to VOTE this time.

Comment thread core/src/main/scala/kafka/cluster/Partition.scala Outdated
@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Feb 19, 2021

Ran tests locally with newest code from trunk. Here are the tests that failed:

ConsumerBounceTest.testClose()
ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit()
DynamicConnectionQuotaTest.testDynamicListenerConnectionCreationRateQuota()

These are tests I've found to be flaky running locally in the past.

@hachikuji hachikuji merged commit d030dc5 into apache:trunk Feb 19, 2021
hachikuji pushed a commit that referenced this pull request Feb 19, 2021
…quests (#10143)

Changes how invalid IDs are handled in LeaderAndIsr requests. The ID check now occurs before leader epoch. If the ID exists and is invalid, the partition is ignored and a new `INCONSISTENT_TOPIC_ID` error is returned in the response.

Reviewers: Jason Gustafson <jason@confluent.io>
hachikuji pushed a commit that referenced this pull request Apr 1, 2021
…ReplicaManager (#10282)

KIP-516 introduced partition.metadata file to persist the topic ID on the broker. It is created through handling the LeaderAndIsrRequest in ReplicaManager. (See #10143 for the code path.) RaftReplicaManager was missing the analogue code path for Kip-500 code. Like in ReplicaManager, RaftReplicaManager will now check the partition.metadata file when handling metadata records.

However, since we know that all raft topics will have topic IDs, we can simply set the ID in the log upon the log's creation.
Updated the ReplicaManager path to do the same on newly created topics.

There are also some tweaks to the checking logic to better handle the scenario when the log exists but is not yet associated to Partition (for example, upon startup after a shutdown).

Tests added to ensure the file is created and that the correct error is thrown when the id is inconsistent.
Added tests for creating the log with the new topic ID parameter.

Also adds a few methods to get topic ID from MetadataImageBuilder as this is the most convenient way to get topic ID from RaftReplicaManager.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>
wyuka pushed a commit to wyuka/kafka that referenced this pull request Dec 15, 2021
…ReplicaManager (apache#10282)

KIP-516 introduced partition.metadata file to persist the topic ID on the broker. It is created through handling the LeaderAndIsrRequest in ReplicaManager. (See apache#10143 for the code path.) RaftReplicaManager was missing the analogue code path for Kip-500 code. Like in ReplicaManager, RaftReplicaManager will now check the partition.metadata file when handling metadata records.

However, since we know that all raft topics will have topic IDs, we can simply set the ID in the log upon the log's creation.
Updated the ReplicaManager path to do the same on newly created topics.

There are also some tweaks to the checking logic to better handle the scenario when the log exists but is not yet associated to Partition (for example, upon startup after a shutdown).

Tests added to ensure the file is created and that the correct error is thrown when the id is inconsistent.
Added tests for creating the log with the new topic ID parameter.

Also adds a few methods to get topic ID from MetadataImageBuilder as this is the most convenient way to get topic ID from RaftReplicaManager.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants