KAFKA-8237; Untangle TopicDeleteManager and add test cases by hachikuji · Pull Request #6588 · apache/kafka

hachikuji · 2019-04-17T00:48:24Z

The controller maintains a bunch of state across ControllerContext, PartitionStateMachine, ReplicaStateMachine, and TopicDeletionManager. None of this state is actually isolated from the rest. For example, topics undergoing deletion are intertwined with the partition and replica states. As a consequence of this, each of these components tends to be dependent on all the rest, which makes testing and reasoning about the system difficult. This is a first step toward untangling all the state. I have simply moved it all into ControllerContext and removed many of the circular dependencies. So far, this is mostly a direct translation, but in the future we can add additional validation in ControllerContext to make sure that state is maintained consistently.

Additionally, I have created several mock objects to enable easier testing: MockReplicaStateMachine and MockPartitionStateMachine. These have simplified logic for updating the current state. This is used to create some new test cases for TopicDeletionManager. I found that I had to change the valid previous state of ReplicaDeletionIneligible to include OfflineReplica to get the new tests to pass, but please make sure there is not a bug in the test case.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

jsancio

Thanks for the PR @hachikuji. This looks great and it is for sure an improvement.

Feel free to ignore my renaming suggestions.

jsancio · 2019-04-17T22:04:41Z

+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package kafka.controller


Very cool. I was going to do something similar. I need this for my changes as I am planning to change the signature to some of these methods. I think I'll wait until we merge this PR.

jsancio · 2019-04-17T22:05:19Z

+
+  def leaderForReassign(partition: TopicPartition,
+                        leaderIsrAndControllerEpoch: LeaderIsrAndControllerEpoch,
+                        controllerContext: ControllerContext): ElectionResult = {


Same there. private[this]?

jsancio · 2019-04-17T22:05:44Z

+
+  def leaderForPreferredReplica(partition: TopicPartition,
+                                leaderIsrAndControllerEpoch: LeaderIsrAndControllerEpoch,
+                                controllerContext: ControllerContext): ElectionResult = {


Same here. private[this]?

jsancio · 2019-04-17T22:06:08Z


  private val stateChangeLogger = new StateChangeLogger(config.brokerId, inControllerContext = true, None)
  val controllerContext = new ControllerContext
+  var controllerChannelManager: ControllerChannelManager = _


jsancio

LGTM

jsancio · 2019-04-18T21:35:59Z

+  def leaderForOffline(partition: TopicPartition,
+                       leaderIsrAndControllerEpochOpt: Option[LeaderIsrAndControllerEpoch],
+                       uncleanLeaderElectionEnabled: Boolean,
+                       controllerContext: ControllerContext): ElectionResult = {


Yeah, I am not sure here either. I think in some cases we have unnecessary "batching" APIs or method signatures given that we batch things in the ControllerChannelManager. Maybe we can work toward that code structure in the controller. Where internally we deal with one partition, replica, topic, broker, etc at a time and perform batching at the edges (IO inputs and outputs).

hachikuji · 2019-04-20T06:38:00Z

retest this please

junrao

@hachikuji : Thanks for the cleanup patch. Great improvement. A few comments below.

junrao · 2019-04-23T00:39:31Z

    brokerIds.filter(_ >= 0).foreach { brokerId =>
+      def callback(stopReplicaResponse: AbstractResponse): Unit = {
+        if (deletePartition)
+          controller.eventManager.put(controller.TopicDeletionStopReplicaResponseReceived(stopReplicaResponse, brokerId))


We want to be a bit careful here. Partition reassignment will send StopReplicaRequest with deletePartition=true too. In that case, we don't want to trigger the topic deletion logic.

I added a check for whether the topic is undergoing deletion. Let me know if that seems sufficient. It would be nice to avoid the awkward callback through the ReplicaStateMachine if possible.

junrao · 2019-04-23T01:36:03Z

+    assertEquals(Set("foo"), controllerContext.topicsWithDeletionStarted)
+    assertEquals(Set("foo"), controllerContext.topicsIneligibleForDeletion)
+    assertEquals(onlineReplicas, controllerContext.replicasInState("foo", ReplicaDeletionSuccessful))
+    assertEquals(offlineReplicas, controllerContext.replicasInState("foo", OfflineReplica))


This is an existing issue. At this stage, it seems that offlineReplicas should remain in ReplicaDeletionIneligible.

Note I created https://issues.apache.org/jira/browse/KAFKA-8283 for this and the other issue below.

junrao · 2019-04-23T01:41:23Z

+    assertEquals(offlineReplicas, controllerContext.replicasInState("foo", ReplicaDeletionIneligible))
+    assertEquals(onlineReplicas, controllerContext.replicasInState("foo", ReplicaDeletionStarted))
+
+    // Broker 2 is restarted. The offline replicas remain ineligable


Yes, this does seem to be an existing issue. It seems that deletionManager.resumeDeletionForTopics should bring offlineReplicas to ReplicaDeletionStarted.

Ack. I will file a JIRA and we can address this separately.

junrao

@hachikuji : Thanks for the updated PR. LGTM. Just one more comment below.

junrao · 2019-04-23T20:57:24Z

+  def addStopReplicaRequestForBrokers(brokerIds: Seq[Int], topicPartition: TopicPartition, deletePartition: Boolean): Unit = {
    brokerIds.filter(_ >= 0).foreach { brokerId =>
+      def callback(stopReplicaResponse: AbstractResponse): Unit = {
+        if (deletePartition && controllerContext.isTopicQueuedUpForDeletion(topicPartition.topic))


When there is a pending partition reassignment task, a deleted topic will be queued up but not executed until the reassignment completes. So, we probably want to further check that isTopicIneligibleForDeletion(topic) is false.

junrao

@hachikuji : Thanks for the new patch. A couple of more comments below.

junrao · 2019-04-23T22:05:08Z

    brokerIds.filter(_ >= 0).foreach { brokerId =>
+      def callback(stopReplicaResponse: AbstractResponse): Unit = {
+        if (deletePartition)
+          controller.eventManager.put(controller.StopAndDeleteReplicaResponseReceived(stopReplicaResponse, brokerId))


This is actually trickier than I thought. In sendRequestsToBrokers(), we only group replicas with no deletion and no callback in a single StopReplicaRequest. By always including a callback, it disables batching in StopReplicaRequest.

junrao · 2019-04-23T22:06:12Z

+
+      // Here we are only handling partitions which are part of topic deletion
      val responseMap = stopReplicaResponse.responses.asScala
+        .filterKeys(tp => controllerContext.isTopicQueuedUpForDeletion(tp.topic))


We probably also want to check that the topic is eligible for deletion.

I have a little trouble following the state machine. I think we will transition to ReplicaDeletionIneligible if either we get a StopReplica response with an error or if the replica fails during deletion. I think it would be possible for a pending StopReplica response to be received after the broker had been marked offline, so I think the suggestion makes sense. Just to be clear though, this seems to be an existing problem?

I was concerned about the case when a topic is queued for deletion behind a pending partition reassignment. In that case, the pending reassignment could send StopReplicaRequest with deletePartition=true. In the current logic, there is no callback in that case. So, the replicas won't be transitioned to ReplicaDeletionIneligible or ReplicaDeletionSuccessful until the topic is eligible for deletion. I was just trying to preserve that logic now that we add a callback as long as deletePartition=true.

Got it. That makes sense.

@junrao Unfortunately, we have to skip the topic ineligible check in order to continue to completeReplicaDeletion while another replica is offline. Some test cases are failing because of this. I think we should be installing a callback anyway on reassignment so that we can retry, but we can leave this for a future patch. For now, I will see if I can find a nice way to preserve existing logic.

junrao

@hachikuji : Thanks for the latest PR. LGTM

ijuma · 2019-04-24T02:23:57Z

I noticed a discussion about callbacks and batching. See #6515

hachikuji · 2019-04-24T06:27:48Z

@ijuma Thanks for the link. I think this patch achieves the same effect. I solve it by removing the callback parameter and checking the deletion flag directly.

junrao · 2019-04-24T20:55:30Z

+          val currentState = controllerContext.replicaState(replica)
+          logSuccessfulTransition(replicaId, replica.topicPartition, currentState, ReplicaDeletionStarted)
+          controllerContext.putReplicaState(replica, ReplicaDeletionStarted)
+          val topicDeletionInProgress = controllerContext.isTopicQueuedUpForDeletion(replica.topicPartition.topic)


I think we need to check if a topic is eligible for deletion. A topic can be queued up for deletion but is blocked behind a pending partition reassignment, which will also transition old replicas to ReplicaDeletionStarted at some point.

The problem seems to be that we are using the topicsIneligibleForDeletion set for two unrelated purposes. We use it to track topics which are awaiting reassignment before deletion and we also use it to indicate topics which have had some replica deletion failure. The problem is that we may transition a replica to ReplicaDeletionStarted in both cases. It's becoming tempting to add back that nasty callback, but let me try one more time to salvage this.

junrao

@hachikuji : Thanks for the latest update. LGTM

* ak/trunk: (42 commits) KAFKA-8134: `linger.ms` must be a long KAFKA-7779; Avoid unnecessary loop iteration in leastLoadedNode (apache#6081) MINOR: Update Gradle to 5.4.1 and update its plugins (apache#6436) MINOR: improve Session expiration notice (apache#6618) KAFKA-8029: In memory session store (apache#6525) MINOR: In-memory stores cleanup (apache#6595) KAFKA-7862 & KIP-345 part-one: Add static membership logic to JoinGroup protocol (apache#6177) KAFKA-8254: Pass Changelog as Topic in Suppress Serdes (apache#6602) KAFKA-7903: automatically generate OffsetCommitRequest (apache#6583) KAFKA-8291 : System test fix (apache#6637) MINOR: Do not log retriable offset commit exceptions as errors (apache#5904) MINOR: Fix log message error of loadTransactionMetadata (apache#6571) MINOR: Fix 404 security features links (apache#6634) MINOR: Remove an unnecessary character from broker's startup log MINOR: Make LogCleaner.shouldRetainRecord more readable (apache#6590) MINOR: Remove implicit return statement (apache#6629) KAFKA-8237; Untangle TopicDeleteManager and add test cases (apache#6588) KAFKA-8227 DOCS Fixed missing links duality of streams tables (apache#6625) MINOR: reformat settings.gradle to be more readable (apache#6621) MINOR: Correct RestServerTest formatting ... Conflicts: build.gradle settings.gradle

The controller maintains state across `ControllerContext`, `PartitionStateMachine`, `ReplicaStateMachine`, and `TopicDeletionManager`. None of this state is actually isolated from the rest. For example, topics undergoing deletion are intertwined with the partition and replica states. As a consequence of this, each of these components tends to be dependent on all the rest, which makes testing and reasoning about the system difficult. This is a first step toward untangling all the state. This patch moves it all into `ControllerContext` and removes many of the circular dependencies. So far, this is mostly a direct translation, but in the future we can add additional validation in `ControllerContext` to make sure that state is maintained consistently. Additionally, this patch adds several mock objects to enable easier testing: `MockReplicaStateMachine` and `MockPartitionStateMachine`. These have simplified logic for updating the current state. This is used to create some new test cases for `TopicDeletionManager`. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>

KAFKA-8237; Untangle TopicDeleteManager and add test cases

cd05e99

hachikuji commented Apr 17, 2019

View reviewed changes

Comment thread core/src/main/scala/kafka/controller/ControllerChannelManager.scala

jsancio reviewed Apr 17, 2019

View reviewed changes

Add additional deletion test case and address comments

f45aba5

hachikuji changed the title ~~KAFKA-8237 [WIP]; Untangle TopicDeleteManager and add test cases~~ KAFKA-8237; Untangle TopicDeleteManager and add test cases Apr 18, 2019

Do not use option type in updateBrokerMetadata

4b85b48

jsancio reviewed Apr 18, 2019

View reviewed changes

Declare type of liveBrokerIdAndEpochs

64b3dc7

junrao reviewed Apr 23, 2019

View reviewed changes

Address comments

9e6496c

hachikuji force-pushed the KAFKA-8237 branch from dc83ca5 to 9e6496c Compare April 23, 2019 20:14

junrao approved these changes Apr 23, 2019

View reviewed changes

Check topics undergoing deletion after receiving StopReplica response

ee54087

hachikuji force-pushed the KAFKA-8237 branch from 585d68f to ee54087 Compare April 23, 2019 21:46

junrao reviewed Apr 23, 2019

View reviewed changes

Do not provide a response callback if partition is not being deleted

dd9c53b

junrao approved these changes Apr 24, 2019

View reviewed changes

ijuma reviewed Apr 24, 2019

View reviewed changes

Comment thread core/src/main/scala/kafka/controller/ElectionUtil.scala Outdated

hachikuji added 2 commits April 24, 2019 00:07

Only install stop replica callback when topic is being deleted

178eaaa

Rename ElectionUtil to Election

e58ac48

junrao reviewed Apr 24, 2019

View reviewed changes

Only provide StopReplica callback if deletion has begun

6b68fbe

junrao approved these changes Apr 24, 2019

View reviewed changes

hachikuji merged commit 17c8016 into apache:trunk Apr 25, 2019

hachikuji mentioned this pull request May 11, 2019

KAFKA-8171: Set callback to null in addStopReplicaRequestForBrokers when replica … #6515

Closed

3 tasks

Conversation

hachikuji commented Apr 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

Uh oh!

jsancio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jsancio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji commented Apr 20, 2019

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji Apr 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

ijuma commented Apr 24, 2019

Uh oh!

hachikuji commented Apr 17, 2019 •

edited

Loading

hachikuji Apr 24, 2019 •

edited

Loading