Skip to content

KAFKA-8179: Part 7, cooperative rebalancing in Streams#7386

Merged
guozhangwang merged 18 commits intoapache:trunkfrom
ableegoldman:KIP-429-streams-pt2
Oct 7, 2019
Merged

KAFKA-8179: Part 7, cooperative rebalancing in Streams#7386
guozhangwang merged 18 commits intoapache:trunkfrom
ableegoldman:KIP-429-streams-pt2

Conversation

@ableegoldman
Copy link
Copy Markdown
Member

@ableegoldman ableegoldman commented Sep 25, 2019

Key improvements with this PR:

  • tasks will remain available for IQ during a rebalance (but not during restore)
  • continue restoring and processing standby tasks during a rebalance
  • continue processing active tasks during rebalance until the RecordQueue is empty*
  • only revoked tasks must suspended/closed
  • StreamsPartitionAssignor tries to return tasks to their previous consumers within a client

*but do not try to commit, for now (pending KAFKA-7312)

@ableegoldman ableegoldman changed the title KAFKA-8179: Part 7, cooperative rebalancing in Streams KAFKA-8179: Part 7, cooperative rebalancing in Streams [WIP] Sep 25, 2019
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't let the SubscriptionState wipe out existing partition state (including uncommitted offsets) when the new assignment comes in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! I think we did not hit this since we always revoke everything before :P

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow PARTITIONS_REVOKED to transition to itself (but callback is a no-op if no new partitions have been revoked)

@ableegoldman ableegoldman changed the title KAFKA-8179: Part 7, cooperative rebalancing in Streams [WIP] KAFKA-8179: Part 7, cooperative rebalancing in Streams Sep 26, 2019
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods were duplicates

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We removed key from the map above and store contents in expectedCount, should use that in message

@ableegoldman
Copy link
Copy Markdown
Member Author

Jenkins seems to be having some issues today but all tests pass locally, kicked off system tests here: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3031/

@ableegoldman
Copy link
Copy Markdown
Member Author

@bbejeck @vvcephei

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the subscription came from a client following the cooperative protocol, it will have embedded ownedPartitions instead of activeTasks

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new way to generate assignments that falls back to the old (interleaved) assignment if we can't avoid giving away a previously owned partition and triggering a second rebalance. I tried to comment it sufficiently well to explain what it's doing, but please give it a look and lmk if anything doesn't make sense

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments lgtm.

@ableegoldman
Copy link
Copy Markdown
Member Author

Still debugging the streams_eos_test:StreamsEosTest but all others pass: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3042/

Jenkins failures cleaned up, retest this please

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually fine to call standby.commit during a rebalance, I just wasn't sure how that would work with the metrics (it would screw with commit-latency to be sometimes recording the latency of active + standby commits, and sometimes just standby). Maybe we could still commit the standbys but pretend like we didn't call commit (ie, not record anything?) Or would that screw up the metrics even more @guozhangwang

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a somewhat related note, I notice we don't actually call commit on restoring tasks but we do on standbys. Is there a particular reason for this? What is there to commit for a standby task but not for a restoring?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we have separate tags for standby and primary tasks commit, I don't think the metrics will be messed up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do though -- currently we just call commitSensor.record(intervalCommitLatency / (double) committed, now); where the commitLatency is for committing all tasks

@bbejeck bbejeck added the streams label Oct 1, 2019
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During version probing rebalances we used to just give all clients back their old assignments, except for "future" consumers who would get an empty assignment since we can't interpret their subscriptions. But we can now determine their previous active tasks by the ownedPartitions so we can now put them in their assignment.

But actually we might as well just do our best to generate a "real" (not version probing) assignment during this rebalance, so that on the second rebalance we will not have to revoke any partitions and trigger a third rebalance. So I consolidated the assignment generation into a separate method that both computeNewAssignment and versionProbingAssignment can use

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be kind of an ugly merge/rebasing ahead once this version probing bugfix PR is fixed, but I think the overall behavior will remain the same

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice cleanup!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this trace for?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it might be helpful to see that we are not committing some of what we've processed, because it is we can't commit during a rebalance. But I don't think it's absolutely necessary and can take it out if you don't think it adds much?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to leave it as TRACE, as practically we do not turn on TRACE that frequently.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we have separate tags for standby and primary tasks commit, I don't think the metrics will be messed up.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to restore this test if we choose to keep standby tasks metrics.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the test because we no longer suspend standbys (we still have a test for recording metrics on task close)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processor used here keeps track of the numRecordsProcessed and outputs this whenever it's a multiple of 100. It also resets in init, so the eos test watches for a rebalance and then looks for the numRecordsProcessed to hit 500 again, knowing it should have started over again from 0.
But the whole point of cooperative rebalancing is we may not have needed to revoke/reinitialize this task during a rebalance, and the counter may be reset after a rebalance.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished one pass.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! I think we did not hit this since we always revoke everything before :P

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup! In #5501 we moved the close / abort logic into the caller suspend function, and we should actually remove the parameter in that PR but overlooked it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a meta comment: maybe another (equally hacky?) way to do this, is to expose the MemberState from ConsumerMetadata (we are exposing this in KIP-447). And then in Streams, we can check that state after each poll call -- remember that state can only change within the poll call, and depending on that we can decide whether or not commit.

As for now I think this way is fine, cannot really think of a better way that does not change public APIs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit anti-pattern to use null and "" indicating two different sentinel cases: the partitions is owned by different clients, or the task is new. I think there's a better way to save on not redundantly iterating through taskPartitions: e.g. we let it to return a Collection<String> (and rename to previousConsumers... which is the union of the claimed owners of the partitions, and then in the caller we can just treat: 1) it's a singleton, 2) it's empty, 3) it's a plural, differently.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I knew this would get put down in the PR review but didn't get around to cleaning it up in time...thanks for the suggestion, that is much better 😄

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my other comment: since now allOwnedPartitions is just a union of clientOwnedPartitions I think it is not needed anymore inside the assign function.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we still need allOwnedPartitions here because we need to distinguish between partitions which are not owned within this client vs partitions which are not owned by anyone in any client. ie just because no one in the client claims a partition as owned doesn't mean it is safe to give that partition/task away, since it might still be owned by someone in the group and need to be revoked

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read was that the keyset of Map<TopicPartition, String> clientOwnedPartitions and Set<TopicPartition> allOwnedPartitions are the same, but now I realized the former is only for a single client (as with my other comment above) actually.. Now this logic makes sense.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to give good self-explanatory names to all your variables 😉 Let me know if you think of any names that better explain what allOwnedPartitions is supposed to be

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this can happen? We checked !newTasks.isEmpty() before already right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a loop inside the main while (!newTasks.isEmpty()) loop, where we are actively removing things from newTasks. So, we might have polled the last task in the while (consumerIt.hasNext()) loop before we get back to the outer loop and check newTasks.isEmpty
This code is actually pretty much the same code as in the interleaveTasksByGroupId method. I'll see if it can be moved out into a shared method.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this variable can be reduced to a local one since it's only needed inside computeNewAssignment: before loop over the client initialize it to false. And then let giveTasksBackToConsumers to return empty map if it needs to beak fail fast (maybe rename to tryGiveTasksBackToConsumsers). And the caller becomes:

if (rebalanceRequired || state.ownedPartitions().isEmpty())
    assignment = interleaveTasksByGroupId
else if ((assignment = giveTasksBackToConsumers).equals(empty))
    assignment = interleaveTasksByGroupId
    rebalanceRequired = true

WDYT?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test coverage is not sufficient with the augmented logic here, we should at least test the following code path:

  1. among clients, with EAGER set we still try to honor stickiness based on user metadata (this may be covered already).
  2. among clients, with COOPERATIVE we still try to honor stickiness based on owned partitions.
  3. within a client, we try to reassign tasks back to consumers if the prev consumers are fixed, and load balance is not violated.
  4. within a client, we if we cannot satisfy 3), we interleave.
  5. within a client, upon version probing we interleave.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was in the middle of extending the tests when I found the version probing bug -- I'll get back to them once I've rebased

@ableegoldman ableegoldman force-pushed the KIP-429-streams-pt2 branch 2 times, most recently from e92caf1 to b97e8ea Compare October 3, 2019 02:18
Copy link
Copy Markdown
Member Author

@ableegoldman ableegoldman Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can potentially be in PARTITIONS_REVOKED with cooperative rebalancing so we don't want to just block doing nothing during the rebalance -- not sure if it's worth polling for zero since this is rare with cooperative, or some other time < pollTime? cc/ @guozhangwang

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In EAGER, during the PARTITIONS_REVOKED state we would not return any data from consumer anyways; In future COOPERATIVE, even if we can return some data during the rebalance, the transition of PARTITIONS_REVOKED -> PARTITIONS_ASSIGNED would happen in a single consumer.poll call, and only very rarely we would stay in PARTITION_REVOKED after consumer.poll if the subscription changed. So I think poll with zero sounds good to me.

Also note that in my PR for returning data in the middle of a rebalance, we still pass in non-zero timeout for finding the coordinator so that we are ensured to have one round-trip at least within that call.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own education purpose, why we need to use TreeMap as underlying return struct?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation is to avoid the random order of consumers in HashMap, so that we hopefully end up with a similar task -> consumer assignment in subsequent rebalances.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own education purpose, what's the benefit we get from initializing a linked list vs a general array list?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't choose this this so I can only guess, but probably so we can easily/efficiently poll and remove the last element until empty..?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about state.activeTaskCount() / consumers.size() + 1?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work when numConsumers evenly divides numTasks

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow the logic here. If this partition is owned by the current owner, then it should appear in allOwnedPartitions which means we expect a count of 2 for the final result of previousConsumers.
While outside we check for this set to be > 1 and early terminate, is that true?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this partition is owned by a consumer in this client, then currentPartitionConsumer != null and currentPartitionConsumer gets added to the previousConsumers set. We don't even check allOwnedPartitions in that case since it's in an "else if"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic here is not very intuitive as we are doing a double loop here. What we could do is to loop through all the unfilled consumers and fetch unfulfilled amount of new tasks to fed them. This could also reduce the number of loop cycles we have to go through.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the goal here is to interleave the tasks of the same groupId (ie subtopology) across different consumers as much as possible, since some subtopologies might be quite heavy while others are quite light. (This is what we do in interleaveTasksByGroupId also)
It is a little unintuitive I agree, so let me know if you have any suggestions for clearer code and/or comments. But if it's an consolation,
a) we probably aren't looping through that many remaining consumers and/or tasks
b) we may be doing a nested loop but within the inner loop we are removing things from the outer loop. So, in the end we are actually only looping over newTasks and hitting each task in it once

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, could you reflect your goal as meta comments for interleaveTasksByGroupId? It's a bit hard to interpret the goal just by reading this code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll try to shore up the explanation for interleaveTasksByGroupId and then refer to that here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ableegoldman , I think you need to add the missing "upgrade from" versions to the configuredMetadataVersion method below as well. Otherwise, creating Streams will just throw an exception when you pass (for example) UPGRCE_FROM_23.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks yeah good catch! That reminds me I still need to add test that actually configures the assignor using these.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, not sure if it makes sense to have a test that you can actually construct and start Streams with all versions of "upgrade.from", which would have caught this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well no, I just added a test that uses UPGRADE_FROM_23 (and one that doesn't)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which would have caught that. I'm wondering if it actually makes more sense to not throw an exception in the default and just check beforehand whether its a valid config (ie is it contained in some "valid config values" set, which we actually may already have somewhere in StreamsConfig?)
It's not really practical to have to add new UPGRADE_FROM versions to an increasing number of places in the code or else get an exception thrown. Some cleanup to think about once this is over

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the cleanup front. The current code makes sense when you just have one usage. As soon as we add a second, we probably need a different strategy.

@ableegoldman
Copy link
Copy Markdown
Member Author

Doing another run of system tests but initial run passed (exception for already flaky test_broker_type_bounce) : http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2019-10-03--001.1570083170--ableegoldman--KIP-429-streams-pt2--e92caf1/report.html

@ableegoldman
Copy link
Copy Markdown
Member Author

Another (this time completely) green run of the system tests: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3142/

@guozhangwang
Copy link
Copy Markdown
Contributor

Test failures come from connect and this one:

SmokeTestDriverIntegrationTest.shouldWorkWithRebalance

Not sure if it is consistent, need to re-run.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made another pass on the PR, thanks for the added unit tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to leave it as TRACE, as practically we do not turn on TRACE that frequently.


AssignedPartition(final TaskId taskId,
final TopicPartition partition) {
final TopicPartition partition) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about just put these two in one line? We usually only use multi-lines if there are 3+ parameters.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah must have changed that by mistake, will revert

private Map<String, Assignment> errorAssignment(final Map<UUID, ClientMetadata> clientsMetadata,
final String topic,
final int errorCode) {
final String topic,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional to not align parameters in the function definition?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, IDE must have auto-indented some things (wrongly)

if (repartitionTopicMetadata.get(sourceTopicName).numberOfPartitions().isPresent()) {
numPartitionsCandidate = repartitionTopicMetadata.get(sourceTopicName).numberOfPartitions().get();
if (repartitionTopicMetadata.get(sourceTopicName).numberOfPartitions()
.isPresent()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggested by IDE? :P I feel it is not necessary but if I do not feel strongly either.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a bunch of the indentation got messed up somehow, I agree this is not necessary

") prevActiveTasks: (" + prevActiveTasks +
") prevStandbyTasks: (" + prevStandbyTasks +
") prevAssignedTasks: (" + prevAssignedTasks +
") prevOwnedPartitions: (" + ownedPartitions.keySet() +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prevOwnedPartitionsByConsumerId

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should we check in state.addPreviousActiveTasks that the task ids were not added in other client metadata? I might be paranoid here but what if client A claims (cooperative) its ownership of partition1 which maps to task1, while client B (eager) encodes task1 as its prev owned tasks with empty owned partitions?


// If the partition is new to this consumer but is still owned by another, remove from the assignment
// until it has been revoked and can safely be reassigned according the COOPERATIVE protocol
if (newPartitionForConsumer && allOwnedPartitions.contains(partition)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that a task maps to multiple partitions, while only some of them have old owners, would that cause us to encode a partial task here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. We should collect the assignedPartitions for a single task and only add them to the assignedPartitions list at the end, if all can be safely assigned


// If this consumer previously owned more tasks than it has capacity for, some must be revoked
if (assignments.get(consumer).size() >= maxTasksPerClient) {
return Collections.emptyMap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a debug entry as well?

previousStandbyTaskAssignment.put(prevAssignedTask, new HashSet<>());
}
for (final TaskId prevAssignedTask : clientState.getValue().prevStandbyTasks()) {
previousStandbyTaskAssignment.computeIfAbsent(prevAssignedTask, t -> new HashSet<>());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

return info;
}

private void assertEquivalentAssignment(final Map<String, List<TaskId>> thisAssignment,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just rely on the equals function of map and list here? I think it checks for "exact equality", while here otherAssignment being a super-set of thisAssignment can still pass.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we also check for the size to be the same, but this could be improved I guess. I'm not sure we can use the map equals because the values are not necessarily equal until sorted.

But we could sort them and then use list equality instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG.

@guozhangwang
Copy link
Copy Markdown
Contributor

The new jenkins failures are due to timed out after 270 min (2.12 and 2.11) and connect integration test (2.13). And there's no overlapping test failures in consecutive runs.

Will merge to trunk now.

@guozhangwang guozhangwang merged commit d88f104 into apache:trunk Oct 7, 2019
guozhangwang pushed a commit that referenced this pull request Oct 7, 2019
Key improvements with this PR:

* tasks will remain available for IQ during a rebalance (but not during restore)
* continue restoring and processing standby tasks during a rebalance
* continue processing active tasks during rebalance until the RecordQueue is empty*
* only revoked tasks must suspended/closed
* StreamsPartitionAssignor tries to return tasks to their previous consumers within a client
* but do not try to commit, for now (pending KAFKA-7312)


Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
@guozhangwang
Copy link
Copy Markdown
Contributor

Also cherry-picked to 2.4

ableegoldman added a commit to ableegoldman/kafka that referenced this pull request Nov 13, 2019
ijuma added a commit to ijuma/kafka that referenced this pull request Apr 28, 2020
…t-for-generated-requests

* apache-github/trunk:
  KAFKA-8932; Add tag for CreateTopicsResponse.TopicConfigErrorCode (KIP-525) (apache#7464)
  KAFKA-8944: Fixed KTable compiler warning. (apache#7393)
  KAFKA-8964: Rename tag client-id for thread-level metrics and below (apache#7429)
  MINOR: remove unused imports in Streams system tests (apache#7468)
  KAFKA-7190; Retain producer state until transactionalIdExpiration time passes (apache#7388)
  KAFKA-8983; AdminClient deleteRecords should not fail all partitions unnecessarily (apache#7449)
  MINOR: Modified Exception handling for KIP-470 (apache#7461)
  KAFKA-7245: Deprecate WindowStore#put(key, value) (apache#7105)
  KAFKA-8179: Part 7, cooperative rebalancing in Streams (apache#7386)
  KAFKA-8985; Add flexible version support to inter-broker APIs (apache#7453)
  MINOR: Bump version to 2.5.0-SNAPSHOT (apache#7455)
@mjsax mjsax added the kip Requires or implements a KIP label Jun 12, 2020
@ableegoldman ableegoldman deleted the KIP-429-streams-pt2 branch June 26, 2020 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kip Requires or implements a KIP streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants