KAFKA-6145: KIP-441 Build state constrained assignment from balanced one by ableegoldman · Pull Request #8497 · apache/kafka

ableegoldman · 2020-04-16T05:10:35Z

John's awesome TaskAssignorConvergenceTest revealed some issues with the current assignor, which he nailed down as being due to the state constrained and balanced assignments not converging.

One way to get an assignment that is as close to the balanced assignment as possible while still being state constrained is of course to start with the balanced assignment, and move tasks around as necessary to satisfy the state constraint. With this basic approach, the converge test is passing.

This PR also includes some semi-orthogonal refactoring, most significantly the removal of the assignment maps; we now just immediately assign tasks to the ClientState rather than first sticking them in an intermediate map.

Also moves ValidClientsByTaskLoadQueue to its own file.

Apologies for the length of this PR due to the above, but it didn't seem reasonable to do things the wrong way in the parts I changed, just so they could be undone in a followup PR along with the other parts.

ableegoldman · 2020-04-17T18:28:33Z

I just moved this class to its own file from HATA, with one main change: we now just pass in the criteria to consider a client a valid candidate for a task.
The original criteria was that the client has no other version of this task already, but now we are flexible enough to use other validation criteria (eg that the client is caught-up on this task)

I'd just like to say what an awesome tool for optimization this class is. Kudos to you and @cadonna .

I do not remember having contributed to this awesomeness. It is all @ableegoldman 's merit.

Ah, sorry about that @ableegoldman ; I wasn't able (or was too lazy) to follow the git praise trail through the class movement. Well, kudos to you, then. :)

vvcephei · 2020-04-17T18:41:09Z

test this please

vvcephei

Dropping in my first (partial pass) batch of review comments, so I can kick off the tests again for you. I'm actively continuing with my review...

vvcephei · 2020-04-17T19:31:49Z

Suggested change

final PriorityQueue<UUID> queue = new PriorityQueue<>(

(client, other) -> {

final double clientTaskLoad = clientStates.get(client).taskLoad();

final double otherTaskLoad = clientStates.get(other).taskLoad();

if (clientTaskLoad < otherTaskLoad) {

return -1;

} else if (clientTaskLoad > otherTaskLoad) {

return 1;

} else {

return client.compareTo(other);

}

});

final PriorityQueue<UUID> queue = new PriorityQueue<>(

Comparator.comparingDouble(k -> clientStates.get(k).taskLoad())

);

Upon second reading, this does the same thing, right?

Almost, we want to fall back to comparing the actual UUIDs if the taskLoad happens to be equal

Ah, thanks.

vvcephei · 2020-04-17T21:09:32Z

test this please

vvcephei · 2020-04-17T21:09:42Z

test this please

vvcephei · 2020-04-17T21:09:49Z

test this please

ableegoldman · 2020-04-17T21:56:38Z

This test now produces different assignments depending on which task assignor is used. Since the only thing its verifying is the actual assignment, and that's not really the responsibility of the StreamsPartitionAssignor anyway, I thought it made the most sense to just remove

Should we instead adapt the test to verify that it produces a valid assignment for mixed instances during version probing? Or is that already covered?

I don't mean to totally cop out on this, but I think we should do this in a followup PR. I'll make a ticket and assign it to myself for later so I can't escape, but I don't even think it's worth marking it @Ignore for now.
Tbh we should have removed it a while ago, rather than changing it over time to become its useless self today. It's a long history, and I'm mostly responsible, but just looking ahead the question now is: what do we even want to validate? The task assignor has no knowledge of version probing, and the partition assignor is not responsible for the task assignment (whereas it used to be with version probing, hence this test). What we should do is validate the inputs are being assembled sensibly during version probing.
Anyways this will be really difficult to do just based on the final partition assignment, and even harder to distinguish a real failure from an unrelated one. So I'd propose to kick this into the future, when we embed the actual assignor class in the configs instead of this flag, and then pass in a VersionProbingClientStatesValidatingAssignor or whatever...SG?

Probably a much longer answer than you ever wanted, but this test has been haunting me over many PRs 👀

vvcephei · 2020-04-17T22:03:33Z

test this please

vvcephei · 2020-04-17T22:03:42Z

test this please

vvcephei · 2020-04-17T22:44:13Z

test this please

vvcephei · 2020-04-17T22:44:19Z

test this please

vvcephei · 2020-04-17T22:44:26Z

test this please

vvcephei · 2020-04-18T13:11:36Z

test this please

vvcephei · 2020-04-18T13:11:55Z

test this please

ableegoldman · 2020-04-18T21:04:20Z

One build failed with unrelated RocksDBTimestampedStoreTest.shouldVerifyThatMetricsGetMeasurementsFromRocksDB (should actually be fixed already by #8510)

vvcephei

Ok @ableegoldman , I finally got the whole review in! Unfortunately, there are at least a few things that should be addressed before we can merge it.

Thanks so much for this fix!

vvcephei · 2020-04-18T20:16:24Z

Unless I missed something, it's possible for tasksToCaughtUpClients not to contain a task, which would give us an NPE. Can we either add a test and handle the case or assert it here with an IllegalStateException, so we don't have to chase down an NPE later?

Well, by definition every task in here must have at least one caught-up client. I'll add the IllegalStateException

I might have dropped the thread of logic here. Why is that by definition? It looks like all we know about the task is that it's not caught up on the destination client. Why do we think it's caught up on some other client?

I think we've hit the same source of confusion as in the other thread. But anything in taskMovements, and in fact any task that has an associated TaskMovement object must have at least one caught-up client. If it didn't, we wouldn't be creating a warmup task for it; that's just a normal standby. A warmup replica always implies there is an active version elsewhere on a caught-up client

Yep, you're right. I'm on the same page now. So the only risk is that the code changes elsewhere and breaks your invariant.

I'll leave it to you whether you want to check the invariant and throw an exception or just let it be an NPE if that happens.

Yeah I am a bit worried about protecting against future changes (very much including those by myself a few months from now). I have a thought about how to enforce things a bit better, let's see where this goes...

Alright I decided to just push the validation into the TaskMovement constructor, and skip the check here

vvcephei · 2020-04-18T20:48:47Z

I'd just like to say what an awesome tool for optimization this class is. Kudos to you and @cadonna .

vvcephei · 2020-04-18T21:01:35Z

Should we instead adapt the test to verify that it produces a valid assignment for mixed instances during version probing? Or is that already covered?

vvcephei · 2020-04-18T21:03:01Z

cadonna

@ableegoldman Thank you for the PR

Here my feedback

cadonna · 2020-04-20T13:13:44Z

I love it when a comment gets killed by a meaningful method name!

cadonna · 2020-04-20T14:03:43Z

Q: I do not understand why we need uniqueClients here? Would it not suffice to check for clientsByTaskLoad.contains(client)?

I think it's just a computer-sciencey matter of principle. clientsByTaskLoad is a linear collection, so every offer would become O(n) if we did a contains call on it every time. Right now, it's only O(n) when we need to remove the prior record for the same client, and O(log(n)) otherwise.

Does it really matter? I'm not sure.

Got it!

That means, we get O(n) for all cases where we first poll() and then offer() the same clients because those clients are contained in uniqueClients, i.e.:

HighAvailabilityTaskAssignor@155

HighAvailabilityTaskAssignor@131

TaskMovement@94

ValidClientsByTaskLoadQueue@76

ValidClientsByTaskLoadQueue@88

Those are the majority of the calls to offer() and offerAll(). Additionally, the last two occurrences in the list are called in each call to poll(). In poll(), if the top does not satisfy the criteria it is added to invalidPolledClients which then is added with offerAll(). For each element of invalidPolledClients the whole queue clientsByTaskLoad is scanned, since each element is contained in uniqueClients but not in clientsByTaskLoad. This results in O(n^2).

AFAIU, we need the uniqueness check because of TaskMovement@99.

If we update uniqueClients also in poll(), we would avoid O(n^2) for poll() and restrict O(n) to the case at TaskMovement@99.

Does it really matter?

I'm not also sure. Performance test would be the only way to tell.

Gah! You're right. We should also remove the client from uniqueClients when we poll.

@cadonna you're right, I forgot to remove from uniqueClients in poll. Good catch

cadonna · 2020-04-20T15:27:22Z

I do not remember having contributed to this awesomeness. It is all @ableegoldman 's merit.

cadonna · 2020-04-20T16:41:58Z

Q: Why do we even care at all whether the task was running on the client? What if we just assign a real stand-by task if we have a spare one?

cadonna · 2020-04-21T09:58:14Z

Got it!

That means, we get O(n) for all cases where we first poll() and then offer() the same clients because those clients are contained in uniqueClients, i.e.:

HighAvailabilityTaskAssignor@155

HighAvailabilityTaskAssignor@131

TaskMovement@94

ValidClientsByTaskLoadQueue@76

ValidClientsByTaskLoadQueue@88

Those are the majority of the calls to offer() and offerAll(). Additionally, the last two occurrences in the list are called in each call to poll(). In poll(), if the top does not satisfy the criteria it is added to invalidPolledClients which then is added with offerAll(). For each element of invalidPolledClients the whole queue clientsByTaskLoad is scanned, since each element is contained in uniqueClients but not in clientsByTaskLoad. This results in O(n^2).

AFAIU, we need the uniqueness check because of TaskMovement@99.

If we update uniqueClients also in poll(), we would avoid O(n^2) for poll() and restrict O(n) to the case at TaskMovement@99.

Does it really matter?

I'm not also sure. Performance test would be the only way to tell.

cadonna · 2020-04-21T10:53:14Z

+                                                     final ClientState destinationClientState,
+                                                     final AtomicInteger remainingWarmupReplicas,
+                                                     final Map<TaskId, Integer> tasksToRemainingStandbys) {
+        if (destinationClientState.previousAssignedTasks().contains(task) && tasksToRemainingStandbys.get(task) > 0) {


This question from my previous review went unnoticed (or you did simply not care ;-)).

Q: Why do we even care at all whether the task was running on the client? What if we just assign a real stand-by task if we have a spare one?

I think I answered this already. We're trying not to decrease the overall availability the standbys are providing, which could happen if we drop a caught-up standby in order to warm up an empty node. We can certainly do better than what we do now, which is not very efficient in terms of task movement, but I think it's good enough for this PR.

vvcephei · 2020-04-21T16:17:00Z

Test this, please.

vvcephei · 2020-04-21T16:22:57Z

Test this, please.

vvcephei · 2020-04-21T16:23:45Z

Test this, please.

vvcephei

The only outstanding thing I see is that we need to remove the client from the priority queue in poll

vvcephei · 2020-04-21T19:23:10Z

test this please

vvcephei · 2020-04-21T19:24:26Z

test this please

vvcephei · 2020-04-21T22:08:35Z

Unrelated java 11 failures:
kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQueryAllStalePartitionStores

ableegoldman changed the title ~~KAFKA-6145: KIP-441 Build state constrained assignment from balanced one~~ [WIP] KAFKA-6145: KIP-441 Build state constrained assignment from balanced one Apr 16, 2020

ableegoldman commented Apr 17, 2020

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/TaskMovement.java Outdated

ableegoldman commented Apr 17, 2020

View reviewed changes

vvcephei mentioned this pull request Apr 17, 2020

KAFKA-6145: KIP-441: Add test scenarios to ensure rebalance convergence #8475

Merged

3 tasks

ableegoldman changed the title ~~[WIP] KAFKA-6145: KIP-441 Build state constrained assignment from balanced one~~ KAFKA-6145: KIP-441 Build state constrained assignment from balanced one Apr 17, 2020

ableegoldman force-pushed the KIP-441-redo-assignors-to-converge branch from 26f0557 to 82d4541 Compare April 17, 2020 20:44

vvcephei reviewed Apr 17, 2020

View reviewed changes

ableegoldman force-pushed the KIP-441-redo-assignors-to-converge branch from 69a3a9c to 596f228 Compare April 17, 2020 21:54

ableegoldman commented Apr 17, 2020

View reviewed changes

vvcephei reviewed Apr 18, 2020

View reviewed changes

cadonna reviewed Apr 20, 2020

View reviewed changes

ableegoldman added 7 commits April 20, 2020 15:26

fix bugs in previousAssignmentIsValid

42b300e

random convergence test is passing

22a6707

remove unused classes

957ee75

clean up TaskMovementTest

c2ba04e

fix broken HATAT test

73aa71a

fix remaining HATAT tests

ae3aebf

checkstyle

1e5cddb

ableegoldman added 5 commits April 20, 2020 15:26

fix up SPAT tests

e3271af

first set of github reviews

e0071fa

remove ignore annotation from convergence tests

8ebc11c

remove unused ignore import

2709ba0

first set of github review comments

864a539

ableegoldman force-pushed the KIP-441-redo-assignors-to-converge branch from e1920fe to 864a539 Compare April 21, 2020 03:31

add tests, helper method for ClientState in tests

67d20fd

cadonna reviewed Apr 21, 2020

View reviewed changes

ableegoldman added 3 commits April 21, 2020 12:14

github review prop

2ab532f

queue fix

7500412

wrap in method

2a6dc0b

vvcephei approved these changes Apr 21, 2020

View reviewed changes

vvcephei merged commit 5c548e5 into apache:trunk Apr 21, 2020

ableegoldman deleted the KIP-441-redo-assignors-to-converge branch June 26, 2020 22:39

-        final PriorityQueue<UUID> queue = new PriorityQueue<>(
-            (client, other) -> {
-                final double clientTaskLoad = clientStates.get(client).taskLoad();
-                final double otherTaskLoad = clientStates.get(other).taskLoad();
-                if (clientTaskLoad < otherTaskLoad) {
-                    return -1;
-                } else if (clientTaskLoad > otherTaskLoad) {
-                    return 1;
-                } else {
-                    return client.compareTo(other);
-                }
-            });
+        final PriorityQueue<UUID> queue = new PriorityQueue<>(
+            Comparator.comparingDouble(k -> clientStates.get(k).taskLoad())
+        );

Conversation

ableegoldman commented Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

vvcephei commented Apr 17, 2020

Uh oh!

vvcephei commented Apr 18, 2020

Uh oh!

vvcephei commented Apr 18, 2020

Uh oh!

ableegoldman commented Apr 18, 2020

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

ableegoldman commented Apr 16, 2020 •

edited

Loading