[KAFKA-6730] Simplify state store recovery by ConcurrencyPractitioner · Pull Request #4901 · apache/kafka

ConcurrencyPractitioner · 2018-04-19T17:50:50Z

No description provided.

ConcurrencyPractitioner · 2018-04-19T17:54:28Z

@mjsax Once the end offsets are fetched, how would you check if the we did fully restored? I still called restorePartition as it is because currently I am not sure how to correctly check the end offsets other than the way provided in restorePartition.

ConcurrencyPractitioner · 2018-04-20T00:52:29Z

On another note, I noticed how the check should be performed. So this should be ready for review!

mjsax · 2018-04-20T08:41:26Z

\cc @bbejeck @vvcephei

mjsax · 2018-04-20T08:42:04Z

@ConcurrencyPractitioner Thanks for the PR -- test failures seem to be related. Can you have a look before we review?

ConcurrencyPractitioner · 2018-04-20T20:30:16Z

Thanks Matthias. However, I am not to sure how to simplify this any further, more specifically how to avoid the check that a Task has migrated. It will probably take some time.

ConcurrencyPractitioner · 2018-04-20T22:30:24Z

@mjsax I will probably need some help on the logic.

ConcurrencyPractitioner · 2018-04-21T05:19:14Z

@mjsax I have passed the tests.

ConcurrencyPractitioner · 2018-04-24T22:23:12Z

@mjsax This is ready for review.

vvcephei · 2018-04-25T18:23:03Z

    }

+    private ConsumerRecords<byte[], byte[]> mergeRecords(Set<ConsumerRecords<byte[], byte[]>> allRecords) {
+        final Map<TopicPartition, List<ConsumerRecord<byte[], byte[]>>> mergedRecords = new HashMap<>();


Nit: this logic isn't much more complicated than what you're doing to build allRecords in the first place. Maybe just do this inline?

vvcephei · 2018-04-25T18:30:58Z

-            final ConsumerRecords<byte[], byte[]> allRecords = restoreConsumer.poll(10);
+            final Set<ConsumerRecords<byte[], byte[]>> allRecords = new HashSet<>();
+            while (true) {
+                final ConsumerRecords<byte[], byte[]> records = restoreConsumer.poll(10);


Just curious; why poll for 10ms in particular?

When I looked through the code, it was set to 10 previously. I just set it to this value
to follow the previous versions as much as possible.

vvcephei

One question and one nit, otherwise it looks fine to me.

bbejeck

Thanks for the PR, overall looks good. I have just one minor comment about the code itself.

But I have a meta-comment in the current version restores each batch returned from the poll call, while this proposed approach first collects all records upfront and holds them in memory for the restoration. I'm wondering if this could have an impact on applications with a significant amount of state to restore.

bbejeck · 2018-04-25T19:29:39Z

+                mergedRecords.put(partition, records.records(partition));
+            }
+        }
+        final ConsumerRecords<byte[], byte[]> result = new ConsumerRecords<>(mergedRecords);


nit: don't need result can return new ConsumerRecords<>(mergedRecords) directly

ConcurrencyPractitioner · 2018-04-27T03:24:07Z

I never thought about the amount of state which is needed to be restored. Thanks for pointing this out.
We could set a maximum amount of records at which point we will stop and then judging the restore.
@mjsax What do you think about this? Currently, polling until no data is returned is not advisable, particularily since the amount of data involved could lead to high latency.

ConcurrencyPractitioner · 2018-05-02T02:40:33Z

Hi, @bbejeck @mjsax . Would an extra config be required if we would want to cap the number of polled records?
I don't think that adding a configuration is necessary, particularly since it only adds complexity to a problem when originally we intended to simplify it.

mjsax

Thanks for updating the PR. Left some comments.

mjsax · 2018-05-05T15:23:57Z

-            final ConsumerRecords<byte[], byte[]> allRecords = poll(restoreConsumer, 10);
+            int totalNumberOfRecords = 0;
+            final Map<TopicPartition, List<ConsumerRecord<byte[], byte[]>>> allRecords = new HashMap<>();
+            while (totalNumberOfRecords < DEFAULT_MAX) {


I am not sure why we need DEFAULT_MAX here? Can't we just restore whatever poll(restoreConsumer, 10) returns?

Bill pointed out above that we do not want to restore too many records since it would cause too much latency. Also, in KAFKA-6730's description, it said to "to just consuming until poll() does not return any data". However, there is a chance this condition could not be fulfilled, particularly since poll() can continue to return records indefinitely, then we will terminate once we hit this DEFAULT_MAX parameter to ensure that the number of records restored will be relatively small.

@ConcurrencyPractitioner thanks for updating the PR. My point from before was that we should restore each batch of records returned from each poll() call vs. keeping all returned records in memory and start the restore process when there no more records to fetch. Sorry if I did not make that point very clear.

mjsax · 2018-05-05T15:26:19Z

+                final long pos = processNext(mergedRecords.records(partition), restorer, endOffset);
+                restorer.setRestoredOffset(pos);
+                if (restorer.hasCompleted(pos, endOffset)) {
+                    if (pos > endOffset) {


The goal of the ticket is to actually remove this check.

mjsax · 2018-05-05T15:26:37Z

+                    }
+                    if (restorer.offsetLimit() == Long.MAX_VALUE) {
+                        final Long updatedEndOffset = restoreConsumer.endOffsets(Collections.singletonList(partition)).get(partition);
+                        if (!restorer.hasCompleted(pos, updatedEndOffset)) {


we also want to remove this check

ConcurrencyPractitioner · 2018-05-05T16:58:24Z

Hi @mjsax. When you say to "fetch the end offsets" in the JIRA description, do you do it regularly between callbacks for poll() or are you planning to do it after poll() ceases to return any records? I am currently confused on this point.

ConcurrencyPractitioner · 2018-05-06T01:42:13Z

When I added the first conditional i.e. (if (pos > endOffset)) in the original code to this PR, I managed to pass four of the tests, the only one that failed was shouldThrowTaskMigratedExceptionIfChangelogTopicUpdatedDuringRestoreProcessFoundInSecondCheck. In contrast, when I added the only the second condition to this PR, I managed to get the following for failing tests:

org.apache.kafka.streams.processor.internals.StoreChangelogReaderTest.shouldThrowTaskMigratedExceptionIfEndOffsetGetsExceededDuringRestoreForChangelogTopicEOSEnabled
org.apache.kafka.streams.processor.internals.StoreChangelogReaderTest.shouldThrowTaskMigratedExceptionIfEndOffsetGetsExceededDuringRestoreForChangelogTopic

We might consider the need to change the tests, particularly since TaskMigratedException might be used in a different manner.

mjsax · 2018-05-07T16:17:04Z

@ConcurrencyPractitioner We would call endOffsets() only once. Note, that restoring interleaves with calling poll() on the main consumer. Thus, the overall flow should be:

mainConsumer.poll()
needsRestore?
yes -> get endOffset if unknown (we only do this once)
restoreConsumer.poll()
restore records
if reached endOffsets; set "needsRestore" to false

For the tests: because we change the behavior, we also need to update (or maybe even remove) some tests. It's expected that they don't pass as they test for current behavior that we change.

Does this make sense?

ConcurrencyPractitioner · 2018-05-08T21:42:41Z

@mjsax I have updated PR with your help. Thanks!
It should be ready for another round of review.

mjsax

Thanks for the update. Some follow up questions/comments.

mjsax · 2018-05-09T01:45:51Z

        try {
-            final ConsumerRecords<byte[], byte[]> allRecords = poll(restoreConsumer, 10);
-            for (final TopicPartition partition : restoringPartitions) {
-                restorePartition(allRecords, partition, active.restoringTaskFor(partition));


I think the JavaDoc of this method can be removed, because we don't call restorePartition() anymore and thus not TaskMigratedException can happen. Please backtrack the callers of this method an update JavaDocs accordingly for them, too, if necessary.

mjsax · 2018-05-09T02:23:24Z

+            while (!needsRestoring.isEmpty()) {
+                final ConsumerRecords<byte[], byte[]> records = poll(restoreConsumer, 10);
+                if (records.count() == 0) {
+                    break;


Do we need to check if restore is completed for some partitions? I think, with EOS and commit markers, there is a corner case that the check below does not detect that restore is complete even if we fetched all data (but not the final commit marker). For this case, records.count() could be zero but the actual position() for a partitions was advanced by 1 to step over the commit marker.

I am not too clear on this point. Do you have something specific in mind?

I think we need the same check as in https://github.com/apache/kafka/pull/4901/files#diff-46ed6d177221c8778965ecb1b6657be3R101

(might be good to extract this into a private method)

We should cover this corner case with a unit test, too. As the tests pass atm, it seems the corner case is not covered yet.

To summarize on what this check does, I have found that if this check is removed (if records.count()==0) then hanging tests will result since needsRestoring might still contain partitions even if poll() no longer returns any records. However, when I dug through the older version of the code, I could not find the check you are referring to. My closest guess is that you are thinking about comparing endOffsets.get(partition) to the restored offsets for that particular partition. (note, it is not the updatedEndOffsets field which is strictly used only in restore). Is this what you mean? Currently, my understanding is sketchy at best.

Sorry for not expressing my thoughts clearly.

The check if (records.count()==0) if fine. However, I think just doing a break is not good enough. Before the break we need to check if restore has completed for any partitions that is in restore phase -- and if yes, remove those partitions from needsRestore etc. Otherwise, it might happen that a partition stays in "restoring" phase forever, because count()==0 is always zero and we never to the check in line 101.

Does this make sense?

mjsax · 2018-05-09T02:24:13Z

-            final ConsumerRecords<byte[], byte[]> allRecords = poll(restoreConsumer, 10);
-            for (final TopicPartition partition : restoringPartitions) {
-                restorePartition(allRecords, partition, active.restoringTaskFor(partition));
+            final Map<TopicPartition, Long> endOffsets = restoreConsumer.endOffsets(restoringPartitions);


I am wondering, if this should be a class member that is update once at "restore begin" but not each time we check for the next "batch of records" ?

ConcurrencyPractitioner · 2018-05-10T01:13:03Z

Hi @mjsax, this should be able to resolve the error. I just moved the for() loop to the front of
the if loop, such that any partitions that have been completed will be removed prior to exiting from the outer while loop. In this manner, I think we can avoid leaving partitions which have finished restoring in the needsRestoring map field.

ConcurrencyPractitioner · 2018-05-11T01:37:58Z

Hi @mjsax Could you review? This PR is almost ready.

mjsax · 2018-05-11T17:11:01Z

-            final ConsumerRecords<byte[], byte[]> allRecords = poll(restoreConsumer, 10);
-            for (final TopicPartition partition : restoringPartitions) {
-                restorePartition(allRecords, partition, active.restoringTaskFor(partition));
+            updatedEndOffsets = !needsRestoring.isEmpty() ? 


This would update updateEndOffsets multiple time -- should we set it only once? Note, that needsRestoring will not be empty until the full restore is completed.

mjsax · 2018-05-11T17:13:18Z

-                restorePartition(allRecords, partition, active.restoringTaskFor(partition));
+            updatedEndOffsets = !needsRestoring.isEmpty() ? 
+                restoreConsumer.endOffsets(restoringPartitions) : updatedEndOffsets;
+            while (!needsRestoring.isEmpty()) {


We don't want to do it this way, because, we want restore and calling mainCosumer.poll() to interleave -- otherwise, we might drop out of the consumer group as restore is expected to take longer than max.poll.intervall.ms. Hence, within this method, we should only do a single poll(restoreCosumer, 10) an return afterwards -- the main loop will make sure that this method will be called again to resume the restore.

ConcurrencyPractitioner · 2018-05-11T23:53:30Z

@mjsax In local, when I ran StoreChangelogReaderTest, a single test didn't pass: shouldRestorePartitionsRegisteredPostInitialization. Upon further inspection, I discovered that end offsets were updated twice in this particular. Thinking this through, I think that updatedEndOffsets should continue retrieve new end offsets to account for this corner case. What do you think?

ConcurrencyPractitioner · 2018-05-12T18:27:42Z

Migrating to new PR, current branch is too old.

mjsax · 2018-05-16T04:21:50Z

Replaced by #5013

[KAFKA-6730] Simplify state store recovery

2d71391

Fixing logic

ebaf61d

mjsax requested review from guozhangwang and mjsax April 20, 2018 08:41

mjsax added the streams label Apr 20, 2018

Setting up better checks

b404e92

vvcephei reviewed Apr 25, 2018

View reviewed changes

vvcephei approved these changes Apr 25, 2018

View reviewed changes

bbejeck reviewed Apr 26, 2018

View reviewed changes

Richard Yu added 2 commits May 1, 2018 19:36

Bounding number of poll records

d87ab8c

Removing unneccesary method

8072e8f

Merge branch 'trunk' into KAFKA-6730

9639317

mjsax reviewed May 5, 2018

View reviewed changes

Attempt to remove conditionals

750c501

Richard Yu added 2 commits May 7, 2018 16:18

Closer approach to JIRA

3c3aa0d

Removing concurrenct modification

229fccf

Adding ignore tags to deprecated tests

2feb4f2

mjsax reviewed May 9, 2018

View reviewed changes

Richard Yu added 2 commits May 8, 2018 20:28

Improving performance to update end offsets only if restore is required

86a6b4d

Moving around checks

953f06d

mjsax reviewed May 11, 2018

View reviewed changes

Preventing additional calls to endOffsets

23c9f9e

ConcurrencyPractitioner closed this May 12, 2018

ConcurrencyPractitioner mentioned this pull request May 16, 2018

[KAFKA-6730] Simplify State Store Recovery #5013

Merged

Conversation

ConcurrencyPractitioner commented Apr 19, 2018

Uh oh!

ConcurrencyPractitioner commented Apr 19, 2018

Uh oh!

ConcurrencyPractitioner commented Apr 20, 2018

Uh oh!

mjsax commented Apr 20, 2018

Uh oh!

mjsax commented Apr 20, 2018

Uh oh!

ConcurrencyPractitioner commented Apr 20, 2018

Uh oh!

ConcurrencyPractitioner commented Apr 20, 2018

Uh oh!

ConcurrencyPractitioner commented Apr 21, 2018

Uh oh!

ConcurrencyPractitioner commented Apr 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

bbejeck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented Apr 27, 2018

Uh oh!

ConcurrencyPractitioner commented May 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented May 5, 2018

Uh oh!

ConcurrencyPractitioner commented May 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjsax commented May 7, 2018

Uh oh!

ConcurrencyPractitioner commented May 8, 2018

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented May 10, 2018

Uh oh!

ConcurrencyPractitioner commented May 11, 2018

ConcurrencyPractitioner commented May 2, 2018 •

edited

Loading

ConcurrencyPractitioner commented May 6, 2018 •

edited

Loading