KAFKA-4317: Regularly checkpoint StateStore changelog offsets by dguy · Pull Request #2471 · apache/kafka

dguy · 2017-01-31T17:34:39Z

Currently the checkpoint file is deleted at state store initialization and it is only ever written again during a clean shutdown. This can result in significant delays during restarts as the entire store needs to be loaded from the changelog.
We can mitigate against this by frequently checkpointing the offsets. The checkpointing happens only during the commit phase, i.e, after we have manually flushed the store and the producer. So we guarantee that the checkpointed offsets are never greater than what has been flushed.
In the event of hard failure we can recover by reading the checkpoints and consuming from the stored offsets.

dguy · 2017-01-31T17:35:28Z

@guozhangwang @enothereska @mjsax

asfbot · 2017-01-31T17:36:31Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1356/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-01-31T17:36:35Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1352/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-01-31T17:37:32Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1352/
Test FAILed (JDK 7 and Scala 2.10).

mjsax · 2017-01-31T17:43:01Z

Tests fail because of checkstyle error.

dguy · 2017-01-31T18:07:30Z

doh! i ran the build before commit, too. Must have accidentally changed something

dguy · 2017-01-31T18:19:43Z

FYI - i need to do a KIP for the new config param

asfbot · 2017-01-31T18:58:50Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1359/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-31T19:00:24Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1355/
Test PASSed (JDK 8 and Scala 2.12).

gfodor · 2017-01-31T19:13:15Z

Just want to say thanks for implementing this!

asfbot · 2017-01-31T20:29:06Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1355/
Test PASSed (JDK 7 and Scala 2.10).

dguy · 2017-02-01T10:19:04Z

KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-116+-+Add+State+Store+Checkpoint+Interval+Configuration

dguy · 2017-02-01T17:20:58Z

I just ran the simple benchmark with checkpointing off and checkpointing set to the same as the commit interval (10 seconds). Oddly (probably just other stuff going on) the 3 runs with checkpointing on had better throughput than the three without.

Throughput without checkpointing:
20.78 MB/s, 23.51 MB/s, 23.54 MB/s

Throughput with checkpointing:
27.13 MB/s, 28.47 MB/s, 24.48 MB/s

Based on this small sample my assumption is that overhead of checkpointing is negligible. The overhead would increase as the number of stores increases, but these are tiny files.

enothereska · 2017-02-02T13:09:01Z

+
+import java.util.Map;
+
+// Interface to indicate that an object can be Check pointed


nit: checkpointed

enothereska · 2017-02-02T13:16:03Z

        this.checkpointedOffsets = new HashMap<>(checkpoint.read());
-
-        // delete the checkpoint file after finish loading its stored offsets
-        checkpoint.delete();


Can the checkpoint file grow indefinitely?

no - it is overwritten each time. See OffsetCheckpoint#write

asfbot · 2017-02-02T16:21:14Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1430/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-02-02T16:21:23Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1430/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-02-02T16:23:33Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1433/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-02-02T18:17:05Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1432/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-02-02T18:18:33Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1435/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-02-02T19:09:19Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1432/
Test PASSed (JDK 7 and Scala 2.10).

mjsax · 2017-02-03T01:26:57Z

Nit: can you please update the PR or JIRA or KIP name (they should match)

mjsax

First initial pass.

mjsax · 2017-02-03T01:37:32Z

+    private final long checkpointInterval;
+    private long lastCheckpointMs;
+
+    public Checkpointer(final Time time,


Nit: allign indention

mjsax · 2017-02-03T01:45:17Z


+    // write the checkpoint
+    @Override
+    public void checkpoint(final Map<TopicPartition, Long> ackedOffsets) {


ackedOffsets indicates that those offsets are acked already, but those offsets are going to get acked after the checkpoint was written, right?

Nah, the offsets are acked first. They need to be otherwise we risk writing a checkpoint for data that has not been acked.

Yes. Got confused about the correct order...

mjsax · 2017-02-03T01:48:20Z

+                    checkpointedOffsets.put(topicPartition, ackedOffsets.get(topicPartition) + 1);
+                } else if (restoredOffsets.containsKey(topicPartition)) {
+                    checkpointedOffsets.put(topicPartition, restoredOffsets.get(topicPartition));
+                }


what about the case that both if's are false -- can this happen? If not, we should throw an exception

Yeah it can happen. The table may not have received any updates in the period, so both would be empty

mjsax · 2017-02-03T01:50:01Z

                       Consumer<byte[], byte[]> consumer,
                       Consumer<byte[], byte[]> restoreConsumer,
                       StreamsConfig config,
-                       StreamsMetrics metrics, final StateDirectory stateDirectory) {


add final to all

mjsax · 2017-02-03T02:07:30Z

-
+            // 3) write checkpoints for any local state
+            checkpointer.checkpoint(recordCollectorOffsets());
            // 3) commit consumed offsets if it is dirty already


I like 3) 3) better ;-)

mjsax · 2017-02-03T02:11:38Z

        try {
            this.stateMgr = new ProcessorStateManager(id, partitions, restoreConsumer, isStandby, stateDirectory, topology.storeToChangelogTopic());
-
+            this.checkpointer = new Checkpointer(time, stateMgr, checkpointInterval);


remove this (we should avoid this whenever possible)

mjsax · 2017-02-03T02:12:52Z

+                 final long checkpointInterval) {
+        this.time = time;
+        this.checkpointable = checkpointable;
+        this.lastCheckpointMs = time.milliseconds();


nit: remove this and move one line down -- first initialize with parameters, than everything else

asfbot · 2017-02-03T10:39:39Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1459/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-02-09T09:54:58Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1584/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-02-09T09:58:01Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1584/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-02-14T16:29:52Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1675/
Test FAILed (JDK 7 and Scala 2.10).

dguy · 2017-02-14T16:57:11Z

I've removed the checkpoint config. So this can be committed without the KIP as there are no public API changes.

asfbot · 2017-02-14T16:58:26Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1678/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-02-14T17:39:52Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1678/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-02-14T17:40:30Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1681/
Test PASSed (JDK 8 and Scala 2.11).

mjsax

LGTM

guozhangwang · 2017-02-16T05:33:45Z

                      Time time,
                      final RecordCollector recordCollector) {
-        super(id, applicationId, partitions, topology, consumer, restoreConsumer, false, stateDirectory, cache);
+        super(id, applicationId, partitions, topology, consumer, restoreConsumer, false, stateDirectory, cache, time, config.getLong(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG));


If we are always writing the checkpointing file upon committing, do we still need this parameter? Or we can just execute the logic of Checkpointer.checkpoint() without the timing conditional?

Correct. It, and the Checkpointer aren't really needed anymore. I'll remove them

asfbot · 2017-02-16T17:21:17Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1709/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-02-16T17:22:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1712/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-02-16T17:27:21Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1709/
Test PASSed (JDK 7 and Scala 2.10).

guozhangwang · 2017-02-16T21:55:25Z

+
+import java.util.Map;
+
+public class Checkpointer {


As discussed before, this class can be removed?

i thought i did. Doh!

guozhangwang · 2017-02-16T21:59:39Z

+
+import java.util.Map;
+
+// Interface to indicate that an object can be checkpointed


nit: an object -> an object has associated partition offsets that can be ...

guozhangwang · 2017-02-16T21:59:54Z

 import java.util.Map;

-interface StateManager {
+interface StateManager extends Checkpointable {


Should we move function checkpointedOffsets() into Checkpointable as well, and maybe rename to checkpointed as well?

guozhangwang · 2017-02-16T22:00:56Z

                                                                          streamsMetrics,
                                                                          cache),
-                                                                  stateMgr),
+                                                                  stateMgr


Is this intentional? Or did you just want to align line 163 with previous lines?

guozhangwang · 2017-02-16T22:14:24Z

+                       final StreamsConfig config,
+                       final StreamsMetrics metrics,
+                       final StateDirectory stateDirectory,
+                       final Time time) {


Where is this param used?

Thought i removed that, too. Obviously i was sleep coding this morning

guozhangwang · 2017-02-16T22:15:38Z

        log.debug("standby-task [{}] Committing its state", id());
        stateMgr.flush(processorContext);
-
+        stateMgr.checkpoint(Collections.<TopicPartition, Long>emptyMap());


Is this correct, to always write empty map (is it interpreted as offset 0)?

yes it is correct as for standby task there are no ackedOffsets. It uses the restoredOffsets in ProcessorStateManager

Yes, as there are no ackedOffsets for standby tasks. The stateMgr will use the restored offsets when checkpointing.

guozhangwang · 2017-02-16T22:16:48Z

 public class GlobalStateTaskTest {

+    private final MockTime time = new MockTime(0);
+    private final int checkpointInterval = 10;


Do we still need this?

guozhangwang · 2017-02-16T22:19:33Z

                setProperty(StreamsConfig.BUFFERED_RECORDS_PER_PARTITION_CONFIG, "3");
                setProperty(StreamsConfig.STATE_DIR_CONFIG, baseDir.getCanonicalPath());
                setProperty(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, MockTimestampExtractor.class.getName());
+                setProperty(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "1");


Since we are using MockTime, we do not need to enforce the internal to be very small but still can run it as fast as we want, right?

dguy · 2017-02-16T23:12:44Z

@guozhangwang thanks. I've tidied it up (for real this time!) I apologize for my terrible attempt this morning.

guozhangwang · 2017-02-16T23:44:15Z

Set up a streams system test on https://jenkins.confluent.io/job/kafka-streams-system-test-pr/1/

asfbot · 2017-02-16T23:56:45Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1725/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-02-17T00:26:36Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1722/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-02-17T02:12:41Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1722/
Test FAILed (JDK 7 and Scala 2.10).

guozhangwang

Merged to trunk.

Currently the checkpoint file is deleted at state store initialization and it is only ever written again during a clean shutdown. This can result in significant delays during restarts as the entire store needs to be loaded from the changelog. We can mitigate against this by frequently checkpointing the offsets. The checkpointing happens only during the commit phase, i.e, after we have manually flushed the store and the producer. So we guarantee that the checkpointed offsets are never greater than what has been flushed. In the event of hard failure we can recover by reading the checkpoints and consuming from the stored offsets. Author: Damian Guy <damian.guy@gmail.com> Reviewers: Eno Thereska, Matthias J. Sax, Guozhang Wang Closes apache#2471 from dguy/kafka-4317

This is a backport of #2471 Author: Damian Guy <damian.guy@gmail.com> Reviewers: Guozhang Wang <wangguoz@gmail.com> Closes #3024 from dguy/k4881-bp

checkpoint statestores

6743dc6

fix checkstyle

291ff37

enothereska reviewed Feb 2, 2017

View reviewed changes

fix comment

c908a0d

merge trunk

d0cf6d8

mjsax reviewed Feb 3, 2017

View reviewed changes

dguy changed the title ~~KAFKA-4317: Checkpoint State Stores on commit/flush~~ KAFKA-4317: Add State Store Checkpoint Interval Configuration Feb 3, 2017

feedback

f7553a3

Merge branch 'trunk' into kafka-4317

6959156

remove checkpoint interval config

05e08c1

dguy changed the title ~~KAFKA-4317: Add State Store Checkpoint Interval Configuration~~ KAFKA-4317: Regularly checkpoint StateStore changelog offsets Feb 15, 2017

mjsax approved these changes Feb 16, 2017

View reviewed changes

guozhangwang reviewed Feb 16, 2017

View reviewed changes

remove Checpointer as it is no longer needed

90ca3e8

guozhangwang reviewed Feb 16, 2017

View reviewed changes

address comments

dfe52c6

guozhangwang approved these changes Feb 17, 2017

View reviewed changes

asfgit closed this in 1f1e794 Feb 17, 2017

dguy deleted the kafka-4317 branch March 30, 2017 11:10

This was referenced May 11, 2017

KAFKA-4317: Checkpoint StateStores on commit interval #3023

Closed

KAFKA-4317: Checkpoint state stores on commit interval #3024

Closed

asfgit pushed a commit that referenced this pull request May 12, 2017

KAFKA-4317: Checkpoint state stores on commit interval

9eb0cdb

This is a backport of #2471 Author: Damian Guy <damian.guy@gmail.com> Reviewers: Guozhang Wang <wangguoz@gmail.com> Closes #3024 from dguy/k4881-bp


		import java.util.Map;

		// Interface to indicate that an object can be Check pointed

Conversation

dguy commented Jan 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dguy commented Jan 31, 2017

Uh oh!

asfbot commented Jan 31, 2017

Uh oh!

asfbot commented Jan 31, 2017

Uh oh!

asfbot commented Jan 31, 2017

Uh oh!

mjsax commented Jan 31, 2017

Uh oh!

dguy commented Jan 31, 2017

Uh oh!

dguy commented Jan 31, 2017

Uh oh!

asfbot commented Jan 31, 2017

Uh oh!

asfbot commented Jan 31, 2017

Uh oh!

gfodor commented Jan 31, 2017

Uh oh!

asfbot commented Jan 31, 2017

Uh oh!

dguy commented Feb 1, 2017

Uh oh!

dguy commented Feb 1, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asfbot commented Feb 2, 2017

Uh oh!

asfbot commented Feb 2, 2017

Uh oh!

asfbot commented Feb 2, 2017

Uh oh!

asfbot commented Feb 2, 2017

Uh oh!

asfbot commented Feb 2, 2017

Uh oh!

asfbot commented Feb 2, 2017

Uh oh!

mjsax commented Feb 3, 2017

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asfbot commented Feb 3, 2017

Uh oh!

asfbot commented Feb 9, 2017

Uh oh!

asfbot commented Feb 9, 2017

dguy commented Jan 31, 2017 •

edited

Loading