Skip to content

KAFKA-4317: Regularly checkpoint StateStore changelog offsets#2471

Closed
dguy wants to merge 13 commits intoapache:trunkfrom
dguy:kafka-4317
Closed

KAFKA-4317: Regularly checkpoint StateStore changelog offsets#2471
dguy wants to merge 13 commits intoapache:trunkfrom
dguy:kafka-4317

Conversation

@dguy
Copy link
Copy Markdown
Contributor

@dguy dguy commented Jan 31, 2017

Currently the checkpoint file is deleted at state store initialization and it is only ever written again during a clean shutdown. This can result in significant delays during restarts as the entire store needs to be loaded from the changelog.
We can mitigate against this by frequently checkpointing the offsets. The checkpointing happens only during the commit phase, i.e, after we have manually flushed the store and the producer. So we guarantee that the checkpointed offsets are never greater than what has been flushed.
In the event of hard failure we can recover by reading the checkpoints and consuming from the stored offsets.

@dguy
Copy link
Copy Markdown
Contributor Author

dguy commented Jan 31, 2017

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1356/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1352/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1352/
Test FAILed (JDK 7 and Scala 2.10).

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jan 31, 2017

Tests fail because of checkstyle error.

@dguy
Copy link
Copy Markdown
Contributor Author

dguy commented Jan 31, 2017

doh! i ran the build before commit, too. Must have accidentally changed something

@dguy
Copy link
Copy Markdown
Contributor Author

dguy commented Jan 31, 2017

FYI - i need to do a KIP for the new config param

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1359/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1355/
Test PASSed (JDK 8 and Scala 2.12).

@gfodor
Copy link
Copy Markdown
Contributor

gfodor commented Jan 31, 2017

Just want to say thanks for implementing this!

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1355/
Test PASSed (JDK 7 and Scala 2.10).

@dguy
Copy link
Copy Markdown
Contributor Author

dguy commented Feb 1, 2017

KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-116+-+Add+State+Store+Checkpoint+Interval+Configuration

@dguy
Copy link
Copy Markdown
Contributor Author

dguy commented Feb 1, 2017

I just ran the simple benchmark with checkpointing off and checkpointing set to the same as the commit interval (10 seconds). Oddly (probably just other stuff going on) the 3 runs with checkpointing on had better throughput than the three without.

Throughput without checkpointing:
20.78 MB/s, 23.51 MB/s, 23.54 MB/s

Throughput with checkpointing:
27.13 MB/s, 28.47 MB/s, 24.48 MB/s

Based on this small sample my assumption is that overhead of checkpointing is negligible. The overhead would increase as the number of stores increases, but these are tiny files.


import java.util.Map;

// Interface to indicate that an object can be Check pointed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: checkpointed

this.checkpointedOffsets = new HashMap<>(checkpoint.read());

// delete the checkpoint file after finish loading its stored offsets
checkpoint.delete();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the checkpoint file grow indefinitely?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - it is overwritten each time. See OffsetCheckpoint#write

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1430/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1430/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1433/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1432/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1435/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1432/
Test PASSed (JDK 7 and Scala 2.10).

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Feb 3, 2017

Nit: can you please update the PR or JIRA or KIP name (they should match)

Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First initial pass.

private final long checkpointInterval;
private long lastCheckpointMs;

public Checkpointer(final Time time,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: allign indention


// write the checkpoint
@Override
public void checkpoint(final Map<TopicPartition, Long> ackedOffsets) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ackedOffsets indicates that those offsets are acked already, but those offsets are going to get acked after the checkpoint was written, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, the offsets are acked first. They need to be otherwise we risk writing a checkpoint for data that has not been acked.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Got confused about the correct order...

checkpointedOffsets.put(topicPartition, ackedOffsets.get(topicPartition) + 1);
} else if (restoredOffsets.containsKey(topicPartition)) {
checkpointedOffsets.put(topicPartition, restoredOffsets.get(topicPartition));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the case that both if's are false -- can this happen? If not, we should throw an exception

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it can happen. The table may not have received any updates in the period, so both would be empty

Consumer<byte[], byte[]> consumer,
Consumer<byte[], byte[]> restoreConsumer,
StreamsConfig config,
StreamsMetrics metrics, final StateDirectory stateDirectory) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add final to all


// 3) write checkpoints for any local state
checkpointer.checkpoint(recordCollectorOffsets());
// 3) commit consumed offsets if it is dirty already
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> 4)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like 3) 3) better ;-)

try {
this.stateMgr = new ProcessorStateManager(id, partitions, restoreConsumer, isStandby, stateDirectory, topology.storeToChangelogTopic());

this.checkpointer = new Checkpointer(time, stateMgr, checkpointInterval);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this (we should avoid this whenever possible)

final long checkpointInterval) {
this.time = time;
this.checkpointable = checkpointable;
this.lastCheckpointMs = time.milliseconds();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this and move one line down -- first initialize with parameters, than everything else

@dguy dguy changed the title KAFKA-4317: Checkpoint State Stores on commit/flush KAFKA-4317: Add State Store Checkpoint Interval Configuration Feb 3, 2017
@asfbot
Copy link
Copy Markdown

asfbot commented Feb 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1459/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1584/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1584/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1675/
Test FAILed (JDK 7 and Scala 2.10).

@dguy
Copy link
Copy Markdown
Contributor Author

dguy commented Feb 14, 2017

I've removed the checkpoint config. So this can be committed without the KIP as there are no public API changes.

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1678/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1678/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1681/
Test PASSed (JDK 8 and Scala 2.11).

@dguy dguy changed the title KAFKA-4317: Add State Store Checkpoint Interval Configuration KAFKA-4317: Regularly checkpoint StateStore changelog offsets Feb 15, 2017
Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Time time,
final RecordCollector recordCollector) {
super(id, applicationId, partitions, topology, consumer, restoreConsumer, false, stateDirectory, cache);
super(id, applicationId, partitions, topology, consumer, restoreConsumer, false, stateDirectory, cache, time, config.getLong(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are always writing the checkpointing file upon committing, do we still need this parameter? Or we can just execute the logic of Checkpointer.checkpoint() without the timing conditional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It, and the Checkpointer aren't really needed anymore. I'll remove them

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1709/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1712/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1709/
Test PASSed (JDK 7 and Scala 2.10).


import java.util.Map;

public class Checkpointer {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed before, this class can be removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought i did. Doh!


import java.util.Map;

// Interface to indicate that an object can be checkpointed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: an object -> an object has associated partition offsets that can be ...

import java.util.Map;

interface StateManager {
interface StateManager extends Checkpointable {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move function checkpointedOffsets() into Checkpointable as well, and maybe rename to checkpointed as well?

streamsMetrics,
cache),
stateMgr),
stateMgr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional? Or did you just want to align line 163 with previous lines?

final StreamsConfig config,
final StreamsMetrics metrics,
final StateDirectory stateDirectory,
final Time time) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this param used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought i removed that, too. Obviously i was sleep coding this morning

log.debug("standby-task [{}] Committing its state", id());
stateMgr.flush(processorContext);

stateMgr.checkpoint(Collections.<TopicPartition, Long>emptyMap());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct, to always write empty map (is it interpreted as offset 0)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is correct as for standby task there are no ackedOffsets. It uses the restoredOffsets in ProcessorStateManager

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as there are no ackedOffsets for standby tasks. The stateMgr will use the restored offsets when checkpointing.

public class GlobalStateTaskTest {

private final MockTime time = new MockTime(0);
private final int checkpointInterval = 10;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope

setProperty(StreamsConfig.BUFFERED_RECORDS_PER_PARTITION_CONFIG, "3");
setProperty(StreamsConfig.STATE_DIR_CONFIG, baseDir.getCanonicalPath());
setProperty(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, MockTimestampExtractor.class.getName());
setProperty(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "1");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using MockTime, we do not need to enforce the internal to be very small but still can run it as fast as we want, right?

@dguy
Copy link
Copy Markdown
Contributor Author

dguy commented Feb 16, 2017

@guozhangwang thanks. I've tidied it up (for real this time!) I apologize for my terrible attempt this morning.

@guozhangwang
Copy link
Copy Markdown
Contributor

Set up a streams system test on https://jenkins.confluent.io/job/kafka-streams-system-test-pr/1/

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 16, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1725/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 17, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1722/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 17, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1722/
Test FAILed (JDK 7 and Scala 2.10).

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged to trunk.

@asfgit asfgit closed this in 1f1e794 Feb 17, 2017
hachikuji pushed a commit to confluentinc/kafka that referenced this pull request Feb 23, 2017
Currently the checkpoint file is deleted at state store initialization and it is only ever written again during a clean shutdown. This can result in significant delays during restarts as the entire store needs to be loaded from the changelog.
We can mitigate against this by frequently checkpointing the offsets. The checkpointing happens only during the commit phase, i.e, after we have manually flushed the store and the producer. So we guarantee that the checkpointed offsets are never greater than what has been flushed.
In the event of hard failure we can recover by reading the checkpoints and consuming from the stored offsets.

Author: Damian Guy <damian.guy@gmail.com>

Reviewers: Eno Thereska, Matthias J. Sax, Guozhang Wang

Closes apache#2471 from dguy/kafka-4317
@dguy dguy deleted the kafka-4317 branch March 30, 2017 11:10
asfgit pushed a commit that referenced this pull request May 12, 2017
This is a backport of #2471

Author: Damian Guy <damian.guy@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #3024 from dguy/k4881-bp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants