Skip to content

KAFKA-7443: OffsetOutOfRangeException in restoring state store from changelog topic when start offset of local checkpoint is smaller than that of changelog topic#5946

Merged
mjsax merged 8 commits intoapache:trunkfrom
linyli001:kafka-7443
Dec 11, 2018

Conversation

@linyli001
Copy link
Copy Markdown
Contributor

See https://issues.apache.org/jira/browse/KAFKA-7443 for details.
The fix set this state partition to "NO_CHECKPOINT" when the offset in local checkpoint file has expired and older than the current start offset of changelog topic, thus making this task to restore local state from the current beginning offset of changelog topic, avoiding falling into the infinite loop caused by this exception.

…hangelog topic when start offset of local checkpoint is smaller than that of changelog topic
@mjsax
Copy link
Copy Markdown
Member

mjsax commented Nov 26, 2018

Thanks for the PR @linyli001. Can we add a unit (or integration) test for this case?

@mjsax mjsax added the streams label Nov 26, 2018
@mjsax
Copy link
Copy Markdown
Member

mjsax commented Nov 26, 2018

There is a checkstyle error:

ant:checkstyle] [ERROR] /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11@2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java:112:1: File contains tab characters (this is the first instance). [FileTabCharacter]

@vvcephei
Copy link
Copy Markdown
Contributor

The test results are gone already, for some reason...

@vvcephei
Copy link
Copy Markdown
Contributor

Retest this, please.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Dec 4, 2018

@linyli001 Can we add a test for this case?

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, but I agree with @mjsax we'll need an integration test for this.

@linyli001
Copy link
Copy Markdown
Contributor Author

@mjsax Ok, I'll add an integration test. But the scenario only triggers under certain conditions: the changelog topic deleted the expired segments and updated the new starting offset, which is greater than the offset recorded in local checkpoint file. I'm confused how to simulate this scenario in my test case, so could you give me some suggestions?

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Dec 5, 2018

@linyli001 I see your point, just off the top of my head without looking at the code, could you simulate the condition better using mocks? In that case, you'd only need a unit test.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Dec 5, 2018

I agree that it's a little tricky... A mocked unit test should be possible for sure as suggested by Bill.

For an integration test, you could create a topic with very small segment size, write data into it, and user purgeDataAPI to delete old records. Then, a checkpoint file with old offsets is written and afterwards Streams is started and we check if it recovers correctly.

Does this help? Thoughts?

@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Dec 5, 2018

I think we'd need to see the unit test to decide if it's complete enough, but it feels like enough to me...

The fundamental question is whether you trust the client to throw that exception when the offset is ... out of range. If that's the specified behavior of the client, then we don't need an integration test, we just need to make sure that Streams properly clears the offset when it gets the exception.

In other words, we should be able to test it satisfactorily with a mock.

@linyli001
Copy link
Copy Markdown
Contributor Author

Ok. I will add the test case according to @mjsax. Thanks!

@linyli001
Copy link
Copy Markdown
Contributor Author

The unit test has some problems, and I'll check and push a new commit latter.

Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test. Some nits. Overall LGTM.

Comment thread clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java Outdated
Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, @linyli001 .

Thanks!

@vvcephei
Copy link
Copy Markdown
Contributor

@mjsax , is this ready to merge?

@mjsax mjsax merged commit a94c8da into apache:trunk Dec 11, 2018
@linyli001 linyli001 deleted the kafka-7443 branch December 11, 2018 08:45
mjsax pushed a commit that referenced this pull request Dec 11, 2018
…hangelog topic when start offset of local checkpoint is smaller than that of changelog topic (#5946)

Reviewer: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>
mjsax pushed a commit that referenced this pull request Dec 11, 2018
…hangelog topic when start offset of local checkpoint is smaller than that of changelog topic (#5946)

Reviewer: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>
mjsax pushed a commit that referenced this pull request Dec 11, 2018
…hangelog topic when start offset of local checkpoint is smaller than that of changelog topic (#5946)

Reviewer: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>
@mjsax
Copy link
Copy Markdown
Member

mjsax commented Dec 11, 2018

Thanks for the PR @linyli001!

Merged to trunk and cherry-picked to 2.1, 2.0, and 1.1.

@linyli001
Copy link
Copy Markdown
Contributor Author

@mjsax I have some problems when sync with the latest change. I can see my fix has been merged into the trunk branch, but when I folk the apache/kafka to my GitHub, I can't found the latest change. Any steps am I missing?

@vvcephei
Copy link
Copy Markdown
Contributor

Hi @linyli001 ,

IIRC, you have to check out your trunk branch, then pull from apache/kafka's trunk, then push it to your fork's trunk.

Is that what you're already trying to do?

@linyli001
Copy link
Copy Markdown
Contributor Author

@vvcephei Sorry I don't quite understand how to do it. I have cloned the git from my fork to local, but run "git pull origin trunk" not includes the latest change. I think it just pull from my fork's trunk, but I don't know how to pull from apache/kafka's trunk locally? Can you show me the git command? Thanks!

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Dec 13, 2018

If you fork the repo, your copy on github won't be updated automatically. You need to pull the latest changes from kafka repo into your local one (and than you can push those back to you copy on github).

Assume you have an alias called kafka-github (you can create one with git add remote... command) and assume you are on local branch trunk:

git fetch kafka-github              // get latest changes
git rebase kafka-github trunk       // update local trunk to latest changes
git push origin trunk               // push the changes to you github copy

@linyli001
Copy link
Copy Markdown
Contributor Author

@mjsax Thanks for the guide! I can update my folk now.

pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…hangelog topic when start offset of local checkpoint is smaller than that of changelog topic (apache#5946)

Reviewer: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants