KAFKA-5520: KIP-171 - Extend Consumer Group Reset Offset for Stream Application - Updated#4159
KAFKA-5520: KIP-171 - Extend Consumer Group Reset Offset for Stream Application - Updated#4159jeqo wants to merge 104 commits intoapache:trunkfrom
Conversation
mjsax
left a comment
There was a problem hiding this comment.
Some initial comments. Did not yet have a look at the test (it seems, there is a lot of redundancy in the test though -- could we refactor to share more code?)
| } | ||
| } | ||
|
|
||
| public Long getDateTime(String ts) throws ParseException { |
There was a problem hiding this comment.
Can't this be package private? (add comment // visible for testing)
nit: ts -> timestamp
There was a problem hiding this comment.
Yes, but test is in another package/module. Maybe once it is moved to streams we can improve visibility of this method. I will add the comment.
| } | ||
|
|
||
| public Long getDateTime(String ts) throws ParseException { | ||
| if (!(ts.split("T")[1].contains("+") || ts.split("T")[1].contains("-") || ts.split("T")[1].contains("Z"))) { |
There was a problem hiding this comment.
This could throw a NPE. I think we should guard against this, and throw ParseException if NPE happens
ts.split("T") is redundant and should be extracted into a variable.
There was a problem hiding this comment.
Agree. I will reformat this.
| Date date; | ||
| try { | ||
| date = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX").parse(ts); | ||
| } catch (ParseException e) { |
| } | ||
|
|
||
| @Test | ||
| public void testDateTimeFormats() throws ParseException { |
There was a problem hiding this comment.
this should be multiple tests (one test method per scenario). Also, use better method names, eg, shouldAcceptValidDateFormats, shouldThrowOnInvalidDateFormat etc
| try { | ||
| invokeGetDateTimeMethod(new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss")); | ||
| fail("Call to getDateTime should fail"); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Not sure where the final should be. Maybe you're referring to the next comment?
| } | ||
| } | ||
|
|
||
| private void invokeGetDateTimeMethod(SimpleDateFormat format) throws ParseException { |
| final Map<TopicPartition, Long> topicPartitionAndOffset = new HashMap<>(); | ||
| for (final String line : resetPlanCsv.split("\n")) { | ||
| final String[] parts = line.split(","); | ||
| final String topic = parts[0]; |
There was a problem hiding this comment.
Should we guard against NPE? (same blow)
| return date.getTime(); | ||
| } | ||
|
|
||
| private Map<TopicPartition, Long> parseResetPlan(String resetPlanCsv) { |
| Map<TopicPartition, Long> beginningOffsets, | ||
| Map<TopicPartition, Long> endOffsets) { | ||
| Map<TopicPartition, Long> validatedTopicPartitionsOffsets = new HashMap<>(); | ||
| for (final TopicPartition topicPartition : inputTopicPartitionsAndOffset.keySet()) { |
There was a problem hiding this comment.
nit: As key and value is accessed, we might want to iterate throw the .entrySet instead of keySet
| return topicPartitionAndOffset; | ||
| } | ||
|
|
||
| private Map<TopicPartition, Long> checkOffsetRange(Map<TopicPartition, Long> inputTopicPartitionsAndOffset, |
|
FAILURE |
2 similar comments
|
FAILURE |
|
FAILURE |
|
FAILURE |
2 similar comments
|
FAILURE |
|
FAILURE |
|
FAILURE |
|
SUCCESS |
1 similar comment
|
SUCCESS |
|
FAILURE |
|
FAILURE |
|
SUCCESS |
|
@jeqo We just merged a fix for the failing tests. Can you please rebase. thx. |
…up rebalances in progress …up rebalances in progress Author: Colin P. Mccabe <cmccabe@confluent.io> Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com> Closes apache#3506 from cmccabe/KAFKA-5565
|
Thanks for the PR, looks good overall just one minor comment. The |
|
@bbejeck agree. I've fix it. Thanks for your feedback! |
|
@guozhangwang I've added |
|
@jeqo The unit tests of |
|
@guozhangwang I've removed SSL integration test, and put integration tests in one class. how does it look now? |
There was a problem hiding this comment.
@jeqo Maybe my previous comment was misleading :) I was suggesting that we do not need to add more test cases into the reset integration tests like "from offset", "from date" etc, as for integration test we just check its interaction with the cluster module all worked as expected, so having just one test for "from scratch with intermediate" and "from scratch without intermediate" is good enough. For the added functionality our added unit tests should be well covering it.
Also it seems you have removed the test with SSL config as well? Is that intentional? If not maybe we can just make this integration test a parameterized test, with ssl turned on and off (i.e. when it is turned on broker cluster will have the SSL configs to not accept unauthorized access, and the streams resetter needs the appropriate SSL configs to be able to delete those topics).
cc @mjsax
There was a problem hiding this comment.
I think I get it know: I will return tests with SSL config (I thought that was what it was already tested) and remove tests that are already validated by unit-test. There are some methods from KafkaConsumer that are not available on unit test like offsetsForTimes, I will keep those on integration test.
There was a problem hiding this comment.
Sounds good! Thanks. The reason we'd like to avoid redundant integration tests with overlapping unit tests is that they would take much more time. If we are not careful the Jenkins builds would take longer and longer to finish, and hence hinders our development pace (we have seen the Streams unit test suite taking from 1 min to about 7 min now on my laptop, and much longer on the Jenkins box).
| "--input-topics", INPUT_TOPIC | ||
| }; | ||
| } | ||
| "--intermediate-topics", INTERMEDIATE_USER_TOPIC)); |
There was a problem hiding this comment.
If we add the intermediate topic for all cases, I am wondering if this is correct.
I understand that the test is still passing and we just log an error, but should StreamsResetter not return a non-zero return value to indicate that something went wrong?
There was a problem hiding this comment.
That's a good point. Maybe we should still add a boolean parameter to indicate if we should include --intermediate-topics", INTERMEDIATE_USER_TOPIC into the config lists?
There was a problem hiding this comment.
Yes. And we should make sure that the test fails if we do it wrong. That the test passes atm, indicates an issue in the reset tool. It only returns non-zero exit code if there was an fatal exception and we crash. But we should also return non-zero if anything else went wrong. Atm, we catch, print/log and move on -- we should remember if any error occurred though and only return zero if no error occurred at all.
I would recommend to use different return codes for different errors and document it. Should be part of the public API IMHO.
And add a test to check the return codes :)
There was a problem hiding this comment.
Should this be a separate PR out of the scope of the KIP though?
There was a problem hiding this comment.
Absolutely. It's a bug and we can file a JIRA -- thus, the bug fix can also re-introduce the it-then-else for the intermediate topic here.
There was a problem hiding this comment.
@jeqo do you want to add back the boolean indicator whether intermediate topic exists and should be removed before we merge this PR?
|
Thanks for the updates. Still wondering if we should share more code for the individual test methods. Do not insist on if, but might be worth. WDYT @guozhangwang ? |
|
I looked at the |
|
@mjsax we could go back to private methods once we move |
|
I just looked into |
|
But |
|
The reason that it does not fail is because in the resetter, we use a client to subscribe to all the topics, and then populate the intermediate topics: If the intermediate topic does not exist, then the Synced with @mjsax offline, and he will file a JIRA to improve on this situation separately. |
|
@jeqo You are right -- I did look at the wrong code. My bad. Makes sense that the test passed. Did create https://issues.apache.org/jira/browse/KAFKA-6318 as follow up. This is the issue (unrelated to the KIP): |
mjsax
left a comment
There was a problem hiding this comment.
Thanks for the KIP and PR! Great addition!
|
LGTM. Merged to trunk. @jeqo Thanks for your contribution! |
| if (!dryRun) { | ||
| client.seekToEnd(intermediateTopicPartitions); | ||
|
|
||
| client.seekToEnd(intermediateTopicPartitions); |
There was a problem hiding this comment.
Should this call be governed by !dryRun
There was a problem hiding this comment.
It's fine. seekToEnd() is lazy and we guard position() with dryRun-check:
| final Set<TopicPartition> inputTopicPartitions) { | ||
| // visible for testing | ||
| public void resetOffsetsFromResetPlan(Consumer<byte[], byte[]> client, Set<TopicPartition> inputTopicPartitions, Map<TopicPartition, Long> topicPartitionsAndOffset) { | ||
| final Map<TopicPartition, Long> endOffsets = client.endOffsets(inputTopicPartitions); |
There was a problem hiding this comment.
These two calls boils down to
Fetcher#retrieveOffsetsByTimes(Map<TopicPartition, Long> timestampsToSearch ...)
I wonder if they can be combined (to save roundtrips).
There was a problem hiding this comment.
We want to get endOffsets() and beginningOffsets for the same set of partitions. A single request cannot get both at once AFAIK.
Also, the reset tool is not considered to be on the "hot code path" -- thus, we don't need to worry about performance too much and apply (unnecessary?) micro optimizations. Just my two cents here.
There was a problem hiding this comment.
Agreed that such optimization is not needed for this task.
|
@jeqo I just realized, that we should have updated the docs with this PR. Would you mind to do a follow up PR? Should go here https://kafka.apache.org/10/documentation/streams/upgrade-guide and https://kafka.apache.org/10/documentation/streams/developer-guide/app-reset-tool.html |
KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-171+-+Extend+Consumer+Group+Reset+Offset+for+Stream+Application
Merge changes from KIP-198
Ref: #3831