Skip to content

MINOR: unify calls to get committed offsets and metadata#7463

Merged
mjsax merged 2 commits intoapache:trunkfrom
mjsax:minor-improve-consumer-committed
Oct 15, 2019
Merged

MINOR: unify calls to get committed offsets and metadata#7463
mjsax merged 2 commits intoapache:trunkfrom
mjsax:minor-improve-consumer-committed

Conversation

@mjsax
Copy link
Copy Markdown
Member

@mjsax mjsax commented Oct 8, 2019

This is a follow up to #7304 and #6694 to unify the calls to consumer.committed() further and only call it once during startup instead of twice.

@mjsax mjsax added the streams label Oct 8, 2019
return stateMgr.changelogPartitions();
}

Map<TopicPartition, Long> committedOffsetForPartitions(final Set<TopicPartition> partitions) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved into StandbyTask as we don't share it any longer for both types of tasks.

}

private boolean closeUnclean(final T task) {
private void closeUnclean(final T task) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side cleanup: return type is never used.

void transitionToRunning(final T task) {
log.debug("Transitioning {} {} to running", taskTypeName, task.id());
running.put(task.id(), task);
task.initializeTaskTime();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we remove the method entirely and init task time in the constructor of StreamTask (note, that we don't need to init task time for StandbyTasks anyway)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in now done implicitly with resume()

}
}

private void initializeCommittedOffsets(final Map<TopicPartition, OffsetAndMetadata> offsetsAndMetadata) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mix of "old" initializeStateStores (the part to filter for source topic partitions that are use as changelogs) and "old" AbstractTaks#committedOffsetForPartitions)

stateMgr.putOffsetLimits(committedOffsetsForChangelogs);
}

private void initializeTaskTime(final Map<TopicPartition, OffsetAndMetadata> offsetsAndMetadata) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not private. Same implementation as before, however, we get offsetsAndMetadata passed in, and don't use the consumer to get them from the brokers.



@Override
public boolean initializeStateStores() {
Copy link
Copy Markdown
Member Author

@mjsax mjsax Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just simplified -- moved the offset initialization into the constructor.

final LinkedHashMap<MetricName, Metric> result = new LinkedHashMap<>();
result.putAll(adminClientMetrics);
return result;
return new LinkedHashMap<>(adminClientMetrics);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side cleanup

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a LinkedHashMap here? Particularly because the interface doesn't suggest it needs anything more than a Map implementation? Actually I'm also wondering why we need to copy at all since we get back a read only view of the map from Admin and it appears the only consumer of adminClientMetrics immediately copies the results anyway.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the LinkedHashMap -- maybe @guozhangwang knows more?

For why to copy: it's not a public contract that we get a read-only map and hence we need to guard the map from writes via a copy. Not sure if we could rely on an AdminClient implementation detail... \cc @cmccabe ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the only use case I found above this code is to immediately copy it into yet another map (which is why the LinkedHashMap also doesn't make sense), so it should not matter, unless we're expecting to do something else with it down the road. Anyway, per our discussion I'm OK with this going in without fixing it, but it is low hanging fruit to reduce unnecessary work.

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Oct 8, 2019

Call for review @guozhangwang @cadonna @cpettitt-confluent

Copy link
Copy Markdown
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjsax Thanks for the PR

Here my comments

public void initializeTaskTime() {
//no-op
}
public void initializeTopology() {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also move this only to the StreamTask? Doesn't have to be in this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, because we call initializeTopology() within AssignedTasks#transitToRunning(T task) that does not know the type (it generic T extends Tasks) -- hence, we would need to do an instanceof call within that method what is undesired.

Copy link
Copy Markdown
Member

@cadonna cadonna Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I did not think about that dependency. Nevertheless I think somehow it would be possible to move initializeTopology() to StreamsTask without an instanceof but that needs a bit more thoughts and refactoring and it is out-of-scope for this PR.

Comment on lines +258 to +259
initializeCommittedOffsets(offsetsAndMetadata);
initializeTaskTime(offsetsAndMetadata);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently it is not completely save to call instance methods from the constructor since the object and its member variables are not completely initialised yet. Static methods would be fine, though.

I do not think that initializeCommittedOffsets() is currently problematic since it only accesses member variables from AbstractTask which should be already initialised. However, initializeTaskTime() accesses partitionGroup which is created in the constructor. Since we cannot know what reorderings a compiler does, it would be better to specify initializeTaskTime() as static and pass in partitionGroup. To make initializeCommittedOffsets() future-proof, it would also be good to also transform it to a static method.

Copy link
Copy Markdown
Contributor

@cpettitt-confluent cpettitt-confluent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a pass on the PR, lgtm modulo @cadonna 's comments.

*/
AbstractTask(final TaskId id,
final Collection<TopicPartition> partitions,
final Set<TopicPartition> partitions,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reducing the type generality here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In StreamTask constructor we call consumer.committed(...) that takes a Set<TopicPartitions> -- hence, it's easier to pass in a Set from the beginning on. Second, thinking about it, a Collection is actually too generic anyway -- we would not want to assign a List of partitions that may contain duplicates IMHO.

try {
if (!entry.getValue().initializeStateStores()) {
final T task = entry.getValue();
task.initializeMetadata();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new -- we need to initialize the task after we created it.

}

@Override
public void initializeMetadata() {}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new method

throw new ProcessorStateException(format("%sError while deleting the checkpoint file", logPrefix), e);
}
}
initializeMetadata();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also new -- we implicitly re-initialize the task on resume() -- was done "outside" before

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to re-initialize committed offsets and task time during Resume?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because in StreamTask#suspend() we call closeTopology() that calls partitionGroup.clear() that resets all times to UNKNOWN.

static ProcessorTopology withRepartitionTopics(final List<ProcessorNode> processorNodes,
final Map<String, SourceNode> sourcesByTopic,
final Set<String> repartitionTopics) {
private static ProcessorTopology withRepartitionTopics(final List<ProcessorNode> processorNodes,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side cleanup

final ProcessorTopology topology = withSources(
asList(source1, source2, processorStreamTime, processorSystemTime),
mkMap(mkEntry(topic1, (SourceNode) source1), mkEntry(topic2, (SourceNode) source2))
mkMap(mkEntry(topic1, source1), mkEntry(topic2, source2))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side cleanup

assertThat(expectedError, is(singletonList("ERROR")));
}

@SuppressWarnings("unchecked")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side cleanup

throw new KafkaException("KABOOM!");
}
task.punctuate(processorStreamTime, 1, PunctuationType.STREAM_TIME, timestamp -> {
throw new KafkaException("KABOOM!");
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side cleanup

}

@Test
public void shouldInitTaskTimeOnResumeWithEosDisabled() {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new test

public void shouldThrowProcessorStateExceptionOnInitializeOffsetsWhenAuthorizationException() {
final Consumer<byte[], byte[]> consumer = mockConsumerWithCommittedException(new AuthorizationException("message"));
final StreamTask task = createOptimizedStatefulTask(createConfig(false), consumer);
task.initializeMetadata();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we moved initializing the offsets from initializeStateStores into initializeMetadata()


final ProcessorTopology topology = ProcessorTopologyFactories.with(
asList(source1),
singletonList(source1),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side cleanup

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Oct 10, 2019

Updated this PR. Call for review.

As an afterthought, I realized that we should not init the offsets and task time within the constructor of StreamTask because we should not have a blocking call (consumer.committed()) during rebalance phase (note, that tasks are created in onPartitionAssigned() callback).

Hence, I refactored the code and added a new unified init method to get the offsets. This also addressed @cadonna concerns about the initialization order within the constructor.

@guozhangwang
Copy link
Copy Markdown
Contributor

@cadonna @cpettitt-confluent could you guys take another look?

Copy link
Copy Markdown
Contributor

@cpettitt-confluent cpettitt-confluent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment above on one of the side cleanups where I think we can go a bit further. Otherwise LGTM.

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Oct 14, 2019

Java 11 / 2.12 passed. Others failed -- test result not available any longer.

Retest this please.

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Oct 15, 2019

Java 11 / 2.13: kafka.log.LogCleanerIntegrationTest.testIsThreadFailed
Java 11 / 2.12: org.apache.kafka.streams.integration.SmokeTestDriverIntegrationTest.shouldWorkWithRebalance
Java 8 passed.

@mjsax mjsax merged commit c55277c into apache:trunk Oct 15, 2019
mjsax added a commit that referenced this pull request Oct 15, 2019
Reviewers: Chris Pettitt <cpettitt@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confluent.io>
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Oct 15, 2019

Merged to trunk and cherry-picked to 2.4.

@mjsax mjsax deleted the minor-improve-consumer-committed branch October 15, 2019 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants