Skip to content

KAFKA-10867: Improved task idling#9840

Merged
vvcephei merged 10 commits intotrunkfrom
kafka-10867-improved-task-idling
Jan 28, 2021
Merged

KAFKA-10867: Improved task idling#9840
vvcephei merged 10 commits intotrunkfrom
kafka-10867-improved-task-idling

Conversation

@vvcephei
Copy link
Copy Markdown
Contributor

@vvcephei vvcephei commented Jan 7, 2021

Use the new ConsumerRecords.metadata() API to implement
improved task idling as described in KIP-695

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@vvcephei vvcephei requested a review from guozhangwang January 7, 2021 05:32
@vvcephei vvcephei force-pushed the kafka-10867-improved-task-idling branch from 03ee17e to d22143b Compare January 7, 2021 05:45
@mjsax mjsax added the kip Requires or implements a KIP label Jan 7, 2021
@vvcephei vvcephei force-pushed the kafka-10867-improved-task-idling branch from b73a533 to 7917638 Compare January 12, 2021 00:41
@vvcephei vvcephei force-pushed the kafka-10866-consumerrecords-metadata branch from 41900e1 to ddef3d1 Compare January 12, 2021 00:44
@vvcephei vvcephei force-pushed the kafka-10867-improved-task-idling branch from 7917638 to a366fc3 Compare January 12, 2021 00:45
@guozhangwang
Copy link
Copy Markdown
Contributor

Trying to understand the dependency here: should I only review this after #9836 is merged?

@vvcephei vvcephei force-pushed the kafka-10866-consumerrecords-metadata branch 2 times, most recently from 552131c to 2e46bce Compare January 20, 2021 15:52
@vvcephei vvcephei force-pushed the kafka-10867-improved-task-idling branch from a366fc3 to a48dd7f Compare January 20, 2021 19:12
@SuppressWarnings("deprecation")
public class StreamsConfig extends AbstractConfig {

public static final long MAX_TASK_IDLE_MS_DISABLED = -1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this down below to 147?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also nit, the line below should be

private static final Logger

* (i.e., it increases or stays the same over time).
*/
public class PartitionGroup {
private static final Logger LOG = LoggerFactory.getLogger(PartitionGroup.class);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more convienent to pass in the log object from AbstractTask to the PartitionGroup constructor? It is created with the logContext including the task-type / task-id.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can pass in the log context. I wouldn't pass the actual logger, though, because it would mess up common log4j usage patterns.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

bufferedPartitions,
emptyPartitions);
}
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log INFO if we are indeed enforcing processing? I.e. there are some empty partitions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on a second thought.. if users configures -1 it means they probably do not care about enforced processing, while on the other side the INFO entry may flood the logs here. So NVM.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad we agree ;)

}

@Override
public void addFetchedMetadata(final TopicPartition partition, final ConsumerRecords.Metadata metadata) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason that we need to add this function at Task seems to be tasks.activeTasksForInputPartition(partition) at TaskManager. and there's a TODO to convert its return to StreamTask anyways. So let's just move this function to StreamTask only and in TaskManager force convert the task to StreamTask. And then we can remove it from StandbyTask.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be good to avoid taking on unrelated refactoring TODOs in this PR? It seems better to me to leave it to whoever decides to pick up that TODO and instead just keep this PR focused on the feature.

final int dummyThreadIdx = 1;
this.maxPollTimeMs = new InternalConsumerConfig(config.getMainConsumerConfigs("dummyGroupId", "dummyClientId", dummyThreadIdx))
.getInt(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG);
this.maxPollTimeMs = maxPollTimeMs;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale of this refactoring?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was leftover from when I was using the poll interval as a heuristic. I'll revert it.

return lastRecordedValue;
}

@Before
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactoring!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@vvcephei vvcephei force-pushed the kafka-10867-improved-task-idling branch from a48dd7f to 47d7856 Compare January 21, 2021 21:37
@guozhangwang
Copy link
Copy Markdown
Contributor

LGTM! Once we have perf numbers quantifying its impact I think we can merge.

@vvcephei vvcephei force-pushed the kafka-10866-consumerrecords-metadata branch from 1cc0bb7 to ffbaeab Compare January 22, 2021 15:55
@vvcephei vvcephei force-pushed the kafka-10867-improved-task-idling branch from 47d7856 to 0634433 Compare January 22, 2021 15:56
// TRACE for messages that don't wait for fetches, since these may be logged at extremely high frequency
// DEBUG when we waited for a fetch and decided to wait some more, as configured
// DEBUG when we are ready for processing and didn't have to enforce processing
// TRACE for messages that don't wait for fetches
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on demoting these log entries! :)

@guozhangwang
Copy link
Copy Markdown
Contributor

I've just kicked another test. Please feel free to merge afterwards.

@vvcephei vvcephei force-pushed the kafka-10866-consumerrecords-metadata branch 2 times, most recently from bb23614 to fc3ec40 Compare January 27, 2021 20:08
@vvcephei vvcephei force-pushed the kafka-10867-improved-task-idling branch from fa1af20 to 12fb3c4 Compare January 27, 2021 21:39
@vvcephei
Copy link
Copy Markdown
Contributor Author

There was a merge conflict with trunk. Rebased and pushed.

Base automatically changed from kafka-10866-consumerrecords-metadata to trunk January 28, 2021 00:18
@vvcephei vvcephei force-pushed the kafka-10867-improved-task-idling branch from 12fb3c4 to bc1afd0 Compare January 28, 2021 00:19
John Roesler added 6 commits January 27, 2021 18:20
@vvcephei
Copy link
Copy Markdown
Contributor Author

Hmm, the Java 8 build appears to have hung after an hour and 58 minutes. It's been running for 3 hours and 30 minutes now. This is now the 16th build, and there have been multiple Java 8 successes to date, so I think it's environmental.

I'll go ahead with the merge.

@vvcephei vvcephei merged commit 4d28391 into trunk Jan 28, 2021
@vvcephei vvcephei deleted the kafka-10867-improved-task-idling branch January 28, 2021 03:57
vvcephei pushed a commit to vvcephei/kafka that referenced this pull request Feb 12, 2021
vvcephei pushed a commit that referenced this pull request Feb 12, 2021
This reverts commit 4d28391.

Closes #10119

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kip Requires or implements a KIP streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants