KAFKA-7515: Trogdor - Add Consumer Group Benchmark Specification#5810
KAFKA-7515: Trogdor - Add Consumer Group Benchmark Specification#5810cmccabe merged 7 commits intoapache:trunkfrom stanislavkozlovski:trogdor-consumer-group-bench-spec
Conversation
|
Thanks, @stanislavkozlovski It seems like the only difference between the version with a group and the version without is that we call subscribe. So we don’t need a new class, right, just a new configuration option for ConsumeBench, I think. |
| executor.submit(new ConsumeMessages(partitions)); | ||
|
|
||
| AbstractConsumeMessages consumeMessagesTask; | ||
| if (spec.consumerGroup() == null) { |
There was a problem hiding this comment.
We don't use null entries in JSON, because it gets too confusing. You should check against empty string here.
There was a problem hiding this comment.
oh yeah, I could just set it to an empty string when it's null in the spec - way better
There was a problem hiding this comment.
Using empty string for null is more confusing no?
There was a problem hiding this comment.
There's a pattern for all of the Trogdor JSON code where we don't use null anywhere. The problem with null is it gets annoying to check each collection for empty vs. null, each string for empty vs. null, etc. etc.
null is also handled kind of inconsistently in Jackson. Sometimes Jackson will serialize a field that is null as "foo": null whereas sometimes it will just omit the field. (I think that "foo": null is actually not conforming JSON, by the way...) There are probably ways to configure all this, but null doesn't really provide any value 99% of the time, so it's simpler to just treat empty as null.
There was a problem hiding this comment.
We should be consistent, I agree. I don't agree with the value part. It's like saying that we should use empty String to represent the absence of a value in Java. Something like Optional is better and maps to null in JSON. Anyway, a discussion for a different venue. :)
|
|
||
| AbstractConsumeMessages consumeMessagesTask; | ||
| if (spec.consumerGroup() == null) { | ||
| spec.consumerGroup(DEFAULT_CONSUMER_GROUP); |
There was a problem hiding this comment.
We should use a randomly generated (and hopefully unique!) consumer group here so that we don't conflict with other people running a test.
| this.commonClientConf = configOrEmptyMap(commonClientConf); | ||
| this.adminClientConf = configOrEmptyMap(adminClientConf); | ||
| this.activeTopics = activeTopics == null ? TopicsSpec.EMPTY : activeTopics.immutableCopy(); | ||
| this.consumerGroup = consumerGroup; |
There was a problem hiding this comment.
Should be consumerGroup == null ? "" : consumerGroup to match the other entries. We don't use nulls in JSON
| Properties consumerProperties; | ||
|
|
||
| ConsumeMessages(Collection<TopicPartition> topicPartitions) { | ||
| AbstractConsumeMessages(Map<String, List<TopicPartition>> topicPartitionsByTopic) { |
There was a problem hiding this comment.
Seems like we don't really need inheritance here. Can just have an "if" statement that checks if we have a group or not
| .flatMap(List::stream).collect(Collectors.toList()); | ||
| KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>( | ||
| consumerProperties, new ByteArrayDeserializer(), new ByteArrayDeserializer()); | ||
| consumer.assign(topicPartitions); |
There was a problem hiding this comment.
We shouldn't always use assign here. If the developer has not specified any partitions, we can use the partitions of the group itself.
|
Thanks, @stanislavkozlovski. So there are basically three cases here:
In case 1, we want the group ID to be randomly assigned and not to conflict with other group IDs in the cluster. Otherwise we may impact concurrently running tests. In case 2, we want the group ID to be set. The client will ask the group for a partition assignment, rather than creating one manually. This is the most common way to use groups. In case 3, we will set the group ID, but use We can assume that we're in case 1 when we have an empty group ID. We can assume that we're in case 2 when we have a non-empty group ID, but not any partition data. After all, it doesn't make sense to consume nothing! Otherwise we're in case 3. |
|
Thanks for the review @cmccabe, Case 1. Agree, it makes sense to improve the current behavior and assign a random group when one isn't specified Case 2. This also makes sense to me - we should support a consumer to "bootstrap" itself onto an existing consumer group Case 3. I see the use case of saving offsets in that way, but I'm concerned we don't have a way to create a new consumer group from this tool. How would we go on about creating a new consumer group that it subscribed to some topics? |
If you join a consumer group that doesn't already exist, then it is created.
That is a good point. Perhaps we should have some way of support a use-case 4: create a group and subscribe (not assign) some partitions. One way of doing this would be adding a new configuration like |
Exactly, but if you are creating that and haven't populated the I think supporting use-case 4 is very important and as you suggested, should be the default behavior. I'm thinking of adding a Now I think this could have some impact on existing users as they would change from using Here's how I envision the configs to work: |
|
After a bit of investigation, it seems like we cannot simply call |
This ConsumeBenchWorker now supports three cases: 1. When we want to manually assign partitions and use a random, new consumer group. (useGroupPartitionAssignment=false, consumerGroup is undefined) - KafkaConsumer#assign() 2. When we want to have dynamic partition assignment via an existing consumer group's (useGroupPartitionAssignment=false, consumerGroup is specified) - KafkaConsumer#subscribe() 3. When we want to manually assign partitions but track offsets via an existing consumer group (useGroupPartitionAssignment=true, consumerGroup is specified) - KafkaConsumer#assign() Adds one new field to the ConsumeBenchSpec - "useGRoupPartitionAssignment"
|
@cmccabe I've updated the PR to support cases 1, 3 and 4. Let me know if I'm on the right track and I'll make sure to update the tests/docs as well |
| topics, consumerGroup); | ||
| consumer.subscribe(topics); | ||
| } | ||
| else { |
There was a problem hiding this comment.
whoops, will change to be on the same line as closing bracket
| private final Map<String, String> commonClientConf; | ||
| private final TopicsSpec activeTopics; | ||
| private final List<String> activeTopics; | ||
| private Map<String, List<TopicPartition>> materializedTopics; |
| private final TopicsSpec activeTopics; | ||
| private final List<String> activeTopics; | ||
| private Map<String, List<TopicPartition>> materializedTopics; | ||
| private boolean useGroupPartitionAssignment; |
| private final List<String> activeTopics; | ||
| private Map<String, List<TopicPartition>> materializedTopics; | ||
| private boolean useGroupPartitionAssignment; | ||
| private String consumerGroup; |
| } | ||
|
|
||
| private String generateConsumerGroup() { | ||
| return "consumer-group-" + UUID.randomUUID().toString(); |
There was a problem hiding this comment.
perhaps "consume-bench-" + UUID... so that it's clear that Trogdor created it?
|
Thanks, @stanislavkozlovski , this looks good. I think we're getting close. With regard to the topics / partitions specification. The current approach in the PR, if I understand correctly, would require me to specify foo[0][0-1] if I wanted partitions foo0:0, foo0:1. That seems awkward. In general I don't think that we should conflate globs with partition numbers. What I mean is that we should allow people to specify partitions by number if there are zero, one, or two globs in the string. I think we can do this by treating : as a partition number specifier. So for example Supporting multiple globs in strings should be pretty simple too-- we just keep calling expand on the same set of strings until the set of strings doesn't change. |
|
Thanks for the review @cmccabe. I like this notation better. |
|
+1. Thanks, @stanislavkozlovski |
|
Java 11 build timed out The Java 8 build passed fine. Maybe we could merge this? |
|
Build timeout in jdk11 is unrelated. Will merge. Thanks, @stanislavkozlovski |
…_group_partitions_should_raise (#6015) This is the error message we're after: "You may not specify an explicit partition assignment when using multiple consumers in the same group." We apparently changed it midway through #5810 and forgot to update the test. Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>
…che#5810) This ConsumeBenchWorker now supports using consumer groups. The groups may be either used to store offsets, or as subscriptions.
…_group_partitions_should_raise (apache#6015) This is the error message we're after: "You may not specify an explicit partition assignment when using multiple consumers in the same group." We apparently changed it midway through apache#5810 and forgot to update the test. Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>
https://issues.apache.org/jira/browse/KAFKA-7515
Changes
consumerGroupfield toConsumeBenchSpec.activeTopicsfield format inConsumeBenchSpec.activeTopicsis a list of strings and now supports three notations for each valueThis ConsumeBenchWorker now supports three cases: