Skip to content

KAFKA-3932 - Consumer fails to consume in a round robin fashion#5838

Closed
chienhsingwu wants to merge 1 commit intoapache:trunkfrom
chienhsingwu:trunk
Closed

KAFKA-3932 - Consumer fails to consume in a round robin fashion#5838
chienhsingwu wants to merge 1 commit intoapache:trunkfrom
chienhsingwu:trunk

Conversation

@chienhsingwu
Copy link
Copy Markdown

@chienhsingwu chienhsingwu commented Oct 24, 2018

I think the issue is the statement in PIP: "As before, we'd keep track of which partition we left off at so that the next iteration would begin there." I think it should NOT use the last partition in the next iteration; it should pick the next one instead.

The simplest solution to impose the order to pick the next one is to use the order the consumer.internals.Fetcher receives the partition messages, as determined by completedFetches queue in that class. To avoid parsing the partition messages repeatedly. we can save those parsed fetches to a list and maintain the next partition to get messages there.

@hachikuji, @lindong28, @guozhangwang, @ijuma and others, please review.

@chienhsingwu
Copy link
Copy Markdown
Author

retest this please

@stanislavkozlovski
Copy link
Copy Markdown
Contributor

Hi @chienhsingwu, thanks a lot for the PR!

Do you mind further clarifying what the current behavior is versus the one you're introducing?
As far as I understand it, if we have max.poll.records=50 and partition A, B and C each have 100 fetched messages - poll calls will be like this:
#1 - 50 from A, #2 - 50 from A, #3 - 50 from B, #4 - 50 from B, #5 - 50 from C, #6 - 50 from C

And your changes would make it like this:
#1 - 50 from A, #2 - 50 from B, #3 - 50 from C, #4 - 50 from A, #5 - 50 from B, #6 - 50 from C

Is that correct?
If so, what will be the behavior if we have partition A, B and C each with 51 fetched messages?

@chienhsingwu
Copy link
Copy Markdown
Author

Hi @stanislavkozlovski

Yes the order in the first case would be what you described:
#1 - 50 from A, #2 - 50 from B, #3 - 50 from C, #4 - 50 from A, #5 - 50 from B, #6 - 50 from C

When each has 51 messages it will be:
#1 - 50 from A, #2 - 50 from B, #3 - 50 from C, #4 - 1 from A, 1 from B, 1 from C

@stanislavkozlovski
Copy link
Copy Markdown
Contributor

The idea sounds solid to me. I'm just wondering if this warrants a KIP and discussion in the mailing list instead of here. Let's wait for committers to take a look at this

@chienhsingwu
Copy link
Copy Markdown
Author

OK @stanislavkozlovski. I am new to this review process. Do you know who I should tap to get their attention to review this?

@stanislavkozlovski
Copy link
Copy Markdown
Contributor

You've tagged committers but Github notifications are easy to miss. That's why I recommend you use the dev mailing list even if you do not create a KIP. You're typically bound to get responses there

@chienhsingwu
Copy link
Copy Markdown
Author

Thanks for the tip. Looks like someone did reply to an email I sent to dev.

@lejtmany
Copy link
Copy Markdown

This is a problem for us as well. We need to process messages roughly in order across partitions.

Are any plans to merge this in?

Thanks!

@chienhsingwu
Copy link
Copy Markdown
Author

@lejtmany, unfortunately this did not get approved in kafka dev discussions. If you subscribe to dev@kafka.apache.org, search for the following emails.

RE: [EXTERNAL] - Re: [DISCUSS] KIP-387: Fair Message Consumption Across Partitions in KafkaConsumer

@stanislavkozlovski
Copy link
Copy Markdown
Contributor

stanislavkozlovski commented May 21, 2019

@lejtmany feel free to restart the discussion (or start another one) in the mailing list with regards to this if you think it makes sense to have

@chienhsingwu I think we can probably close this PR for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants