Skip to content

KAFKA-8653; Default rebalance timeout to session timeout for JoinGroup v0#7072

Merged
ijuma merged 1 commit intoapache:trunkfrom
hachikuji:KAFKA-8653
Jul 11, 2019
Merged

KAFKA-8653; Default rebalance timeout to session timeout for JoinGroup v0#7072
ijuma merged 1 commit intoapache:trunkfrom
hachikuji:KAFKA-8653

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji commented Jul 11, 2019

The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3, we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal to the session timeout. We lost this logic when we converted the API to use the generated schema definition (#6419) which uses the default value of -1. The impact of this is that the group rebalance timeout becomes 0, so rebalances finish immediately after we enter the PrepareRebalance state and kick out all old members. This causes consumer groups to enter an endless rebalance loop. This patch restores the old behavior.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown

@abbccdda abbccdda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, LGTM.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 11, 2019

Reopening to see if it triggers a PR build.

@ijuma ijuma closed this Jul 11, 2019
@ijuma ijuma reopened this Jul 11, 2019
@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 11, 2019

No test failures although a kill command stopped the job before it completed. I'll go ahead and merge to trunk and 2.3.

@ijuma ijuma merged commit ebb80f5 into apache:trunk Jul 11, 2019
ijuma pushed a commit that referenced this pull request Jul 11, 2019
…p v0 (#7072)

The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>
ijuma pushed a commit to confluentinc/kafka that referenced this pull request Jul 11, 2019
…p v0 (apache#7072)

The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (apache#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>
ijuma added a commit to confluentinc/kafka that referenced this pull request Jul 20, 2019
* apache-github/2.3:
  MINOR: Update documentation for enabling optimizations (apache#7099)
  MINOR: Remove stale streams producer retry default docs. (apache#6844)
  KAFKA-8635; Skip client poll in Sender loop when no request is sent (apache#7085)
  KAFKA-8615: Change to track partition time breaks TimestampExtractor (apache#7054)
  KAFKA-8670; Fix exception for kafka-topics.sh --describe without --topic mentioned (apache#7094)
  KAFKA-8602: Separate PR for 2.3 branch (apache#7092)
  KAFKA-8530; Check for topic authorization errors in OffsetFetch response (apache#6928)
  KAFKA-8662; Fix producer metadata error handling and consumer manual assignment (apache#7086)
  KAFKA-8637: WriteBatch objects leak off-heap memory (apache#7050)
  KAFKA-8620: fix NPE due to race condition during shutdown while rebalancing (apache#7021)
  HOT FIX: close RocksDB objects in correct order (apache#7076)
  KAFKA-7157: Fix handling of nulls in TimestampConverter (apache#7070)
  KAFKA-6605: Fix NPE in Flatten when optional Struct is null (apache#5705)
  Fixes apache#8198 KStreams testing docs use non-existent method pipe (apache#6678)
  KAFKA-5998: fix checkpointableOffsets handling (apache#7030)
  KAFKA-8653; Default rebalance timeout to session timeout for JoinGroup v0 (apache#7072)
  KAFKA-8591; WorkerConfigTransformer NPE on connector configuration reloading (apache#6991)
  MINOR: add upgrade text (apache#7013)
  Bump version to 2.3.1-SNAPSHOT
xiowu0 pushed a commit to linkedin/kafka that referenced this pull request Aug 22, 2019
…ession timeout for JoinGroup v0 (apache#7072)

TICKET = KAFKA-8653
LI_DESCRIPTION =
EXIT_CRITERIA = HASH [b725b3c]
ORIGINAL_DESCRIPTION =

The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (apache#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>
(cherry picked from commit b725b3c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants