Skip to content

KAFKA-9053: AssignmentInfo#encode hardcodes the LATEST_SUPPORTED_VERSION#7537

Merged
guozhangwang merged 3 commits intoapache:trunkfrom
ableegoldman:9053-VP-hotfix
Oct 17, 2019
Merged

KAFKA-9053: AssignmentInfo#encode hardcodes the LATEST_SUPPORTED_VERSION#7537
guozhangwang merged 3 commits intoapache:trunkfrom
ableegoldman:9053-VP-hotfix

Conversation

@ableegoldman
Copy link
Copy Markdown
Member

@ableegoldman ableegoldman commented Oct 16, 2019

Also put in some additional logging that makes sense to add, and proved helpful in debugging this particular issue.

Unit tests verifying the encoded supported version were added.

This should get cherry-picked back to 2.1

@ableegoldman ableegoldman changed the title KAFKA-9053: AssignmentInfo#encode hardcodes the LATEST_SUPPORTED_VERSION KAFKA-9053: AssignmentInfo#encode hardcodes the LATEST_SUPPORTED_VERSION Oct 16, 2019
@ableegoldman ableegoldman changed the title KAFKA-9053: AssignmentInfo#encode hardcodes the LATEST_SUPPORTED_VERSION KAFKA-9053: AssignmentInfo#encode hardcodes the LATEST_SUPPORTED_VERSION Oct 16, 2019
@ableegoldman
Copy link
Copy Markdown
Member Author

cc/ @guozhangwang @mjsax @bbejeck


buf.putInt(usedVersion); // used version
buf.putInt(LATEST_SUPPORTED_VERSION); // supported version
buf.putInt(latestSupportedVersion); // supported version
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not necessary for SubscriptionInfo at this point in time, but may prevent a similar bug in the future.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Hopefully we do not need to decode and encode a subscription info object within the same instance :P

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HahaI guess you never know

}

@Test
public void shouldEncodeAndDecodeVersion5() {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a missing test I noticed

@mjsax mjsax added the streams label Oct 16, 2019
Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @ableegoldman LGTM

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Oct 16, 2019

"Sent a version {} subscription and got version {} assignment back (successful version probing). "
+
"Downgrade subscription metadata to commonly supported version and trigger new rebalance.",
"Downgrade subscription metadata to commonly supported version {} and trigger new rebalance.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: let's just use downgrade than downgrading in the log4j?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will replace all with downgrade


buf.putInt(usedVersion); // used version
buf.putInt(LATEST_SUPPORTED_VERSION); // supported version
buf.putInt(latestSupportedVersion); // supported version
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Hopefully we do not need to decode and encode a subscription info object within the same instance :P

@guozhangwang
Copy link
Copy Markdown
Contributor

LGTM, will merge after jenkins build.

@ableegoldman
Copy link
Copy Markdown
Member Author

@guozhangwang Java 11/2.13 passed, Java 8 failed with unrelated flaky
kafka.api.SslAdminClientIntegrationTest.testCreateTopicsResponseMetadataAndConfig
kafka.network.DynamicConnectionQuotaTest.testDynamicConnectionQuota

Java 11/2.12 failed with unknown/unnamed core integration test
Execution failed for task ':core:integrationTest'

@guozhangwang guozhangwang merged commit 78f5da9 into apache:trunk Oct 17, 2019
guozhangwang pushed a commit that referenced this pull request Oct 17, 2019
…ION (#7537)

Also put in some additional logging that makes sense to add, and proved helpful in debugging this particular issue.

Unit tests verifying the encoded supported version were added.

This should get cherry-picked back to 2.1

Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
@guozhangwang
Copy link
Copy Markdown
Contributor

@ableegoldman I've merged to trunk and 2.4, but it took me quite some time trying to resolve conflicts for 2.3 and older versions and I'm not very confident my resolution would not introduce any regressions. Could you create a separate PR for 2.3-2.1 (one PR should be fine since there's not too much changes)?

bbejeck pushed a commit that referenced this pull request Oct 17, 2019
Same as #7537
but targeted at 2.3 for cherry-pick
Reviewers: Bill Bejeck <bbejeck@gmail.com>
bbejeck pushed a commit that referenced this pull request Oct 17, 2019
Same as #7537
but targeted at 2.3 for cherry-pick
Reviewers: Bill Bejeck <bbejeck@gmail.com>
bbejeck pushed a commit that referenced this pull request Oct 17, 2019
Same as #7537
but targeted at 2.3 for cherry-pick
Reviewers: Bill Bejeck <bbejeck@gmail.com>
sdandu-gh added a commit to confluentinc/kafka that referenced this pull request Nov 9, 2019
* KAFKA-8649: send latest commonly supported version in assignment (apache#7425)

Reviewers: Matthias J. Sax <matthias@confluent.io>

* KAFKA-8262, KAFKA-8263: Fix flaky test `MetricsIntegrationTest` (apache#6922) (apache#7457)

- Timeout occurred due to initial slow rebalancing.
- Added code to wait until `KafkaStreams` instance is in state RUNNING to check registration of metrics and in state NOT_RUNNING to check deregistration of metrics.
- I removed all other wait conditions, because they are not needed if `KafkaStreams` instance is in the right state.

Reviewers: Guozhang Wang <wangguoz@gmail.com>

* MINOR: clarify wording around fault-tolerant state stores (apache#7510)

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>

* MINOR: Port changes from trunk for test stability to 2.3 branch (apache#7424)

Reviewer: Matthias J. Sax <matthias@confluent.io>

* KAFKA-9014: Fix AssertionError when SourceTask.poll returns an empty list (apache#7491)

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>

* KAFKA-8945/KAFKA-8947: Fix bugs in Connect REST extension API (apache#7392)

Fix bug in Connect REST extension API caused by invalid constructor parameter validation, and update integration test to play nicely with Jenkins

Fix instantiation of TaskState objects by Connect framework.

Author: Chris Egerton <chrise@confluent.io>
Reviewers: Magesh Nandakumar <mageshn@confluent.io>, Randall Hauch <rhauch@gmail.com>

* KAFKA-8340, KAFKA-8819: Use PluginClassLoader while statically initializing plugins (apache#7315)

Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.

KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.

Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>

* MINOR: log reason for fatal error in locking state dir (apache#7534)

Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>

* KAKFA-8950: Fix KafkaConsumer Fetcher breaking on concurrent disconnect (apache#7511)

The KafkaConsumer Fetcher can sometimes get into an invalid state where it believes that there are ongoing fetch requests, but in fact there are none. This may be caused by the heartbeat thread concurrently handling a disconnection event just after the fetcher thread submits a request which would cause the Fetcher to enter an invalid state where it believes it has ongoing requests to the disconnected node but in fact it does not. This is due to a thread safety issue in the Fetcher where it was possible for the ordering of the modifications to the nodesWithPendingFetchRequests to be incorrect - the Fetcher was adding it after the listener had already been invoked, which would mean that pending node never gets removed again.

This PR addresses that thread safety issue by ensuring that the pending node is added to the nodesWithPendingFetchRequests before the listener is added to the future, ensuring the finally block is called after the node is added.

Reviewers: Tom Lee, Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>

* Fix bug in AssignmentInfo#encode and add additional logging (apache#7545)

Same as apache#7537
but targeted at 2.3 for cherry-pick
Reviewers: Bill Bejeck <bbejeck@gmail.com>

* Bump version to 2.3.1

* Update versions to 2.3.2-SNAPSHOT

* HOTFIX: fix bug in VP test where it greps for the wrong log message (apache#7643)

Reviewers: Guozhang Wang <wangguoz@gmail.com>
ableegoldman added a commit to ableegoldman/kafka that referenced this pull request Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants