Skip to content

KAFKA-6054: Fix upgrade path from Kafka Streams v0.10.0#4768

Merged
mjsax merged 7 commits intoapache:1.0from
mjsax:kafka-6054-fix-upgrade-10
Apr 3, 2018
Merged

KAFKA-6054: Fix upgrade path from Kafka Streams v0.10.0#4768
mjsax merged 7 commits intoapache:1.0from
mjsax:kafka-6054-fix-upgrade-10

Conversation

@mjsax
Copy link
Copy Markdown
Member

@mjsax mjsax commented Mar 23, 2018

No description provided.

@mjsax mjsax added the streams label Mar 23, 2018
@mjsax mjsax requested review from dguy and guozhangwang March 23, 2018 23:48
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Mar 23, 2018

This is the port of #4746 (0.10.1), #4758 (0.10.2), and #4761 (0.11.0) to branch 1.0.

Note: 1.0 was the first branch that did contain upgrade system tests already -- I just added the new once -- we can consider to consolidate them in this PR already. LMKWYT.

\cc @bbejeck @vvcephei

@mjsax mjsax force-pushed the kafka-6054-fix-upgrade-10 branch from 6ae5b0b to f0531b3 Compare March 25, 2018 19:44
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Mar 25, 2018

@mjsax mjsax force-pushed the kafka-6054-fix-upgrade-10 branch from f0531b3 to 5acbb1a Compare March 26, 2018 00:30
Github comments
John's review
Guozhang's follow up
@mjsax mjsax force-pushed the kafka-6054-fix-upgrade-10 branch 3 times, most recently from 040aed6 to ad66b12 Compare March 26, 2018 01:07
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Mar 26, 2018

Added missing system test (missing combination of versions to be tested): https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1601/

Passed.

@mjsax mjsax force-pushed the kafka-6054-fix-upgrade-10 branch from ad66b12 to 1b94960 Compare March 26, 2018 18:57
@parametrize(old_version=str(LATEST_0_10_2), new_version=str(LATEST_0_11_0))
@parametrize(old_version=str(LATEST_0_10_2), new_version=str(DEV_VERSION))
@parametrize(old_version=str(LATEST_0_11_0), new_version=str(DEV_VERSION))
def test_simple_upgrade(self, old_version, new_version):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest update the existing test directly in this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? Should we remove test_upgrade_downgrade_streams or merge this with test_upgrade_downgrade_streams ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for your comment above:

Note: 1.0 was the first branch that did contain upgrade system tests already -- I just added the new once -- we can consider to consolidate them in this PR already. LMKWYT.

I'd suggest we consolidate them in this PR already, same for 1.1

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits otherwise LGTM.

Comment thread docs/upgrade.html Outdated
Thus, you need to update and recompile your code. Just swapping the Kafka Streams library jar file will not work and will break your application. </li>
<li> Upgrading from 0.10.0.x to 1.0.2 requires two rolling bounces with config <code>upgrade.from="0.10.0"</code> set for first upgrade phase
(cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade">KIP-268</a>).
As an alternative, and offline upgrade is also possible.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As an alternative, an offline upgrade is also possible

Comment thread docs/upgrade.html Outdated
Thus, you need to update and recompile your code. Just swapping the Kafka Streams library jar file will not work and will break your application. </li>
<li> Upgrading from 0.10.0.x to 0.10.2.2 requires two rolling bounces with config <code>upgrade.from="0.10.0"</code> set for first upgrade phase
(cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade">KIP-268</a>).
As an alternative, and offline upgrade is also possible.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as above

Comment thread docs/upgrade.html Outdated
Thus, you need to update and recompile your code. Just swapping the Kafka Streams library jar file will not work and will break your application. </li>
<li> Upgrading from 0.10.0.x to 0.10.1.2 requires two rolling bounces with config <code>upgrade.from="0.10.0"</code> set for first upgrade phase
(cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade">KIP-268</a>).
As an alternative, and offline upgrade is also possible.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same

Starts 3 KafkaStreams instances with <old_version>, and upgrades one-by-one to <new_version>
"""

self.topics = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topics defined here and in next test, maybe move up to init

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general test run with 3 broker and creates topic with replication factor 3 an other config (and used init for setup) -- however, we need different single-broker setup here. If we cleanup and merge tests (what we should to right away or a follow up PR -- Guozhang suggested to do it in this PR) than I agree that it makes sense to unify the setup. WDYT?

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Mar 26, 2018

Java 7 and Java 9 passed Java 8 failed exceeded rate limit

retest this please

@mjsax mjsax force-pushed the kafka-6054-fix-upgrade-10 branch from 03aa5e1 to 263fcfd Compare March 29, 2018 23:37
@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Mar 30, 2018

Java 8 build aborted.

retest this please

@vvcephei
Copy link
Copy Markdown
Contributor

JDK 7 failure (https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/12218/) was:

00:10:15.512 org.apache.kafka.clients.NetworkClientTest > testConnectionDelayDisconnected FAILED
00:10:15.512     java.lang.AssertionError: expected:<16032.0> but was:<22917.0>
00:10:15.512         at org.junit.Assert.fail(Assert.java:88)
00:10:15.512         at org.junit.Assert.failNotEquals(Assert.java:834)
00:10:15.512         at org.junit.Assert.assertEquals(Assert.java:553)
00:10:15.512         at org.junit.Assert.assertEquals(Assert.java:683)
00:10:15.512         at org.apache.kafka.clients.NetworkClientTest.testConnectionDelayDisconnected(NetworkClientTest.java:307)

JDK8 failure (https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/12207/consoleFull) was:

00:12:26.606 org.apache.kafka.clients.producer.internals.BufferPoolTest > testBlockTimeout FAILED
00:12:26.606     java.lang.AssertionError: available memory8
00:12:26.606         at org.junit.Assert.fail(Assert.java:88)
00:12:26.606         at org.junit.Assert.assertTrue(Assert.java:41)
00:12:26.606         at org.apache.kafka.clients.producer.internals.BufferPoolTest.testBlockTimeout(BufferPoolTest.java:190)

JDK9 is ongoing.

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR LGTM. Not sure what's up with those failing tests, but they look unrelated.

@mjsax mjsax force-pushed the kafka-6054-fix-upgrade-10 branch 2 times, most recently from f7d669f to dd6c414 Compare March 30, 2018 17:05
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Mar 30, 2018

Updated this. Triggered system tests: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1634/

@mjsax mjsax force-pushed the kafka-6054-fix-upgrade-10 branch 2 times, most recently from 07bb017 to 27ad139 Compare March 30, 2018 20:30
@mjsax mjsax force-pushed the kafka-6054-fix-upgrade-10 branch from 27ad139 to 35d3e92 Compare March 30, 2018 22:19
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Mar 31, 2018

Consolidated system tests as requested.

1634 failed partially. Fixed. Corresponding tests passed at https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1642/

Retriggered all upgrade tests again: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1643/ (passed)

Call for review.

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Mar 31, 2018

Retest this please

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a clarifying question.

Thanks for the consolidation.

#@parametrize(new_version=str(LATEST_0_10_1)) we cannot run this test until Kafka 0.10.1.2 is released
#@parametrize(new_version=str(LATEST_0_10_2)) we cannot run this test until Kafka 0.10.2.2 is released
#@parametrize(new_version=str(LATEST_0_11_0)) we cannot run this test until Kafka 0.11.0.3 is released
@parametrize(new_version=str(DEV_VERSION))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a double-take here... I think I get it now, but to check my understanding...

The plan is to wait until those patch releases, and then do something like :

@matrix(new_version=simple_upgrade_versions_metadata_version_2)

You're effectively slicing 0.10.0 out from the upgrade/downgrade test matrix because:

  • upgrading from 0.10.0 requires a different algorithm (double bounce)
  • downgrading to 0.10.0 is not supported

Right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Upgrading from 0.10.0 requires two rolling bounces including the new config upgrade.from and thus we have this extra test for it -- this new config is only available in un-release code (for older non-dev branches) and thus, we cannot run this test for those versions atm.

We chould add a downgrade test though -- but it would be a new test method, too, as it also requires two rolling bounces but with different order of command compare to this test. It would be something like:

  • prepare all instances to bounce and set config upgrade.from="0.10.0"
  • do first round of rolling bounce -- in this rolling bounce you stay on current version and don't downgrade yet, but only prepare all instances for downgrading (via setting the config)
  • do a second round of rolling bounces downgrading to 0.10.0

I think it's not worth to add a downgrade test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We have no evidence that downgrade is necessary. We don't need to document it or test it unless or until we actually choose to support it.

@mjsax mjsax merged commit 6a24f5b into apache:1.0 Apr 3, 2018
@mjsax mjsax deleted the kafka-6054-fix-upgrade-10 branch June 5, 2018 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants