KAFKA-15093: Add 3.4 and 3.5 Streams upgrade system tests#13860
KAFKA-15093: Add 3.4 and 3.5 Streams upgrade system tests#13860mimaison wants to merge 2 commits intoapache:trunkfrom
Conversation
mjsax
left a comment
There was a problem hiding this comment.
Thanks. It's really just a c&p (view minor updates required).
We should trigger a system test run before we merge to see it its working as expected.
| @Override | ||
| public void init(final ProcessorContext<Void, Void> context) { | ||
| super.init(context); | ||
| System.out.println("[3.3] initializing processor: topic=" + topic + " taskId=" + context.taskId()); |
There was a problem hiding this comment.
| System.out.println("[3.3] initializing processor: topic=" + topic + " taskId=" + context.taskId()); | |
| System.out.println("[3.4] initializing processor: topic=" + topic + " taskId=" + context.taskId()); |
|
|
||
| final Properties streamsProperties = Utils.loadProps(propFileName); | ||
|
|
||
| System.out.println("StreamsTest instance started (StreamsUpgradeTest v3.3)"); |
There was a problem hiding this comment.
| System.out.println("StreamsTest instance started (StreamsUpgradeTest v3.3)"); | |
| System.out.println("StreamsTest instance started (StreamsUpgradeTest v3.4)"); |
|
|
||
| @Override | ||
| public void init(final ProcessorContext<KOut, VOut> context) { | ||
| System.out.println("[3.3] initializing processor: topic=" + topic + "taskId=" + context.taskId()); |
There was a problem hiding this comment.
| System.out.println("[3.3] initializing processor: topic=" + topic + "taskId=" + context.taskId()); | |
| System.out.println("[3.4] initializing processor: topic=" + topic + "taskId=" + context.taskId()); |
| @Override | ||
| public void init(final ProcessorContext<Void, Void> context) { | ||
| super.init(context); | ||
| System.out.println("[3.3] initializing processor: topic=" + topic + " taskId=" + context.taskId()); |
There was a problem hiding this comment.
| System.out.println("[3.3] initializing processor: topic=" + topic + " taskId=" + context.taskId()); | |
| System.out.println("[3.5] initializing processor: topic=" + topic + " taskId=" + context.taskId()); |
|
|
||
| final Properties streamsProperties = Utils.loadProps(propFileName); | ||
|
|
||
| System.out.println("StreamsTest instance started (StreamsUpgradeTest v3.3)"); |
There was a problem hiding this comment.
| System.out.println("StreamsTest instance started (StreamsUpgradeTest v3.3)"); | |
| System.out.println("StreamsTest instance started (StreamsUpgradeTest v3.5)"); |
|
|
||
| @Override | ||
| public void init(final ProcessorContext<KOut, VOut> context) { | ||
| System.out.println("[3.3] initializing processor: topic=" + topic + "taskId=" + context.taskId()); |
There was a problem hiding this comment.
| System.out.println("[3.3] initializing processor: topic=" + topic + "taskId=" + context.taskId()); | |
| System.out.println("[3.5] initializing processor: topic=" + topic + "taskId=" + context.taskId()); |
|
@mjsax I also noticed https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/AssignorConfiguration.java#L180 |
|
Good catch, @mimaison . It looks like we could keep adding "upgrade from" versions on each release, but unless there's a compatibility issue, we don't need to. If you look at |
|
Thanks for confirming @vvcephei -- as a matter of fact, we released KP-904 with 3.5, so it seems to be an issue. Let me do a PR to address this and also add a test to get a guard in place. |
|
Are there compatibility issues with 3.4 or 3.5? |
|
With KIP-904 we changes some serialization format inside an internal repartition topic (cf https://cwiki.apache.org/confluence/display/KAFKA/KIP-904%3A+Kafka+Streams+-+Guarantee+subtractor+is+called+before+adder+if+key+has+not+changed) To upgrade KS correctly, you need to do two rolling bounces (first one with Thus, it seems, while the upgrade path for KP-904 itself works, the We did not get to finish the system test for this upgrade (it's a WIP PR: #13656), that I assume would have caught this issue. Guess I should have insistent to get the system test in place before the release (that is totally my fault) So basically, if one uses \cc @fqaiser94 @cadonna |
|
Did a PR: #14103 |
|
I just uploaded the artifact for 3.5.1 to the S3 bucket we use for system tests -- can we add 3.5.1 in this PR right away (or even wait for 3.5.2 -- we might need a hotfix for this upgrade issue...?) |
I have added system tests to use 3.5.1 in #14069 |
|
System tests look better with #14103. In addition to a few flaky tests, I got 3 failures: |
f07cf9d to
16c72d3
Compare
|
This depends on #14103 so we need to get that merged first |
|
@mimaison Merged the hotfix PR. |
16c72d3 to
a770085
Compare
|
I rebased on top of trunk and reran the system tests in our CI. Many of the Streams upgrade system tests are flaky but I've seen all of them pass at least once apart from From my limited Streams understanding I don't see any obvious issues in the logs. @mjsax can you take a look and see if it's just really flaky or if there's a real issue. Is is passing in your CI? I've uploaded the logs from the last run to my Apache home directory: https://home.apache.org/~mimaison/streams-upgrade-failure.zip |
|
We also observed failing tests -- it's on our TODO list but might take some time until we get to it :( -- I'll keep you posted. |
|
@mjsax These missing tests are starting to pile up. We're already missing them for 3.4 and 3.5, and 3.6 will be out shortly. If we were to break compatibility in Streams how would we notice it if we don't have these tests? Actually do we know whether the failures are caused by the tests or by issues in Streams? Figuring this out seems relatively high priority for me. |
|
I identified one fundamental problem in 3.3 release -- a bug was introduced that fundamentally breaks KS systems tests using that version. This PR (fa03244) changes It was fixed via 0de0374 but it's only included in 3.4.1 release -- while it was backported to |
|
Another issue just introduced in |
|
Thanks @mjsax for following up. So what's the path forward? Are we able to merge this or do we need to release new bugfix versions for 3.3 and 3.4 (and 3.5?)? |
|
I just opened a PR to fix |
I could not reproduce the issue any longer and did more digging -- it think I hit this issue previously because we did not bump the |
a770085 to
ff352ce
Compare
|
I rebased this PR on trunk and rerun the Streams system tests and still got a bunch of failures: Tests can be a bit flaky in our CI but many of these seem to fail consistently. @mjsax did you get a clean run or do you see these tests failing in your CI as well? |
#14539 is not merged yet... So that's expected I guess. -- We actually also just found another bug that was fixed in the meantime: #14555 Let's see if we can push 14539 over the finish line first... Will ping here again after that's done; sorry that it take so long, but we are on it. Promised. |
|
Summary about your test results:
|
|
Updated #14539 and re-triggered a system test run. Let's hope for the best :) |
|
Merged the other PR. Btw: should we split this PR into two, one to add 3.4 and one to add 3.5? So it's easier to cherry-pick? (Adding 3.5 should only go into |
|
@mjsax Good call, I've split this PR in 2:
Closing this PR |
Committer Checklist (excluded from commit message)