KAFKA-10173: Fix suppress changelog binary schema compatibility by vvcephei · Pull Request #8905 · apache/kafka

vvcephei · 2020-06-19T22:14:09Z

We inadvertently changed the binary schema of the suppress buffer changelog
in 2.4.0 without bumping the schema version number. As a result, it is impossible
to upgrade from 2.3.x to 2.4+ if you are using suppression.

Refactor the schema compatibility test to use serialized data from older versions
as a more foolproof compatibility test.
Refactor the upgrade system test to use the smoke test application so that we
actually exercise a significant portion of the Streams API during upgrade testing
Add more recent versions to the upgrade system test matrix
Fix the compatibility bug by bumping the schema version to 3

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

vvcephei · 2020-06-19T22:14:54Z

+            if (record.partition() != partition) {
+                throw new IllegalStateException(
+                    String.format(
+                        "record partition [%d] is being restored by the wrong suppress partition [%d]",
+                        record.partition(),
+                        partition
+                    )
+                );
+            }


On the side, I realized we can consolidate this check and perform it first, rather than after we're already written bad data into the buffer.

vvcephei · 2020-06-19T22:16:24Z

                        )
                    );
-                } else if (V_1_CHANGELOG_HEADERS.lastHeader("v").equals(record.headers().lastHeader("v"))) {
+                } else if (Arrays.equals(record.headers().lastHeader("v").value(), V_1_CHANGELOG_HEADER_VALUE)) {


This is the fix (although it was probably fine before). The implementation of Header.equals is not specified by any contract, so it's safer to perform a direct comparison on the header values. Just as before, I'm comparing byte arrays to avoid deserializing the value.

chia7712 · 2020-06-20T15:07:18Z

    private static final BytesSerializer KEY_SERIALIZER = new BytesSerializer();
    private static final ByteArraySerializer VALUE_SERIALIZER = new ByteArraySerializer();
+    private static final byte[] V_1_CHANGELOG_HEADER_VALUE = {(byte) 1};
    private static final RecordHeaders V_1_CHANGELOG_HEADERS =


my IDEA says this variable is never used.

I saw it is used in line 342.

my bad. The unused variable is V_1_CHANGELOG_HEADERS rather than V_1_CHANGELOG_HEADER_VALUE

Ah, right. My mistake. Thanks for pointing it out.

chia7712 · 2020-06-20T15:10:31Z

-                    );
-                }
            } else {
                if (record.headers().lastHeader("v") == null) {


nit:
We seek the last header many times. Could we reuse the return value?

guozhangwang

The PR lgtm, please feel free to merge after addressed @chia7712 's comments.

guozhangwang · 2020-06-21T04:33:01Z

    private static final BytesSerializer KEY_SERIALIZER = new BytesSerializer();
    private static final ByteArraySerializer VALUE_SERIALIZER = new ByteArraySerializer();
+    private static final byte[] V_1_CHANGELOG_HEADER_VALUE = {(byte) 1};
    private static final RecordHeaders V_1_CHANGELOG_HEADERS =


I saw it is used in line 342.

vvcephei · 2020-06-25T22:14:45Z

I'm still cleaning up this PR. I'll call for reviews when it's ready.

I was getting this exception, and somehow, the parallel GC parameter was the culprit java.lang.OutOfMemoryError: Java heap space at org.apache.kafka.streams.kstream.internals.FullChangeSerde.decomposeLegacyFormattedArrayIntoChangeArrays(FullChangeSerde.java:82) at org.apache.kafka.streams.state.internals.TimeOrderedKeyValueBufferChangelogDeserializationHelper.deserializeV2(TimeOrderedKeyValueBufferChangelogDeserializationHelper.java:90) at org.apache.kafka.streams.state.internals.TimeOrderedKeyValueBufferChangelogDeserializationHelper.duckTypeV2(TimeOrderedKeyValueBufferChangelogDeserializationHelper.java:61) at org.apache.kafka.streams.state.internals.InMemoryTimeOrderedKeyValueBuffer.restoreBatch(InMemoryTimeOrderedKeyValueBuffer.java:369) at org.apache.kafka.streams.state.internals.InMemoryTimeOrderedKeyValueBuffer$$Lambda$284/0x00000001002cb440.restoreBatch(Unknown Source) at org.apache.kafka.streams.state.internals.TimeOrderedKeyValueBufferTest.shouldRestoreV3FormatWithV2Header(TimeOrderedKeyValueBufferTest.java:742)

vvcephei

Whew! Can you take another look @guozhangwang and @chia7712 ?

After finding the root cause, I was able to fixed several related problems. The diff size is unfortunate but it's almost all the result of copy/pasting the smoke test into the upgrade-system-tests modules.

vvcephei · 2020-06-26T15:49:35Z


  defaultMaxHeapSize = "2g"
-  defaultJvmArgs = ["-Xss4m", "-XX:+UseParallelGC"]
+  defaultJvmArgs = ["-Xss4m"]


@ijuma , you'll probably want to know about this.

I have no idea why, but one of the new tests in this PR was failing with:

java.lang.OutOfMemoryError: Java heap space at org.apache.kafka.streams.kstream.internals.FullChangeSerde.decomposeLegacyFormattedArrayIntoChangeArrays(FullChangeSerde.java:82) at org.apache.kafka.streams.state.internals.TimeOrderedKeyValueBufferChangelogDeserializationHelper.deserializeV2(TimeOrderedKeyValueBufferChangelogDeserializationHelper.java:90) at org.apache.kafka.streams.state.internals.TimeOrderedKeyValueBufferChangelogDeserializationHelper.duckTypeV2(TimeOrderedKeyValueBufferChangelogDeserializationHelper.java:61) at org.apache.kafka.streams.state.internals.InMemoryTimeOrderedKeyValueBuffer.restoreBatch(InMemoryTimeOrderedKeyValueBuffer.java:369) at org.apache.kafka.streams.state.internals.InMemoryTimeOrderedKeyValueBuffer$$Lambda$284/0x00000001002cb440.restoreBatch(Unknown Source) at org.apache.kafka.streams.state.internals.TimeOrderedKeyValueBufferTest.shouldRestoreV3FormatWithV2Header(TimeOrderedKeyValueBufferTest.java:742)

I captured a flight recording and a heap dump on exit, but everything looked fine, and the heap was only a few megs at the time of the crash. I noticed first that if I just overrode all the jvm args, the test would pass, and through trial and error, I identified this one as the "cause".

I get an OOMe every time with -XX:+UseParallelGC and I've never gotten it without the flag. WDYT about dropping it?

Aha! I figured it out. There actually was a bug in the test. While duck-typing, the code was trying to allocate an array of 1.8GB. It's funny that disabling this flag made this test pass on java 11 and 14. Maybe the flag partitions the heap on those versions or something, so the test didn't actually have the full 2GB available. Anyway, I'm about to push a fix and put the flag back.

vvcephei · 2020-06-26T15:50:01Z

-     * We used to serialize a Change into a single byte[]. Now, we don't anymore, but we still keep this logic here
-     * so that we can produce the legacy format to test that we can still deserialize it.
-     */
-    public static byte[] mergeChangeArraysIntoSingleLegacyFormattedArray(final Change<byte[]> serialChange) {


Only used in the test now, so I moved it.

vvcephei · 2020-06-26T15:51:34Z

        if (oldValue == null) {
            buffer.putInt(NULL_VALUE_SENTINEL);
-        } else if (priorValue == oldValue) {
+        } else if (Arrays.equals(priorValue, oldValue)) {


This was correct before, since we check equality and enforce identity in the constructor, but Arrays.equals is extremely cheap when the arrays are identical, so explicitly doing an identity check instead of equality was a micro-optimization.

vvcephei · 2020-06-26T15:52:36Z

+    private static final byte[] V_1_CHANGELOG_HEADER_VALUE = {(byte) 1};
+    private static final byte[] V_2_CHANGELOG_HEADER_VALUE = {(byte) 2};
+    private static final byte[] V_3_CHANGELOG_HEADER_VALUE = {(byte) 3};
+    static final RecordHeaders CHANGELOG_HEADERS =
+        new RecordHeaders(new Header[] {new RecordHeader("v", V_3_CHANGELOG_HEADER_VALUE)});


We don't need to store the whole RecordHeaders for the old versions, just the actual version flag.

vvcephei · 2020-06-26T15:55:50Z

-                    // in this case, the changelog value is a serialized BufferValue
+                } else if (Arrays.equals(versionHeader.value(), V_2_CHANGELOG_HEADER_VALUE)) {
+
+                    final DeserializationResult deserializationResult = duckTypeV2(record, key);


See the comment on this method for why we need to duck-type version 2. I pulled these deserializations into a helper class because all the extra branches pushed our cyclomatic complexity over the limit.

But I kept the first two branches here because they aren't pure functions. They perform a lookup in the buffer itself as part of converting the old format.

Could you clarify which comment are you referring to? I did not see any comments for the "restoreBatch" method..

Sorry, the comments in duckTypeV2.

Basically, because we released three versions that would write data in the "v3" format, but with the "v2" flag, when we see the v2 flag, the data might be in v2 format or v3 format. The only way to tell is to just try to deserialize it in v2 format, and if we get an exception, then to try with v3 format.

vvcephei · 2020-06-26T16:19:32Z

+                      "replication.factor": self.REPLICATION_FACTOR,
+                      "num.standby.replicas": 2,
+                      "buffered.records.per.partition": 100,
+                      "commit.interval.ms": 1000,
+                      "auto.offset.reset": "earliest",
+                      "acks": "all"}


Moved from the java code so that all the configs can be defined together.

vvcephei · 2020-06-26T16:20:32Z

 # can be replaced with metadata_2_versions
 backward_compatible_metadata_2_versions = [str(LATEST_0_10_2), str(LATEST_0_11_0), str(LATEST_1_0), str(LATEST_1_1)]
 metadata_3_or_higher_versions = [str(LATEST_2_0), str(LATEST_2_1), str(LATEST_2_2), str(LATEST_2_3), str(LATEST_2_4), str(LATEST_2_5), str(DEV_VERSION)]
+smoke_test_versions = [str(LATEST_2_2), str(LATEST_2_3), str(LATEST_2_4), str(LATEST_2_5)]


See KAFKA-10203 for why I couldn't go past 2.2

vvcephei · 2020-06-26T16:21:49Z


-    @matrix(from_version=metadata_2_versions, to_version=metadata_2_versions)
-    def test_simple_upgrade_downgrade(self, from_version, to_version):
+    @matrix(from_version=smoke_test_versions, to_version=dev_version)


We were previously not testing 2.0+ at all. After rewriting this as a smoke test, it only applies to 2.2+. I also figured it makes more sense just to test upgrades to the current branch, rather than testing cross-upgrades between every pair of versions.

+1, I think this is a great find.

vvcephei · 2020-06-26T16:22:28Z

        self.zk.start()

-        self.kafka = KafkaService(self.test_context, num_nodes=1, zk=self.zk, topics=self.topics)
+        self.kafka = KafkaService(self.test_context, num_nodes=1, zk=self.zk, topics={


A lot of these changes are part of adapting the test to the smoke test app.

vvcephei · 2020-06-26T16:23:24Z

@@ -349,56 +370,42 @@ def get_version_string(self, version):
    def start_all_nodes_with(self, version):


I refactored this method to start all the nodes concurrently, rather than one at a time. We still do a rolling upgrade, but there's no need to do a rolling startup.

guozhangwang · 2020-06-26T18:15:08Z

-                    // in this case, the changelog value is a serialized BufferValue
+                } else if (Arrays.equals(versionHeader.value(), V_2_CHANGELOG_HEADER_VALUE)) {
+
+                    final DeserializationResult deserializationResult = duckTypeV2(record, key);


Could you clarify which comment are you referring to? I did not see any comments for the "restoreBatch" method..

guozhangwang · 2020-06-26T18:20:21Z

+import static org.apache.kafka.streams.tests.SmokeTestDriver.generate;
+import static org.apache.kafka.streams.tests.SmokeTestDriver.generatePerpetually;
+
+public class StreamsSmokeTest {


I'm assuming 22..25 client / drive code are all copy-pastes here so I skipped reviewing them. LMK if they aren't.

That's correct.

guozhangwang · 2020-06-26T18:26:52Z


-    @matrix(from_version=metadata_2_versions, to_version=metadata_2_versions)
-    def test_simple_upgrade_downgrade(self, from_version, to_version):
+    @matrix(from_version=smoke_test_versions, to_version=dev_version)


+1, I think this is a great find.

guozhangwang · 2020-06-26T18:28:56Z

+        return deserializationResult;
+    }
+
+    private static DeserializationResult deserializeV2(final ConsumerRecord<byte[], byte[]> record,


Some docs, either here or directly inside InMemoryTimeOrderedKeyValueBuffer.java explaining the format difference would help a lot. You can see some examples like object GroupMetadataManager

sure thing!

guozhangwang · 2020-06-26T18:31:59Z

+            changelogTopic,
+            key,
+            null,
+            null,


I'm just thinking, maybe we should encode headers to tombstones too in case in the future we changed the semantics of tombstones?

I remember considering this when I added the first version header. The reason I didn't is that, since the initial version didn't have any headers, even if we change the tombstone format in the future, we'll always have to interpret a "no header, null value" record as being a "legacy format" tombstone, just like we have to interpret a "no header, non-null value" as being a "legacy format" data record.

You can think of "no header" as indicating "version 0". Since we haven't changed the format of tombstones yet, there's no value in adding a "version 1" flag. We should just wait until we do need to make such a change (if ever).

guozhangwang

Triggered https://jenkins.confluent.io/view/all/job/system-test-kafka-branch-builder/3991/

guozhangwang · 2020-06-26T19:22:19Z

test this

vvcephei · 2020-06-26T19:23:00Z

Hmm. Still saw the OOME in https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3134/

vvcephei · 2020-06-26T19:23:55Z

Retest this please

vvcephei · 2020-06-26T20:34:35Z

Ah, that heap space thing was legit. Fix coming...

vvcephei · 2020-06-26T21:32:23Z

Hey @guozhangwang , you might want to take a look at that last fix. The duck-typing code was producing an OOME some times, when it would just interpret a random integer out of the buffer as a "size" (integer) and blindly allocate an array of that size.

I added a Util (with tests) that has a guard to prevent this.

vvcephei · 2020-06-26T21:33:19Z

I will follow up shortly to extract the system tests to a separate PR, since we're having trouble running the tests at all right now, and we wouldn't know if they are even more broken.

vvcephei · 2020-06-27T00:33:20Z

Ok, @guozhangwang , This is my "final" iteration. I pulled the system tests out, and I'll follow up with another PR later. This PR should be sufficient for the basic purpose, thanks to the new "binary" compatibility unit tests.

guozhangwang

LGTM. We can merge after green build.

Let's trigger system test builds on the follow-up PR

vvcephei · 2020-06-27T02:40:10Z

Failures were unrelated:

kafka.api.PlaintextConsumerTest > testLowMaxFetchSizeForRequestAndPartition FAILED
    org.scalatest.exceptions.TestFailedException: Timed out before consuming expected 2700 records. The number consumed was 1077.
        at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
        at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
        at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
        at org.scalatest.Assertions.fail(Assertions.scala:1091)
        at org.scalatest.Assertions.fail$(Assertions.scala:1087)
        at org.scalatest.Assertions$.fail(Assertions.scala:1389)
        at kafka.api.AbstractConsumerTest.consumeRecords(AbstractConsumerTest.scala:158)
        at kafka.api.PlaintextConsumerTest.testLowMaxFetchSizeForRequestAndPartition(PlaintextConsumerTest.scala:804)

org.apache.kafka.connect.mirror.MirrorConnectorsIntegrationTest > testReplication FAILED
    java.lang.RuntimeException: Could not find enough records. found 0, expected 100
        at org.apache.kafka.connect.util.clusters.EmbeddedKafkaCluster.consume(EmbeddedKafkaCluster.java:435)
        at org.apache.kafka.connect.mirror.MirrorConnectorsIntegrationTest.testReplication(MirrorConnectorsIntegrationTest.java:217)

kafka.api.PlaintextConsumerTest.testLowMaxFetchSizeForRequestAndPartition

org.apache.kafka.streams.integration.HighAvailabilityTaskAssignorIntegrationTest.shouldScaleOutWithWarmupTasksAndPersistentStores

org.apache.kafka.connect.integration.BlockingConnectorTest.testBlockInConnectorStop

We inadvertently changed the binary schema of the suppress buffer changelog in 2.4.0 without bumping the schema version number. As a result, it is impossible to upgrade from 2.3.x to 2.4+ if you are using suppression. * Refactor the schema compatibility test to use serialized data from older versions as a more foolproof compatibility test. * Refactor the upgrade system test to use the smoke test application so that we actually exercise a significant portion of the Streams API during upgrade testing * Add more recent versions to the upgrade system test matrix * Fix the compatibility bug by bumping the schema version to 3 Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

vvcephei · 2020-06-27T03:50:47Z

backported to 2.6, 2.5, and 2.4. I ran the streams and client tests each time, as well as systemTestLibs.

* 'trunk' of github.com:apache/kafka: KAFKA-10180: Fix security_config caching in system tests (apache#8917) KAFKA-10173: Fix suppress changelog binary schema compatibility (apache#8905) KAFKA-10166: always write checkpoint before closing an (initialized) task (apache#8926) MINOR: Rename SslTransportLayer.State."NOT_INITALIZED" enum value to "NOT_INITIALIZED" MINOR: Update Scala to 2.13.3 (apache#8931) KAFKA-9076: support consumer sync across clusters in MM 2.0 (apache#7577) MINOR: Remove Diamond and code code Alignment (apache#8107) KAFKA-10198: guard against recycling dirty state (apache#8924)

KAFKA-10173: Directly use Arrays.equals for version comparison

4c39333

vvcephei added the streams label Jun 19, 2020

vvcephei commented Jun 19, 2020

View reviewed changes

chia7712 reviewed Jun 20, 2020

View reviewed changes

guozhangwang reviewed Jun 21, 2020

View reviewed changes

chia7712 approved these changes Jun 21, 2020

View reviewed changes

John Roesler added 13 commits June 24, 2020 10:25

converting upgrade test to smoke test

98786b3

wip debugging suppress buffer

5ba0c59

asdf

f556a41

fix

7af7935

upgrade test passed (2.3.1 -> trunk)

495aebc

wip

e29728f

cleanup

738eef7

cleanup

9cd927a

fix test

2396458

direct encoding of restore test vx

29ab207

direct encoding of restore test v1

dab0fc1

add v2 and v3

776efb6

cleanup

e6e0b48

vvcephei changed the title ~~KAFKA-10173: Directly use Arrays.equals for version comparison~~ KAFKA-10173: Fix suppress changelog binary schema compatibility Jun 25, 2020

John Roesler added 6 commits June 25, 2020 21:00

adding other old versions

65a549f

fallback to make 2.4 work

41a4db1

remove 2.1. cf KAFKA-10203

63f1cc3

reduce cyclomatic complexity

2465162

cleanup

7b8deb8

vvcephei commented Jun 26, 2020

View reviewed changes

guozhangwang reviewed Jun 26, 2020

View reviewed changes

fix attempt to allocate arbitrary sized array

844321f

John Roesler added 2 commits June 26, 2020 16:59

factor out system test changes

c420099

style

f5cc0b7

guozhangwang approved these changes Jun 27, 2020

View reviewed changes

vvcephei merged commit 8319389 into apache:trunk Jun 27, 2020

vvcephei deleted the kafka-10173-improve-header-comparison branch June 27, 2020 02:41

		@@ -349,56 +370,42 @@ def get_version_string(self, version):
		def start_all_nodes_with(self, version):

Conversation

vvcephei commented Jun 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei commented Jun 25, 2020

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

vvcephei commented Jun 19, 2020 •

edited

Loading