KAFKA-7652: Part III; Put to underlying before Flush by guozhangwang · Pull Request #6191 · apache/kafka

guozhangwang · 2019-01-24T06:48:25Z

In the caching layer's flush listener call, we should always write to the underlying store, before flushing (see MINOR: Improve Join integration test coverage, PART I #4331 's point 4) for detailed explanation). When fixing 4331, it only touches on KV stores, but it turns out that we should fix for window and session store as well.
Also apply the optimization that was in session-store already: when the new value bytes and old value bytes are all null (this is possible e.g. if there is a put(K, V) followed by a remove(K) or put(K, null) and these two operations only hit the cache), upon flushing this mean the underlying store does not have this value at all and also no intermediate value has been sent to downstream as well. We can skip both putting a null to the underlying store as well as calling the flush listener sending null -> null in this case.

Modifies corresponding unit tests.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…derbytes-upper-range

guozhangwang · 2019-01-24T06:52:58Z

@mjsax @bbejeck @vvcephei

guozhangwang · 2019-01-31T02:29:42Z

-                    entry.entry().context().timestamp());
-            } else {
+        if (flushListener != null) {
+            final byte[] newValueBytes = entry.newValue();


This is an optimization that I did for all three caching stores:

get the new bytes from cache, read the old bytes from underlying store.

if either old / new bytes are not null, go to 3) below; otherwise skip so that we do not need to deserialize.

deserialize to objects, and apply flush listener to downstream.

guozhangwang · 2019-01-31T02:32:25Z

+                final AGG newValue = newValueBytes != null ? serdes.valueFrom(newValueBytes) : null;
+                final AGG oldValue = sendOldValues && oldValueBytes != null ? serdes.valueFrom(oldValueBytes) : null;
+                // we need to get the old values if needed, and then put to store, and then flush
+                bytesStore.put(bytesKey, entry.newValue());


This is another optimization I did for window / session stores: previously we deserialize bytes to windowed<K>, and then we serialize it back to windowed<Bytes> for underlying.put, which is a bit waste.

Now what I did is bytes -> windowed<Bytes> -> windowed<K> where the first deser is always executed, while the second deser is executed only if new / old bytes are not null. Note that second deser actually only does Bytes -> K and we just wrap with the same window.

guozhangwang · 2019-01-31T02:34:28Z

+
+            // this is an optimization: if this key did not exist in underlying store and also not in the cache,
+            // we can skip flushing to downstream as well as writing to underlying store
+            if (newValueBytes != null || oldValueBytes != null) {


The actual fix here: we need to 1) read the old bytes, and 2) put the new bytes to underlying, and 3) apply flush listener.

Previously the ordering was 1) -> 3) -> 2), which has an issue that if the flushed downstream access the store, it does not have the new data yet which is incorrect. This fix was merged into KVStore some time ago, but we did not do the same for window / session store here.

guozhangwang · 2019-01-31T02:35:14Z

        kafkaStreams.start();

-        waitUntilAtLeastNumRecordProcessed(outputTopic, 2);
+        waitUntilAtLeastNumRecordProcessed(outputTopic, 1);


This is the effect of the optimization: we will only sent down one record now, and the second null record will be omitted.

guozhangwang · 2019-01-31T02:35:47Z

        final KTable<String, Integer> table1 = builder.table(topic1, consumed);

-        final KTable<String, Integer> table2 = table1.filter(predicate, Materialized.as("anyStoreNameFilter"));
+        final KTable<String, Integer> table2 = table1.filter(predicate,


This is needed since with the optimization, we will omit flush those unnecessary nulls.

Why not update the expected test output? Or better, have a test for both cases?

This should be covered in the store test case itself, not the KTableFilter here. I've added a separate test case for this purpose.

Well. Guess the "problem" is, that it's unclear what this test is supposed to test. The method name is not very good.

It seems to be about the case, that the filter does the right thing based on the input data.

testKTable -- table store is not materialized thus everything is pass-through

testQueryableKTable -- table store is materialized (we disable caching to preserve pass-through behavior)
similar below for

testNotSendingOldValue (not materialized)

testQueryableNotSendingOldValue (preserve pass-through via disabling caching)

Does this sound correct?

This sounds right to me. Let me rename the test cases a bit to be more clear.

guozhangwang · 2019-01-31T02:36:00Z

        assertNull(store.get(3));
        assertEquals("four", store.get(4));
        assertEquals("five", store.get(5));
+        store.flush();


Ditto here due to the optimization.

Same comment as above.

For this test case I've tried to do that, but there's some trickiness about that: 1) AbstractKVStoreTest is shared in five tests, and only one of them is caching store which is affected. 2) this effect is dependent on the caching size.

So I've decided to add a separate test in CachingKVStoreTest only for this optimization effectiveness.

Again, testPutGetRange is not a good name -- maybe that is what confused me.

Also, if we do the flush() to be able to share code, maybe a comment about the why -- basically, a test should work for caching or non-caching (or explicitly state in the test name that is test the one or other behavior).

guozhangwang · 2019-01-31T02:37:14Z

        store.setFlushListener(cacheFlushListener, true);
        store.put(bytesKey("1"), bytesValue("a"));
        store.flush();
+        assertEquals("a", cacheFlushListener.forwarded.get("1").newValue);


The augmented unit tests here and below are for 1) demonstrating the optimization, 2) demonstrating the bug fix as well.

guozhangwang · 2019-01-31T02:38:51Z

@bbejeck @vvcephei @ableegoldman @mjsax for reviews.

vvcephei

LGTM @guozhangwang . Thanks!

I left one comment to maybe improve readability, and one question about an additional possible optimization (but maybe out of scope).

Thanks,
-John

vvcephei · 2019-02-01T18:15:58Z

+
+            // this is an optimization: if this key did not exist in underlying store and also not in the cache,
+            // we can skip flushing to downstream as well as writing to underlying store
+            if (newValueBytes != null || oldValueBytes != null) {


Can we apply deMorgan's rule and flip the conditional (with an empty body) here?

It seems easier to understand:

if (newValueBytes == null && oldValueBytes == null) { // no need to flush or write to underlying store } else { ... the rest of the code }

In other words, the empty "skip" block is more self-documenting than the preceeding comment explaining the algorithm.

Which actually makes me wonder: is there a more general optimization that we can skip desserialization, flush, and write any time the new and old values are identical?

I think we cannot generally do this optimization if oldBytes == newBytes, since the timestamp may be updated for flushListener.apply.

I think an empty then-body would be bad code style.

Hey folks, I'm slightly modifying this logic to NOT always blindly read from the underlying store, but do sth. like this:

final byte[] newValueBytes = entry.newValue(); final byte[] oldValueBytes = newValueBytes == null || sendOldValues ? underlying.get(entry.key()) : null; if (newValueBytes != null || oldValueBytes != null) { .... }

The main motivation is that, for session stores, the likelihood of newValueBytes == null could be high while for other two types, the likelihood would be low. As a result, for other two types we would fail the above condition and if sendOldValues == false we would not need to read the old bytes for this optimization since it is doomed to not happen.

…ching-session-remove

mjsax · 2019-02-10T08:03:10Z

+
+            // this is an optimization: if this key did not exist in underlying store and also not in the cache,
+            // we can skip flushing to downstream as well as writing to underlying store
+            if (newValueBytes != null || oldValueBytes != null) {


I think an empty then-body would be bad code style.

mjsax · 2019-02-10T08:09:18Z

        final KTable<String, Integer> table1 = builder.table(topic1, consumed);

-        final KTable<String, Integer> table2 = table1.filter(predicate, Materialized.as("anyStoreNameFilter"));
+        final KTable<String, Integer> table2 = table1.filter(predicate,


Why not update the expected test output? Or better, have a test for both cases?

mjsax · 2019-02-10T08:09:42Z

        assertNull(store.get(3));
        assertEquals("four", store.get(4));
        assertEquals("five", store.get(5));
+        store.flush();


Same comment as above.

mjsax · 2019-02-10T08:10:09Z

        cachingStore.flush();
        cachingStore.remove(a);
-        cachingStore.flush();
+        //cachingStore.flush();


…ching-session-remove

mjsax · 2019-02-11T23:54:49Z

Failed with checkstyle error:

14:47:08 [ant:checkstyle] [ERROR] /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/streams/src/test/java/org/apache/kafka/streams/state/internals/CachingKeyValueStoreTest.java:41:8: Unused import - java.io.IOException. [UnusedImports]

guozhangwang · 2019-02-12T01:19:00Z

Failed with checkstyle error:

Fixed it on my latest commit, did not push since I thought you have more comments to come.

mjsax · 2019-02-12T01:02:22Z

-            } else {
+        if (flushListener != null) {
+            final byte[] newValueBytes = entry.newValue();
+            final byte[] oldValueBytes = underlying.get(entry.key());


Because we only flush() if newValueBytes != null || oldValueBytes != null, I think can actually do get get() only, if newValueBytes == null || sendOldValues is true. Same for the other stores.

Thoughts?

I had the same idea: #6191 (comment)

mjsax · 2019-02-12T01:03:10Z

+            if (newValueBytes != null || oldValueBytes != null) {
+                final K key = serdes.keyFrom(entry.key().get());
+                final V newValue = newValueBytes != null ? serdes.valueFrom(newValueBytes) : null;
+                final V oldValue = sendOldValues && oldValueBytes != null ? serdes.valueFrom(oldValueBytes) : null;


Do we need to check sendOldValues here?

Good point, since I've add the check above I can remove it here. Ditto elsewhere.

@mjsax Actually, it is not correct: when sendOldValues is false, we should never send old values downstreams. So suppose newValueBytes != null, and hence we read the underlying store, we still need to have the check here so that we can have oldValue as null.

mjsax · 2019-02-12T01:04:49Z

+        final Windowed<Bytes> bytesKey = SessionKeySchema.from(binaryKey);
+        if (flushListener != null) {
+            final byte[] newValueBytes = entry.newValue();
+            final byte[] oldValueBytes = bytesStore.fetchSession(bytesKey.key(), bytesKey.window().start(), bytesKey.window().end());


As above. Only do fetchSession if newValueBytes == null || sendOldValues is true

mjsax · 2019-02-12T01:05:05Z

+            if (newValueBytes != null || oldValueBytes != null) {
+                final Windowed<K> key = SessionKeySchema.from(bytesKey, serdes.keyDeserializer(), topic);
+                final AGG newValue = newValueBytes != null ? serdes.valueFrom(newValueBytes) : null;
+                final AGG oldValue = sendOldValues && oldValueBytes != null ? serdes.valueFrom(oldValueBytes) : null;


Do we need to check sendOldValues here?

mjsax · 2019-02-12T01:05:55Z

-            } finally {
-                context.setRecordContext(current);
+            final byte[] newValueBytes = entry.newValue();
+            final byte[] oldValueBytes = underlying.fetch(key, windowStartTimestamp);


mjsax · 2019-02-12T01:23:12Z

        assertNull(store.get(3));
        assertEquals("four", store.get(4));
        assertEquals("five", store.get(5));
+        store.flush();


Again, testPutGetRange is not a good name -- maybe that is what confused me.

Also, if we do the flush() to be able to share code, maybe a comment about the why -- basically, a test should work for caching or non-caching (or explicitly state in the test name that is test the one or other behavior).

mjsax · 2019-02-12T01:29:59Z

        store.flush();
+        assertEquals("a", cacheFlushListener.forwarded.get("1").newValue);
+        assertNull(cacheFlushListener.forwarded.get("1").oldValue);
        store.put(bytesKey("1"), bytesValue("b"));


I think we should do a third put store.put(bytesKey("1"), bytesValue("c")); here and test if old value is a and new value is c to make sure we return the correct old value -- with 2 puts, it's unclear it it's correct. (If would be incorrect if oldValue would be b)

mjsax · 2019-02-12T01:33:53Z

@@ -176,10 +179,14 @@ public void shouldRemove() {
        cachingStore.put(b, "2".getBytes());
        cachingStore.flush();


Why do we need this flush if we don't need the one below?

mjsax · 2019-02-12T01:36:22Z

+        assertEquals("a", cacheListener.forwarded.get(windowedKey).newValue);
+        assertNull(cacheListener.forwarded.get(windowedKey).oldValue);
+        cacheListener.forwarded.clear();
        cachingStore.put(bytesKey("1"), bytesValue("b"));


mjsax · 2019-02-12T01:36:39Z

        cachingStore.flush();
+        assertEquals("a", cacheListener.forwarded.get(windowedKey).newValue);
+        assertNull(cacheListener.forwarded.get(windowedKey).oldValue);
        cachingStore.put(bytesKey("1"), bytesValue("b"));


bbejeck

Thanks @guozhangwang overall looks good to me I just have one minor question and I'll take another pass after you update the PR as mentioned in https://github.com/apache/kafka/pull/6191/files#r255764702

bbejeck · 2019-02-12T17:04:18Z

+            // we can skip flushing to downstream as well as writing to underlying store
+            if (newValueBytes != null || oldValueBytes != null) {
+                final Windowed<K> windowedKey = WindowKeySchema.fromStoreKey(windowedKeyBytes, serdes.keyDeserializer(), serdes.topic());
+                final V newValue = newValueBytes != null ? serdes.valueFrom(newValueBytes) : null;


It seems that after setting final Windowed<K> key in the putAndMaybeForward method the code is more or less the same for all caching stores in question.

Just a thought, but would it be worth extracting the logic from the two similar methods and placing them in a method of WrappedStateStore.AbstractStateStore

I think AbstractStateStore is not a good place to share this logic, since it is not only for wrapping caching layers, but also for other layers as well.

I feel good about keeping these three functions separated so far, mainly because their callees (fetches) are still different.

guozhangwang · 2019-02-12T19:17:24Z

Updated per comments.

mjsax · 2019-02-13T01:34:25Z

Tests failed, because trunk was broken. Fixed via #6257

Retest this please. (If it fails again, this needs to be rebased.)

mjsax

LGTM. Feel free to merge after build is green.

…ching-session-remove

guozhangwang · 2019-02-13T06:01:51Z

Pushed another commit to fix checkstyle fixes (junit upgrade causing deprecated assertion APIs).

1. In the caching layer's flush listener call, we should always write to the underlying store, before flushing (see #4331 's point 4) for detailed explanation). When fixing 4331, it only touches on KV stores, but it turns out that we should fix for window and session store as well. 2. Also apply the optimization that was in session-store already: when the new value bytes and old value bytes are all null (this is possible e.g. if there is a put(K, V) followed by a remove(K) or put(K, null) and these two operations only hit the cache), upon flushing this mean the underlying store does not have this value at all and also no intermediate value has been sent to downstream as well. We can skip both putting a null to the underlying store as well as calling the flush listener sending `null -> null` in this case. Modifies corresponding unit tests. Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>

guozhangwang · 2019-02-13T06:37:42Z

Cherry-picked to 2.2 as well.

1. In the caching layer's flush listener call, we should always write to the underlying store, before flushing (see apache#4331 's point 4) for detailed explanation). When fixing 4331, it only touches on KV stores, but it turns out that we should fix for window and session store as well. 2. Also apply the optimization that was in session-store already: when the new value bytes and old value bytes are all null (this is possible e.g. if there is a put(K, V) followed by a remove(K) or put(K, null) and these two operations only hit the cache), upon flushing this mean the underlying store does not have this value at all and also no intermediate value has been sent to downstream as well. We can skip both putting a null to the underlying store as well as calling the flush listener sending `null -> null` in this case. Modifies corresponding unit tests. Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>

guozhangwang added 15 commits January 15, 2019 16:09

first try

2723846

add single point query

f08e0a5

Merge branch 'trunk' of https://github.com/apache/kafka into K7652-or…

152f017

…derbytes-upper-range

unit tests

c739655

checkstyle fixes

d0a12ea

Merge branch 'trunk' of https://github.com/apache/kafka into K7652-or…

bddeae0

…derbytes-upper-range

move to SessionStore

a51380b

one minor fix

5c6cab3

rebase from trunk

ea7b4f1

fix unit tests

862f79f

Merge branch 'trunk' of https://github.com/apache/kafka into K7652-or…

6f2ec56

…derbytes-upper-range

github comments

ec3ec15

Merge branch 'trunk' of https://github.com/apache/kafka into K7652-or…

3a25ef7

…derbytes-upper-range

unit tests

05fbb76

fix a final bug

4860d84

mjsax added the streams label Jan 27, 2019

guozhangwang added 2 commits January 30, 2019 17:38

rebase from trunk

acc26a6

augment unit tests

04d045a

guozhangwang commented Jan 31, 2019

View reviewed changes

vvcephei approved these changes Feb 1, 2019

View reviewed changes

guozhangwang mentioned this pull request Feb 8, 2019

KAFKA-3522: Add TimestampedKeyValueStore builder/runtime classes #6152

Merged

Merge branch 'trunk' of https://github.com/apache/kafka into K7652-ca…

5e01dd6

…ching-session-remove

mjsax reviewed Feb 10, 2019

View reviewed changes

guozhangwang added 2 commits February 11, 2019 12:50

Merge branch 'trunk' of https://github.com/apache/kafka into K7652-ca…

9d56dbb

…ching-session-remove

github comments

8932803

mjsax reviewed Feb 12, 2019

View reviewed changes

bbejeck reviewed Feb 12, 2019

View reviewed changes

github comments

dd8de3d

mjsax approved these changes Feb 13, 2019

View reviewed changes

guozhangwang added 2 commits February 12, 2019 21:46

Merge branch 'trunk' of https://github.com/apache/kafka into K7652-ca…

98d27b4

…ching-session-remove

fix unit tests

8fef77b

guozhangwang merged commit 0a1c269 into apache:trunk Feb 13, 2019

		@@ -176,10 +179,14 @@ public void shouldRemove() {
		cachingStore.put(b, "2".getBytes());
		cachingStore.flush();

Conversation

guozhangwang commented Jan 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

guozhangwang commented Jan 24, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang Jan 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang commented Jan 31, 2019

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang Feb 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjsax commented Feb 11, 2019

Uh oh!

guozhangwang commented Feb 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang commented Jan 24, 2019 •

edited

Loading

guozhangwang Jan 31, 2019 •

edited

Loading

guozhangwang Feb 12, 2019 •

edited

Loading