KAFKA-7652: Part III; Put to underlying before Flush#6191
KAFKA-7652: Part III; Put to underlying before Flush#6191guozhangwang merged 23 commits intoapache:trunkfrom
Conversation
…derbytes-upper-range
…derbytes-upper-range
…derbytes-upper-range
…derbytes-upper-range
| entry.entry().context().timestamp()); | ||
| } else { | ||
| if (flushListener != null) { | ||
| final byte[] newValueBytes = entry.newValue(); |
There was a problem hiding this comment.
This is an optimization that I did for all three caching stores:
- get the new bytes from cache, read the old bytes from underlying store.
- if either old / new bytes are not null, go to 3) below; otherwise skip so that we do not need to deserialize.
- deserialize to objects, and apply flush listener to downstream.
| final AGG newValue = newValueBytes != null ? serdes.valueFrom(newValueBytes) : null; | ||
| final AGG oldValue = sendOldValues && oldValueBytes != null ? serdes.valueFrom(oldValueBytes) : null; | ||
| // we need to get the old values if needed, and then put to store, and then flush | ||
| bytesStore.put(bytesKey, entry.newValue()); |
There was a problem hiding this comment.
This is another optimization I did for window / session stores: previously we deserialize bytes to windowed<K>, and then we serialize it back to windowed<Bytes> for underlying.put, which is a bit waste.
Now what I did is bytes -> windowed<Bytes> -> windowed<K> where the first deser is always executed, while the second deser is executed only if new / old bytes are not null. Note that second deser actually only does Bytes -> K and we just wrap with the same window.
|
|
||
| // this is an optimization: if this key did not exist in underlying store and also not in the cache, | ||
| // we can skip flushing to downstream as well as writing to underlying store | ||
| if (newValueBytes != null || oldValueBytes != null) { |
There was a problem hiding this comment.
The actual fix here: we need to 1) read the old bytes, and 2) put the new bytes to underlying, and 3) apply flush listener.
Previously the ordering was 1) -> 3) -> 2), which has an issue that if the flushed downstream access the store, it does not have the new data yet which is incorrect. This fix was merged into KVStore some time ago, but we did not do the same for window / session store here.
| kafkaStreams.start(); | ||
|
|
||
| waitUntilAtLeastNumRecordProcessed(outputTopic, 2); | ||
| waitUntilAtLeastNumRecordProcessed(outputTopic, 1); |
There was a problem hiding this comment.
This is the effect of the optimization: we will only sent down one record now, and the second null record will be omitted.
| final KTable<String, Integer> table1 = builder.table(topic1, consumed); | ||
|
|
||
| final KTable<String, Integer> table2 = table1.filter(predicate, Materialized.as("anyStoreNameFilter")); | ||
| final KTable<String, Integer> table2 = table1.filter(predicate, |
There was a problem hiding this comment.
This is needed since with the optimization, we will omit flush those unnecessary nulls.
There was a problem hiding this comment.
Why not update the expected test output? Or better, have a test for both cases?
There was a problem hiding this comment.
This should be covered in the store test case itself, not the KTableFilter here. I've added a separate test case for this purpose.
There was a problem hiding this comment.
Well. Guess the "problem" is, that it's unclear what this test is supposed to test. The method name is not very good.
It seems to be about the case, that the filter does the right thing based on the input data.
testKTable-- table store is not materialized thus everything is pass-throughtestQueryableKTable-- table store is materialized (we disable caching to preserve pass-through behavior)
similar below fortestNotSendingOldValue(not materialized)testQueryableNotSendingOldValue(preserve pass-through via disabling caching)
Does this sound correct?
There was a problem hiding this comment.
This sounds right to me. Let me rename the test cases a bit to be more clear.
| assertNull(store.get(3)); | ||
| assertEquals("four", store.get(4)); | ||
| assertEquals("five", store.get(5)); | ||
| store.flush(); |
There was a problem hiding this comment.
Ditto here due to the optimization.
There was a problem hiding this comment.
For this test case I've tried to do that, but there's some trickiness about that: 1) AbstractKVStoreTest is shared in five tests, and only one of them is caching store which is affected. 2) this effect is dependent on the caching size.
So I've decided to add a separate test in CachingKVStoreTest only for this optimization effectiveness.
There was a problem hiding this comment.
Again, testPutGetRange is not a good name -- maybe that is what confused me.
Also, if we do the flush() to be able to share code, maybe a comment about the why -- basically, a test should work for caching or non-caching (or explicitly state in the test name that is test the one or other behavior).
| store.setFlushListener(cacheFlushListener, true); | ||
| store.put(bytesKey("1"), bytesValue("a")); | ||
| store.flush(); | ||
| assertEquals("a", cacheFlushListener.forwarded.get("1").newValue); |
There was a problem hiding this comment.
The augmented unit tests here and below are for 1) demonstrating the optimization, 2) demonstrating the bug fix as well.
|
@bbejeck @vvcephei @ableegoldman @mjsax for reviews. |
vvcephei
left a comment
There was a problem hiding this comment.
LGTM @guozhangwang . Thanks!
I left one comment to maybe improve readability, and one question about an additional possible optimization (but maybe out of scope).
Thanks,
-John
|
|
||
| // this is an optimization: if this key did not exist in underlying store and also not in the cache, | ||
| // we can skip flushing to downstream as well as writing to underlying store | ||
| if (newValueBytes != null || oldValueBytes != null) { |
There was a problem hiding this comment.
Can we apply deMorgan's rule and flip the conditional (with an empty body) here?
It seems easier to understand:
if (newValueBytes == null && oldValueBytes == null) {
// no need to flush or write to underlying store
} else {
... the rest of the code
}
In other words, the empty "skip" block is more self-documenting than the preceeding comment explaining the algorithm.
Which actually makes me wonder: is there a more general optimization that we can skip desserialization, flush, and write any time the new and old values are identical?
There was a problem hiding this comment.
I think we cannot generally do this optimization if oldBytes == newBytes, since the timestamp may be updated for flushListener.apply.
There was a problem hiding this comment.
I think an empty then-body would be bad code style.
There was a problem hiding this comment.
Hey folks, I'm slightly modifying this logic to NOT always blindly read from the underlying store, but do sth. like this:
final byte[] newValueBytes = entry.newValue();
final byte[] oldValueBytes = newValueBytes == null || sendOldValues ? underlying.get(entry.key()) : null;
if (newValueBytes != null || oldValueBytes != null) {
....
}
The main motivation is that, for session stores, the likelihood of newValueBytes == null could be high while for other two types, the likelihood would be low. As a result, for other two types we would fail the above condition and if sendOldValues == false we would not need to read the old bytes for this optimization since it is doomed to not happen.
…ching-session-remove
|
|
||
| // this is an optimization: if this key did not exist in underlying store and also not in the cache, | ||
| // we can skip flushing to downstream as well as writing to underlying store | ||
| if (newValueBytes != null || oldValueBytes != null) { |
There was a problem hiding this comment.
I think an empty then-body would be bad code style.
| final KTable<String, Integer> table1 = builder.table(topic1, consumed); | ||
|
|
||
| final KTable<String, Integer> table2 = table1.filter(predicate, Materialized.as("anyStoreNameFilter")); | ||
| final KTable<String, Integer> table2 = table1.filter(predicate, |
There was a problem hiding this comment.
Why not update the expected test output? Or better, have a test for both cases?
| assertNull(store.get(3)); | ||
| assertEquals("four", store.get(4)); | ||
| assertEquals("five", store.get(5)); | ||
| store.flush(); |
| cachingStore.flush(); | ||
| cachingStore.remove(a); | ||
| cachingStore.flush(); | ||
| //cachingStore.flush(); |
…ching-session-remove
|
Failed with checkstyle error: |
Fixed it on my latest commit, did not push since I thought you have more comments to come. |
| } else { | ||
| if (flushListener != null) { | ||
| final byte[] newValueBytes = entry.newValue(); | ||
| final byte[] oldValueBytes = underlying.get(entry.key()); |
There was a problem hiding this comment.
Because we only flush() if newValueBytes != null || oldValueBytes != null, I think can actually do get get() only, if newValueBytes == null || sendOldValues is true. Same for the other stores.
Thoughts?
| if (newValueBytes != null || oldValueBytes != null) { | ||
| final K key = serdes.keyFrom(entry.key().get()); | ||
| final V newValue = newValueBytes != null ? serdes.valueFrom(newValueBytes) : null; | ||
| final V oldValue = sendOldValues && oldValueBytes != null ? serdes.valueFrom(oldValueBytes) : null; |
There was a problem hiding this comment.
Do we need to check sendOldValues here?
There was a problem hiding this comment.
Good point, since I've add the check above I can remove it here. Ditto elsewhere.
There was a problem hiding this comment.
@mjsax Actually, it is not correct: when sendOldValues is false, we should never send old values downstreams. So suppose newValueBytes != null, and hence we read the underlying store, we still need to have the check here so that we can have oldValue as null.
| final Windowed<Bytes> bytesKey = SessionKeySchema.from(binaryKey); | ||
| if (flushListener != null) { | ||
| final byte[] newValueBytes = entry.newValue(); | ||
| final byte[] oldValueBytes = bytesStore.fetchSession(bytesKey.key(), bytesKey.window().start(), bytesKey.window().end()); |
There was a problem hiding this comment.
As above. Only do fetchSession if newValueBytes == null || sendOldValues is true
| if (newValueBytes != null || oldValueBytes != null) { | ||
| final Windowed<K> key = SessionKeySchema.from(bytesKey, serdes.keyDeserializer(), topic); | ||
| final AGG newValue = newValueBytes != null ? serdes.valueFrom(newValueBytes) : null; | ||
| final AGG oldValue = sendOldValues && oldValueBytes != null ? serdes.valueFrom(oldValueBytes) : null; |
There was a problem hiding this comment.
Do we need to check sendOldValues here?
| } finally { | ||
| context.setRecordContext(current); | ||
| final byte[] newValueBytes = entry.newValue(); | ||
| final byte[] oldValueBytes = underlying.fetch(key, windowStartTimestamp); |
| assertNull(store.get(3)); | ||
| assertEquals("four", store.get(4)); | ||
| assertEquals("five", store.get(5)); | ||
| store.flush(); |
There was a problem hiding this comment.
Again, testPutGetRange is not a good name -- maybe that is what confused me.
Also, if we do the flush() to be able to share code, maybe a comment about the why -- basically, a test should work for caching or non-caching (or explicitly state in the test name that is test the one or other behavior).
| store.flush(); | ||
| assertEquals("a", cacheFlushListener.forwarded.get("1").newValue); | ||
| assertNull(cacheFlushListener.forwarded.get("1").oldValue); | ||
| store.put(bytesKey("1"), bytesValue("b")); |
There was a problem hiding this comment.
I think we should do a third put store.put(bytesKey("1"), bytesValue("c")); here and test if old value is a and new value is c to make sure we return the correct old value -- with 2 puts, it's unclear it it's correct. (If would be incorrect if oldValue would be b)
| @@ -176,10 +179,14 @@ public void shouldRemove() { | |||
| cachingStore.put(b, "2".getBytes()); | |||
| cachingStore.flush(); | |||
There was a problem hiding this comment.
Why do we need this flush if we don't need the one below?
| assertEquals("a", cacheListener.forwarded.get(windowedKey).newValue); | ||
| assertNull(cacheListener.forwarded.get(windowedKey).oldValue); | ||
| cacheListener.forwarded.clear(); | ||
| cachingStore.put(bytesKey("1"), bytesValue("b")); |
| cachingStore.flush(); | ||
| assertEquals("a", cacheListener.forwarded.get(windowedKey).newValue); | ||
| assertNull(cacheListener.forwarded.get(windowedKey).oldValue); | ||
| cachingStore.put(bytesKey("1"), bytesValue("b")); |
bbejeck
left a comment
There was a problem hiding this comment.
Thanks @guozhangwang overall looks good to me I just have one minor question and I'll take another pass after you update the PR as mentioned in https://github.com/apache/kafka/pull/6191/files#r255764702
| // we can skip flushing to downstream as well as writing to underlying store | ||
| if (newValueBytes != null || oldValueBytes != null) { | ||
| final Windowed<K> windowedKey = WindowKeySchema.fromStoreKey(windowedKeyBytes, serdes.keyDeserializer(), serdes.topic()); | ||
| final V newValue = newValueBytes != null ? serdes.valueFrom(newValueBytes) : null; |
There was a problem hiding this comment.
It seems that after setting final Windowed<K> key in the putAndMaybeForward method the code is more or less the same for all caching stores in question.
Just a thought, but would it be worth extracting the logic from the two similar methods and placing them in a method of WrappedStateStore.AbstractStateStore
There was a problem hiding this comment.
I think AbstractStateStore is not a good place to share this logic, since it is not only for wrapping caching layers, but also for other layers as well.
I feel good about keeping these three functions separated so far, mainly because their callees (fetches) are still different.
|
Updated per comments. |
|
Tests failed, because Retest this please. (If it fails again, this needs to be rebased.) |
mjsax
left a comment
There was a problem hiding this comment.
LGTM. Feel free to merge after build is green.
…ching-session-remove
|
Pushed another commit to fix checkstyle fixes (junit upgrade causing deprecated assertion APIs). |
1. In the caching layer's flush listener call, we should always write to the underlying store, before flushing (see #4331 's point 4) for detailed explanation). When fixing 4331, it only touches on KV stores, but it turns out that we should fix for window and session store as well. 2. Also apply the optimization that was in session-store already: when the new value bytes and old value bytes are all null (this is possible e.g. if there is a put(K, V) followed by a remove(K) or put(K, null) and these two operations only hit the cache), upon flushing this mean the underlying store does not have this value at all and also no intermediate value has been sent to downstream as well. We can skip both putting a null to the underlying store as well as calling the flush listener sending `null -> null` in this case. Modifies corresponding unit tests. Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
|
Cherry-picked to 2.2 as well. |
1. In the caching layer's flush listener call, we should always write to the underlying store, before flushing (see apache#4331 's point 4) for detailed explanation). When fixing 4331, it only touches on KV stores, but it turns out that we should fix for window and session store as well. 2. Also apply the optimization that was in session-store already: when the new value bytes and old value bytes are all null (this is possible e.g. if there is a put(K, V) followed by a remove(K) or put(K, null) and these two operations only hit the cache), upon flushing this mean the underlying store does not have this value at all and also no intermediate value has been sent to downstream as well. We can skip both putting a null to the underlying store as well as calling the flush listener sending `null -> null` in this case. Modifies corresponding unit tests. Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
In the caching layer's flush listener call, we should always write to the underlying store, before flushing (see MINOR: Improve Join integration test coverage, PART I #4331 's point 4) for detailed explanation). When fixing 4331, it only touches on KV stores, but it turns out that we should fix for window and session store as well.
Also apply the optimization that was in session-store already: when the new value bytes and old value bytes are all null (this is possible e.g. if there is a put(K, V) followed by a remove(K) or put(K, null) and these two operations only hit the cache), upon flushing this mean the underlying store does not have this value at all and also no intermediate value has been sent to downstream as well. We can skip both putting a null to the underlying store as well as calling the flush listener sending
null -> nullin this case.Modifies corresponding unit tests.
Committer Checklist (excluded from commit message)