KAFKA-20173: Ensure Metered kv-stores pass headers correctly#21768
KAFKA-20173: Ensure Metered kv-stores pass headers correctly#21768mjsax merged 1 commit intoapache:trunkfrom
Conversation
| protected Sensor getSensor; | ||
| protected Sensor deleteSensor; | ||
| private Sensor putAllSensor; | ||
| protected Sensor putAllSensor; |
There was a problem hiding this comment.
Need access in MeteredKeyValueTimestampStoreWithHeaders
| if (rawResult.isSuccess()) { | ||
| final KeyValueIterator<Bytes, byte[]> iterator = rawResult.getResult(); | ||
| final KeyValueIterator<K, V> resultIterator = new MeteredKeyValueTimestampedIterator( | ||
| final KeyValueIterator<K, V> resultIterator = new MeteredKeyValueStoreIterator( |
There was a problem hiding this comment.
Side cleanup: I think there is no reason to have MeteredKeyValueTimestampedIterator (removing blow), and we can just use existing MeteredKeyValueStoreIterator. -- We don't have the case to bridge between different value types in MeteredKeyValueStore.
| Objects.requireNonNull(prefix, "prefix cannot be null"); | ||
| Objects.requireNonNull(prefixKeySerializer, "prefixKeySerializer cannot be null"); | ||
| return new MeteredKeyValueIterator(wrapped().prefixScan(prefix, prefixKeySerializer), prefixScanSensor); | ||
| return new MeteredKeyValueStoreIterator(wrapped().prefixScan(prefix, prefixKeySerializer), prefixScanSensor); |
There was a problem hiding this comment.
Just renaming, to align to established naming patterns across the different kv-metered stores.
| public ValueTimestampHeaders<V> get(final K key) { | ||
| Objects.requireNonNull(key, "key cannot be null"); | ||
| try { | ||
| return maybeMeasureLatency(() -> deserializeValue(wrapped().get(serializeKey(key, new RecordHeaders()))), time, getSensor); |
There was a problem hiding this comment.
We need to overwrite get() to be able to pass in headers into serializeKey(...) -- For get() it might not be strictly necessary, but if forces us to make a conscious decision what to pass -- it's a side-effect of disallowing to call serializeKey w/o headers.
There was a problem hiding this comment.
Wondering if we would want to pass context.headers() instead of new RecordHeaders(), similar to what we did in #21684
If yes, applies elsewhere in this class, too.
There was a problem hiding this comment.
I think the same argument applies here: serdes should have access to record context.
There was a problem hiding this comment.
+1 to context.headers()
There was a problem hiding this comment.
I think context.headers() is better, but still have a question though
If we do it this way, effectively this is the same as
Can we just reuse it?
There was a problem hiding this comment.
SG -- seems it only affect get() -- so yes, it's the same as serializeKey(key) but we make a conscious decision about it, what it desired.
I'll update it for delete can in the PR, too for now, but after #21639 is merged this will change (of if Frank's PR is merged first, it will go away after rebasing).
| try { | ||
| final Headers headers = value != null ? value.headers() : new RecordHeaders(); | ||
| maybeMeasureLatency(() -> wrapped().put(keyBytes(key, headers), serdes.rawValue(value, headers)), time, putSensor); | ||
| maybeMeasureLatency(() -> wrapped().put(serializeKey(key, headers), serdes.rawValue(value, headers)), time, putSensor); |
There was a problem hiding this comment.
Unifying the existing keyBytes method with the naming of MeteredKeyValueStore which uses serializeKey
| if (rawResult.isSuccess()) { | ||
| final KeyValueIterator<Bytes, byte[]> iterator = rawResult.getResult(); | ||
| final KeyValueIterator<K, V> resultIterator = new MeteredTimestampedKeyValueStoreWithHeadersIterator( | ||
| final KeyValueIterator<K, V> resultIterator = new MeteredTimestampedKeyValueStoreWithHeadersQueryIterator( |
There was a problem hiding this comment.
Just renaming to aling naming patterns.
| Objects.requireNonNull(prefix, "prefix cannot be null"); | ||
| Objects.requireNonNull(prefixKeySerializer, "prefixKeySerializer cannot be null"); | ||
| return new MeteredValueTimestampHeadersIterator( | ||
| return new MeteredTimestampedKeyValueStoreWithHeadersIterator( |
There was a problem hiding this comment.
Just renaming to align naming patterns
|
|
||
| final KeyValue<Bytes, byte[]> keyValue = iter.next(); | ||
| final ValueTimestampHeaders<V> valueTimestampHeaders = valueTimestampHeadersDeserializer.apply(keyValue.value); | ||
| final Headers headers = valueTimestampHeaders != null ? valueTimestampHeaders.headers() : new RecordHeaders(); |
There was a problem hiding this comment.
This can never be null. We know that iter.next cannot return null (this would be a "non existing row" -- null-values are deletes).
Thus simplifying the code (we don't have any null checks for this case in older code for plain- and ts-store either.
Similar below
| protected Bytes keyBytes(final K key, final Headers headers) { | ||
| @Override | ||
| protected Bytes serializeKey(final K key) { | ||
| throw new UnsupportedOperationException("MeteredTimestampedKeyValueStoreWithHeaders required to pass in Headers when serializing a key."); |
There was a problem hiding this comment.
This was actually surfacing the missing overrides, highlighting the bug to not override delete. So a good guard to have in place (little bit annoying side effect that it forces us to override put/putAll, too, but I think it's worth the prices to pay)
There was a problem hiding this comment.
May be we don't even need this. See my comment https://github.com/apache/kafka/pull/21768/changes#r2941312324
There was a problem hiding this comment.
The goal is, to force us to make conscious decision of each serializeKey call, because every case is different. So we cannot accidentally call the non-header variant, and introduce a bug w/o noticing it.
There was a problem hiding this comment.
Yeah, I see now that in many places we propagate headers from ValueTimestampHeaders not from internal context (I don't know how I missed this 🤦♂️). In this case that makes sense to go with this guard to enforce dev to think about what to propagate
|
|
||
| @Override | ||
| protected K deserializeKey(final byte[] rawKey) { | ||
| throw new UnsupportedOperationException("MeteredTimestampedKeyValueStoreWithHeaders required to pass in Headers when deserializing a key."); |
There was a problem hiding this comment.
Similar guard -- this did not surface any issues.
| (KeyValueIterator<K, ValueAndTimestamp<V>>) new MeteredTimestampedKeyValueStoreWithHeadersIterator( | ||
| (KeyValueIterator<K, ValueAndTimestamp<V>>) new MeteredTimestampedKeyValueStoreWithHeadersQueryIterator( |
There was a problem hiding this comment.
Shouldn't it be a KeyValueIterator<K, ValueTimestampHeaders<V>> resultIterator instead of KeyValueIterator<K, ValueAndTimestamp<V>> ?
Why do we return just ValueAndTimestamp<V> as a query result?
UPD: I see that this is what iterator returns: https://github.com/apache/kafka/pull/21768/changes#diff-43268e867244a98e6614710d9f2a3be2c519345e1f81204b91414ee1db77c754R464 , but still why?
There was a problem hiding this comment.
It's because of backward compatibility in the DSL. There is existing code, which uses IQ to access the state stores, and it expects to get back ValueAndTimestamp type. We change all the DSL code to use metered/caching/changelogging-header-stores and only keep the inner-most byte-store the default ts-store.
Thus, if users upgrade to 4.3, nothing should change for them, and thus we cannot just start to return ValueTimestampHeaders type -- this would break app even if they do not enable header stores in the DSL.
Does this make sense? -- It's known feature gap that IQ will not allow to query header with AK 4.3, and it's something we need a followup KIP to add.
There was a problem hiding this comment.
Makes sense. Thanks for the clarification.
| public ValueTimestampHeaders<V> get(final K key) { | ||
| Objects.requireNonNull(key, "key cannot be null"); | ||
| try { | ||
| return maybeMeasureLatency(() -> deserializeValue(wrapped().get(serializeKey(key, new RecordHeaders()))), time, getSensor); |
There was a problem hiding this comment.
I think context.headers() is better, but still have a question though
If we do it this way, effectively this is the same as
Can we just reuse it?
| protected Bytes keyBytes(final K key, final Headers headers) { | ||
| @Override | ||
| protected Bytes serializeKey(final K key) { | ||
| throw new UnsupportedOperationException("MeteredTimestampedKeyValueStoreWithHeaders required to pass in Headers when serializing a key."); |
There was a problem hiding this comment.
May be we don't even need this. See my comment https://github.com/apache/kafka/pull/21768/changes#r2941312324
| final List<KeyValue<Bytes, byte[]>> byteEntries = new ArrayList<>(); | ||
| for (final KeyValue<K, ValueTimestampHeaders<V>> entry : from) { | ||
| final Headers headers = entry.value != null ? entry.value.headers() : new RecordHeaders(); | ||
| byteEntries.add(KeyValue.pair(serializeKey(entry.key, headers), serializeValue(entry.value))); |
There was a problem hiding this comment.
This is different from put -> wrapped().put(serializeKey(key, headers), serdes.rawValue(value, headers))
But here it does serializeValue(entry.value) shouldn't this use the same as put?
There was a problem hiding this comment.
It's doesn't really matter, because for the header-case the passed in headers are ignored -- but yes, there is some code inconsistency... Let me unify it.
| public ValueTimestampHeaders<V> delete(final K key) { | ||
| Objects.requireNonNull(key, "key cannot be null"); | ||
| try { | ||
| return maybeMeasureLatency(() -> deserializeValue(wrapped().delete(serializeKey(key, new RecordHeaders()))), time, deleteSensor); |
There was a problem hiding this comment.
I think we need to update new RecordHeaders() to context.headers() here and below based on comments up to this point.
There was a problem hiding this comment.
Yes, but it will be mute with Franks fix anyway: #21639
|
|
||
| protected K deserializeKey(final byte[] rawKey) { | ||
| return rawKey != null ? serdes.keyFrom(rawKey, internalContext.headers()) : null; | ||
| return serdes.keyFrom(rawKey, internalContext.headers()); |
There was a problem hiding this comment.
The key can actually never be null -- this would be bug that we should not mask by a nul-check but surface directly
Ensures that all Metered KV-stores (plain, ts, headers, version) pass headers into de/serializers.
7831aa0 to
40439ac
Compare
…21768) Ensures that all Metered KV-stores (plain, ts, headers, version) pass headers into de/serializers. Reviewers: Alieh Saeedi <asaeedi@confluent.io>, TengYao Chi <frankvicky@apache.org>, Uladzislau Blok, <blokv75@gmail.com>, Bill Bejeck <bill@confluent.io>
…21768) Ensures that all Metered KV-stores (plain, ts, headers, version) pass headers into de/serializers. Reviewers: Alieh Saeedi <asaeedi@confluent.io>, TengYao Chi <frankvicky@apache.org>, Uladzislau Blok, <blokv75@gmail.com>, Bill Bejeck <bill@confluent.io>
Ensures that all Metered KV-stores (plain, ts, headers, version) pass
headers into de/serializers.
Reviewers: Alieh Saeedi asaeedi@confluent.io, TengYao Chi
frankvicky@apache.org, Uladzislau Blok, blokv75@gmail.com, Bill
Bejeck bill@confluent.io