KAFKA-3522: Add in-memory TimestampedKeyValueStore by mjsax · Pull Request #6151 · apache/kafka

mjsax · 2019-01-16T00:13:54Z

Part of KIP-258.

mjsax · 2019-01-16T00:14:10Z

Call for review @guozhangwang @bbejeck @vvcephei

mjsax · 2019-01-19T02:00:13Z

Using Arrays.copyRange here -- on other places, I uses System.arraycopy() -- other code uses ByteBuffer -- not sure which one is the best. Any insights?

Arrays.copyRange() uses System.arraycopy() after doing range check so I would use former for safety. Not sure about ByteBuffer. Can you provide example where it is used?

WindowKeySchema: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/WindowKeySchema.java#L111-L113

SessionKeySchema: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/SessionKeySchema.java#L143-L146

SegmentedCachFunction: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/SegmentedCacheFunction.java#L45

And some more...

Interesting! I was unaware of this method.

The range check is just making sure that the "end" index isn't after the "start" index. This is probably just to give a nice error message, since it would otherwise result in a negative "length" argument and throw a different runtime exception (java.lang.ArrayIndexOutOfBoundsException)

It seems like the real advantage is the compile-time type safety it gives, since there's nothing preventing you from using System.arrayCopy to copy from an int[] into a String[]. You'd get a runtime exception (java.lang.ArrayStoreException). Using Arrays.copyOfRange, you get a compile-time check that your types are right.

Plus, it creates the destination array for you, which is also handy.

Since Java seems to lack a structural array slice (I.e., get a new array that is actually a view on a backing array), you have to make a copy to slice the array, so the current method seems appropriate.

ByteBuffer actually does let you do a structural slice, but afaict, any mechanism that gets an array from a ByteBuffer either gives you the whole (unsliced) backing array or makes a copy any way. If the deserializer accepted a ByteBuffer, we could do these deserializations without making array copies, but as it is, it seems like the Deserializer interface is painting us into a corner by requiring arrays as input.

The reason (I think) that we use ByteBuffer sometimes is that its builder methods make it easy to pack and unpack data into and out of arrays. This can be an error prone process in general, so it's sometimes nice. But what you're doing is pretty straightforward, and as long as the serializer and deserializer are symmetrical, it's going to be fine.

What is your conclusion on what we should use?

I'd vote to go with Arrays.CopyOfRange since its extra check logic does not seem to incur a huge overhead, so although this is on the critical code path it may still not a big perf regression.

As discussed in person, we agreed on using ByteBuffer.

mjsax · 2019-01-19T02:00:47Z

Here I use System.arraycopy (cf. my comment above)

Seems fine to me. Just to illustrate the advantage of ByteBuffer, it would be:

ByteBuffer .allocate(rawTimestamp.length + rawValue.length) .put(rawTimestamp) .put(rawValue) .array()

or

ByteBuffer .wrap(new byte[rawTimestamp.length + rawValue.length]) .put(rawTimestamp) .put(rawValue) .array()

It's both more compact and less of a chance of messing up the ranges.

But it's also functionally equivalent.

I personally have no preference; just answering your question.

Ah. Here it is. There is no conclusion :)

vvcephei

Looks great overall, just a few comments.
Thanks @mjsax !

vvcephei · 2019-01-22T21:19:18Z

It looks like this was maybe de-privatized for use in InMemoryTimestampedKeyValueStore, but that class has its own inner iterator class.

Should we restore the private modifier?

Ack. Good catch.

vvcephei · 2019-01-22T21:24:59Z

Should this be private?

Ah. Actually this code should not be here -- it's the same as InMemroyKeyValueStore.InMemoryKeyValueIterator -- that's why private modifier was removed.

Will remove this nested class to avoid code duplication.

vvcephei · 2019-01-22T22:08:57Z

Interesting! I was unaware of this method.

The range check is just making sure that the "end" index isn't after the "start" index. This is probably just to give a nice error message, since it would otherwise result in a negative "length" argument and throw a different runtime exception (java.lang.ArrayIndexOutOfBoundsException)

It seems like the real advantage is the compile-time type safety it gives, since there's nothing preventing you from using System.arrayCopy to copy from an int[] into a String[]. You'd get a runtime exception (java.lang.ArrayStoreException). Using Arrays.copyOfRange, you get a compile-time check that your types are right.

Plus, it creates the destination array for you, which is also handy.

Since Java seems to lack a structural array slice (I.e., get a new array that is actually a view on a backing array), you have to make a copy to slice the array, so the current method seems appropriate.

ByteBuffer actually does let you do a structural slice, but afaict, any mechanism that gets an array from a ByteBuffer either gives you the whole (unsliced) backing array or makes a copy any way. If the deserializer accepted a ByteBuffer, we could do these deserializations without making array copies, but as it is, it seems like the Deserializer interface is painting us into a corner by requiring arrays as input.

The reason (I think) that we use ByteBuffer sometimes is that its builder methods make it easy to pack and unpack data into and out of arrays. This can be an error prone process in general, so it's sometimes nice. But what you're doing is pretty straightforward, and as long as the serializer and deserializer are symmetrical, it's going to be fine.

vvcephei · 2019-01-22T22:14:02Z

Seems fine to me. Just to illustrate the advantage of ByteBuffer, it would be:

ByteBuffer .allocate(rawTimestamp.length + rawValue.length) .put(rawTimestamp) .put(rawValue) .array()

or

ByteBuffer .wrap(new byte[rawTimestamp.length + rawValue.length]) .put(rawTimestamp) .put(rawValue) .array()

It's both more compact and less of a chance of messing up the ranges.

But it's also functionally equivalent.

I personally have no preference; just answering your question.

mjsax · 2019-01-23T00:22:24Z

Updated this.

ableegoldman · 2019-03-01T19:41:24Z

+import java.util.NavigableMap;
+import java.util.TreeMap;
+
+public class InMemoryTimestampedKeyValueStore<K, V> implements TimestampedKeyValueStore<K, V> {


Just curious, why are we implementing this as a separate class instead of wrapping the existing InMemoryKeyValueStore?

In the original KIP, we added new method to interface TimestampedKeyValueStore, eg:

public synchronized ValueAndTimestamp<V> putIfAbsent(final K key, final V value, final long timestamp)

Hence, this PR is a little out dated, and I agree that we don't need InMemoryTimestampedKeyValueStore any longer -- I'll cleanup and rebase the PR accordingly.

mjsax · 2019-03-06T00:06:09Z

Turns out, after rebasing there is nothing left on this PR. Closing it, because we don't need it any longer.

mjsax added the streams label Jan 16, 2019

This was referenced Jan 16, 2019

KAFKA-3522: Add TimestampedKeyValueStore builder/runtime classes #6152

Merged

KAFKA-3522: Add TimestampedWindowStore builder/runtime classes #6173

Merged

mjsax changed the title ~~KAFKA-3522: Add in-memory KeyValueWithTimestampStore~~ KAFKA-3522: Add in-memory TimestampedKeyValueStore Jan 19, 2019

mjsax commented Jan 19, 2019

View reviewed changes

vvcephei reviewed Jan 22, 2019

View reviewed changes

mjsax added 4 commits January 22, 2019 16:21

KAFKA-3522: Add in-memory KeyValueWithTimestampStore

d5a8411

Renamed interfaces/classes according to KIP

2cd92a5

Added Unit Tests

965f1e6

Github comments

4b4f0c0

mjsax force-pushed the kafka-3522-rocksdb-format-interfaces branch from a7a89c9 to 4b4f0c0 Compare January 23, 2019 00:22

mjsax mentioned this pull request Jan 23, 2019

KAFKA-3522: Add RocksDBTimestampedStore #6149

Merged

ableegoldman reviewed Mar 1, 2019

View reviewed changes

mjsax closed this Mar 6, 2019

mjsax deleted the kafka-3522-rocksdb-format-interfaces branch March 6, 2019 00:11

mjsax added the kip Requires or implements a KIP label Jun 12, 2020

Conversation

mjsax commented Jan 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjsax commented Jan 16, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjsax commented Jan 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjsax commented Mar 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mjsax commented Jan 16, 2019 •

edited

Loading