KAFKA-7916: Unify store wrapping code for clarity#6255
KAFKA-7916: Unify store wrapping code for clarity#6255guozhangwang merged 13 commits intoapache:trunkfrom vvcephei:wrapper-unification
Conversation
vvcephei
left a comment
There was a problem hiding this comment.
Hey @guozhangwang @ableegoldman @bbejeck @mjsax ,
Please take a look at this. It's meant to be the first in a series of rationalizations of the internal state store handling logic.
There was a problem hiding this comment.
Move reasoning about whether a store is new or old binary format inside of the store hierarchy, to preserve encapsulation. Ideally, isTimestamped would be an instance method, but we don't need to add a method to the public StateStore interface. A static method is good enough.
There was a problem hiding this comment.
These no-longer-necessary methods demonstrate the benefit of adding the generic parameter to WrappedStateStore. Previously, wrappedStore could only return a StateStore, but now it directly returns a T extends StateStore, with no need for casting anywhere.
There was a problem hiding this comment.
You'll note in all these methods that we need to say super.*** to explicitly invoke the operation on the wrapped store when it's a method that the subclass explicitly overrides.
There was a problem hiding this comment.
This is the other way that the generic parameter on WrappedStateStore manifests... Now that wrappedStore() returns a properly typed store, we no longer need to store a redundant reference to the wrapped store just to preserve the type information.
There was a problem hiding this comment.
Cache the functions so we don't have to lean on the compiler's decision to cache it and avoid creating a new function every invocation.
There was a problem hiding this comment.
Renamed the function to be a bit more explicit. It's internal, so the name shouldn't affect the public API at all.
There was a problem hiding this comment.
nit: rawValueToTimestampedValueRecordConvert() -- without RecordConverter suffix, it's unclear what it actually returns, a RecordConverter
There was a problem hiding this comment.
Are we doing Hungarian Notation now? https://en.wikipedia.org/wiki/Hungarian_notation ;)
Some might argue that it's unnecessary to encode the type in a name in a strongly (and explicitly) typed language like Java.
If you feel strongly about it, I'm fine with the change.
There was a problem hiding this comment.
For regular types I agree, however, here, we actually return a "function". I don't feel strong about it -- it's just personal taste I guess. If I read RecordConverter.rawValueToTimestampedValue() I would expect that rawValueToTimestampedValue() is a method that takes rawValue as input and returns a timestampedValue (as the method name indicates) -- but this is of course not the case (there is not even an input parameter) -- thus, it's confusing to me to read the code.
There was a problem hiding this comment.
Also added a cached identity function. IMHO, it's just a little more intentional than saying entity -> entity inline.
There was a problem hiding this comment.
nit: identityRecordConverter or noOpRecordConverter ?
There was a problem hiding this comment.
Moving this in here because it requires specific knowledge of the concrete store types. Especially checking and recursively unwrapping wrapped stores is a potentially brittle and dangerout operation that shouldn't be done scattered throughout the codebase.
Specifically, as you recurse down the wrapper references, you lose all knowledge of the wrapped type. We avoid a dangerous cast by answering a specific question (is there a timestamped byte store in my wrapped stack of stores?) and returning a boolean, rather than a reference to an unwrapped and casted store.
There was a problem hiding this comment.
There was no advantage to separating the interface and abstract class after adding the generic parameter, so I've collapsed them into one abstract class.
Previously, the interface was nice because some implementations could declare that they were wrapped without being forced to store duplicate references to the inner store.
There was a problem hiding this comment.
collapsed them into one abstract class
Class is not declared as abstract -- is this intentionally or did it slip? IMHO, it should be abstract (should the name indicate this, too? -> AbstractedWrappedStateStore?)
There was a problem hiding this comment.
I can make it abstract; I don't think there's any advantage to just wrapping a state store, except for testing.
I don't think we need to change the name of the class, though...
There was a problem hiding this comment.
This was only used for testing, and it's pretty awkward, since you don't know what type the inner store of the inner store is, except StateStore.
Rather than keeping it in the interface, I've just inlined it into the two tests that needed it.
There was a problem hiding this comment.
Not sure about this -- why not add two generics to the store, one for "wrapped" and one for "root" and keep this method that return the root type?
I would also rename inner() -> root()
There was a problem hiding this comment.
How would we know that there's any specific number of inner types?
This particular method assumes that you're only trying to get the "grandparent" store, but just because that's what a specific test needs to do. Why should a specific test's need be encoded in a method on all stores in runtime code?
There was a problem hiding this comment.
How would we know that there's any specific number of inner types?
No need to know all types for multiple nesting -- only the first and most inner.
I would recommend to change the method to return the most inner store (if, completely unnest instead of only one level.
Why should a specific test's need be encoded in a method on all stores in runtime code?
Runtime code will need this too IIRC (we had POC PRs for timestamped stores that did use unnesting in the DSL in Processor#init() calls). If it's only for tests, I agree that removing makes sense.
|
retest this please |
guozhangwang
left a comment
There was a problem hiding this comment.
also cc @ableegoldman
There was a problem hiding this comment.
This is a meta thought orthogonal to this PR, but I'd leave it here and then create a JIRA ticket later: The StateStore interface is really designed to be "implemented by users, and called by library" -- maybe one can argue that users may want to explicitly call flush as well, but I'd say it is okay to NOT allow users do so. And the extended interface, like KeyValueStore are designed to be "implemented by library or user-customizable, and called by user".
Given this thought, we should probably consider refactoring the hierarchy of state stores more as:
interface StateStoreinterface ReadOnlyKeyValueStoreinterface KeyValueStateStore extends StateStore, ReadOnlyKeyValueStore(note the name difference)interface KeyValueStore extends ReadOnlyKeyValueStore.
And 3) would be the one used in Stores factory, i.e. required for either built-in or user-customization, while 4) would be the one returned from ProcessorContext.getStateStore. The returned object would, in fact, always be extending StateStore as well but only library will care to cast and call its inner functions.
I'm not sure if there's a better way to do this in order to get rid of all such decorators, @vvcephei wdyt?
There was a problem hiding this comment.
While I completely agree (all the "decorators" are only necessary because we got the API wrong, leaking "internal" API like flush(), close() etc into PAPI).
However, I this possible to refactor without breaking the public API?
There was a problem hiding this comment.
I agree 100%, and would very much like to see something like this. It would also help us to avoid leaking other administrative methods if we add them (besides just init, flush, close).
I think cleanest the way to do it backwards compatibly is to introduce a new store hierarchy and deprecate the old one.
There was a problem hiding this comment.
I think cleanest the way to do it backwards compatibly is to introduce a new store hierarchy and deprecate the old one.
I am afraid that might be the only way to do it -- deprecation hell -- users won't be happy -- thus we should think hard if it's worth to do or not...
There was a problem hiding this comment.
Cool, I will create a JIRA ticket for now to keep track of it.
There was a problem hiding this comment.
Not introduced in this PR but: why do we still need to override these two functions? Should the ones of WrappedStore be sufficient?
There was a problem hiding this comment.
It was an oversight. I'll remove the pass-though overrides.
There was a problem hiding this comment.
Since super.init is innerState.init(context, root); could we just call that to be consistent with other calls, or are you intentionally call this in case we change the implementation of wrappedstore.init in the future?
There was a problem hiding this comment.
yes, it would be incorrect design to call wrappedStore().init directly. The job of WrappedStateStore.init is to "do the right thing". If we go and look at what it does today, and just do that thing, it'll work only until the superclass's implementation changes.
Another way to look at it is that we are "intercepting" the init call, to add some extra stuff before and after the init. This doesn't mean we get to interrupt the call hierarchy and circumvent our parent class, nor do we have any reason to do so.
Note that "the other calls" you refer to are not StateStore calls, they are WindowStore calls. And our superclass doesn't implement WindowStore, so it's not possible to call (eg) super.put(key, value, timestamp).
guozhangwang
left a comment
There was a problem hiding this comment.
Made a pass over the PR, lgtm!
I only have some questions for my own understanding; after that I can merge this as-is.
There was a problem hiding this comment.
This is the same as GlobalStateManagerImpl.java -- maybe add method AbstractStateManager#recordConverter(store) to unify code?
There was a problem hiding this comment.
nit: rawValueToTimestampedValueRecordConvert() -- without RecordConverter suffix, it's unclear what it actually returns, a RecordConverter
There was a problem hiding this comment.
nit: identityRecordConverter or noOpRecordConverter ?
There was a problem hiding this comment.
collapsed them into one abstract class
Class is not declared as abstract -- is this intentionally or did it slip? IMHO, it should be abstract (should the name indicate this, too? -> AbstractedWrappedStateStore?)
There was a problem hiding this comment.
I think, we should first check for this condition, because we should only check the most inner store -- if an wrapping store would (be mistake) implement TimestampedBytesStore, we would return true even if the most inner store does not -- this would be incorrect.
There was a problem hiding this comment.
Interesting thought.
Let's say there's a wrapper that advertises that it's a TimestampedBytesStore. This means it is telling Streams that it expects to receive serialized data in the new (with-timestamp) format.
Why does it do this? Maybe it's a translation layer for other stores? In which case, is it correct for Streams to second-guess the implementation and break its own contract by ignoring the marker interface and delivering non-timestamped binary data?
Or maybe, as you suggest, it's just a bug. Is it best for Streams to try and work around a future bug in a state store implementation by reasoning about relative roles of wrapper stores and bytes stores?
I'm not sure... It sounds like it would be more future-proof to just take the store's interface at face value.
There was a problem hiding this comment.
Why does it do this? Maybe it's a translation layer for other stores? In which case, is it correct for Streams to second-guess the implementation and break its own contract by ignoring the marker interface and delivering non-timestamped binary data?
I don't think that would work. Note, on restore, we always get the most inner store and would not call this "translation layer wrapper store" (and thus it would break as we would insert our converter and hand timestamped-bytes to the store that does not understand them). If one want to implement a translation wrapper like this, she need to "hide" it from Kafka Streams and not implement WrappingStore (ie, the translation wrapper must be the most inner store).
There was a problem hiding this comment.
Huh... How do we get the most inner store on restore?
There was a problem hiding this comment.
WrappedStore should have a method root() (or similar name) that returns the most inner store.
There was a problem hiding this comment.
As far as I can tell, this method would actually never be needed in the existing code base.
I agree with the naming of it (and the location in the code), but can we defer adding it until it's needed?
To address the earlier question, I believe that the state restoration logic just calls whatever restore listener has been registered. It turns out that the root store is the one to register the listener, but this doesn't require actually unwrapping anything (since it is done within context).
There was a problem hiding this comment.
This is ugly -- we should have access to the wrapped store directly via inheritance. (also below)
There was a problem hiding this comment.
This is ugly -- we should have access to the wrapped store directly via inheritance. (also below)
There was a problem hiding this comment.
This is ugly -- we should have access to the wrapped store directly via inheritance. (also below)
There was a problem hiding this comment.
While I completely agree (all the "decorators" are only necessary because we got the API wrong, leaking "internal" API like flush(), close() etc into PAPI).
However, I this possible to refactor without breaking the public API?
|
I've addressed all the comments and rebased to resolve a merge conflict. |
|
rebased again to resolve merge conflicts |
|
also responded to the open questions |
|
Java 8 failure was unrelated: Java 11 passed: Retest this, please. |
|
@guozhangwang I am ok with the PR. Feel free to merge. |
|
Java 8 build failed (unrelated): Not sure if there's any point continuing to repeat the tests, hoping that the flaky core tests will pass... |
|
Thanks for the reviews, @mjsax and @guozhangwang ! |
Refactor internal store wrapping for improved maintainability. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
|
Cherry-picked to 2.2. |
* ak/trunk: (45 commits) KAFKA-7487: DumpLogSegments misreports offset mismatches (apache#5756) MINOR: improve JavaDocs about auto-repartitioning in Streams DSL (apache#6269) KAFKA-7935: UNSUPPORTED_COMPRESSION_TYPE if ReplicaManager.getLogConfig returns None (apache#6274) KAFKA-7895: Fix stream-time reckoning for suppress (apache#6278) KAFKA-6569: Move OffsetIndex/TimeIndex logger to companion object (apache#4586) MINOR: add log indicating the suppression time (apache#6260) MINOR: Make info logs for KafkaConsumer a bit more verbose (apache#6279) KAFKA-7758: Reuse KGroupedStream/KGroupedTable with named repartition topics (apache#6265) KAFKA-7884; Docs for message.format.version should display valid values (apache#6209) MINOR: Save failed test output to build output directory MINOR: add test for StreamsSmokeTestDriver (apache#6231) MINOR: Fix bugs identified by compiler warnings (apache#6258) KAFKA-6474: Rewrite tests to use new public TopologyTestDriver [part 4] (apache#5433) MINOR: fix bypasses in ChangeLogging stores (apache#6266) MINOR: Make MockClient#poll() more thread-safe (apache#5942) MINOR: drop dbAccessor reference on close (apache#6254) KAFKA-7811: Avoid unnecessary lock acquire when KafkaConsumer commits offsets (apache#6119) KAFKA-7916: Unify store wrapping code for clarity (apache#6255) MINOR: Add missing Alter Operation to Topic supported operations list in AclCommand KAFKA-7921: log at error level for missing source topic (apache#6262) ...
Refactor internal store wrapping for improved maintainability. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Refactor internal store wrapping for improved maintainability.
Committer Checklist (excluded from commit message)