Skip to content

Kafka 3522 [WIP DO NOT MERGE]#6192

Closed
mjsax wants to merge 1 commit intoapache:trunkfrom
mjsax:kafka-3522-rocksdb-format-column-family-reduced-public-api
Closed

Kafka 3522 [WIP DO NOT MERGE]#6192
mjsax wants to merge 1 commit intoapache:trunkfrom
mjsax:kafka-3522-rocksdb-format-column-family-reduced-public-api

Conversation

@mjsax
Copy link
Copy Markdown
Member

@mjsax mjsax commented Jan 24, 2019

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

};
}

public static class WindowStoreFacade<A, B> implements WindowStore<A, B>, RecordConverter {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ:

  1. Does this class need to be a public static one?
  2. Can it extend form ReadOnlyWindowStoreFacade?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the end we will find these classes a better place (currently they are a bit scattered, e.g. KeyValueStoreFacade is in the materializer class).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess those -- this PR is a POC and hacked together. I would never recommend to merge it... There are many issue like this.

public WindowStoreFacade(final WindowStore<A, ValueAndTimestamp<B>> store) {
this.inner = store;
final StateStore rootStore = store instanceof WrappedStateStore ? ((WrappedStateStore) store).inner() : store;
innerRecordConvert = rootStore instanceof RecordConverter ? (RecordConverter) rootStore : new DefaultRecordConverter();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me thinking about WrappedStateStore in which we do not check inner but blindly assumes it always implement this interface:

        public ConsumerRecord<byte[], byte[]> convert(final ConsumerRecord<byte[], byte[]> record) {
            return ((RecordConverter) innerState).convert(record);
        }

Is that safe? Could it be a non-caching, non-logging store (i.e. only one layer of wrapper on metered, and the inner is already the inner-most underlying store which is customized store not implementing the interface)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be the case -- this part of the code is not extensively tested yet. I was aware of it already on the old "big PR" and planned to address it, on the smaller PR when we actually add upgrade tests ect.

metrics = (StreamsMetricsImpl) context.metrics();

otherWindow = (WindowStore<K, V2>) context.getStateStore(otherWindowName);
StateStore store = context.getStateStore(otherWindowName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. we are pealing off two layers here. The second layer is window-store-facade, what's the first layer of WrappedStateStore?

Ditto elsewhere.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some stores, it sometimes one layer, sometimes multiple. Thus, this is a guard to get the right store for any case -- I use the same code for all DSL processor for simplicity in this PR. Might not be necessary for all processors.

final ValueAndTimestamp<V> parentValueAndTimestamp = parentGetter.get(key);
final V parentValue = parentValueAndTimestamp == null ? null : parentValueAndTimestamp.value();
final V1 resultValue = computeValue(key, parentValue);
return resultValue == null ? null : ValueAndTimestamp.make(resultValue, parentValueAndTimestamp.timestamp()); // this is a potential NPE -- if `parentValueAndTimestamp == null` we could also directly return `null`, but this would be a semantic change
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we return -1 if the timestamp is null like we did in other operators?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was for the old code. Not sure if it still applies.

however, the concern was to return ValueAndTimestamp(null, -1) instead of null -- I think, the former would be incorrect.

R newValue = null;
final V1 value1 = valueGetter1.get(key);
final V2 value2 = valueGetter2.get(key);
long resultTimestamp = -1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd suggest use a constant instead of hard-coded -1: we can reuse RecordQueue.UNKNOWN e.g.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a POC :)

resultTimestamp = oldAggWithTimestamp.timestamp();
} else {
oldAgg = null;
resultTimestamp = context().timestamp();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places we use -1 whereas here we use context.timestamp. Could you elaborated a bit on the principles applied here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use context.timestamp() because resultTimestamp = Math.max(resultTimestamp, context().timestamp()); with resultTimestamp==-1 is the exact some thing. Compare the lines below: if we init a new value, the result timestamp is the timestamp of the current input record (no need to use Max.max(-1, context().timestamp())` to compute it.

* @return StoreBuilder
*/
public StoreBuilder<KeyValueStore<K, V>> materialize() {
KeyValueBytesStoreSupplier supplier = (KeyValueBytesStoreSupplier) materialized.storeSupplier();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit suspicious to me: users could also call Stores.persistentKeyValueWithTimestampStore and passed in the returned object o Materialized, in which case V is already a ValueAndTimestamp<V1> right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In fact, we expect the supplier to return a store that can handle ValueAndTimestamp.

However, we need the return type of the supplier to be a byte store -- and thus, we cannot distinguish between non-timestamp-byte-stores and timestamped-byte-store easily. However, we cover this case at an outer level and wrap a plain-key-value store with a proxy if a RecordConverter is not implemented (was is only an indicator)

In the end, because we want all most-inner stores to be plain-byte stores, we can't distinguish them -- however, if we exclude the upgrade of an existing store into a new store, both work work for both cases because both only take bytes, and only use the key for lookups -- basically both stores are agnostic is the value bytes are the one or the other.

@mjsax mjsax added the streams label Mar 7, 2019
@mjsax mjsax closed this Mar 7, 2019
@mjsax mjsax deleted the kafka-3522-rocksdb-format-column-family-reduced-public-api branch March 9, 2019 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants