KAFKA-6936: Implicit Materialized for aggregates#5066
KAFKA-6936: Implicit Materialized for aggregates#5066guozhangwang merged 1 commit intoapache:trunkfrom
Conversation
|
Regarding the following ..
For the |
|
@debasishg How can you propagate the Serde if the resulting type of the |
|
@debasishg We are removing the deprecated APIs in Java that only takes a serde as parameters, so now aggregate operators only have two overloaded variant: with and without a @joan38 Currently there is no semantics difference between the two overloaded functions with and without the |
There was a problem hiding this comment.
Is it intentional to put them in a single line? Multi-lines are more human readable.
There was a problem hiding this comment.
No, this change has been introduced by my IDE and doesn't make sense. I will revert it.
c90332e to
1cd7718
Compare
guozhangwang
left a comment
There was a problem hiding this comment.
This is a meta comment: to make aggregate reduce and count API consistency, I'd still prefer either maintain two overloads for each, or one for each, instead of two for reduce and count and one for aggregate. Personally I'd prefer two for each as it will benefit in the future to optimize away physical materialization, but I'm not sure if we can disambiguate aggregate(..)(implicit Materialized) with aggregate(..), is it possible?
On the other we could consider making reduce and count also only having one overload function. E.g. in KGroupedTable:
def count(implicit materialized: Materialized[K, Long, ByteArrayKeyValueStore]): KTable[K, Long]
def reduce(adder: (V, V) => V)(subtractor: (V, V) => V)(implicit materialized: Materialized[K, V, ByteArrayKeyValueStore]): KTable[K, V]
Note for reduce I used the same pattern in the other PR trying to leverage on type inference, but again I'm not 100% sure if it would work.
@debasishg @joan38 wdyt?
|
@guozhangwang I agree in principle. I am on a vacation right now and don't have the infrastructure to code. It would be good if @joan38 could verify your concerns. |
|
@guozhangwang the first way of doing is what I tried, unfortunately it's not able to desambiguate the 2. I think the other way around unifying |
c95686a to
fa72082
Compare
|
Should we do |
df38a90 to
7f7be17
Compare
|
@guozhangwang @debasishg Any news on this? |
|
Sorry for the late reply. If we cannot desambiguate the two overloaded functions for I'd prefer |
|
Ok so the first point is done. I will revert back to Thanks |
|
Note the jenkins failures are relevant: |
|
@guozhangwang The errors are fixed now. This should be ready to go I guess? |
guozhangwang
left a comment
There was a problem hiding this comment.
One nit comment, otherwise LGTM.
There was a problem hiding this comment.
We should call count() here?
There was a problem hiding this comment.
Arf, missed this one. Thanks for picking it
|
Thanks @guozhangwang |
…ache#5066) In apache#4919 we propagate the SerDes for each of these aggregation operators. As @guozhangwang mentioned in that PR: ``` reduce: inherit the key and value serdes from the parent XXImpl class. count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes. aggregate: inherit the key serdes, do not set for value serdes internally. ``` Although it's all good for reduce and count, it is quiet unsafe to have aggregate without Materialized given. In fact I don't see why we would not give a Materialized for the aggregate since the result type will always be different (otherwise use reduce) and also the value Serde is simply not propagated. This has been discussed previously in a broader PR before but I believe for aggregate we could pass implicitly a Materialized the same way we pass a Joined, just to avoid the stupid case. Then if the user wants to specialize, he can give his own Materialized. Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
In #4919 we propagate the SerDes for each of these aggregation operators.
As @guozhangwang mentioned in that PR:
Although it's all good for
reduceandcount, it is quiet unsafe to haveaggregatewithoutMaterializedgiven. In fact I don't see why we would not give aMaterializedfor theaggregatesince the result type will always be different (otherwise usereduce) and also the value Serde is simply not propagated.This has been discussed previously in a broader PR before but I believe for
aggregatewe could pass implicitly aMaterializedthe same way we pass aJoined, just to avoid the stupid case. Then if the user wants to specialize, he can give his ownMaterialized.@guozhangwang @debasishg @seglo Let me know your thoughts.
Committer Checklist (excluded from commit message)