KAFKA-7492 : Updated javadocs for aggregate and reduce methods returning null behavior.#6285
KAFKA-7492 : Updated javadocs for aggregate and reduce methods returning null behavior.#6285bbejeck merged 2 commits intoapache:trunkfrom
Conversation
…ing null behavior.
|
@mjsax - Have take a stab updating the documentation. Kindly review. |
mjsax
left a comment
There was a problem hiding this comment.
Thanks for the PR @asutosh936.
Call for second review @guozhangwang @bbejeck @vvcephei @ableegoldman
| * @param reducer a {@link Reducer} that computes a new aggregate result. Cannot be {@code null}. | ||
| * @return a {@link KTable} that contains "update" records with unmodified keys, and values that represent the | ||
| * latest (rolling) aggregate for each key | ||
| * latest (rolling) aggregate for each key. Incase {@link KTable} returns null the following message will be |
There was a problem hiding this comment.
typo: In case
nit: null -> {@code null}
Similar comments below.
What do you mean by KTable returns null ? Do you mean Reducer#apply(..) returns null?
There was a problem hiding this comment.
How about this: If the reduce function returns null, it is then interpreted as deletion for the key, and future messages of the same key coming from upstream operators will be handled as newly initialized value.
There was a problem hiding this comment.
Should we mention the "problem" with out-of-order data for this case? Should we ever recommend to not return null ?
We had a discussion at some point to actually disallow returning null because a "delete" is not a valid aggregation result.
There was a problem hiding this comment.
It's a nuanced topic. It would be good to clarify if we can find a good way to say it.
The operator itself never processes data out of order. It always respects the order defined by the topic. But upstream repartitions don't provide any guarantees about the order they write to the topic. This is how data can logically become out of order, even if users take care to populate their input topics strictly ordered, which might in turn violate some expectations with nulls in the picture.
I'm concerned that if we start saying, "Kafka Streams processes data out of order", with no context, it'll create FUD. We should be up front about the limits of the system, but it doesn't benefit anyone to create the impression that we don't guarantee things that we actually do guarantee.
There was a problem hiding this comment.
In other words, I'm recommending that we specifically say something like "Producing deletes from your aggregations may cause unexpected results when processing dis-ordered data. Streams always processes data in the order it appears in the topic. If the topic is populated out of order, you may have late arriving records, which can cause records to become unexpectedly re-created after they have been deleted. Out-of-order data can be a problem for non-deleting aggregation functions as well, but it's especially surprising with aggregations that produce deletes."
:/ ... you see what I mean by saying that it's a nuanced topic.
There was a problem hiding this comment.
I agree it's nuanced. (Don't agree with all you say above -- omit details for now)
Seems we get out-of-scope for this PR... Maybe JavaDocs are not the right place to document this anyway?
There was a problem hiding this comment.
I'd agree that javadocs are not the best place to document out-of-ordering here.
| * @return a {@link KTable} that contains "update" records with unmodified keys, and values that represent the | ||
| * latest (rolling) aggregate for each key | ||
| * latest (rolling) aggregate for each key. Incase {@link KTable} returns null the following message will be | ||
| * processed as if it was the first message. |
There was a problem hiding this comment.
nit: was -> were (also below)
…ing null behavior. Implemented review comments
|
Updated comment as per @guozhangwang. |
bbejeck
left a comment
There was a problem hiding this comment.
Thanks for the contribution @asutosh936 LGTM
|
LGTM! |
|
merged #6285 to trunk |
|
@asutosh936 What is your Jira Account so we can assign the ticket to you? |
|
@mjsax - Have given comment on Jira. I can assign the ticket to myself if needed. |
* AK/trunk: (36 commits) KAFKA-7962: Avoid NPE for StickyAssignor (apache#6308) Address flakiness of CustomQuotaCallbackTest#testCustomQuotaCallback (apache#6330) KAFKA-7918: Inline generic parameters Pt. II: RocksDB Bytes Store and Memory LRU Caches (apache#6327) MINOR: fix parameter naming (apache#6316) KAFKA-7956 In ShutdownableThread, immediately complete the shutdown if the thread has not been started (apache#6218) MINOR: Refactor replica log dir fetching for improved logging (apache#6313) [TRIVIAL] Remove unused StreamsGraphNode#repartitionRequired (apache#6227) MINOR: Increase produce timeout to 120 seconds (apache#6326) KAFKA-7918: Inline generic parameters Pt. I: in-memory key-value store (apache#6293) MINOR: Fix line break issue in upgrade notes (apache#6320) KAFKA-7972: Use automatic RPC generation in SaslHandshake MINOR: Enable capture of full stack trace in StreamTask#process (apache#6310) KAFKA-7938: Fix test flakiness in DeleteConsumerGroupsTest (apache#6312) KAFKA-7937: Fix Flaky Test ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup (apache#6311) MINOR: Update docs to say 2.2 (apache#6315) KAFKA-7672 : force write checkpoint during StreamTask #suspend (apache#6115) KAFKA-7961; Ignore assignment for un-subscribed partitions (apache#6304) KAFKA-7672: Restoring tasks need to be closed upon task suspension (apache#6113) KAFKA-7864; validate partitions are 0-based (apache#6246) KAFKA-7492 : Updated javadocs for aggregate and reduce methods returning null behavior. (apache#6285) ...
…ing null behavior. (apache#6285) This is an update to the existing javadocs for KGroupedStream class. Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
https://stackoverflow.com/questions/52692202/what-happens-if-the-aggregator-of-a-kgroupedstream-returns-null
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)