Skip to content

KAFKA-7492 : Updated javadocs for aggregate and reduce methods returning null behavior.#6285

Merged
bbejeck merged 2 commits intoapache:trunkfrom
asutosh936:KAFKA-7492
Feb 22, 2019
Merged

KAFKA-7492 : Updated javadocs for aggregate and reduce methods returning null behavior.#6285
bbejeck merged 2 commits intoapache:trunkfrom
asutosh936:KAFKA-7492

Conversation

@asutosh936
Copy link
Copy Markdown
Contributor

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@asutosh936
Copy link
Copy Markdown
Contributor Author

@mjsax - Have take a stab updating the documentation. Kindly review.

Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @asutosh936.

Call for second review @guozhangwang @bbejeck @vvcephei @ableegoldman

* @param reducer a {@link Reducer} that computes a new aggregate result. Cannot be {@code null}.
* @return a {@link KTable} that contains "update" records with unmodified keys, and values that represent the
* latest (rolling) aggregate for each key
* latest (rolling) aggregate for each key. Incase {@link KTable} returns null the following message will be
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: In case

nit: null -> {@code null}

Similar comments below.

What do you mean by KTable returns null ? Do you mean Reducer#apply(..) returns null?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this: If the reduce function returns null, it is then interpreted as deletion for the key, and future messages of the same key coming from upstream operators will be handled as newly initialized value.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention the "problem" with out-of-order data for this case? Should we ever recommend to not return null ?

We had a discussion at some point to actually disallow returning null because a "delete" is not a valid aggregation result.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nuanced topic. It would be good to clarify if we can find a good way to say it.

The operator itself never processes data out of order. It always respects the order defined by the topic. But upstream repartitions don't provide any guarantees about the order they write to the topic. This is how data can logically become out of order, even if users take care to populate their input topics strictly ordered, which might in turn violate some expectations with nulls in the picture.

I'm concerned that if we start saying, "Kafka Streams processes data out of order", with no context, it'll create FUD. We should be up front about the limits of the system, but it doesn't benefit anyone to create the impression that we don't guarantee things that we actually do guarantee.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, I'm recommending that we specifically say something like "Producing deletes from your aggregations may cause unexpected results when processing dis-ordered data. Streams always processes data in the order it appears in the topic. If the topic is populated out of order, you may have late arriving records, which can cause records to become unexpectedly re-created after they have been deleted. Out-of-order data can be a problem for non-deleting aggregation functions as well, but it's especially surprising with aggregations that produce deletes."

:/ ... you see what I mean by saying that it's a nuanced topic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's nuanced. (Don't agree with all you say above -- omit details for now)

Seems we get out-of-scope for this PR... Maybe JavaDocs are not the right place to document this anyway?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree that javadocs are not the best place to document out-of-ordering here.

* @return a {@link KTable} that contains "update" records with unmodified keys, and values that represent the
* latest (rolling) aggregate for each key
* latest (rolling) aggregate for each key. Incase {@link KTable} returns null the following message will be
* processed as if it was the first message.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was -> were (also below)

…ing null behavior. Implemented review comments
@asutosh936
Copy link
Copy Markdown
Contributor Author

Updated comment as per @guozhangwang.

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @asutosh936 LGTM

@guozhangwang
Copy link
Copy Markdown
Contributor

LGTM!

@bbejeck bbejeck merged commit 93ba962 into apache:trunk Feb 22, 2019
@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Feb 22, 2019

merged #6285 to trunk

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Feb 23, 2019

@asutosh936 What is your Jira Account so we can assign the ticket to you?

@asutosh936
Copy link
Copy Markdown
Contributor Author

@mjsax - Have given comment on Jira. I can assign the ticket to myself if needed.

jarekr pushed a commit to confluentinc/kafka that referenced this pull request Apr 18, 2019
* AK/trunk: (36 commits)
  KAFKA-7962: Avoid NPE for StickyAssignor (apache#6308)
  Address flakiness of CustomQuotaCallbackTest#testCustomQuotaCallback (apache#6330)
  KAFKA-7918: Inline generic parameters Pt. II: RocksDB Bytes Store and Memory LRU Caches (apache#6327)
  MINOR: fix parameter naming (apache#6316)
  KAFKA-7956 In ShutdownableThread, immediately complete the shutdown if the thread has not been started (apache#6218)
  MINOR: Refactor replica log dir fetching for improved logging (apache#6313)
  [TRIVIAL] Remove unused StreamsGraphNode#repartitionRequired (apache#6227)
  MINOR: Increase produce timeout to 120 seconds (apache#6326)
  KAFKA-7918: Inline generic parameters Pt. I: in-memory key-value store (apache#6293)
  MINOR: Fix line break issue in upgrade notes (apache#6320)
  KAFKA-7972: Use automatic RPC generation in SaslHandshake
  MINOR: Enable capture of full stack trace in StreamTask#process (apache#6310)
  KAFKA-7938: Fix test flakiness in DeleteConsumerGroupsTest (apache#6312)
  KAFKA-7937: Fix Flaky Test ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup (apache#6311)
  MINOR: Update docs to say 2.2 (apache#6315)
  KAFKA-7672 : force write checkpoint during StreamTask #suspend (apache#6115)
  KAFKA-7961; Ignore assignment for un-subscribed partitions (apache#6304)
  KAFKA-7672: Restoring tasks need to be closed upon task suspension (apache#6113)
  KAFKA-7864; validate partitions are 0-based (apache#6246)
  KAFKA-7492 : Updated javadocs for aggregate and reduce methods returning null behavior. (apache#6285)
  ...
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…ing null behavior. (apache#6285)

This is an update to the existing javadocs for KGroupedStream class.

Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>,  John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants