Skip to content

MINOR: add warning for long grace period for suppress#6260

Merged
mjsax merged 1 commit intoapache:trunkfrom
vvcephei:suppress-grace-warning
Feb 17, 2019
Merged

MINOR: add warning for long grace period for suppress#6260
mjsax merged 1 commit intoapache:trunkfrom
vvcephei:suppress-grace-warning

Conversation

@vvcephei
Copy link
Copy Markdown
Contributor

Unfortunately, the default grace period is 24 hours, soif people don't explicitly set
the grace period before Suppress.untilWindowCloses, the suppression would
appear to do nothing for a long time.

My advice is to always explicitly set the grace period when using final-results
suppression, but there's no clean way to enforce it.
Maybe a warning is good enough?

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@vvcephei
Copy link
Copy Markdown
Contributor Author

@mjsax @bbejeck ,

This sharp edge has been making me uncomfortable for a while. What do you think about this?

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, left a few minor comments

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suspiciously is a little strong. Maybe drop it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe display the grace period in minutes so users can immediately grok the time frame without needing to convert to minutes themselves?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line actually prints in ISP-8601 format, which breaks down hours, minutes, seconds, and sub-seconds.

https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html#toString--

I had the same thought.

Comment thread streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java Outdated
Comment thread streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java Outdated
@vvcephei
Copy link
Copy Markdown
Contributor Author

Good feedback, @bbejeck . Thanks for the review!

@vvcephei vvcephei changed the title MINOR: add warning for unreasonably long grace period for suppress MINOR: add warning for long grace period for suppress Feb 12, 2019
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be >= ?

Also, it seems to be very subjective to make the cut-off point 1h -- why not 30 minutes? Or 10 minutes? It might be better to have a way to distinguish between the case for which we run with default grace period or with user specified and log only for the default case?

Having said this, if we can distinguish both cases, we could even throw here... (but this would be a public behavior change and requires a KIP -- don't think we could consider it a bug fix, could we?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to present a warning if the grace period is long enough to cause usability problems with Suppress. 1 hour is arbitrary; 10 minutes seems too short.

As you point out, an alternative to making a judgement about what is "long enough to cause usability problems" is just to warn if the grace period isn't explicitly set. There is a default grace period, which is max(24 hours, window size). There's no technical issue using this default grace period in conjunction with Suppress. If the default grace period were more punctual, like zero, or 10 minutes, there would be no motivation to warn. Thus, the motivation reduces to "the default grace period is long enough to cause usability problems with suppress". In other words, warning on default grace is just as subjective as warning on any other arbitrary grace period that "seems too long".

Maybe a less subjective approach would be to log an info level message simply indicating what amount of time we are configuring suppression to wait. This would still address my concern, the first-15-minutes experience wherein you plug in suppress, hit "go", and nothing happens. Plus, it provides valuable operational information, namely what is the resolved suppression time for the operator: since the value is computed, it would be time-saving to just print out the actual computed value, for our users benefit when debugging the behavior of their applications.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where you are going with this.

As you point out, an alternative to making a judgement about what is "long enough to cause usability problems" is just to warn if the grace period isn't explicitly set.

So you agree to update the code accordingly or not? If not, why?

In other words, warning on default grace is just as subjective as warning on any other arbitrary grace period that "seems too long".

I don't think so, because as you mention as "first-15-minutes experience" could be very bad with default settings. Thus, if user want 24h, they can either ignore the warning or overwrite the default with a custom 24h to get rid of the warning (I would expect most users to set a much smaller value anyway, thus I don't think it would have negative impact on UX).

Maybe a less subjective approach would be to log an info level message simply indicating what amount of time we are configuring suppression to wait.

Sounds like a good idea -- I would only WARN if there is something to warn about -- also atm, the code checks for >, thus it does not warn with default value and is also warn if a user sets it to an higher value (it seems weird to warn a user about an explicit setting -- there in in generally nothing wrong with a very large grace period)

My suggestion would be, do an INFO log about the actual used value, and maybe do a WARN if default is not overwritten.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the info log is what I was proposing, and I've just pushed the change. Can you take another look?

Thanks,
-John

Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this PR against 2.2 but not trunk ?

LGTM.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a side remark: I guess with KIP-307 this can be removed, too?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an internal interface, might it make sense to NamedSuppressed extends Suppressed such that both internal sub-classed inherit only NamedSuppressed (ie, this insures that we don't forget to inherit both interfaces?) -- just a thought -- it's of course less compossible than.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, KIP-307 would make this redundant as well.

You have a good point about making it extend suppressed; I didn't put too much thought into it, as it's just a temporary band-aid with a really limited surface area.

I'll make this change.

@vvcephei
Copy link
Copy Markdown
Contributor Author

@mjsax I targeted 2.2 just so that we can get the usability problem resoved ASAP.

This has been a pretty common sticking point so far, and I'm hoping we can get out in front of a huge number of people running into trouble with this.

I guess we should go ahead and cherry-pick to 2.1 as well, so it'll be in the next bugfix release...

Do you want me to retarget to trunk?

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Feb 14, 2019

We usually merge to trunk and cherry-pick to older branches. So yet, please redirect as it should go into trunk, too (to keep a "clean" workflow). Thanks.

Copy link
Copy Markdown
Member

@mjsax mjsax Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sufficient to implement NameSuppressed now only

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is redundant now, but would it be better to leave them separate?

NamedSuppressed is a bandage just until there's some other interface that guarantees a name() method, and NamedSuppressed only extends Suppressed to guarantee it won't be misused for other config classes.

Given both of these, listing them both here means that we can just slice out the NamedSuppressed when the time comes.

Not a big deal, though, just sharing my thoughts. I'm happy to make the change if you prefer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's internal. So should be fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

@vvcephei vvcephei changed the base branch from 2.2 to trunk February 15, 2019 21:45
@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Feb 15, 2019

@vvcephei can you rebase this? Then we can merge it.

@vvcephei
Copy link
Copy Markdown
Contributor Author

Ok, it's rebased now.

Ideally, we would cherry-pick this to 2.1 and 2.2

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Feb 16, 2019

Java8 failed (env issue). Java11 passed.

Retest this please.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Feb 17, 2019

https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/19521/testReport/kafka.api/AdminClientIntegrationTest/testMinimumRequestTimeouts/

org.junit.runners.model.TestTimedOutException: test timed out after 120000 milliseconds
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:92)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
	at kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1429)
	at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1071)

Java 11 failed with env errror.

Retest this please.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Feb 17, 2019

Java 11 passed.

https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/19532/

org.junit.runners.model.TestTimedOutException: test timed out after 120000 milliseconds
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:92)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
	at kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1429)
	at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1071)

Because we got one pass for Java8 and Java11 (even if not at the same time), I am merging this now.

@mjsax mjsax merged commit bc4fed3 into apache:trunk Feb 17, 2019
mjsax pushed a commit that referenced this pull request Feb 17, 2019
Reviewer: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
@mjsax
Copy link
Copy Markdown
Member

mjsax commented Feb 17, 2019

Merged to trunk and cherry-picked to 2.2

@vvcephei vvcephei deleted the suppress-grace-warning branch February 18, 2019 16:00
jarekr pushed a commit to confluentinc/kafka that referenced this pull request Apr 18, 2019
* ak/trunk: (45 commits)
  KAFKA-7487: DumpLogSegments misreports offset mismatches (apache#5756)
  MINOR: improve JavaDocs about auto-repartitioning in Streams DSL (apache#6269)
  KAFKA-7935: UNSUPPORTED_COMPRESSION_TYPE if ReplicaManager.getLogConfig returns None (apache#6274)
  KAFKA-7895: Fix stream-time reckoning for suppress (apache#6278)
  KAFKA-6569: Move OffsetIndex/TimeIndex logger to companion object  (apache#4586)
  MINOR: add log indicating the suppression time (apache#6260)
  MINOR: Make info logs for KafkaConsumer a bit more verbose (apache#6279)
  KAFKA-7758: Reuse KGroupedStream/KGroupedTable with named repartition topics (apache#6265)
  KAFKA-7884; Docs for message.format.version should display valid values (apache#6209)
  MINOR: Save failed test output to build output directory
  MINOR: add test for StreamsSmokeTestDriver (apache#6231)
  MINOR: Fix bugs identified by compiler warnings (apache#6258)
  KAFKA-6474: Rewrite tests to use new public TopologyTestDriver [part 4] (apache#5433)
  MINOR: fix bypasses in ChangeLogging stores (apache#6266)
  MINOR: Make MockClient#poll() more thread-safe (apache#5942)
  MINOR: drop dbAccessor reference on close (apache#6254)
  KAFKA-7811: Avoid unnecessary lock acquire when KafkaConsumer commits offsets (apache#6119)
  KAFKA-7916: Unify store wrapping code for clarity (apache#6255)
  MINOR: Add missing Alter Operation to Topic supported operations list in AclCommand
  KAFKA-7921: log at error level for missing source topic (apache#6262)
  ...
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
Reviewer: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants