MINOR: add warning for long grace period for suppress#6260
MINOR: add warning for long grace period for suppress#6260mjsax merged 1 commit intoapache:trunkfrom vvcephei:suppress-grace-warning
Conversation
bbejeck
left a comment
There was a problem hiding this comment.
Overall looks good, left a few minor comments
There was a problem hiding this comment.
nit: suspiciously is a little strong. Maybe drop it?
There was a problem hiding this comment.
maybe display the grace period in minutes so users can immediately grok the time frame without needing to convert to minutes themselves?
There was a problem hiding this comment.
This line actually prints in ISP-8601 format, which breaks down hours, minutes, seconds, and sub-seconds.
https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html#toString--
I had the same thought.
|
Good feedback, @bbejeck . Thanks for the review! |
There was a problem hiding this comment.
Should this be >= ?
Also, it seems to be very subjective to make the cut-off point 1h -- why not 30 minutes? Or 10 minutes? It might be better to have a way to distinguish between the case for which we run with default grace period or with user specified and log only for the default case?
Having said this, if we can distinguish both cases, we could even throw here... (but this would be a public behavior change and requires a KIP -- don't think we could consider it a bug fix, could we?)
There was a problem hiding this comment.
The intent is to present a warning if the grace period is long enough to cause usability problems with Suppress. 1 hour is arbitrary; 10 minutes seems too short.
As you point out, an alternative to making a judgement about what is "long enough to cause usability problems" is just to warn if the grace period isn't explicitly set. There is a default grace period, which is max(24 hours, window size). There's no technical issue using this default grace period in conjunction with Suppress. If the default grace period were more punctual, like zero, or 10 minutes, there would be no motivation to warn. Thus, the motivation reduces to "the default grace period is long enough to cause usability problems with suppress". In other words, warning on default grace is just as subjective as warning on any other arbitrary grace period that "seems too long".
Maybe a less subjective approach would be to log an info level message simply indicating what amount of time we are configuring suppression to wait. This would still address my concern, the first-15-minutes experience wherein you plug in suppress, hit "go", and nothing happens. Plus, it provides valuable operational information, namely what is the resolved suppression time for the operator: since the value is computed, it would be time-saving to just print out the actual computed value, for our users benefit when debugging the behavior of their applications.
There was a problem hiding this comment.
Not sure where you are going with this.
As you point out, an alternative to making a judgement about what is "long enough to cause usability problems" is just to warn if the grace period isn't explicitly set.
So you agree to update the code accordingly or not? If not, why?
In other words, warning on default grace is just as subjective as warning on any other arbitrary grace period that "seems too long".
I don't think so, because as you mention as "first-15-minutes experience" could be very bad with default settings. Thus, if user want 24h, they can either ignore the warning or overwrite the default with a custom 24h to get rid of the warning (I would expect most users to set a much smaller value anyway, thus I don't think it would have negative impact on UX).
Maybe a less subjective approach would be to log an
infolevel message simply indicating what amount of time we are configuring suppression to wait.
Sounds like a good idea -- I would only WARN if there is something to warn about -- also atm, the code checks for >, thus it does not warn with default value and is also warn if a user sets it to an higher value (it seems weird to warn a user about an explicit setting -- there in in generally nothing wrong with a very large grace period)
My suggestion would be, do an INFO log about the actual used value, and maybe do a WARN if default is not overwritten.
There was a problem hiding this comment.
Ok, the info log is what I was proposing, and I've just pushed the change. Can you take another look?
Thanks,
-John
mjsax
left a comment
There was a problem hiding this comment.
Why is this PR against 2.2 but not trunk ?
LGTM.
There was a problem hiding this comment.
Just a side remark: I guess with KIP-307 this can be removed, too?
There was a problem hiding this comment.
If it's an internal interface, might it make sense to NamedSuppressed extends Suppressed such that both internal sub-classed inherit only NamedSuppressed (ie, this insures that we don't forget to inherit both interfaces?) -- just a thought -- it's of course less compossible than.
There was a problem hiding this comment.
Yep, KIP-307 would make this redundant as well.
You have a good point about making it extend suppressed; I didn't put too much thought into it, as it's just a temporary band-aid with a really limited surface area.
I'll make this change.
|
@mjsax I targeted 2.2 just so that we can get the usability problem resoved ASAP. This has been a pretty common sticking point so far, and I'm hoping we can get out in front of a huge number of people running into trouble with this. I guess we should go ahead and cherry-pick to 2.1 as well, so it'll be in the next bugfix release... Do you want me to retarget to trunk? |
|
We usually merge to |
There was a problem hiding this comment.
nit: sufficient to implement NameSuppressed now only
There was a problem hiding this comment.
It is redundant now, but would it be better to leave them separate?
NamedSuppressed is a bandage just until there's some other interface that guarantees a name() method, and NamedSuppressed only extends Suppressed to guarantee it won't be misused for other config classes.
Given both of these, listing them both here means that we can just slice out the NamedSuppressed when the time comes.
Not a big deal, though, just sharing my thoughts. I'm happy to make the change if you prefer.
There was a problem hiding this comment.
It's internal. So should be fine.
|
@vvcephei can you rebase this? Then we can merge it. |
|
Ok, it's rebased now. Ideally, we would cherry-pick this to |
|
Java8 failed (env issue). Java11 passed. Retest this please. |
Java 11 failed with env errror. Retest this please. |
|
Java 11 passed. https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/19532/ Because we got one pass for Java8 and Java11 (even if not at the same time), I am merging this now. |
Reviewer: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
|
Merged to |
* ak/trunk: (45 commits) KAFKA-7487: DumpLogSegments misreports offset mismatches (apache#5756) MINOR: improve JavaDocs about auto-repartitioning in Streams DSL (apache#6269) KAFKA-7935: UNSUPPORTED_COMPRESSION_TYPE if ReplicaManager.getLogConfig returns None (apache#6274) KAFKA-7895: Fix stream-time reckoning for suppress (apache#6278) KAFKA-6569: Move OffsetIndex/TimeIndex logger to companion object (apache#4586) MINOR: add log indicating the suppression time (apache#6260) MINOR: Make info logs for KafkaConsumer a bit more verbose (apache#6279) KAFKA-7758: Reuse KGroupedStream/KGroupedTable with named repartition topics (apache#6265) KAFKA-7884; Docs for message.format.version should display valid values (apache#6209) MINOR: Save failed test output to build output directory MINOR: add test for StreamsSmokeTestDriver (apache#6231) MINOR: Fix bugs identified by compiler warnings (apache#6258) KAFKA-6474: Rewrite tests to use new public TopologyTestDriver [part 4] (apache#5433) MINOR: fix bypasses in ChangeLogging stores (apache#6266) MINOR: Make MockClient#poll() more thread-safe (apache#5942) MINOR: drop dbAccessor reference on close (apache#6254) KAFKA-7811: Avoid unnecessary lock acquire when KafkaConsumer commits offsets (apache#6119) KAFKA-7916: Unify store wrapping code for clarity (apache#6255) MINOR: Add missing Alter Operation to Topic supported operations list in AclCommand KAFKA-7921: log at error level for missing source topic (apache#6262) ...
Reviewer: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Unfortunately, the default grace period is 24 hours, soif people don't explicitly set
the grace period before
Suppress.untilWindowCloses, the suppression wouldappear to do nothing for a long time.
My advice is to always explicitly set the grace period when using final-results
suppression, but there's no clean way to enforce it.
Maybe a warning is good enough?
Committer Checklist (excluded from commit message)