KAFKA-7528: Standardize on Min/Avg/Max Kafka metrics' default value - NaN#5908
KAFKA-7528: Standardize on Min/Avg/Max Kafka metrics' default value - NaN#5908guozhangwang merged 5 commits intoapache:trunkfrom stanislavkozlovski:KAFKA-7528-strandardize-min-avg-max-kafka-metrics
Conversation
guozhangwang
left a comment
There was a problem hiding this comment.
Made a quick pass over the PR, LGTM.
It just occurs to me that for Streams we seems do not have unit tests to verify initial values for our metrics (otherwise they should be broken and needs update here). cc @vvcephei .
bbejeck
left a comment
There was a problem hiding this comment.
I took a pass over the PR and it LGTM (streams perspective).
| count += s.eventCount; | ||
| } | ||
| return count == 0 ? 0 : total / count; | ||
| return count == 0 ? total : total / count; |
There was a problem hiding this comment.
Hi @stanislavkozlovski ,
Thanks for this PR. If I'm reading the code of SampledStat right, I think you could achieve your objective simply by replacing this line with:
return count == 0 ? Double.NaN : total / count;
And then, you wouldn't need any of the other changes in this class. Does that seem right to you?
If so, I think it would apply to the rest of the changed stats as well.
Thanks,
-John
There was a problem hiding this comment.
Yes, that would be the same functionality. Do you think that makes the code clearer?
There was a problem hiding this comment.
Yeah, personally, I think so.
As long as we can report NaN externally for un-initialized metrics, it really doesn't matter how we initialize each sample internally. IMHO, each metric should use an initial sample value that makes its internal math simple, since those samples aren't directly visible externally. (unless I've misunderstood the system)
There was a problem hiding this comment.
That makes a lot of sense. Can you take another look, @vvcephei ?
| for (Sample sample : samples) | ||
| min = Math.min(min, sample.value); | ||
| return min; | ||
| return Math.abs(min - Double.MAX_VALUE) < 0.001 ? Double.NaN : min; |
There was a problem hiding this comment.
Spotbugs was complaining about comparing doubles (FE_FLOATING_POINT_EQUALITY)
There was a problem hiding this comment.
Yeah, it's a tricky business... I think the suggestion I had in Max would also apply here, and you wouldn't have to compare them at all.
| count += s.eventCount; | ||
| } | ||
| return count == 0 ? 0 : total / count; | ||
| return count == 0 ? Double.NaN : total; |
There was a problem hiding this comment.
| return count == 0 ? Double.NaN : total; | |
| return count == 0 ? Double.NaN : total / count; |
I think you accidentally lost the average computation in the course of the changes.
There was a problem hiding this comment.
Nice catch!
| max = Math.max(max, sample.value); | ||
| return max; | ||
| } | ||
| return Math.abs(max - Double.MIN_VALUE) < 0.001 ? Double.NaN : max; |
There was a problem hiding this comment.
Sorry if this basically seems like code-golfing, but I'm wondering if this would be equivalent and a little more robust?
| return Math.abs(max - Double.MIN_VALUE) < 0.001 ? Double.NaN : max; | |
| return samples.isEmpty() ? Double.NaN : max; |
Then, we could leave the initial values at negative infinity (not sure if it matters).
There was a problem hiding this comment.
No, it doesn't seem like code-golfing. Thanks for the review, I think we made the implementation substantially easier to read
vvcephei
left a comment
There was a problem hiding this comment.
LGTM! Thanks for humoring my review, @stanislavkozlovski .
Note: there are a couple of failing tests that look related.
|
@vvcephei We cannot rely on the I think the best way to keep track of metrics with initial values is by summing up the event count and returning |
Also split the one testSampledStatInitialValue into two, more clear tests
|
@stanislavkozlovski Good catch! It still looks good to me. |
|
The failed test looks unrelated - |
|
Merged to trunk, thanks a lot @stanislavkozlovski !! |
… NaN (apache#5908) While metrics like Min, Avg and Max make sense to respective use Double.MAX_VALUE, 0.0 and Double.MIN_VALUE as default values to ease computation logic, exposing those values makes reading them a bit misleading. For instance, how would you differentiate whether your -avg metric has a value of 0 because it was given samples of 0 or no samples were fed to it? It makes sense to standardize on the output of these metrics with something that clearly denotes that no values have been recorded. Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
While metrics like
Min,AvgandMaxmake sense to respective useDouble.MAX_VALUE,0.0andDouble.MIN_VALUEas default values to ease computation logic, exposing those values makes reading them a bit misleading. For instance, how would you differentiate whether your-avgmetric has a value of 0 because it was given samples of 0 or no samples were fed to it?It makes sense to standardize on the output of these metrics with something that clearly denotes that no values have been recorded.