Skip to content

KAFKA-7528: Standardize on Min/Avg/Max Kafka metrics' default value - NaN#5908

Merged
guozhangwang merged 5 commits intoapache:trunkfrom
stanislavkozlovski:KAFKA-7528-strandardize-min-avg-max-kafka-metrics
Nov 20, 2018
Merged

KAFKA-7528: Standardize on Min/Avg/Max Kafka metrics' default value - NaN#5908
guozhangwang merged 5 commits intoapache:trunkfrom
stanislavkozlovski:KAFKA-7528-strandardize-min-avg-max-kafka-metrics

Conversation

@stanislavkozlovski
Copy link
Copy Markdown
Contributor

@stanislavkozlovski stanislavkozlovski commented Nov 12, 2018

While metrics like Min, Avg and Max make sense to respective use Double.MAX_VALUE, 0.0 and Double.MIN_VALUE as default values to ease computation logic, exposing those values makes reading them a bit misleading. For instance, how would you differentiate whether your -avg metric has a value of 0 because it was given samples of 0 or no samples were fed to it?

It makes sense to standardize on the output of these metrics with something that clearly denotes that no values have been recorded.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a quick pass over the PR, LGTM.

It just occurs to me that for Streams we seems do not have unit tests to verify initial values for our metrics (otherwise they should be broken and needs update here). cc @vvcephei .

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a pass over the PR and it LGTM (streams perspective).

count += s.eventCount;
}
return count == 0 ? 0 : total / count;
return count == 0 ? total : total / count;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stanislavkozlovski ,

Thanks for this PR. If I'm reading the code of SampledStat right, I think you could achieve your objective simply by replacing this line with:

return count == 0 ? Double.NaN : total / count;

And then, you wouldn't need any of the other changes in this class. Does that seem right to you?

If so, I think it would apply to the rest of the changed stats as well.

Thanks,
-John

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be the same functionality. Do you think that makes the code clearer?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, personally, I think so.

As long as we can report NaN externally for un-initialized metrics, it really doesn't matter how we initialize each sample internally. IMHO, each metric should use an initial sample value that makes its internal math simple, since those samples aren't directly visible externally. (unless I've misunderstood the system)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense. Can you take another look, @vvcephei ?

for (Sample sample : samples)
min = Math.min(min, sample.value);
return min;
return Math.abs(min - Double.MAX_VALUE) < 0.001 ? Double.NaN : min;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotbugs was complaining about comparing doubles (FE_FLOATING_POINT_EQUALITY)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a tricky business... I think the suggestion I had in Max would also apply here, and you wouldn't have to compare them at all.

count += s.eventCount;
}
return count == 0 ? 0 : total / count;
return count == 0 ? Double.NaN : total;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return count == 0 ? Double.NaN : total;
return count == 0 ? Double.NaN : total / count;

I think you accidentally lost the average computation in the course of the changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

max = Math.max(max, sample.value);
return max;
}
return Math.abs(max - Double.MIN_VALUE) < 0.001 ? Double.NaN : max;
Copy link
Copy Markdown
Contributor

@vvcephei vvcephei Nov 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this basically seems like code-golfing, but I'm wondering if this would be equivalent and a little more robust?

Suggested change
return Math.abs(max - Double.MIN_VALUE) < 0.001 ? Double.NaN : max;
return samples.isEmpty() ? Double.NaN : max;

Then, we could leave the initial values at negative infinity (not sure if it matters).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't seem like code-golfing. Thanks for the review, I think we made the implementation substantially easier to read

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for humoring my review, @stanislavkozlovski .

Note: there are a couple of failing tests that look related.

@stanislavkozlovski
Copy link
Copy Markdown
Contributor Author

@vvcephei We cannot rely on the samples.isEmpty() in the combine() methods. This is because pureObsoleteSamples() resets an expired sample's lastWindowMs and its value to the initial value.

I think the best way to keep track of metrics with initial values is by summing up the event count and returning NaN if no such events exist. Every metric sample will have an eventCount > 0 if its value is updated.

Also split the one testSampledStatInitialValue into two, more clear tests
@vvcephei
Copy link
Copy Markdown
Contributor

@stanislavkozlovski Good catch! It still looks good to me.

@stanislavkozlovski
Copy link
Copy Markdown
Contributor Author

The failed test looks unrelated - kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic[3]

java.lang.AssertionError: Contents of the map shouldn't change expected:<Map(0 -> (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but was:<Map(0 -> (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 -> (299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCo

@guozhangwang guozhangwang merged commit 068ab9c into apache:trunk Nov 20, 2018
@guozhangwang
Copy link
Copy Markdown
Contributor

Merged to trunk, thanks a lot @stanislavkozlovski !!

pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
… NaN (apache#5908)

While metrics like Min, Avg and Max make sense to respective use Double.MAX_VALUE, 0.0 and Double.MIN_VALUE as default values to ease computation logic, exposing those values makes reading them a bit misleading. For instance, how would you differentiate whether your -avg metric has a value of 0 because it was given samples of 0 or no samples were fed to it?

It makes sense to standardize on the output of these metrics with something that clearly denotes that no values have been recorded.

Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants