KAFKA-10199: Add task updater metrics, part 2#13300
Conversation
There was a problem hiding this comment.
The logic for measuring remaining records is a bit complex: we first aggregate the total amount of records to restore across all changelog partitions at the beginning when initializing the changelogs; and then during restoration we keep decrementing by the number of restored records.
There was a problem hiding this comment.
Would it be helpful to rename this method to initRestoreRemaining to make it's purpuse clear (and/or add a JavaDoc to the method in StreamTask)?
There was a problem hiding this comment.
This function is not used in prod code, hence cleaning it up.
There was a problem hiding this comment.
This is a piggy-backed metric fix: we should use cumulativeSum than cumulativeCount for dropped records, even though today with most callees as sensor.record() it is effectively the same as it only increment by 1, it is still vulnerable in case we record a non-one value in the future.
There was a problem hiding this comment.
This metric is removed as part of KIP-743, and it's only used in tests (which I also cleaned up as a piggy-back).
There was a problem hiding this comment.
I pondered on the code and I think it should not be null ever? Please correct me if I'm wrong.
There was a problem hiding this comment.
@mjsax LMK what do you think? I may lack some background here.
There was a problem hiding this comment.
Was just wondering. It seemed to be unrelated to this PR, and so I just assumed there must be a reason for having the null check.
There was a problem hiding this comment.
Would it be helpful to rename this method to initRestoreRemaining to make it's purpuse clear (and/or add a JavaDoc to the method in StreamTask)?
There was a problem hiding this comment.
Not very compelling reasons, I just want to make sure we do not start with a negative number, but I cannot think of a case that it could be negative.
There was a problem hiding this comment.
I've added a comment to cover the very edge case when it could become negative.
There was a problem hiding this comment.
The string is not only used as part of the metric name, but also used as part of the sensor name (suffix to be more specific). In our KIP proposal it's defined as, e.g. restore-rate | total so it's correct to be defined as restore/update.
494c079 to
9b8531b
Compare
Fixed some naming confusions in the XXXMetrics classes, especially for distinguishing sensor name constructs and metric name constructs; also fixed a recording bug on dropped record sensor.
Also cleaned up a couple unused metrics and util functions in the metrics classes.
Add related unit tests.
Committer Checklist (excluded from commit message)