stats: metric/tag name fix for Prometheus#2303
stats: metric/tag name fix for Prometheus#2303mattklein123 merged 5 commits intoenvoyproxy:masterfrom srvaroa:g
Conversation
There is a set of unit tests that can verify the sanitization of metrics names when exported to Prometheus. These rely on the default set of metrics for a fresh server which, incidentally, doesn't include metrics with some prohibited characters. #2263 reports one of these cases, '-'. An integration test falls in a similar case. A fix for #2263 is trivial, but adding a regression test is not. The root reason is that the affected method is private in AdminImpl, so the only way of hitting the right code path is to add excessive complexity as both unit/integration versions would require configuration that is known to generate metrics with '-', etc. This commit makes a preliminary refactor on admin.cc, moving all code involved in formatting metrics for Prometheus to its own class where methods can be made public (instead of polluting AdminImpl's API.) A later improvement could also refactor the JSON formatter to a separate class and introduce a strategy pattern for formatters. This requires a shuffling a bit more code so I preferred to take a single step here. Tests are supplied, matching the current behaviour, but still incorrect as per #2263. A later commit will fix #2263, and update the test accordingly. Signed-off-by: Galo Navarro <anglorvaroa@gmail.com>
The Prometheus metrics formatter only sanitizes '.', but leaves other non supported chars like '-'. A case where this may appear is on cipher suites as reported in #2263. The change simply adds a second replace for '-' (rather than a regex to avoid the perf hit, same reason for not matching on all [^a-Z0-9_] chars.) This replacement is only applied to metric and tag names, not to tag values. Signed-off-by: Galo Navarro <anglorvaroa@gmail.com>
Following the suggestion in the bug report, the cipher suite moves to a tag from the metric name. Signed-off-by: Galo Navarro <anglorvaroa@gmail.com>
|
I haven't looked at the code yet, but there's already an issue for this: #2263 As described in the issue, I think the correct fix here is for the cipher suite to be a label. |
|
@mattklein123 thanks in advance for the review. Three doubts: The second commit in the PR implies that metric names with '-' (not only in the cipher suite case) will get changed when exported to Prometheus. This should not break anything (no user would've been seeing those metrics as the name was illegal). Reviewing the contribution guidelines metric names don't seem to require updates on release notes / documentation, but I wanted to double check with you, since metric names are still user-facing. Let me know if you'd like to see any updates on docs. On the third commit in the PR:
|
|
Given that the previous names were illegal, I don't think there's any issue with changing the names. |
|
@ggreenway thanks for the pointer, it's indeed linked to that issue (should I add the # to the PR title too?) The 3rd commit moves the cipher suite to a label as suggested in the issue. However, sanitization of '-' seems to be also relevant, so I did this too in the first 2 commits (I noted in my previous comment the rationale for not splitting in 2 PRs, but happy to do that if preferred.) |
|
I think the proper link is to add "Fixes #2263" in the description. I think it's fine to leave this as 1 PR, but you can split it into two if you'd like. |
ggreenway
left a comment
There was a problem hiding this comment.
Overall looks pretty good. Just 1 nit about naming and comments.
| return rc; | ||
| } | ||
|
|
||
| std::string AdminImpl::sanitizePrometheusName(const std::string& name) { |
There was a problem hiding this comment.
I think a more descriptive name than "strict" would be better. I think the distinction here is whether a stat name or a label name is being sanitized, right? If so, can you rename to something that implies that? Also, please comment on what the prometheus rules are, or link to the prometheus docs, so somebody reading the code can understand what's going on here.
There was a problem hiding this comment.
Actually, I forgot a relevant doubt here.
If you check the code in ::formatTagsForPrometheus, it uses the sanitize function both for the tag name and the value. This seems odd to me, as we should not need to apply the sanitization to the tag value as well. In fact, the change required in this issue needs us not to apply the replace on '-'. I added the "isStrict" to distinguish between real names or values (this is what the 3rd commit contains.) In values, I respect the current replace of '.', but not apply it to '-'.
This was to minimize changes, but as I said, sanitizing values seems odd. I was wondering if rather than adding that if case, we should just not sanitize values at all. That is:
+ buf.push_back(fmt::format("{}=\"{}\"", sanitizePrometheusName(tag.name_), tag.value_));
I'm going to submit a commit addressing your current comments (better naming and comments) for now, but let me know if you agree on my suggestion above and I'll add it later.
There was a problem hiding this comment.
Actually, it seems like we should only sanitize the value, and not the name. Typically, the value will come from configuration (although in the case of cipher suites that isn't true; they come from the ssl library). Whereas the tag name should be a compile-time constant in envoy somewhere.
Or am I misunderstanding?
There was a problem hiding this comment.
Comments addressed @ ff21df3
I've added comments to the header file, cleaned up var / method names, and removed the strict hack as looking at Prometheus docs, it seems pointless. According to the data model, label values may take any unicode character, so there should be no need for sanitization.
You're right that the name may not need sanitization as it comes from configuration (in our case, the label name comes from well_known_names.h). I kept it because it's very easy that someone adds tag name unaware that some chars are not supported by Prometheus, and detecting this will be hard. Keeping the label name sanitization would solve that transparently. But let me know if you prefer to remove it.
Thanks again for the quick reviews & comments.
|
Also, you need to fix_format |
Signed-off-by: Galo Navarro <anglorvaroa@gmail.com>
Following review comments, this: - Cleans up old method / variable names in the Prometheus exporter to make intent clearer. - Adds docs, with references to Prometheus requirements on label and metric names. - Removes sanitization from label values, as any unicode char is accepted (as per Prometheus docs.) Signed-off-by: Galo Navarro <anglorvaroa@gmail.com>
ggreenway
left a comment
There was a problem hiding this comment.
This looks great. Thanks for the fixes and cleanups!
… issue envoyproxy#3019 (envoyproxy#3049)" This reverts commit c2751df. Prometheus label values are permitted to contain any Unicode char. The original bug (envoyproxy#3019) was due to metric names not being sanitized, which was resolved by envoyproxy#2303. Change envoyproxy#3049 was therefore unnecessary and leads to an unexpected change in label values. Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
… issue #3019 (#3049)" (#3830) This reverts commit c2751df. The originally reported bug (#3019) was due to metric names not being sanitized, which I think was resolved by #2303. Change #3049, which was intended to fix #3019, erroneously applied sanitization to label values instead. Prometheus label values are permitted to contain any Unicode char, so this change was unnecessary and leads to an unexpected change in e.g. cluster names. This PR reverts the change, so that - and . will be preserved in label values. Risk Level: low Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Risk Level: low Testing: yep Release Notes: inline Fixes #2303 Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Signed-off-by: JP Simard <jp@jpsim.com>
Risk Level: low Testing: yep Release Notes: inline Fixes #2303 Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Signed-off-by: JP Simard <jp@jpsim.com>
Description
The Prometheus metrics formatter only sanitizes '.', but leaves other
non supported chars like '-'. A case where this may appear is on cipher
suites as reported in #2263.
The change adds a second replace for '-' (rather than a regex to avoid
the perf. hit, same reason for not matching on all [^a-Z0-9_] chars.)
The replacement is only applied to metric and tag names, not tag
values.
As a result of this change, the current formatting would change from:
to
Fixing the general case for any metric/tag name that includes '-'.
Following the suggestion on the original issue, the cipher suite is also moved to a label instead. The specific example above results in:
Risk Level: Low
Testing:
Updated unit/integration rests, + new unit tests.
There is a set of unit tests that can verify the sanitization of metrics
names when exported to Prometheus. These rely on the default set of
metrics for a fresh server which, incidentally, doesn't include metrics
with some prohibited characters. #2263 reports one of these cases, '-'.
An integration test falls in a similar case.
A fix for #2263 is trivial, but adding a regression test is not. The
root reason is that the affected method is private in AdminImpl, so the
only way of hitting the right code path is to add excessive complexity
as both unit/integration versions would require configuration that is
known to generate metrics with '-', etc.
This PR makes a preliminary refactor on admin.cc, moving all code
involved in formatting metrics for Prometheus to its own class where
methods can be made public (instead of polluting AdminImpl's API.) A
later improvement could also refactor the JSON formatter to a separate
class and introduce a strategy pattern for formatters. This requires a
shuffling a bit more code so I preferred to take a single step here.
Docs Changes:
N/A
Release Notes:
N/A
Fixes issue
#2263