Support any UTF-8 string as label name#3321
Support any UTF-8 string as label name#3321yuri-tceretian wants to merge 13 commits intoprometheus:mainfrom
Conversation
that returns true to strings that contain any UTF-8 characters except all whitespaces Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
code copied from model.Alert and changed label validation Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2c09948 to
ac4726a
Compare
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
ac4726a to
5f2333f
Compare
bb7b0e7 to
4a67770
Compare
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
… quotes Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
8aba175 to
024a9f0
Compare
|
Nice work Yuri! I am a little concerned about adding inconsistent rules to label matchers. For example, with this PR UTF-8 is supported without double quotes for values, but not names. For example, the following matcher is accepted: but this is not: as it instead must be: I don't like that there are different rules for each side of the expression. I'm interested to hear what Josh and Simon think? |
| - A UTF-8 string, which may be enclosed in double quotes. Can be an empty string. | ||
|
|
||
| The 3rd token may be the empty string. Within the 3rd token, OpenMetrics escaping rules apply: `\"` for a double-quote, `\n` for a line feed, `\\` for a literal backslash. Unescaped `"` must not occur inside the 3rd token (only as the 1st or last character). However, literal line feed characters are tolerated, as are single `\` characters not followed by `\`, `n`, or `"`. They act as a literal backslash in that case. | ||
| Before or after each token, there may be any amount of whitespace. |
There was a problem hiding this comment.
This was moved to a newline, just wanted to check if that was intentional?
| } | ||
|
|
||
| func (m *Matcher) String() string { | ||
| if !model.LabelName(m.Name).IsValid() { |
There was a problem hiding this comment.
I'm not sure I understand what this does, would it be possible to explain it in a comment?
There was a problem hiding this comment.
Sure! The idea is to check whether the name is Prometheus-compatible and return it without quotes (same as it does now), and wrap it in double quotes otherwise.
| break | ||
| } | ||
| } | ||
| return !allSpaces && utf8.ValidString(lns) |
There was a problem hiding this comment.
Do we need utf8.ValidString if its also checked on Line 138 of parse.go?
There was a problem hiding this comment.
This function is used in many other places to validate incoming labels: API calls, config parsing\validation. parse.go only parses matchers.
|
|
||
| name := rawName | ||
| // if name is quoted, then it can contain any UTF-8 character. Unescape some escape sequences. | ||
| if strings.HasPrefix(rawName, `"`) { |
There was a problem hiding this comment.
I thought we want to escape open metrics for unquoted strings, not for double quoted strings? Double quoted strings have different escape sequences right?
| if strings.HasPrefix(rawName, `"`) { | |
| if !strings.HasPrefix(rawName, `"`) { |
There was a problem hiding this comment.
Unquoted names can be only Prometheus-compatible, and therefore no escaping is allowed. Double quoted in contrast, can contain any UTF-8 characters, and therefore it can be escaped the same way as value.
| rawValue = raw | ||
| expectTrailingQuote = false | ||
| ) | ||
| if strings.HasPrefix(rawValue, `"`) { |
There was a problem hiding this comment.
Should be removed before the call to unescapeMatcherString I think.
There was a problem hiding this comment.
No, because it is also applied to label value, which can be quoted and unquoted.
There was a problem hiding this comment.
What I mean here is that unescapeMatcherString is doing two operations: unescaping escaped sequences and also checking if a double quoted string has both start and end quotes. What I was proposing was separate those out into separate functions / code.
| ) | ||
| if strings.HasPrefix(rawValue, `"`) { | ||
| rawValue = rawValue[1:] | ||
| expectTrailingQuote = true |
There was a problem hiding this comment.
Like above, I think checking that a double quoted string is well terminated should be done somewhere else? I think I would argue that checking for terminating " and checking escape sequences inside a double quoted string are different operations?
Currently, label values can be quoted and unquoted and also can contain UTF-8 characters in both cases. I agree that different rules for different parts of the expression can be confusing, and I can change that if maintainers agree. However, the goal of this PR is to introduce an extension to the current syntax with as less changes as possible and without breaking current configurations. I did try to apply similar rules to both parts of the expression, and that's why I added support for escaped special characters (\n, \t, ") to label names. Ideally, I think a matcher should just be a structured object rather than a string that needs to be parsed. That would drastically simplify code. |
5e0210e to
a3eddd9
Compare
|
Closing it as it does not seem that anyone is interested in reviewing it, and it will be superseded by #3353. |
|
I have taken a look, but I have a strong preference for the approach taken in #3353 as is clearer in terms of explanation - as such we'll focus our efforts on that one. For the future though, I would have expected to see a much better documentation (in the PR) for such a critical change - as an example I have found #3353 (comment) to be very useful to understand to what degree this is a breaking change. |
|
@gotjosh this PR did not break any current behavior but extended it and therefore did not require any supplemental documentation you referred to.
Good. For the future though, the comment would be appreciated :) |
The expansion in character-set meant that previously rejected matchers would now be accepted - this can be considered a breaking change. |
Currently, Alertmanager supports only valid Prometheus labels (i.e. ones that match the following regular expression
^[a-zA-Z_][a-zA-Z0-9_]*$). This PR expands the range of valid symbols to any symbol in UTF-8 range.The only limitation to the label name is that it should not include only whitespace symbols.
It replaces all usage of Prometheus' model.LabelName method
Validto a new functionlabels.IsValidNamethat acceptsmodel.LabelName.Override types.Alert method
Validthat is derived from Prometheus' model.Alert and changes validation of Labels and Annotations. The tests are copied from the Prometheus' Alert tests and expanded with a few more test cases.Update ParseMatcher function that is used to parse string to
labels.Matcher. The regular expression was updated to match any character if it is wrapped by double quotes and only Prometheus-compatible names unquoted.Fixes #3319
Notes for reviewer: The PR can be reviewed by commit.