Skip to content

Support any UTF-8 string as label name#3321

Closed
yuri-tceretian wants to merge 13 commits intoprometheus:mainfrom
grafana:yuri-tceretian/utf-8-label-names
Closed

Support any UTF-8 string as label name#3321
yuri-tceretian wants to merge 13 commits intoprometheus:mainfrom
grafana:yuri-tceretian/utf-8-label-names

Conversation

@yuri-tceretian
Copy link
Contributor

@yuri-tceretian yuri-tceretian commented Apr 10, 2023

Currently, Alertmanager supports only valid Prometheus labels (i.e. ones that match the following regular expression ^[a-zA-Z_][a-zA-Z0-9_]*$). This PR expands the range of valid symbols to any symbol in UTF-8 range.

The only limitation to the label name is that it should not include only whitespace symbols.

  1. It replaces all usage of Prometheus' model.LabelName method Valid to a new function labels.IsValidName that accepts model.LabelName.

  2. Override types.Alert method Valid that is derived from Prometheus' model.Alert and changes validation of Labels and Annotations. The tests are copied from the Prometheus' Alert tests and expanded with a few more test cases.

  3. Update ParseMatcher function that is used to parse string to labels.Matcher. The regular expression was updated to match any character if it is wrapped by double quotes and only Prometheus-compatible names unquoted.

Fixes #3319

Notes for reviewer: The PR can be reviewed by commit.

  • Update validation in UI

that returns true to strings that contain any UTF-8 characters except all whitespaces

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
code copied from model.Alert and changed label validation

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
@yuri-tceretian yuri-tceretian force-pushed the yuri-tceretian/utf-8-label-names branch from 2c09948 to ac4726a Compare April 10, 2023 18:33
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
… quotes

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
@grobinson-grafana
Copy link
Collaborator

Nice work Yuri!

I am a little concerned about adding inconsistent rules to label matchers. For example, with this PR UTF-8 is supported without double quotes for values, but not names. For example, the following matcher is accepted:

{foo=bar😊}

but this is not:

{foo😊=bar}

as it instead must be:

{"foo😊"=bar}

I don't like that there are different rules for each side of the expression. I'm interested to hear what Josh and Simon think?

- A UTF-8 string, which may be enclosed in double quotes. Can be an empty string.

The 3rd token may be the empty string. Within the 3rd token, OpenMetrics escaping rules apply: `\"` for a double-quote, `\n` for a line feed, `\\` for a literal backslash. Unescaped `"` must not occur inside the 3rd token (only as the 1st or last character). However, literal line feed characters are tolerated, as are single `\` characters not followed by `\`, `n`, or `"`. They act as a literal backslash in that case.
Before or after each token, there may be any amount of whitespace.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved to a newline, just wanted to check if that was intentional?

}

func (m *Matcher) String() string {
if !model.LabelName(m.Name).IsValid() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this does, would it be possible to explain it in a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! The idea is to check whether the name is Prometheus-compatible and return it without quotes (same as it does now), and wrap it in double quotes otherwise.

break
}
}
return !allSpaces && utf8.ValidString(lns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need utf8.ValidString if its also checked on Line 138 of parse.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used in many other places to validate incoming labels: API calls, config parsing\validation. parse.go only parses matchers.


name := rawName
// if name is quoted, then it can contain any UTF-8 character. Unescape some escape sequences.
if strings.HasPrefix(rawName, `"`) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we want to escape open metrics for unquoted strings, not for double quoted strings? Double quoted strings have different escape sequences right?

Suggested change
if strings.HasPrefix(rawName, `"`) {
if !strings.HasPrefix(rawName, `"`) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unquoted names can be only Prometheus-compatible, and therefore no escaping is allowed. Double quoted in contrast, can contain any UTF-8 characters, and therefore it can be escaped the same way as value.

rawValue = raw
expectTrailingQuote = false
)
if strings.HasPrefix(rawValue, `"`) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed before the call to unescapeMatcherString I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because it is also applied to label value, which can be quoted and unquoted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean here is that unescapeMatcherString is doing two operations: unescaping escaped sequences and also checking if a double quoted string has both start and end quotes. What I was proposing was separate those out into separate functions / code.

)
if strings.HasPrefix(rawValue, `"`) {
rawValue = rawValue[1:]
expectTrailingQuote = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above, I think checking that a double quoted string is well terminated should be done somewhere else? I think I would argue that checking for terminating " and checking escape sequences inside a double quoted string are different operations?

@yuri-tceretian
Copy link
Contributor Author

I am a little concerned about adding inconsistent rules to label matches.

Currently, label values can be quoted and unquoted and also can contain UTF-8 characters in both cases.

I agree that different rules for different parts of the expression can be confusing, and I can change that if maintainers agree. However, the goal of this PR is to introduce an extension to the current syntax with as less changes as possible and without breaking current configurations.

I did try to apply similar rules to both parts of the expression, and that's why I added support for escaped special characters (\n, \t, ") to label names.

Ideally, I think a matcher should just be a structured object rather than a string that needs to be parsed. That would drastically simplify code.

@yuri-tceretian yuri-tceretian force-pushed the yuri-tceretian/utf-8-label-names branch from 5e0210e to a3eddd9 Compare May 12, 2023 14:05
@yuri-tceretian
Copy link
Contributor Author

Closing it as it does not seem that anyone is interested in reviewing it, and it will be superseded by #3353.

@yuri-tceretian yuri-tceretian deleted the yuri-tceretian/utf-8-label-names branch May 24, 2023 16:14
@gotjosh
Copy link
Member

gotjosh commented May 24, 2023

I have taken a look, but I have a strong preference for the approach taken in #3353 as is clearer in terms of explanation - as such we'll focus our efforts on that one.

For the future though, I would have expected to see a much better documentation (in the PR) for such a critical change - as an example I have found #3353 (comment) to be very useful to understand to what degree this is a breaking change.

@yuri-tceretian
Copy link
Contributor Author

@gotjosh this PR did not break any current behavior but extended it and therefore did not require any supplemental documentation you referred to.
All other documentation was updated according to the change.

I have taken a look,

Good. For the future though, the comment would be appreciated :)

@gotjosh
Copy link
Member

gotjosh commented May 24, 2023

@gotjosh this PR did not break any current behavior but extended it and therefore did not require any supplemental documentation you referred to.

The expansion in character-set meant that previously rejected matchers would now be accepted - this can be considered a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support any symbol in label names

3 participants