Skip to content

Comments

Ignore UTF8 BOMs from funny windows clients.#42

Closed
clavoie-acquisio wants to merge 1 commit intomasterfrom
unknown repository
Closed

Ignore UTF8 BOMs from funny windows clients.#42
clavoie-acquisio wants to merge 1 commit intomasterfrom
unknown repository

Conversation

@clavoie-acquisio
Copy link

@clavoie-acquisio clavoie-acquisio commented May 27, 2016

This change is Reviewable

@grobie
Copy link
Member

grobie commented May 27, 2016

Cool! Can you add a test please.

@clavoie-acquisio
Copy link
Author

With pleasure. I'm just wondering how you'd like that done -- adding random
BOMs to the existing test suite feels awkward: they're effectively
invisible, or downright disallowed in the middle of files (wikipedia says
"BOM use is optional, and, if used, should appear at the start of the text
stream.")

Happy to do it... how?

On Fri, May 27, 2016 at 3:56 PM, Tobias Schmidt notifications@github.com
wrote:

Cool! Can you add a test please.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AStJnnXEVBn0PvN432YZYccFSnnfbEUfks5qF0xbgaJpZM4Iozt3
.

@grobie
Copy link
Member

grobie commented May 27, 2016

Would it be enough to create a byte array with the bom bytes at the
beginning and then following a normal textfile line?

On Fri, May 27, 2016 at 3:59 PM, clavoie-acquisio notifications@github.com
wrote:

With pleasure. I'm just wondering how you'd like that done -- adding random
BOMs to the existing test suite feels awkward: they're effectively
invisible, or downright disallowed in the middle of files (wikipedia says
"BOM use is optional, and, if used, should appear at the start of the text
stream.")

Happy to do it... how?

On Fri, May 27, 2016 at 3:56 PM, Tobias Schmidt notifications@github.com
wrote:

Cool! Can you add a test please.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#42 (comment),
or mute
the thread
<
https://github.com/notifications/unsubscribe/AStJnnXEVBn0PvN432YZYccFSnnfbEUfks5qF0xbgaJpZM4Iozt3

.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAANaGL82qVSUl_keOREor0rBilp4BYxks5qF00TgaJpZM4Iozt3
.

@juliusv
Copy link
Member

juliusv commented May 27, 2016

Given prometheus/prometheus#1679 (comment), should we allow BOMs? I don't care either way, but it seems the recommendation is not to.

@grobie
Copy link
Member

grobie commented May 28, 2016

My first reaction was to follow the robustness principle and be forgiving on the input, I do agree with that recommendation though. BOM has always been an ugly wart, I'm surprised it still exists. Or maybe not ...

@clavoie-acquisio What client is that which sends the BOM? What client library is being used which sets that? What prevents you from not removing the BOM header instead?

@beorn7
Copy link
Member

beorn7 commented May 28, 2016

Given https://tools.ietf.org/html/rfc3629 , I'm inclined to not allow BOMs, but explicitly add to the format specification that BOMs are not allowed.

However, if this is something that is a pain to get rid of on the side of the generator, we might go for being "forgiving on the input".

BTW: The test only fail on Go 1.4 which doesn't compile our dependencies anymore. We should remove it from the test.

@clavoie-acquisio
Copy link
Author

Yeah, I'm fond of allowing BOMs for two big reasons:

  • Valid prometheus text format streams cannot be confused for BOMs --
    either '#' or [a-zA-Z:_]. So there's no chance of confusion with other UTF8
    characters.
  • A lot of text tools (especially on Windows) have basically forgotten that
    ASCII exists. Quoth wikipedia again: "Microsoft
    https://en.wikipedia.org/wiki/Microsoft compilers[10]
    https://en.wikipedia.org/wiki/Byte_order_mark#cite_note-10 and
    interpreters, and many pieces of software on Microsoft Windows
    https://en.wikipedia.org/wiki/Microsoft_Windows such as Notepad
    javascript:void(0) treat the BOM as a required magic number
    javascript:void(0) rather than use heuristics. These tools add a BOM when
    saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present,
    or the file contains only ASCII bytes."

I could also make this configurable on a per-target basis, and not
'pollute' the parser's purity on sane scrape targets, though my preference
is to just assume that a "text format" is what it is in the modern world:
UTF-8.

On Sat, May 28, 2016 at 8:13 AM, Björn Rabenstein notifications@github.com
wrote:

Given https://tools.ietf.org/html/rfc3629 , I'm inclined to not allow
BOMs, but explicitly add to the format specification that BOMs are not
allowed.

However, if this is something that is a pain to get rid of on the side of
the generator, we might go for being "forgiving on the input".

BTW: The test only fail on Go 1.4 which doesn't compile our dependencies
anymore. We should remove it from the test.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AStJns-w0ZOAWXPpN0kdWCDAeK0CiM-Cks5qGDGBgaJpZM4Iozt3
.

@grobie
Copy link
Member

grobie commented May 28, 2016

Your reference to the modern world is exactly the reason why we have the feeling it should not be supported. A byte order is completely meaningless in UTF-8 encoding and one should expect that current software should not need such a wart anymore to handle UTF-8 correctly. As the recommendation is to not set it, I think we should have good arguments why it's necessary to support it anyways.

Do you save the textfiles with your editor? Can you just disable the BOM header in your editor? If some client library writes out the BOM header, would it be possible to change that?

@clavoie-acquisio
Copy link
Author

On Sat, May 28, 2016 at 5:31 PM, Tobias Schmidt notifications@github.com
wrote:

Your reference to the modern world is exactly the reason why we have the
feeling it should not be supported. A byte order is completely meaningless
in UTF-8 encoding and one should expect that current software should not
need such a wart anymore to handle UTF-8 correctly. As the recommendation
is to not set it, I think we should have good arguments why it's necessary
to support it anyways.

Because as mentioned, it doesn't hurt other clients, and most (all?) modern
Windows text processing tools require it, and won't allow disabling it (as
far as I can tell).

Do you save the textfiles with your editor? Can you just disable the BOM
header in your editor? If some client library writes out the BOM header,
would it be possible to change that?

I'm working on fixing the libraries I can (with a few PR sent on their
respective projects), but the use case that I won't be able to fix is
indeed someone writing text files and not being able to disable BOM headers.

Christian Lavoie

@beorn7
Copy link
Member

beorn7 commented May 30, 2016

Given that the text format is our most lenient fallback (e.g. if no content-type header is present), where we just try to parse anything as text format, I tend to allow BOMs, but treat them by ignoring. We should also add that to the spec, as in "BOMs are tolerated but completely ignored, i.e. the content is always considered UTF-8, no matter what the BOMs say."

And BTW: #43 will fix the test.

@grobie
Copy link
Member

grobie commented May 30, 2016

Sounds like it's not easy to get rid of BOM headers in Windows environments, so I agree that we should handle it.

Please add a test which verifies the correct handling. The format specification @beorn7 suggested needs to be added in the prometheus/doc repository.

maybeBOM, err := p.buf.Peek(3)
if len(maybeBOM) == 3 && maybeBOM[0] == 0xEF && maybeBOM[1] == 0xBB && maybeBOM[2] == 0xBF && err == nil {
// The common case dealing with funny Windows clients
log.Debug("Discarding BOM")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use any logging anywhere in this package. It might come as a surprise to the (possibly very diverse) users, and it's pulling in another dependency. I think it's fine to silently swallow the BOMs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

On Tue, May 31, 2016 at 6:32 AM, Björn Rabenstein notifications@github.com
wrote:

In expfmt/text_parse.go
#42 (comment):

@@ -125,6 +126,15 @@ func (p *TextParser) reset(in io.Reader) {
} else {
p.buf.Reset(in)
}
+

  • // Some clients like to throw in UTF-8 BOM chars
  • maybeBOM, err := p.buf.Peek(3)
  • if len(maybeBOM) == 3 && maybeBOM[0] == 0xEF && maybeBOM[1] == 0xBB && maybeBOM[2] == 0xBF && err == nil {
  •   // The common case dealing with funny Windows clients
    
  •   log.Debug("Discarding BOM")
    

We don't use any logging anywhere in this package. It might come as a
surprise to the (possibly very diverse) users, and it's pulling in another
dependency. I think it's fine to silently swallow the BOMs.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/prometheus/common/pull/42/files/1e1e0c1c905808faedb252d7fbb96f35a43726f7#r65157940,
or mute the thread
https://github.com/notifications/unsubscribe/AStJnk1lrBxaqHlLQzUtcFti7h2pqeAYks5qHA5AgaJpZM4Iozt3
.

@beorn7
Copy link
Member

beorn7 commented May 31, 2016

Since there don't seem to be further objections, let's go with what @grobie said:
@clavoie-acquisio please add a test as suggested. I can take care of documenting, if you don't beat me to it.

@beorn7
Copy link
Member

beorn7 commented Jun 7, 2016

@clavoie-acquisio Are you still on this?

@clavoie-acquisio
Copy link
Author

Yup, got distracted with deployment of prometheus. Should be able to update
the PR today or tomorrow. Touch wood.
On Jun 7, 2016 6:03 AM, "Björn Rabenstein" notifications@github.com wrote:

@clavoie-acquisio https://github.com/clavoie-acquisio Are you still on
this?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AStJngICQnWcZIYNAHaJ_0ZVitiaiOnEks5qJUHKgaJpZM4Iozt3
.

@fabxc
Copy link
Contributor

fabxc commented Jun 23, 2016

ping @clavoie-acquisio :)

@juliusv
Copy link
Member

juliusv commented Jul 23, 2016

Another ping @clavoie-acquisio

@judwhite
Copy link

@clavoie-acquisio @fabxc @juliusv I'm happy to pick this up and add some tests if there's still interest.

@grobie
Copy link
Member

grobie commented Feb 19, 2017 via email

@brian-brazil
Copy link
Contributor

Potentially relevant: https://github.com/dimchansky/utfbom

gouthamve pushed a commit to gouthamve/common that referenced this pull request Jul 22, 2018
b990f48 Merge pull request prometheus#42 from kinvolk/lorenzo/fix-git-diff
224a145 Check if SHA1 exists before calling `git diff`
1c3000d Add auto_apply config for wcloud
0ebf5c0 Fix wcloud -serivice
4fe078a Merge pull request prometheus#39 from weaveworks/fix-wrong-subtree-use
3f4934d Remove generate_latest_map
48beb60 Sync changes done directly in scope/tools
45dcdd5 Merge pull request prometheus#37 from weaveworks/fix-mflag-missing
b895344 Use mflag package from weaveworks fork until we find a better solution
e030008 Merge pull request prometheus#36 from weaveworks/wcloud-service-flags
9cbab40 Add wcloud Makefile
ef55901 Review feedback, and build wcloud in circle.
3fe92f5 Add wcloud deploy --service flag
3527b56 Merge pull request prometheus#34 from weaveworks/repo-branch
92cd0b8 [wcloud] Add support for repo_branch option
9f760ab Allow wcloud users to override username
38037f8 Merge pull request prometheus#33 from weaveworks/wcloud-templates
7acfbd7 Propagate the local username
e6876d1 Add template fields to wcloud config.
f1bb537 Merge pull request prometheus#30 from weaveworks/mike/shell-lint/dont-error-if-empty
e60f5df Merge pull request prometheus#31 from weaveworks/mike/fix-shell-lint-errors
e8e2b69 integrations: Fix a shellcheck linter error
a781575 shell-lint: Don't fail if no shell scripts found
db5efc0 Merge pull request prometheus#28 from weaveworks/mike/add-image-tag
5312c40 Import image-tag script into build tools so it can be shared
7e850f8 Fix logs path
dda9785 Update deploy api
f2f4e5b Fix the wcloud client
3925eb6 Merge pull request prometheus#27 from weaveworks/wcloud-events
77355b9 Lint
d9a1c6c Add wcloud events, update flags and error nicely when there is no config

git-subtree-dir: tools
git-subtree-split: b990f488bdc7909b62d9245bc4b4caf58f5ae7ea
gouthamve pushed a commit to gouthamve/common that referenced this pull request Jul 22, 2018
Once the call succeeds (or maxBackoff reached), errors were no longer logged.
From then on, the only log messages generated were first successes after errors.
@sgissi sgissi mentioned this pull request Aug 3, 2018
@brian-brazil
Copy link
Contributor

The current decision is that we won't be allowing this, as OpenMetrics forbids it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants