Skip to content
This repository was archived by the owner on Feb 28, 2026. It is now read-only.

Strip BOM from JSON responses#145

Merged
lmazuel merged 3 commits intoAzure:masterfrom
eduardomourar:feature/strip-bom
Feb 27, 2019
Merged

Strip BOM from JSON responses#145
lmazuel merged 3 commits intoAzure:masterfrom
eduardomourar:feature/strip-bom

Conversation

@eduardomourar
Copy link
Copy Markdown
Contributor

@lmazuel and @susam, this fixes issue #144.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 31, 2019

Codecov Report

Merging #145 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
+ Coverage   88.01%   88.02%   +0.01%     
==========================================
  Files          25       25              
  Lines        2561     2564       +3     
==========================================
+ Hits         2254     2257       +3     
  Misses        307      307
Impacted Files Coverage Δ
msrest/universal_http/__init__.py 75.91% <100%> (ø) ⬆️
msrest/pipeline/universal.py 94.79% <100%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adb4596...f55b930. Read the comment docs.

Comment thread msrest/pipeline/universal.py Outdated

# Remove Byte Order Mark if present in string
bom = codecs.BOM_UTF8.decode(encoding='utf-8')
if data_as_str.startswith(bom):
Copy link
Copy Markdown
Member

@lmazuel lmazuel Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thinkings here:

  • I think in the usual workflow, we don't need it, because you patched HTTPClientResponse to utf-8-sig
  • I'm hesitant on performance. bom can be a constant, startswith is unnecessary (lstrip will return the initial str if not startswith anyway)

I'm thinking maybe not adding this code at all, and just document it "if you give me str, you must have removed the BOM first". bytes and stream scenario will take care of the BOM.

Thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as requested. I believe we have 3 options:

  1. (BEST) We keep this as is (not best performance) for now and then we can create a PR for requests library. I say this because when debugging I saw that the encoding was utf-8 and apparent encoding from chardet was utf-8-sig. So we can just specify to give preference apparent encoding.
  2. We try to create a global boolean flag within msrest to remove BOM only if set to true.
  3. Just documented it. This option would not help me. I was able to create a SDK directly from swagger file from my fork, but if i use the master i need to forcefully set response.encoding = 'utf-8-sig'.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR for requests has been created and, if they merge it there, I believe all the changes here will not be required at all. :)

@lmazuel
Copy link
Copy Markdown
Member

lmazuel commented Feb 6, 2019

Hi @eduardomourar !
Thanks the contribution! I definitely acknowledge there is an issue to fix here (we'll merge your PR eventually). I added a few comments / questions for your review.
Thanks!

Comment thread msrest/pipeline/universal.py Outdated

_LOGGER = logging.getLogger(__name__)

BOM = codecs.BOM_UTF8.decode(encoding='utf-8')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_BOM please, I like to keep my stuff private :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last tiny detail and I merge it ;)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are completely right!

@lmazuel lmazuel merged commit f1fec7d into Azure:master Feb 27, 2019
@lmazuel
Copy link
Copy Markdown
Member

lmazuel commented Feb 27, 2019

Thanks @eduardomourar again for the contribution!
The next release date is unsure yet, but this will be in it for sure ;)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants