Skip to content

Force the tokenizer to load files in utf-8#20

Merged
arashm merged 1 commit intoarashm:masterfrom
damphyr:utf_handling
Jun 22, 2017
Merged

Force the tokenizer to load files in utf-8#20
arashm merged 1 commit intoarashm:masterfrom
damphyr:utf_handling

Conversation

@damphyr
Copy link
Copy Markdown
Contributor

@damphyr damphyr commented Jun 22, 2017

This comes about after trying to use PoParser on the Windows command line which has codepage 850 as its default encoding.

Most modern *ix will set LC_ALL et.al. to a reasonable UTF-* encoding thus you won't notice that reading a file with File.open(bla,'r') will just try to use the terminal's encoding and thus fail miserably and spectacularly.

Given that .po files are usually meant for internationalisation it is a safe assumption that the contents will not be limited to the old 850 codepage and that UTF-8 is a reasonable default.

This is an emergency patch to make it usable on Windows. To fix it permanently I would restructure the library to not do any file IO interactions, instead to expect the content of the .po file as a string.
You then let the user decide how to load the file. Alternately, provide a few convenience methods for specific encodings (e.g. PoParser.load_utf8(po_file) or some such)

@coveralls
Copy link
Copy Markdown

coveralls commented Jun 22, 2017

Coverage Status

Coverage remained the same at 97.872% when pulling 85879d5 on damphyr:utf_handling into 848a085 on arashm:master.

@arashm
Copy link
Copy Markdown
Owner

arashm commented Jun 22, 2017

Thank you! 👍
I checked and this didn't fail on any of my PO files, so it should be harmless to add this. I'm agreed on your point. I will work on a better API for next version so we accept strings by default instead of a path.

@arashm arashm merged commit 01b9d3d into arashm:master Jun 22, 2017
@damphyr damphyr deleted the utf_handling branch June 22, 2017 15:52
@arashm
Copy link
Copy Markdown
Owner

arashm commented Jun 29, 2017

On version 3.2.0 I deprecated this behavior and parse now accepts content of a PO file as string. I also introduced parse_file to be backport compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants