Remove locale-dependence in Agg text rendering by rkern · Pull Request #310 · enthought/enable

rkern · 2018-07-06T22:15:39Z

To handle rendering Unicode text strings in the agg backend, we explicitly encode as UTF-8 byte strings in the Python layer before passing the encoded char* to the C++ API. On Unix platforms currently, it uses locale-dependent functions to decode the bytes back into codepoint integers to pass to the font engine. Under a non-UTF-8 locale, we render mojibake.

This PR vendorizes a liberally-licensed header-only library for explicitly decoding the UTF-8 bytes in the C++ layer.

codecov-io · 2018-07-06T23:03:45Z

Codecov Report

Merging #310 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #310   +/-   ##
=======================================
  Coverage   33.66%   33.66%           
=======================================
  Files         206      206           
  Lines       18299    18299           
  Branches     2411     2411           
=======================================
  Hits         6161     6161           
  Misses      11747    11747           
  Partials      391      391

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f470504...d3807ff. Read the comment docs.

jwiggins

I like the word mojibake. Not so much what it represents, just the way it rolls of the tongue...

Anyhow, this is an improvement overall. I'm curious if it can be done without any extra allocation (which was present in the previous version, I would argue unnecessarily).

Also, I'm a little concerned that this code throws exceptions. What do you think about catching them here?

jwiggins · 2018-07-09T14:18:49Z

-#endif
+// #if defined(_WIN32) || defined(__WIN32__) || defined(__CYGWIN__)
+// #include <windows.h>
+// #endif


Was this an experiment? It should probably be removed if things are working

jwiggins · 2018-07-09T14:24:54Z

+        // font API.
+        std::vector<utf8::uint32_t> codepoints;
+        std::vector<utf8::uint32_t>::iterator p;
+        std::string utf8text(text);


Not sure how I feel about this extra string copy...

jwiggins · 2018-07-09T15:15:04Z

+        std::vector<utf8::uint32_t> codepoints;
+        std::vector<utf8::uint32_t>::iterator p;
+        std::string utf8text(text);
+        utf8::utf8to32(utf8text.begin(), utf8text.end(), std::back_inserter(codepoints));


I believe you can just construct a utf8::iterator and avoid having to copy into codepoints. Additionally, if you get the length of text, I think you can construct the iterator like so: utf8::iterator<char*> p(text, text, text+length);. If that works, it would be nice to avoid the allocations.

rkern · 2018-07-10T17:46:42Z

I'm not too concerned about the exceptions since we are guaranteed that the input is properly-terminated UTF-8 encoded text.

jwiggins · 2018-07-10T18:22:44Z

Sounds good to me. Thanks for the update!

Remove locale-dependence in Agg text rendering

rkern added 5 commits July 6, 2018 13:38

ENH: Vendorize header-only library for UTF-8 conversion.

66bd25b

BUG: Explicitly decode text as UTF-8 and avoid locale dependence.

e79f00a

ENH: Add test.

5ca2e43

BUG: skip the test if the ASCII locale is not installed.

4abc500

BUG: Use the LC_CTYPE category instead of the catch-all LC_ALL.

84f712a

jwiggins reviewed Jul 9, 2018

View reviewed changes

ENH: Avoid string copies.

d3807ff

jwiggins approved these changes Jul 10, 2018

View reviewed changes

rkern merged commit f546b7b into master Jul 12, 2018

rkern deleted the fix/utf-8 branch July 12, 2018 16:10

rkern mentioned this pull request Jul 12, 2018

Release 4.7.2 #311

Merged

jwiggins pushed a commit that referenced this pull request Jul 12, 2018

Merge pull request #310 from enthought/fix/utf-8

d81a559

Remove locale-dependence in Agg text rendering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove locale-dependence in Agg text rendering#310

Remove locale-dependence in Agg text rendering#310
rkern merged 6 commits into
masterfrom
fix/utf-8

rkern commented Jul 6, 2018

Uh oh!

codecov-io commented Jul 6, 2018 •

edited

Loading

Uh oh!

jwiggins left a comment

Uh oh!

jwiggins Jul 9, 2018

Uh oh!

jwiggins Jul 9, 2018

Uh oh!

jwiggins Jul 9, 2018

Uh oh!

rkern commented Jul 10, 2018

Uh oh!

jwiggins commented Jul 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rkern commented Jul 6, 2018

Uh oh!

codecov-io commented Jul 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jwiggins left a comment

Choose a reason for hiding this comment

Uh oh!

jwiggins Jul 9, 2018

Choose a reason for hiding this comment

Uh oh!

jwiggins Jul 9, 2018

Choose a reason for hiding this comment

Uh oh!

jwiggins Jul 9, 2018

Choose a reason for hiding this comment

Uh oh!

rkern commented Jul 10, 2018

Uh oh!

jwiggins commented Jul 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented Jul 6, 2018 •

edited

Loading