Conversation
Codecov Report
@@ Coverage Diff @@
## master #310 +/- ##
=======================================
Coverage 33.66% 33.66%
=======================================
Files 206 206
Lines 18299 18299
Branches 2411 2411
=======================================
Hits 6161 6161
Misses 11747 11747
Partials 391 391Continue to review full report at Codecov.
|
jwiggins
left a comment
There was a problem hiding this comment.
I like the word mojibake. Not so much what it represents, just the way it rolls of the tongue...
Anyhow, this is an improvement overall. I'm curious if it can be done without any extra allocation (which was present in the previous version, I would argue unnecessarily).
Also, I'm a little concerned that this code throws exceptions. What do you think about catching them here?
| #endif | ||
| // #if defined(_WIN32) || defined(__WIN32__) || defined(__CYGWIN__) | ||
| // #include <windows.h> | ||
| // #endif |
There was a problem hiding this comment.
Was this an experiment? It should probably be removed if things are working
| // font API. | ||
| std::vector<utf8::uint32_t> codepoints; | ||
| std::vector<utf8::uint32_t>::iterator p; | ||
| std::string utf8text(text); |
There was a problem hiding this comment.
Not sure how I feel about this extra string copy...
| std::vector<utf8::uint32_t> codepoints; | ||
| std::vector<utf8::uint32_t>::iterator p; | ||
| std::string utf8text(text); | ||
| utf8::utf8to32(utf8text.begin(), utf8text.end(), std::back_inserter(codepoints)); |
There was a problem hiding this comment.
I believe you can just construct a utf8::iterator and avoid having to copy into codepoints. Additionally, if you get the length of text, I think you can construct the iterator like so: utf8::iterator<char*> p(text, text, text+length);. If that works, it would be nice to avoid the allocations.
|
I'm not too concerned about the exceptions since we are guaranteed that the input is properly-terminated UTF-8 encoded text. |
|
Sounds good to me. Thanks for the update! |
Remove locale-dependence in Agg text rendering
To handle rendering Unicode text strings in the
aggbackend, we explicitly encode as UTF-8 byte strings in the Python layer before passing the encodedchar*to the C++ API. On Unix platforms currently, it uses locale-dependent functions to decode the bytes back into codepoint integers to pass to the font engine. Under a non-UTF-8 locale, we render mojibake.This PR vendorizes a liberally-licensed header-only library for explicitly decoding the UTF-8 bytes in the C++ layer.