Conversation
std/uni.d
Outdated
|
Will leave API review to UTF experts. For my money I wonder where the generated tries come from? Should we include them in the code, or use ctfe? Also since you have style corrections maybe time comes near to bring this in line with phobos e.g. |
As with the rest of std.uni the user is in the front seat and picks any CodepointSet he/she happens to like. Then one way is to build a Trie with say toTrie!2 (3/4 levels) that would handle dchars or (new) UTF matcher for ranges of char/wchar.
Anything is possible, including generating a matcher based on run-time input or doing the whole thing via CTFE. As noted before e.g. |
|
Great. Anyhow more line breaks in the generated tries would be nice. |
Sure, I have it on my list. |
|
This pull request conflicts with dlang/dmd#3399: |
|
Let's give the simpified interface of utf-matcher-2 another try. |
|
Still sucks ... |
|
With this std.uni hacked into LDC 0.13-alpha: |
It's a step zero to get decode-less std.regex. UTF matchers are efficient functors around a set of specific tries. Enables processing Unicode characters without decoding at speeds on par with decoding itself. Along the way make staticIota at 'package' protected and reuse it. Fix a shameful typo in setSearcher.
Granularity is horribly high. Auto-inference for templates has the downside that it, leaves no explanations or reasons for failure.
Overlong sequences, wrong continuation for UTF-8. Lone high surrogate for UTf-16/.
Spelling, style etc.
Drop public for documented unittests
|
How to run that benchmark? |
|
Source: Run with a bunch of wiki files as arguments (any text file will do, but these provide distinct sets of languages): |
|
It's really a nuisance that we're developing performance sensitive code with such a dull backend. The benchmark only runs |
|
I running out of ideas to speed up the single function variant, so I'm OK with the |
|
@MartinNowak regarding I have a vision that most code will do one of: Another possible use case for |
|
So are we ready to merge this? |
|
@MartinNowak I think it should be good to go, the only problem left is that it's useless on DMD. |
|
Auto-merge toggled on |
Second try. See also pull #1685