-
Notifications
You must be signed in to change notification settings - Fork 697
Unicode segmenter performance #1703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Unicode segmenter performance #1703
Conversation
|
Thank you for submitting this @clipperhouse , we'll review it soon. |
|
Thanks @abhinavdangeti. I did some more testing of differences between segmenters here: https://github.com/clipperhouse/segmenter-repro |
|
Updated benchmark, using new sample text (~110K in size, multilingual). Note allocations. |
|
@abhinavdangeti I think this is in good shape for your review. Happy to discuss. |
|
I see that the |
|
@abhinavdangeti OK, try those workflows again? Thanks. |
89f74ba to
cf2065d
Compare
|
I rebased a bit for a cleaner merge. |
cf2065d to
93cc997
Compare
|
(Rebased) |
|
@abhinavdangeti Friendly ping, let me know if you'd like to pursue this. |
a8db884 to
c8e26eb
Compare
d4de8e6 to
2782f9f
Compare
2782f9f to
f944431
Compare
f944431 to
e7dd25c
Compare
e7dd25c to
b6bca0a
Compare
|
Hiya @abhinavdangeti, the above pushes are just rebases, no updates here in a while. Would you like to run checks? |
|
Thanks @clipperhouse , re-running the checks. |
|
Looks good, thanks. Ready for review at your convenience. |
b6bca0a to
9b41d64
Compare
226cf98 to
cccea71
Compare
cccea71 to
8fa8ed9
Compare
8fa8ed9 to
4bfba33
Compare
4bfba33 to
042b2d8
Compare
042b2d8 to
7014345
Compare
7014345 to
05ac589
Compare
05ac589 to
97df007
Compare
60ea1cf to
48bcf3b
Compare
Replacing blevesearch/segment. ~2x perf improvement.
48bcf3b to
5fcaf24
Compare
|
Hi @abhinavdangeti, I check on this every few months just to keep it fresh. If this is in your “someday” file, great. If not, feel free to close it. Most recently, I removed my refactoring of the allocations, and just focused on a like-for-like swap of the segmenter. Simpler for review. |
In this PR, as an experiment I’ve swapped out the Unicode segmenter in Bleve with this one.
Benefits
Testing & compatibility