Skip to content

clipperhouse/segmenter-repro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An attempt to compare results of

Have a look at the test file.

What I've found (June 2022)

  • Bleve splits a run of spaces into separate tokens, while uax29 returns a single token with multiple spaces
  • Bleve appears to be Unicode 8.0.0, uax29 is 13.0.0, seems like there's a difference on emoji skin tone modifiers, which might generalize to emoji modifiers in general?

Both the Bleve segmenter and UAX29 pass the Unicode test suite (for their respective Unicode versions).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages