-
-
Notifications
You must be signed in to change notification settings - Fork 17
Add test to specifically cover surrogate pairs, not just combining characters #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #335 +/- ##
==========================================
- Coverage 72.28% 72.26% -0.02%
==========================================
Files 416 416
Lines 35388 35438 +50
Branches 4894 4897 +3
==========================================
+ Hits 25581 25611 +30
- Misses 8705 8724 +19
- Partials 1102 1103 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I've added the custom surrogate pair handling here. Again, I'm open to suggestions for the helper class name. Maybe just |
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 1 of 1 files at r1, 6 of 6 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @Enkidu93)
src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 27 at r2 (raw file):
public UsfmToken UsfmToken { get; private set; } private string _text;
Do we need this? Can we just get the current string from _codePointString.String?
src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 167 at r2 (raw file):
} public class CodePointString
We should add a comment that describes the purpose of this class.
src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 183 at r2 (raw file):
.Where(tup => !char.IsLowSurrogate(tup.c)) .Select((tup, i) => (tup.i, i)); _codePointIndexByStringIndex = indexPairs.ToDictionary(tup => tup.StringIndex, tup => tup.CodePointIndex);
I would like to reduce the performance hit as much as possible from this class. We should build both dictionaries in a single loop.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you think the class name is OK then?
Reviewable status: 5 of 6 files reviewed, 3 unresolved discussions (waiting on @ddaspit)
src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 27 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Do we need this? Can we just get the current string from
_codePointString.String?
Done.
src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 167 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
We should add a comment that describes the purpose of this class.
Done.
src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 183 at r2 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I would like to reduce the performance hit as much as possible from this class. We should build both dictionaries in a single loop.
Done.
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really think of anything better. Maybe SurrogatePairString. At least, it would indicate that the class has something to do with surrogate pairs.
@ddaspit reviewed 1 of 1 files at r3, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
Connected to sillsdev/machine.py#228
This change is