Skip to content

Conversation

@Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Sep 11, 2025

Connected to sillsdev/machine.py#228


This change is Reviewable

@Enkidu93 Enkidu93 requested a review from ddaspit September 11, 2025 22:04
@codecov-commenter
Copy link

codecov-commenter commented Sep 16, 2025

Codecov Report

❌ Patch coverage is 71.23288% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.26%. Comparing base (8565963) to head (d519508).

Files with missing lines Patch % Lines
src/SIL.Machine/PunctuationAnalysis/TextSegment.cs 69.23% 19 Missing and 1 partial ⚠️
...ne/PunctuationAnalysis/QuotationMarkStringMatch.cs 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #335      +/-   ##
==========================================
- Coverage   72.28%   72.26%   -0.02%     
==========================================
  Files         416      416              
  Lines       35388    35438      +50     
  Branches     4894     4897       +3     
==========================================
+ Hits        25581    25611      +30     
- Misses       8705     8724      +19     
- Partials     1102     1103       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Enkidu93
Copy link
Collaborator Author

I've added the custom surrogate pair handling here.

Again, I'm open to suggestions for the helper class name. Maybe just NonSurrogateString or something explicit like that is best.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit reviewed 1 of 1 files at r1, 6 of 6 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @Enkidu93)


src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 27 at r2 (raw file):

        public UsfmToken UsfmToken { get; private set; }

        private string _text;

Do we need this? Can we just get the current string from _codePointString.String?


src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 167 at r2 (raw file):

    }

    public class CodePointString

We should add a comment that describes the purpose of this class.


src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 183 at r2 (raw file):

                .Where(tup => !char.IsLowSurrogate(tup.c))
                .Select((tup, i) => (tup.i, i));
            _codePointIndexByStringIndex = indexPairs.ToDictionary(tup => tup.StringIndex, tup => tup.CodePointIndex);

I would like to reduce the performance hit as much as possible from this class. We should build both dictionaries in a single loop.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you think the class name is OK then?

Reviewable status: 5 of 6 files reviewed, 3 unresolved discussions (waiting on @ddaspit)


src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 27 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Do we need this? Can we just get the current string from _codePointString.String?

Done.


src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 167 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We should add a comment that describes the purpose of this class.

Done.


src/SIL.Machine/PunctuationAnalysis/TextSegment.cs line 183 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I would like to reduce the performance hit as much as possible from this class. We should build both dictionaries in a single loop.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really think of anything better. Maybe SurrogatePairString. At least, it would indicate that the class has something to do with surrogate pairs.

@ddaspit reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@Enkidu93 Enkidu93 merged commit 8d5fd21 into master Sep 17, 2025
3 of 4 checks passed
@Enkidu93 Enkidu93 deleted the add_another_qd_unicode_test branch September 17, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants