Skip to content

Conversation

@Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Sep 24, 2025

Fixes sillsdev/machine.py#232
Also fixes #339 and fixes #340

These will all need to be ported.


This change is Reviewable

@Enkidu93 Enkidu93 requested a review from ddaspit September 24, 2025 18:16
@codecov-commenter
Copy link

codecov-commenter commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 83.84615% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.32%. Comparing base (1662968) to head (56f6d27).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/SIL.Machine/Corpora/ScriptureRefComparer.cs 0.00% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #341      +/-   ##
==========================================
+ Coverage   72.29%   72.32%   +0.03%     
==========================================
  Files         417      418       +1     
  Lines       35475    35579     +104     
  Branches     4900     4914      +14     
==========================================
+ Hits        25646    25732      +86     
- Misses       8731     8751      +20     
+ Partials     1098     1096       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 3 unresolved discussions (waiting on @ddaspit)


src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 76 at r2 (raw file):

        {
            // We need two maps so that update rows can be specified per segment
            // but be handled correctly whether or not the USFM has segments for that verse

As an alternative to two maps, we could use the ignore segments map and then search through the list to see if a segment matches if the update row ref has a segment. That would probably be better, but I wasn't sure.


tests/SIL.Machine.Tests/Corpora/UpdateUsfmParserHandlerTests.cs line 520 at r2 (raw file):

            )
        );
        // Updating using relaxed refs will not be perfect, but it's the best we can do while allowing for out of order rows

Are you ok with this, @ddaspit? I don't see a way to avoid this if the rows have relaxed refs.


src/SIL.Machine/Corpora/ScriptureRef.cs line 134 at r2 (raw file):

    }

    public class ScriptureRefComparer : IComparer<ScriptureRef>, IEqualityComparer<ScriptureRef>

I implemented IComparer as well partially just for completeness. There's no way in calling code to compare scripture refs ignoring segments using a comparer (e.g., if you want to use Order()) - this was the root isssue of Serval/Machine not handling certain refs correctly. It's superseded by the row map solution, but I still thought it'd be helpful to expose. If you'd rather I remove it, I can.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look great, @ddaspit. Thank you! I think this issue is the only one we still need to watch out for: In Serval since we're basically using the source ref in the target versification for pretranslations, we need to sort the pretranslations using compareSegments=true and then pass compareSegments=true to the handler when template=source (i.e., should be the same refs as the USFM), correct? For template=target, we'll want to ignore segments. This makes we wonder if we sort the rows ourselves in the handler to maintain consistency between how the refs are being compared. As long as our sorting method is a stable sorting method, this shouldn't mess up relaxed non-verse refs, right?

Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @ddaspit)


src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs line 31 at r3 (raw file):

            _curTextType.Count == 0 ? ScriptureTextType.None : _curTextType.Peek();

        protected bool DuplicateVerse { get; private set; }

I wondered about this. Do you want to add something to the editor config to flag protected fields? I think we only have something set up for protected readonly fields.


src/SIL.Machine/Corpora/VerseRefComparer.cs line 9 at r3 (raw file):

namespace SIL.Machine.Corpora
{
    public class VerseRefComparer : IComparer<VerseRef>, IEqualityComparer<VerseRef>

If you want to keep the ScriptureRefComparer, should I move it to a separate file or is it appropriate to keep it with ScriptureRef?


tests/SIL.Machine.Tests/Corpora/UpdateUsfmParserHandlerTests.cs line 431 at r3 (raw file):

        var rows = new List<UpdateUsfmRow>
        {
            new UpdateUsfmRow(ScrRef("MAT 2:2"), "Verse 2."),

Is this correct, @ddaspit? Is this something that could even actually happen?

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, compareSegments = true when using the source as the template and compareSegments = false when using the target as the template.

For template=target, we'll want to ignore segments. This makes we wonder if we sort the rows ourselves in the handler to maintain consistency between how the refs are being compared. As long as our sorting method is a stable sorting method, this shouldn't mess up relaxed non-verse refs, right?

I'm not sure I understand what you are asking here. Where exactly are you talking about sorting the rows?

@ddaspit reviewed 2 of 5 files at r1, 5 of 5 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)


src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs line 31 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I wondered about this. Do you want to add something to the editor config to flag protected fields? I think we only have something set up for protected readonly fields.

I don't know if we can set up an editorconfig rule to catch this. Any fields that are accessible outside of a class should be accessed through a property.


src/SIL.Machine/Corpora/VerseRefComparer.cs line 9 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

If you want to keep the ScriptureRefComparer, should I move it to a separate file or is it appropriate to keep it with ScriptureRef?

I think it makes sense to move it to a separate file.


tests/SIL.Machine.Tests/Corpora/UpdateUsfmParserHandlerTests.cs line 431 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Is this correct, @ddaspit? Is this something that could even actually happen?

I'm not sure how realistic it is, but it is the correct behavior in this case.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Serval, we sort the pretranslations: https://github.com/sillsdev/serval/blob/200925cd6722784fc6beb20428338267d707243a/src/Serval/src/Serval.Translation/Services/PretranslationService.cs#L148. What I'm saying is that, if sorting them is necessary, we need to make sure that the comparer we use when sorting is the same as the comparer used when advancing the rows, so maybe we should just sort the rows in the update handler itself to avoid errors caused by inconsistent sorting (i.e., the client sorts it one way, but doesn't pass the corresponding value of compareSegments). This is the issue I'm thinking about: #338.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes sense. Do you want to make the change or should I?

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it since there are a couple other things to clean up/look over too.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I will leave it to you to wrap up this PR.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)

@Enkidu93 Enkidu93 force-pushed the get_update_rows_by_ref branch from 90c5b94 to 55ae257 Compare September 29, 2025 20:54
Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Reviewable status: 5 of 8 files reviewed, 2 unresolved discussions (waiting on @ddaspit)


src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs line 31 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I don't know if we can set up an editorconfig rule to catch this. Any fields that are accessible outside of a class should be accessed through a property.

OK


src/SIL.Machine/Corpora/VerseRefComparer.cs line 9 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I think it makes sense to move it to a separate file.

Sounds good. Done.


tests/SIL.Machine.Tests/Corpora/UpdateUsfmParserHandlerTests.cs line 431 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I'm not sure how realistic it is, but it is the correct behavior in this case.

OK

Enkidu93 and others added 3 commits September 29, 2025 16:56
… verses; add remark after ide tag

Fix marker placement test given new proper out of order verse handling

Add comment regarding row maps
- add compareSegments parameter to UpdateUsfmParserHandler
@Enkidu93 Enkidu93 force-pushed the get_update_rows_by_ref branch from 55ae257 to 8de2518 Compare September 29, 2025 20:56
@Enkidu93 Enkidu93 requested a review from ddaspit September 30, 2025 00:07
@Enkidu93
Copy link
Collaborator Author

As per @ddaspit and I's conversation, I've removed the row-sorting. From Mongo:

ObjectIds are approximately ordered by creation time, but are not perfectly ordered. Sorting a collection on an _id field containing ObjectId values is roughly equivalent to sorting by creation time.

So I think we'll just have to deal with the chance that there's some ugly behavior when template=Target in Serval. This is ready for a (hopefully) final review, Damien.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my research, it looks like sorting by _id would be sufficient. The main issue where the ObjectId sort order and the insertion order might not be exactly the same is when multiple processes insert docs. This is not the case for Serval. All pretranslations for a build are inserted by the same service (Serval API). We would need to add sorting support to SIL.DataAccess. In the meantime, sorting by ScriptureRef will work when the source is used as the template, but will not work when the target is used as the template.

@ddaspit reviewed 4 of 5 files at r4, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 420 at r6 (raw file):

            Dictionary<string, object> rowMetadata = null;
            int sourceIndex = 0;
            // search the sorted rows with updated text, starting from where we left off last.

This code does assume that the rows are sorted.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous statement is not accurate. The current implementation in Serval should work correctly for both source and target templates. We only need to sort by ScriptureRef prior to converting to relaxed refs for template=Target to work properly, which is what we are doing.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yes, sounds good.

Reviewable status: 7 of 8 files reviewed, 1 unresolved discussion (waiting on @ddaspit)


src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 420 at r6 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This code does assume that the rows are sorted.

I reverted it. I don't like that we have this assumption in the comment when it isn't necessarily true. I added a comment on the rows parameter.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ddaspit reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@Enkidu93 Enkidu93 merged commit ba5915c into master Oct 1, 2025
4 checks passed
@Enkidu93 Enkidu93 deleted the get_update_rows_by_ref branch October 1, 2025 21:25
Enkidu93 added a commit to sillsdev/machine.py that referenced this pull request Oct 2, 2025
Enkidu93 added a commit to sillsdev/machine.py that referenced this pull request Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants