-
-
Notifications
You must be signed in to change notification settings - Fork 17
Get update rows by reference #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #341 +/- ##
==========================================
+ Coverage 72.29% 72.32% +0.03%
==========================================
Files 417 418 +1
Lines 35475 35579 +104
Branches 4900 4914 +14
==========================================
+ Hits 25646 25732 +86
- Misses 8731 8751 +20
+ Partials 1098 1096 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 3 unresolved discussions (waiting on @ddaspit)
src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 76 at r2 (raw file):
{ // We need two maps so that update rows can be specified per segment // but be handled correctly whether or not the USFM has segments for that verse
As an alternative to two maps, we could use the ignore segments map and then search through the list to see if a segment matches if the update row ref has a segment. That would probably be better, but I wasn't sure.
tests/SIL.Machine.Tests/Corpora/UpdateUsfmParserHandlerTests.cs line 520 at r2 (raw file):
) ); // Updating using relaxed refs will not be perfect, but it's the best we can do while allowing for out of order rows
Are you ok with this, @ddaspit? I don't see a way to avoid this if the rows have relaxed refs.
src/SIL.Machine/Corpora/ScriptureRef.cs line 134 at r2 (raw file):
} public class ScriptureRefComparer : IComparer<ScriptureRef>, IEqualityComparer<ScriptureRef>
I implemented IComparer as well partially just for completeness. There's no way in calling code to compare scripture refs ignoring segments using a comparer (e.g., if you want to use Order()) - this was the root isssue of Serval/Machine not handling certain refs correctly. It's superseded by the row map solution, but I still thought it'd be helpful to expose. If you'd rather I remove it, I can.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look great, @ddaspit. Thank you! I think this issue is the only one we still need to watch out for: In Serval since we're basically using the source ref in the target versification for pretranslations, we need to sort the pretranslations using compareSegments=true and then pass compareSegments=true to the handler when template=source (i.e., should be the same refs as the USFM), correct? For template=target, we'll want to ignore segments. This makes we wonder if we sort the rows ourselves in the handler to maintain consistency between how the refs are being compared. As long as our sorting method is a stable sorting method, this shouldn't mess up relaxed non-verse refs, right?
Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @ddaspit)
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs line 31 at r3 (raw file):
_curTextType.Count == 0 ? ScriptureTextType.None : _curTextType.Peek(); protected bool DuplicateVerse { get; private set; }
I wondered about this. Do you want to add something to the editor config to flag protected fields? I think we only have something set up for protected readonly fields.
src/SIL.Machine/Corpora/VerseRefComparer.cs line 9 at r3 (raw file):
namespace SIL.Machine.Corpora { public class VerseRefComparer : IComparer<VerseRef>, IEqualityComparer<VerseRef>
If you want to keep the ScriptureRefComparer, should I move it to a separate file or is it appropriate to keep it with ScriptureRef?
tests/SIL.Machine.Tests/Corpora/UpdateUsfmParserHandlerTests.cs line 431 at r3 (raw file):
var rows = new List<UpdateUsfmRow> { new UpdateUsfmRow(ScrRef("MAT 2:2"), "Verse 2."),
Is this correct, @ddaspit? Is this something that could even actually happen?
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, compareSegments = true when using the source as the template and compareSegments = false when using the target as the template.
For
template=target, we'll want to ignore segments. This makes we wonder if we sort the rows ourselves in the handler to maintain consistency between how the refs are being compared. As long as our sorting method is a stable sorting method, this shouldn't mess up relaxed non-verse refs, right?
I'm not sure I understand what you are asking here. Where exactly are you talking about sorting the rows?
@ddaspit reviewed 2 of 5 files at r1, 5 of 5 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs line 31 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I wondered about this. Do you want to add something to the editor config to flag protected fields? I think we only have something set up for protected readonly fields.
I don't know if we can set up an editorconfig rule to catch this. Any fields that are accessible outside of a class should be accessed through a property.
src/SIL.Machine/Corpora/VerseRefComparer.cs line 9 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
If you want to keep the
ScriptureRefComparer, should I move it to a separate file or is it appropriate to keep it withScriptureRef?
I think it makes sense to move it to a separate file.
tests/SIL.Machine.Tests/Corpora/UpdateUsfmParserHandlerTests.cs line 431 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Is this correct, @ddaspit? Is this something that could even actually happen?
I'm not sure how realistic it is, but it is the correct behavior in this case.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Serval, we sort the pretranslations: https://github.com/sillsdev/serval/blob/200925cd6722784fc6beb20428338267d707243a/src/Serval/src/Serval.Translation/Services/PretranslationService.cs#L148. What I'm saying is that, if sorting them is necessary, we need to make sure that the comparer we use when sorting is the same as the comparer used when advancing the rows, so maybe we should just sort the rows in the update handler itself to avoid errors caused by inconsistent sorting (i.e., the client sorts it one way, but doesn't pass the corresponding value of compareSegments). This is the issue I'm thinking about: #338.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That makes sense. Do you want to make the change or should I?
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do it since there are a couple other things to clean up/look over too.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I will leave it to you to wrap up this PR.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
90c5b94 to
55ae257
Compare
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Reviewable status: 5 of 8 files reviewed, 2 unresolved discussions (waiting on @ddaspit)
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs line 31 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I don't know if we can set up an editorconfig rule to catch this. Any fields that are accessible outside of a class should be accessed through a property.
OK
src/SIL.Machine/Corpora/VerseRefComparer.cs line 9 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I think it makes sense to move it to a separate file.
Sounds good. Done.
tests/SIL.Machine.Tests/Corpora/UpdateUsfmParserHandlerTests.cs line 431 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I'm not sure how realistic it is, but it is the correct behavior in this case.
OK
… verses; add remark after ide tag Fix marker placement test given new proper out of order verse handling Add comment regarding row maps
- add compareSegments parameter to UpdateUsfmParserHandler
55ae257 to
8de2518
Compare
|
As per @ddaspit and I's conversation, I've removed the row-sorting. From Mongo:
So I think we'll just have to deal with the chance that there's some ugly behavior when |
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my research, it looks like sorting by _id would be sufficient. The main issue where the ObjectId sort order and the insertion order might not be exactly the same is when multiple processes insert docs. This is not the case for Serval. All pretranslations for a build are inserted by the same service (Serval API). We would need to add sorting support to SIL.DataAccess. In the meantime, sorting by ScriptureRef will work when the source is used as the template, but will not work when the target is used as the template.
@ddaspit reviewed 4 of 5 files at r4, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 420 at r6 (raw file):
Dictionary<string, object> rowMetadata = null; int sourceIndex = 0; // search the sorted rows with updated text, starting from where we left off last.
This code does assume that the rows are sorted.
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous statement is not accurate. The current implementation in Serval should work correctly for both source and target templates. We only need to sort by ScriptureRef prior to converting to relaxed refs for template=Target to work properly, which is what we are doing.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, yes, sounds good.
Reviewable status: 7 of 8 files reviewed, 1 unresolved discussion (waiting on @ddaspit)
src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs line 420 at r6 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This code does assume that the rows are sorted.
I reverted it. I don't like that we have this assumption in the comment when it isn't necessarily true. I added a comment on the rows parameter.
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 1 of 1 files at r7, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
Fixes sillsdev/machine.py#232
Also fixes #339 and fixes #340
These will all need to be ported.
This change is