Skip to content

Conversation

@mudiagaobrikisil
Copy link
Contributor

@mudiagaobrikisil mudiagaobrikisil commented Mar 30, 2025

This PR adds a disclaimer line on the generated draft. The draft is pf the format
\rem This draft of MAT was machine translated on from project <source_project_name>. It should be reviewed and edited carefully.
It adds a new property called addDisclaimer to the getUsfmAsync method, which is used in the translation engines controller class. This property, when set to true, ensures the disclaimer is shown and the other way round.
It also has two new tests to ensure the disclaimer is not added when the addDisclaimer is set to false, and that is is shown when the addDisclaimer is set to true.

The model name has not yet been added to the disclaimer. After discussing with @Enkidu93, it seemed there is no direct way to get the model name in serval, as it is in machine.py. It is captured in the settings.yaml of machine.py, e.g. parent_model_name: facebook/nllb-200-distilled-1.3B. The two suggestions were to either add a new variable to serval settings that captures the model name or to figure out a way to get it from clearml. However, I think the model name I have seen in ClearML is the one used for testing, not the actual model name. @johnml1135, @ddaspit, how would you suggest I go about the model name, and is it something I should do at this point?

Also, it has been manually tested on swagger and seems to work.


This change is Reviewable

Update QA yaml values

Update to machine 3.6.4

Add corpora validation & appropriate error messages

Add missing '!'

Add fileless corpus test; no files monolingual corpus check

More tests

Translation engine tests; pretranslation tests

Update to client version 1.9.1

Changes
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 73.68421% with 10 lines in your changes missing coverage. Please review.

Project coverage is 64.93%. Comparing base (5b2f0da) to head (f02950c).

Files with missing lines Patch % Lines
...rval.Translation/Services/PretranslationService.cs 72.97% 7 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #659      +/-   ##
==========================================
+ Coverage   64.90%   64.93%   +0.02%     
==========================================
  Files         342      342              
  Lines       18836    18871      +35     
  Branches     2455     2462       +7     
==========================================
+ Hits        12225    12253      +28     
- Misses       5728     5734       +6     
- Partials      883      884       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mudiagaobrikisil mudiagaobrikisil requested review from Enkidu93, ddaspit and johnml1135 and removed request for Enkidu93 and johnml1135 March 30, 2025 22:16
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to implement the insertion of the \rem marker in the USFM updater class, then silnlp would be able to use the same code.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)


src/Serval/src/Serval.Translation/Services/IPretranslationService.cs line 23 at r1 (raw file):

        PretranslationUsfmMarkerBehavior embedBehavior,
        PretranslationUsfmMarkerBehavior styleMarkerBehavior,
        bool addDisclaimer = false,

We don't need this parameter.


src/Serval/src/Serval.Translation/Services/PretranslationService.cs line 184 at r1 (raw file):

            if (addDisclaimer)
            {
                idPosition = usfm.IndexOf("\\id");

We should turn this into a function. Also, I believe that silnlp inserts the \rem after the \ide marker if it exists.

Copy link
Contributor Author

@mudiagaobrikisil mudiagaobrikisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. Refactoring.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)

@johnml1135
Copy link
Collaborator

If the model name is important enough for the disclaimer, it is also important enough for the build itself. What if we put it in executionData and then grab it from there. I would say open a new issue to grab the model name and put it in executionData but for right now, don't include it for this version of the disclaimer.

@Enkidu93
Copy link
Collaborator

Enkidu93 commented Apr 1, 2025

Let me know if you need help, Mudi, with reworking this. Also, @johnml1135 and I discussed the issue of adding the 'model name' to the remark this morning. For context, the 'model name' in EITL drafts includes the name of the base model, names of source project(s) and the target project, and the book filter that was applied to the data in training (e.g., NT+GEN) - so maybe 'model name' isn't really the best moniker 😆 . John suggested that we could just try and get a basic disclaimer through and then spin off issues for getting the base model name and training data/filters specified. How does that sound, @ddaspit?

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)

Copy link
Contributor Author

@mudiagaobrikisil mudiagaobrikisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnml1135, noted. I will refactor and get this disclaimer out and then open an issue for the model name. @Enkidu93 , would a call in the morning be fine?

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Enkidu93 I agree. We should just start with a basic disclaimer.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)

@Enkidu93 Enkidu93 self-assigned this Apr 29, 2025
@Enkidu93
Copy link
Collaborator

Since the strategy that we've arrived at for implementing this is different than what Mudi coded, I'm just going to close this and open a new PR.

@Enkidu93 Enkidu93 closed this Apr 30, 2025
@Enkidu93 Enkidu93 deleted the draft_disclaimer branch April 30, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants