Skip to content

Conversation

@pmachapman
Copy link
Collaborator

@pmachapman pmachapman commented Jun 4, 2025

Required for sillsdev/serval#702


This change is Reviewable

@pmachapman pmachapman force-pushed the separate_progress branch from c6f5743 to cc8e549 Compare June 4, 2025 03:40
@pmachapman pmachapman requested a review from ddaspit June 4, 2025 03:41
@codecov-commenter
Copy link

codecov-commenter commented Jun 4, 2025

Codecov Report

Attention: Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.94%. Comparing base (6a9fe26) to head (047b01d).

Files with missing lines Patch % Lines
machine/jobs/nmt_engine_build_job.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #191   +/-   ##
=======================================
  Coverage   88.93%   88.94%           
=======================================
  Files         282      282           
  Lines       17056    17062    +6     
=======================================
+ Hits        15169    15175    +6     
  Misses       1887     1887           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)


machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):

    percentage: float = 0
    report_steps: bool = True
    stage: Optional[Literal["train", "inference"]] = None

Is this worth an Enum?

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)


machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Is this worth an Enum?

I've made it an enum in the Serval PR as well. My primary motivation was for consistency and to help strongly type the C# API.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)


machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

I've made it an enum in the Serval PR as well. My primary motivation was for consistency and to help strongly type the C# API.

This class is intended to be general-purpose, so I would just make this a string.


machine/jobs/build_nmt_engine.py line 47 at r1 (raw file):

                if status.phase_stage is not None:
                    if status.phase_step is not None:
                        task.get_logger().report_single_value(

It looks like there is already code in place to pass multiple progress-related properties through the task runtime properties. It is currently used for SMT and word alignment jobs. The update_runtime_properties function makes a call to the /tasks.edit endpoint on the ClearML REST API to update the runtime property on the task. I believe that this is how the Task.set_progress method works. We should use a consistent approach for all job types. Apart from the fact that ClearML passes progress information using the runtime properties, I feel like there was a reason why we couldn't use the logger to pass progress information, but I can't remember why.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit and @pmachapman)


machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This class is intended to be general-purpose, so I would just make this a string.

I'm sorry - I meant defining an Enum class instead of just the optional literals. But sounds like Damien wants it to be generic 👍

@pmachapman pmachapman force-pushed the separate_progress branch 2 times, most recently from b8bcca6 to dface12 Compare June 9, 2025 01:38
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 7 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @Enkidu93)


machine/jobs/build_nmt_engine.py line 47 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

It looks like there is already code in place to pass multiple progress-related properties through the task runtime properties. It is currently used for SMT and word alignment jobs. The update_runtime_properties function makes a call to the /tasks.edit endpoint on the ClearML REST API to update the runtime property on the task. I believe that this is how the Task.set_progress method works. We should use a consistent approach for all job types. Apart from the fact that ClearML passes progress information using the runtime properties, I feel like there was a reason why we couldn't use the logger to pass progress information, but I can't remember why.

From what I can tell update_runtime_properties only sets the overall progress, not progress for subsystems?

I don't think writing to the runtime properties is the correct place, as apart from the overall progress value, the only other values set their are related to the machine/task runtime and versions (https://github.com/search?q=repo%3Aclearml%2Fclearml%20_set_runtime_properties&type=code)

From what I can tell, logging scalars is the correct place for this information (https://clear.ml/docs/latest/docs/webapp/webapp_exp_track_visual/#scalars) and the report_single_value function I use backs onto the report_scalar function (https://github.com/clearml/clearml/blob/52e053818206c4df9a653ac0411c9bcfd504006a/clearml/logger.py#L196).

I notice that the logger's report_scalar code fires an event rather than an async task, so perhaps it will be slight less time-accurate (or even non-guaranteed delivery?) than writing it to the runtime (which I think is a locking operation on the task's runtime - https://github.com/clearml/clearml/blob/52e053818206c4df9a653ac0411c9bcfd504006a/clearml/backend_interface/task/task.py#L340), but for our purposes, this is OK?


machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I'm sorry - I meant defining an Enum class instead of just the optional literals. But sounds like Damien wants it to be generic 👍

Done. It is now a string.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)


machine/jobs/build_nmt_engine.py line 47 at r1 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

From what I can tell update_runtime_properties only sets the overall progress, not progress for subsystems?

I don't think writing to the runtime properties is the correct place, as apart from the overall progress value, the only other values set their are related to the machine/task runtime and versions (https://github.com/search?q=repo%3Aclearml%2Fclearml%20_set_runtime_properties&type=code)

From what I can tell, logging scalars is the correct place for this information (https://clear.ml/docs/latest/docs/webapp/webapp_exp_track_visual/#scalars) and the report_single_value function I use backs onto the report_scalar function (https://github.com/clearml/clearml/blob/52e053818206c4df9a653ac0411c9bcfd504006a/clearml/logger.py#L196).

I notice that the logger's report_scalar code fires an event rather than an async task, so perhaps it will be slight less time-accurate (or even non-guaranteed delivery?) than writing it to the runtime (which I think is a locking operation on the task's runtime - https://github.com/clearml/clearml/blob/52e053818206c4df9a653ac0411c9bcfd504006a/clearml/backend_interface/task/task.py#L340), but for our purposes, this is OK?

ClearML uses the runtime properties for progress. SMT and word alignment use it for extra progress fields (messages). The report_single_value function is used to report a single value, such as the overall BLEU score. It isn't really intended to be used for values that change throughout the lifetime of a job, such as progress, which is why ClearML uses the runtime properties for it. I also don't want to have multiple ways of passing progress information.

@pmachapman pmachapman force-pushed the separate_progress branch 3 times, most recently from bbd9714 to a33d110 Compare June 11, 2025 02:42
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)


machine/jobs/build_nmt_engine.py line 47 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

ClearML uses the runtime properties for progress. SMT and word alignment use it for extra progress fields (messages). The report_single_value function is used to report a single value, such as the overall BLEU score. It isn't really intended to be used for values that change throughout the lifetime of a job, such as progress, which is why ClearML uses the runtime properties for it. I also don't want to have multiple ways of passing progress information.

Done. Thank you for your patience!

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

@ddaspit ddaspit force-pushed the separate_progress branch from a33d110 to 835f894 Compare June 20, 2025 16:54
@ddaspit ddaspit force-pushed the separate_progress branch from 835f894 to 047b01d Compare June 21, 2025 02:13
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

@ddaspit ddaspit merged commit cd21706 into main Jun 23, 2025
14 checks passed
@ddaspit ddaspit deleted the separate_progress branch June 23, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants