-
-
Notifications
You must be signed in to change notification settings - Fork 3
Separate progress for fine-tuning and inferencing #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c6f5743 to
cc8e549
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #191 +/- ##
=======================================
Coverage 88.93% 88.94%
=======================================
Files 282 282
Lines 17056 17062 +6
=======================================
+ Hits 15169 15175 +6
Misses 1887 1887 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)
machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):
percentage: float = 0 report_steps: bool = True stage: Optional[Literal["train", "inference"]] = None
Is this worth an Enum?
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)
machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Is this worth an
Enum?
I've made it an enum in the Serval PR as well. My primary motivation was for consistency and to help strongly type the C# API.
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
I've made it an enum in the Serval PR as well. My primary motivation was for consistency and to help strongly type the C# API.
This class is intended to be general-purpose, so I would just make this a string.
machine/jobs/build_nmt_engine.py line 47 at r1 (raw file):
if status.phase_stage is not None: if status.phase_step is not None: task.get_logger().report_single_value(
It looks like there is already code in place to pass multiple progress-related properties through the task runtime properties. It is currently used for SMT and word alignment jobs. The update_runtime_properties function makes a call to the /tasks.edit endpoint on the ClearML REST API to update the runtime property on the task. I believe that this is how the Task.set_progress method works. We should use a consistent approach for all job types. Apart from the fact that ClearML passes progress information using the runtime properties, I feel like there was a reason why we couldn't use the logger to pass progress information, but I can't remember why.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit and @pmachapman)
machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This class is intended to be general-purpose, so I would just make this a string.
I'm sorry - I meant defining an Enum class instead of just the optional literals. But sounds like Damien wants it to be generic 👍
b8bcca6 to
dface12
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 7 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @Enkidu93)
machine/jobs/build_nmt_engine.py line 47 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
It looks like there is already code in place to pass multiple progress-related properties through the task runtime properties. It is currently used for SMT and word alignment jobs. The
update_runtime_propertiesfunction makes a call to the/tasks.editendpoint on the ClearML REST API to update theruntimeproperty on the task. I believe that this is how theTask.set_progressmethod works. We should use a consistent approach for all job types. Apart from the fact that ClearML passes progress information using the runtime properties, I feel like there was a reason why we couldn't use the logger to pass progress information, but I can't remember why.
From what I can tell update_runtime_properties only sets the overall progress, not progress for subsystems?
I don't think writing to the runtime properties is the correct place, as apart from the overall progress value, the only other values set their are related to the machine/task runtime and versions (https://github.com/search?q=repo%3Aclearml%2Fclearml%20_set_runtime_properties&type=code)
From what I can tell, logging scalars is the correct place for this information (https://clear.ml/docs/latest/docs/webapp/webapp_exp_track_visual/#scalars) and the report_single_value function I use backs onto the report_scalar function (https://github.com/clearml/clearml/blob/52e053818206c4df9a653ac0411c9bcfd504006a/clearml/logger.py#L196).
I notice that the logger's report_scalar code fires an event rather than an async task, so perhaps it will be slight less time-accurate (or even non-guaranteed delivery?) than writing it to the runtime (which I think is a locking operation on the task's runtime - https://github.com/clearml/clearml/blob/52e053818206c4df9a653ac0411c9bcfd504006a/clearml/backend_interface/task/task.py#L340), but for our purposes, this is OK?
machine/utils/phased_progress_reporter.py line 15 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I'm sorry - I meant defining an
Enumclass instead of just the optional literals. But sounds like Damien wants it to be generic 👍
Done. It is now a string.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)
machine/jobs/build_nmt_engine.py line 47 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
From what I can tell
update_runtime_propertiesonly sets the overall progress, not progress for subsystems?I don't think writing to the runtime properties is the correct place, as apart from the overall progress value, the only other values set their are related to the machine/task runtime and versions (https://github.com/search?q=repo%3Aclearml%2Fclearml%20_set_runtime_properties&type=code)
From what I can tell, logging scalars is the correct place for this information (https://clear.ml/docs/latest/docs/webapp/webapp_exp_track_visual/#scalars) and the report_single_value function I use backs onto the report_scalar function (https://github.com/clearml/clearml/blob/52e053818206c4df9a653ac0411c9bcfd504006a/clearml/logger.py#L196).
I notice that the logger's report_scalar code fires an event rather than an async task, so perhaps it will be slight less time-accurate (or even non-guaranteed delivery?) than writing it to the runtime (which I think is a locking operation on the task's runtime - https://github.com/clearml/clearml/blob/52e053818206c4df9a653ac0411c9bcfd504006a/clearml/backend_interface/task/task.py#L340), but for our purposes, this is OK?
ClearML uses the runtime properties for progress. SMT and word alignment use it for extra progress fields (messages). The report_single_value function is used to report a single value, such as the overall BLEU score. It isn't really intended to be used for values that change throughout the lifetime of a job, such as progress, which is why ClearML uses the runtime properties for it. I also don't want to have multiple ways of passing progress information.
bbd9714 to
a33d110
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 9 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)
machine/jobs/build_nmt_engine.py line 47 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
ClearML uses the runtime properties for progress. SMT and word alignment use it for extra progress fields (messages). The
report_single_valuefunction is used to report a single value, such as the overall BLEU score. It isn't really intended to be used for values that change throughout the lifetime of a job, such as progress, which is why ClearML uses the runtime properties for it. I also don't want to have multiple ways of passing progress information.
Done. Thank you for your patience!
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @pmachapman)
a33d110 to
835f894
Compare
835f894 to
047b01d
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @pmachapman)
Required for sillsdev/serval#702
This change is