Conversation
| stop_lr = xp.asarray(self.stop_lr, dtype=step_dtype) | ||
| stable_steps = xp.asarray(self.stable_steps, dtype=step_dtype) | ||
| decay_phase_steps = xp.asarray(self.decay_phase_steps, dtype=step_dtype) | ||
| decay_num_steps = xp.asarray(self.decay_num_steps, dtype=step_dtype) |
Check notice
Code scanning / CodeQL
Unused local variable Note
There was a problem hiding this comment.
Pull request overview
Adds a new warmup-stable-decay (WSD) learning-rate scheduler to the backend-agnostic BaseLR registry, wires it into argument checking and backend exports, and expands cross-backend test coverage to validate behavior and consistency.
Changes:
- Implement
LearningRateWSD(type="wsd") with warmup support plus configurable decay modes (inverse_linear,cosine,linear). - Extend CLI/config arg validation to accept and validate WSD-specific parameters.
- Add/extend tests across universal, TF, PT, PD, and consistency suites for WSD behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
deepmd/dpmodel/utils/learning_rate.py |
Adds the LearningRateWSD scheduler implementation and registers it under BaseLR. |
deepmd/utils/argcheck.py |
Adds WSD-specific validation and exposes WSD arguments via the lr args plugin registry. |
deepmd/pt/utils/learning_rate.py |
Re-exports LearningRateWSD for the PyTorch backend utilities. |
deepmd/pd/utils/learning_rate.py |
Re-exports LearningRateWSD for the Paddle backend utilities. |
source/tests/universal/dpmodel/utils/test_learning_rate.py |
Adds extensive unit tests for WSD (modes, warmup, array input, boundary conditions). |
source/tests/tf/test_lr.py |
Adds TF wrapper build/value tests for WSD (default + cosine). |
source/tests/pt/test_lr.py |
Adds PT-side WSD curve tests for all decay modes. |
source/tests/pd/test_lr.py |
Adds PD-side WSD curve tests for all decay modes. |
source/tests/consistent/test_learning_rate.py |
Extends consistency parameterization to include WSD and adjusts the sampling step to hit the decay phase. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| stop_lr = xp.asarray(self.stop_lr, dtype=step_dtype) | ||
| stable_steps = xp.asarray(self.stable_steps, dtype=step_dtype) | ||
| decay_phase_steps = xp.asarray(self.decay_phase_steps, dtype=step_dtype) | ||
| decay_num_steps = xp.asarray(self.decay_num_steps, dtype=step_dtype) | ||
|
|
📝 WalkthroughWalkthroughIntroduces a new learning rate schedule class Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
source/tests/tf/test_lr.py (1)
106-140: These assertions don't exercise the TF graph path.
lr_schedule.value()just delegates tobase_lr.value(), so both new tests still pass ifLearningRateSchedule.build()is wrong fortype="wsd". Please evaluate the tensor returned bybuild()at a mid-decay step and compare that result instead.🧪 Suggested coverage improvement
- global_step = tf.constant(0, dtype=tf.int64) - lr_schedule.build(global_step, num_steps=10000) + g = tf.Graph() + with g.as_default(): + global_step = tf.placeholder(shape=[], dtype=tf.int64) + lr_tensor = lr_schedule.build(global_step, num_steps=10000) self.assertIsInstance(lr_schedule.base_lr, LearningRateWSD) - np.testing.assert_allclose( - lr_schedule.value(9500), lr_schedule.base_lr.value(9500), rtol=1e-10 - ) + with tf.Session(graph=g) as sess: + tensor_value = sess.run(lr_tensor, feed_dict={global_step: 9500}) + np.testing.assert_allclose( + tensor_value, lr_schedule.base_lr.value(9500), rtol=1e-10 + )Apply the same pattern to the cosine WSD variant as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/tf/test_lr.py` around lines 106 - 140, The tests are only exercising the Python path because lr_schedule.value(9500) passes a plain int; update both tests to pass a TF tensor so the built TF graph is evaluated: after calling LearningRateSchedule.build(global_step, num_steps=10000), call lr_schedule.value(tf.constant(9500, dtype=tf.int64)) and compare it to lr_schedule.base_lr.value(tf.constant(9500, dtype=tf.int64)) (and do the same change in test_wsd_cosine_build_and_value) to ensure LearningRateSchedule.build, base_lr and LearningRateWSD TF-paths are exercised.source/tests/universal/dpmodel/utils/test_learning_rate.py (1)
142-165: Consider splitting mixed validation assertions into separate tests.
test_invalid_decay_phase_ratioalso checks invaliddecay_type; splitting improves failure localization.♻️ Suggested test split
- def test_invalid_decay_phase_ratio(self) -> None: - """Test invalid WSD decay_phase_ratio values.""" + def test_invalid_decay_phase_ratio(self) -> None: + """Test invalid WSD decay_phase_ratio values.""" with self.assertRaises(ValueError): LearningRateWSD( start_lr=1e-3, stop_lr=1e-5, num_steps=10000, decay_phase_ratio=0.0, ) with self.assertRaises(ValueError): LearningRateWSD( start_lr=1e-3, stop_lr=1e-5, num_steps=10000, decay_phase_ratio=1.1, ) + + def test_invalid_decay_type(self) -> None: + """Test invalid WSD decay_type values.""" with self.assertRaises(ValueError): LearningRateWSD( start_lr=1e-3, stop_lr=1e-5, num_steps=10000, decay_type="bad_mode", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/universal/dpmodel/utils/test_learning_rate.py` around lines 142 - 165, test_invalid_decay_phase_ratio currently asserts multiple unrelated invalid cases (invalid decay_phase_ratio and invalid decay_type) in one test; split it into separate tests so each assertion checks a single validation rule. Create one test (e.g., test_invalid_decay_phase_ratio_values) that calls LearningRateWSD with decay_phase_ratio=0.0 and decay_phase_ratio=1.1 and asserts ValueError, and another test (e.g., test_invalid_decay_type) that calls LearningRateWSD with decay_type="bad_mode" and asserts ValueError; keep function/class names LearningRateWSD, decay_phase_ratio, and decay_type to locate the code. Ensure each new test has a clear name and only one responsibility for better failure localization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 514-525: The current int(self.decay_phase_ratio * self.num_steps)
can produce 0 or exceed post-warmup steps; change the computation of
decay_phase_steps to derive from and clamp against decay_num_steps instead of
raising: compute e.g. desired = max(0, int(round(self.decay_phase_ratio *
self.num_steps))) (or 0 if you prefer), then set self.decay_phase_steps =
min(desired, self.decay_num_steps) and if self.decay_phase_steps == 0 and
self.decay_num_steps > 0 set it to 1 (or otherwise handle the zero-decay case),
replacing the two ValueError checks; update code around decay_phase_steps,
decay_phase_ratio, num_steps and decay_num_steps to use this clamped value so
short runs or heavy-warmup configs don’t raise at runtime.
---
Nitpick comments:
In `@source/tests/tf/test_lr.py`:
- Around line 106-140: The tests are only exercising the Python path because
lr_schedule.value(9500) passes a plain int; update both tests to pass a TF
tensor so the built TF graph is evaluated: after calling
LearningRateSchedule.build(global_step, num_steps=10000), call
lr_schedule.value(tf.constant(9500, dtype=tf.int64)) and compare it to
lr_schedule.base_lr.value(tf.constant(9500, dtype=tf.int64)) (and do the same
change in test_wsd_cosine_build_and_value) to ensure LearningRateSchedule.build,
base_lr and LearningRateWSD TF-paths are exercised.
In `@source/tests/universal/dpmodel/utils/test_learning_rate.py`:
- Around line 142-165: test_invalid_decay_phase_ratio currently asserts multiple
unrelated invalid cases (invalid decay_phase_ratio and invalid decay_type) in
one test; split it into separate tests so each assertion checks a single
validation rule. Create one test (e.g., test_invalid_decay_phase_ratio_values)
that calls LearningRateWSD with decay_phase_ratio=0.0 and decay_phase_ratio=1.1
and asserts ValueError, and another test (e.g., test_invalid_decay_type) that
calls LearningRateWSD with decay_type="bad_mode" and asserts ValueError; keep
function/class names LearningRateWSD, decay_phase_ratio, and decay_type to
locate the code. Ensure each new test has a clear name and only one
responsibility for better failure localization.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a283f5a5-f445-4ee4-858e-4f089ee4287d
📒 Files selected for processing (9)
deepmd/dpmodel/utils/learning_rate.pydeepmd/pd/utils/learning_rate.pydeepmd/pt/utils/learning_rate.pydeepmd/utils/argcheck.pysource/tests/consistent/test_learning_rate.pysource/tests/pd/test_lr.pysource/tests/pt/test_lr.pysource/tests/tf/test_lr.pysource/tests/universal/dpmodel/utils/test_learning_rate.py
| self.decay_phase_steps = int(self.decay_phase_ratio * self.num_steps) | ||
| if self.decay_phase_steps <= 0: | ||
| raise ValueError( | ||
| "decay_phase_ratio results in zero decay steps. " | ||
| "Increase num_steps or decay_phase_ratio." | ||
| ) | ||
| if self.decay_phase_steps > self.decay_num_steps: | ||
| raise ValueError( | ||
| "decay phase steps must not exceed the post-warmup steps. " | ||
| f"Got decay_phase_steps={self.decay_phase_steps}, " | ||
| f"post_warmup_steps={self.decay_num_steps}." | ||
| ) |
There was a problem hiding this comment.
Don't floor the decay phase into an impossible span.
Line 514 uses int(decay_phase_ratio * self.num_steps), so valid-looking configs can fail only at runtime: num_steps=5 with the default ratio produces 0 decay steps, and num_steps=10, warmup_steps=9, decay_phase_ratio=0.2 produces 2 > decay_num_steps=1. Please clamp the derived span, or derive it from self.decay_num_steps, so short runs and warmup-heavy runs still work.
💡 Possible fix
- self.decay_phase_steps = int(self.decay_phase_ratio * self.num_steps)
- if self.decay_phase_steps <= 0:
- raise ValueError(
- "decay_phase_ratio results in zero decay steps. "
- "Increase num_steps or decay_phase_ratio."
- )
- if self.decay_phase_steps > self.decay_num_steps:
- raise ValueError(
- "decay phase steps must not exceed the post-warmup steps. "
- f"Got decay_phase_steps={self.decay_phase_steps}, "
- f"post_warmup_steps={self.decay_num_steps}."
- )
+ if self.decay_num_steps <= 0:
+ raise ValueError("WSD requires at least one post-warmup step.")
+ self.decay_phase_steps = min(
+ max(int(self.decay_phase_ratio * self.num_steps), 1),
+ self.decay_num_steps,
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/dpmodel/utils/learning_rate.py` around lines 514 - 525, The current
int(self.decay_phase_ratio * self.num_steps) can produce 0 or exceed post-warmup
steps; change the computation of decay_phase_steps to derive from and clamp
against decay_num_steps instead of raising: compute e.g. desired = max(0,
int(round(self.decay_phase_ratio * self.num_steps))) (or 0 if you prefer), then
set self.decay_phase_steps = min(desired, self.decay_num_steps) and if
self.decay_phase_steps == 0 and self.decay_num_steps > 0 set it to 1 (or
otherwise handle the zero-decay case), replacing the two ValueError checks;
update code around decay_phase_steps, decay_phase_ratio, num_steps and
decay_num_steps to use this clamped value so short runs or heavy-warmup configs
don’t raise at runtime.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5326 +/- ##
==========================================
- Coverage 82.29% 82.29% -0.01%
==========================================
Files 775 775
Lines 77627 77696 +69
Branches 3675 3675
==========================================
+ Hits 63887 63939 +52
- Misses 12566 12584 +18
+ Partials 1174 1173 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The doc will be added through PR #5276
Summary by CodeRabbit
New Features
Tests