fix: more robust fp8 rollout metric check#1307
Conversation
Signed-off-by: Terry Kong <terryk@nvidia.com>
📝 WalkthroughWalkthroughIntroduces ratio_above helper, extends mean with ignore_top_p parameter and validation, updates evaluate_check to expose ratio_above and use builtins for eval safety, and adjusts a GRPO LLM test to use mean(..., ignore_top_p=0.05). Adds comprehensive unit tests covering helpers and evaluate_check scenarios. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant evaluate_check
participant Context as Local Context (min,max,mean,ratio_above)
participant Builtins as Safe builtins
Caller->>evaluate_check: evaluate_check(check_expr, value_expr, data)
activate evaluate_check
note over evaluate_check,Context: Build context with helpers and data
evaluate_check->>Builtins: Use builtins for eval (no __builtins__ leakage)
evaluate_check->>evaluate_check: eval(value_expr, {__builtins__: builtins}, Context)
evaluate_check->>evaluate_check: eval(check_expr, {__builtins__: builtins}, Context)
evaluate_check-->>Caller: {passed, message, value}
deactivate evaluate_check
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Terry Kong <terryk@nvidia.com>
|
@terrykong are we running this in CI? Is 100 minutes enough for 100 steps? |
|
I cut the generations in half and we should be able to complete in 3 hr. The threshold is still too strict on ratio_above. Am running it once more and will update that threshold |
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Low step count sometimes skewed the logprob above 1.1 when there were occasional spikes
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
New Features
Documentation
Tests
Chores