cp: fix: qwen32 nightly metric check more stable (1271) into r0.4.0#1308
cp: fix: qwen32 nightly metric check more stable (1271) into r0.4.0#1308
fix: qwen32 nightly metric check more stable (1271) into r0.4.0#1308Conversation
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughUpdates a test script to change the train/loss assertion from a single-step threshold at step 20 to a windowed mean over the last 16 steps with a slightly adjusted threshold. No other files or declarations are modified. Changes
Sequence Diagram(s)sequenceDiagram
participant CI as CI Runner
participant Test as sft-qwen2.5-32b-4n8g-fsdp2tp8sp-actckpt.v3.sh
participant Train as Training Job
participant Metrics as Metrics Store
CI->>Test: Execute test script
Test->>Train: Launch training
Train-->>Metrics: Emit train/loss over steps
Test->>Metrics: Fetch train/loss series
Note over Test: New logic: compute mean over last 16 steps
Test->>Test: mean(train/loss, 16) < 0.31 ?
alt Pass
Test-->>CI: Exit 0
else Fail
Test-->>CI: Exit non-zero
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.sh📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
tests/test_suites/llm/*.sh📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
tests/test_suites/**📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
beep boop [🤖]: Hi @terrykong 👋,
Summary by CodeRabbit