Conversation
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
|
@youngeunkwon0405 please refer to this PR and add perf recipe+script for the two async benchmarks as well |
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
…nto guyueh/perf_tests
📝 WalkthroughWalkthroughThis PR introduces a comprehensive GRPO performance testing infrastructure with configuration files for multiple LLM models (DeepSeek, Llama, Qwen variants) across various hardware configurations, corresponding test execution scripts, and shared test utilities. Changes
Sequence Diagram(s)sequenceDiagram
participant Test Script as Test Script<br/>(grpo-*.sh)
participant Env as common.env
participant Runner as uv run<br/>examples/run_grpo_math.py
participant TBoard as TensorBoard Logs
participant Converter as json_dump_tb_logs.py
participant Metrics as JSON Metrics
participant Validator as check_metrics.py
Test Script->>Env: source common.env
Env->>Env: validate config exists<br/>setup directories<br/>translate paths
Env-->>Test Script: environment ready
Test Script->>Runner: execute with config<br/>+ logging + checkpointing
Runner->>TBoard: generate logs
Runner-->>Test Script: experiment complete
Test Script->>Converter: convert logs to JSON
Converter->>Metrics: write metrics
Converter-->>Test Script: conversion done
alt max_steps reached
Test Script->>Validator: check constraints
Validator->>Metrics: read train/token_mult_prob_error
Validator-->>Test Script: validation result
else max_steps not reached
Test Script-->>Test Script: skip validation
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes The changes are primarily configuration and shell script additions following established patterns. While there is significant file volume (23+ files), they are homogeneous—each YAML config and test script shares a consistent structure with minimal logic. The variations are primarily in hyperparameter values, model names, and cluster configurations rather than architectural differences. Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 26
🧹 Nitpick comments (5)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1)
6-12: Document or remove unused configuration variables.The variables
NUM_NODES,NUM_RUNS, andNUM_MINUTESare defined but not referenced in the script. If these are intentional (e.g., for documentation or future use), add a comment explaining their purpose.tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1)
10-14: Document or remove unused configuration variables.The variables
NUM_NODES,NUM_RUNS, andNUM_MINUTESare defined but not referenced in the script. If these are for documentation or future use, add a comment explaining their purpose.examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml (1)
17-17: Replacemake_sequence_length_divisible_by: 1with a meaningful divisor.Dividing by 1 is a no-op. Consider using a value that aligns with the tensor parallelism factor (e.g.,
2to matchtensor_model_parallel_size), or remove this setting if not needed.tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1)
8-12: Document or remove unused configuration variables.The variables
NUM_NODES,NUM_RUNS, andNUM_MINUTESare defined but not referenced in the script. If these are for documentation or future use, add a comment explaining their purpose.tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh (1)
3-3: Quote variable expansions for robustness.Line 3 sources the environment file without quoting. While typically safe, it's better practice to quote variable expansions to handle paths with spaces or special characters.
-source $SCRIPT_DIR/common.env +source "$SCRIPT_DIR/common.env"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml(1 hunks)tests/test_suites/llm/performance/common.env(1 hunks)tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh(1 hunks)tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh(1 hunks)tests/test_suites/performance.txt(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Files:
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml
examples/configs/recipes/**/*.{yaml,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Files:
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml
examples/configs/recipes/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place recipe YAMLs under examples/configs/recipes//
Files:
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.shtests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.shtests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.shtests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh
tests/test_suites/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt
Files:
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.shtests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.shtests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.shtests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.shtests/test_suites/llm/performance/common.envtests/test_suites/performance.txt
🧠 Learnings (1)
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.
Applied to files:
tests/test_suites/llm/performance/common.env
🧬 Code graph analysis (8)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 34-34: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh
[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 29-29: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 29-29: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
🔇 Additional comments (8)
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml (1)
1-59: LGTM. The YAML configuration is well-structured and complete for a GRPO DeepSeek-V3 performance run.examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml (1)
1-53: LGTM. The YAML configuration properly sets up FP8 quantization with consistent precision settings across policy and generation blocks.examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml (1)
1-41: LGTM. The YAML configuration is properly structured for Llama-3.3-70B with appropriate Megatron parallelism settings.examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml (1)
1-44: LGTM. The YAML configuration correctly sets up MoE-specific parallelism with appropriate expert parallelism for Qwen3-30B-A3B.tests/test_suites/performance.txt (1)
1-12: LGTM. Test registry is correctly populated with GRPO performance test script references.tests/test_suites/llm/performance/common.env (1)
1-45: LGTM!The common environment setup provides robust error handling with
set -eou pipefail, validates the config path existence, supports dry-run testing, and implements an early exit mechanism to save compute resources when max steps are reached.examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml (1)
1-52: LGTM!The FP8 configuration for Qwen3-32B is well-structured with appropriate parallelism settings (TP=4, PP=4), blockwise FP8 recipe, and matching vLLM generation configuration.
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh (1)
8-12: Verify exported or used variables.Shellcheck reports
NUM_NODES,NUM_RUNS, andNUM_MINUTESas unused. These variables are likely either:
- Exported for use by
exit_if_max_steps_reachedfromcommon.env- Used by downstream scripts or the GRPO runner
If truly unused, they should be removed; if intentionally exported, add a comment for clarity.
Can you confirm whether these variables are used by
exit_if_max_steps_reachedor other external code? If so, consider adding a brief comment above the CONFIG block to document their purpose.
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
|
@terrykong L0 CI is passing, should we run more CI? |
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
What does this PR do ?
Onboard some recipes for short perf testing in CI
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit