-
Notifications
You must be signed in to change notification settings - Fork 156
glm5 fix precision key name #976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1809,10 +1809,10 @@ | |
|
|
||
| glm5-nvfp4-b200-sglang: | ||
| image: lmsysorg/sglang:nightly-dev-cu13-20260328-a27651d5 | ||
| model: nvidia/GLM-5-NVFP4 | ||
| model-prefix: glm5 | ||
| runner: b200 | ||
| precision: nvfp4 | ||
| precision: fp4 | ||
|
Check failure on line 1815 in .github/configs/nvidia-master.yaml
|
||
| framework: sglang | ||
|
Check failure on line 1816 in .github/configs/nvidia-master.yaml
|
||
|
Comment on lines
1812
to
1816
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 This PR modifies Extended reasoning...What the bug is and how it manifestsThe repo enforces a policy (defined in The specific code path that triggers itThe diff for this PR touches exactly one file: Why existing code doesn't prevent itThere is no automated CI check that hard-blocks the merge via a status check. The enforcement is done by the Claude PR-review GitHub Actions bot, which posts a blocking review comment. Because it is bot-enforced rather than a required status check, the PR can technically be merged by a maintainer who overrides it — but the policy is explicit and the bot will flag it immediately on review. ImpactThe omission violates the repo's documented change-tracking policy and will cause the automated reviewer to block the PR on its first pass. This adds friction and delays the merge. More broadly, How to fix itAdd an entry to - config-keys:
- glm5-nvfp4-b200-sglang
description:
- "Fix precision key name from 'nvfp4' to 'fp4' for GLM-5 NVFP4 B200 SGLang config"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/976Step-by-step proof
|
||
| multinode: false | ||
| seq-len-configs: | ||
|
Check warning on line 1818 in .github/configs/nvidia-master.yaml
|
||
|
Comment on lines
1812
to
1818
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 The config key Extended reasoning...What the bug is and how it manifests The PR correctly changes The specific code path that triggers it All other NVFP4-quantized model configs in
Additionally, Why existing code doesn't prevent it The precision field is authoritative for benchmark script discovery (per CONFIGS.md line 33), so full-sweep workflows continue to work. The convention mismatch is therefore not caught by any hard failure — it's a silent inconsistency. Impact Wildcard key filtering such as Addressing the refutation A refutation notes that CONFIGS.md line 25 says the naming convention is "not required" and that no existing tooling mechanically parses key names. Both points are valid: this is not a functional breakage, which is why the severity is nit. However, every other NVFP4 model entry already follows the How to fix Rename the key from Step-by-step proof
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 The precision rename from
nvfp4tofp4breaks B200 benchmark execution because the corresponding shell script was not renamed. Bothrunners/launch_b200-nb.shandrunners/launch_b200-dgxc.shdynamically construct the benchmark script path using thePRECISIONenv var, so they will now look forbenchmarks/single_node/glm5_fp4_b200.sh, which does not exist — onlyglm5_nvfp4_b200.shis present. All GLM-5 NVFP4 B200 benchmark jobs will fail at runtime with "No such file or directory"; the fix is to renameglm5_nvfp4_b200.shtoglm5_fp4_b200.sh(or create the new file) in the same PR.Extended reasoning...
What the bug is and how it manifests
The PR renames the field value from to for the benchmark configuration in . However, the corresponding benchmark shell script that is expected to exist under was not renamed or created to match the new value. At runtime, the CI/CD runners will attempt to execute a script that does not exist, resulting in a hard failure for every GLM-5 NVFP4 B200 benchmark job.
The specific code path that triggers it
Both (line 20) and (line 39) construct the benchmark script path dynamically:
The environment variable is populated directly from the field in the YAML configuration (confirmed by line 83). With derived from the model prefix (e.g., ), evaluates to , so the full resolved path becomes .
Why existing code doesn't prevent it
There is no guard or existence check before the call — the script simply tries to execute the resolved path directly. The YAML config field and the benchmark shell script filenames are tightly coupled by convention but not enforced at commit or workflow-definition time. The rename of the YAML value is therefore a breaking change unless the shell script file is also renamed.
What the impact would be
Every CI run that triggers the benchmark (on both and runner types) will fail immediately at the script execution step with . No benchmark data will be collected, wasting runner time and blocking any dependent reporting or dashboards.
How to fix it
Rename Error: The following required environment variables are not set:
Step-by-step proof
benchmarks/single_node/glm5_nvfp4_b200.sh shows only — is absent.