Multinode evals#1000
Merged
Oseltamivir merged 46 commits intomainfrom Apr 19, 2026
Merged
Conversation
The sglang 0.5.8 Docker image ships a newer lm-eval 0.4.9.2 commit that defaults fewshot_as_multiturn=True for chat-completion models. Since the version string matches the pinned commit, pip silently skips the install. Adding --force-reinstall ensures the pinned commit is always used regardless of what's pre-installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds dsr1-fp8-mi355x-sglang-disagg-nodpa-eval: same image/model/precision as the DPA config but with dp-attn=false and ep=1. Running evals on this will tell us if DPA is the cause of the 0% GSM8K score or if it's something else about the fp8 disagg setup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9ea1f61 to
b2aabf3
Compare
8ca2534 to
4ed7a9a
Compare
# Conflicts: # perf-changelog.yaml # runners/launch_gb200-nv.sh
/raid/tmp is per-node local storage, so pyxis mount failed on B200 multinode decode workers that landed on nodes without the pre-staged copy. Use the shared /lustre/fsw mirror instead.
Recipe default is max_attempts=360 × interval=10s = 3600s, which is
not enough for DSR1-FP8 (~680GB) to load across 5 workers off the
shared FS — the prior run timed out at ~50% weights loaded. Patch the
recipe in-place after clone; uses ${CONFIG_FILE%%:*} so the :override
suffix on sglang configs doesn't break sed.
Oseltamivir
added a commit
that referenced
this pull request
Apr 19, 2026
Missed staging this change before merging #1000.
OCWC22
added a commit
to OCWC22/InferenceX
that referenced
this pull request
Apr 21, 2026
Pulls 55 upstream commits published on SemiAnalysisAI/InferenceX:main since PR SemiAnalysisAI#1032 was opened. Zero conflicts; none touch tools/ or datasets/isb1/. Purpose: modernize PR base before Cam review and absorb upstream fork-drift reductions. Notable upstream work picked up: - MiniMax M2.5 MXFP4 MI355X + B300 configs - GLM5.1 FP4 MI355X support - GPT-OSS FP4 TP=8 conc=1 extension (SemiAnalysisAI#1096) - H200 multinode evals (SemiAnalysisAI#1000) - B300 configs for Kimi K2.5, DSR1, Qwen3.5 - Parallel random data generation (SemiAnalysisAI#1094) - KNOWN_LIMITATION.md updates Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Apr 23, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add eval-only support for multi-node benchmarks and wire those eval results into CI collection + summary reporting.
This covers:
server.shsrt-slurmforkHow evals are run
Single-node evals are selected on
8k1kat max + median concurrency for each(model, runner, framework, precision, spec-decoding, dp-attn)group.Multi-node evals are selected on
8k1kby taking the entry with the highest max concurrency for each(model, runner, framework, precision, spec-decoding, prefill-dp-attn, decode-dp-attn)group, then running eval at the median concurrency from that config viaeval-conc.EVAL_ONLY=truestarts the server with expanded eval context, skips throughput benchmarking, runslm-eval,writes
meta_env.json+results*.json+sample*.jsonl, uploads those artifacts, then validates scoresagainst thresholds.
srt-slurm fork delta vs upstream
NVIDIA multinode eval uses
Oseltamivir/srt-slurm@sa-submission-q1-2026instead ofishandhanani/srt-slurm.Compared with current
upstream/main, that fork adds the eval path InferenceX needs:lm-evalbenchmark runner/infmax-workspacemounting viaINFMAX_WORKSPACEEVAL_ONLYsupport indo_sweep.pyto skip benchmark stage and run post-eval directlywait_for_model()health checking before eval in eval-only modeMODEL_NAME=self.config.served_model_nameso eval queries the served alias, not the HF repo idEVAL_CONCfrom workflow toEVAL_CONCURRENT_REQUESTS/logs/eval_results/for launcher-side artifact pickupValidation
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24059388771
InferenceX PR
NVIDIA/srt-slurm#12