Skip to content

chore: tag dsv4 b300 benchmark scripts with inference engine#1146

Merged
cquil11 merged 1 commit intomainfrom
chore/fix-engine-suffix
Apr 24, 2026
Merged

chore: tag dsv4 b300 benchmark scripts with inference engine#1146
cquil11 merged 1 commit intomainfrom
chore/fix-engine-suffix

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Apr 24, 2026

Summary

  • B300 single-node launcher now prefers a framework-tagged script (e.g. dsv4_fp4_b300_sglang.sh) and falls back to the legacy unsuffixed name (or the existing _trt suffix) when no tagged variant exists.
  • Renamed benchmarks/single_node/dsv4_fp4_b300.shdsv4_fp4_b300_sglang.sh.
  • Restored benchmarks/single_node/dsv4_fp4_b300_vllm.sh from the abandoned origin/claude/add-dsv4-fp4-b300-vllm branch (script only; no config wired up yet — separate PR can add dsv4-fp4-b300-vllm to nvidia-master.yaml when we want CI to run it).

Each model historically used one inference engine, so the launcher resolved ${model}_${precision}_b300.sh regardless of FRAMEWORK. dsv4 is the first b300 model where both sglang and vllm need to coexist; this PR adds the plumbing for that without forcing a rename of the other ~17 b300 scripts.

Test plan

  • Trigger a dsv4-fp4-b300-sglang sweep — launcher should resolve to dsv4_fp4_b300_sglang.sh.
  • Confirm an existing dsr1-fp4-b300-sglang sweep still resolves to the legacy dsr1_fp4_b300.sh via fallback.
  • Spot-check that _mtp configs still find their script (*_b300_mtp.sh via fallback).

🤖 Generated with Claude Code

Each model historically used one inference engine, so the b300 launcher
just resolved benchmarks/single_node/${model}_${precision}_b300.sh
regardless of FRAMEWORK. With dsv4 we now want both an sglang and a
vllm script to coexist, so:

- launch_b300-nv.sh prefers an engine-tagged script (e.g.
  dsv4_fp4_b300_sglang.sh) and falls back to the legacy unsuffixed
  name (or the existing _trt suffix) when the tagged variant is
  absent. Existing dsr1/glm5/qwen3.5/kimik2.5/minimaxm2.5 b300
  scripts keep their current names.
- Rename the existing sglang dsv4 b300 script to
  dsv4_fp4_b300_sglang.sh.
- Restore dsv4_fp4_b300_vllm.sh from the abandoned
  origin/claude/add-dsv4-fp4-b300-vllm branch (script only; not
  wired into a config yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cquil11 cquil11 requested a review from a team April 24, 2026 22:56
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — mechanical launcher fallback plus script rename; the added vllm script is not yet wired into CI.

Extended reasoning...

Overview

The PR has three changes: (1) renames benchmarks/single_node/dsv4_fp4_b300.sh to dsv4_fp4_b300_sglang.sh, (2) adds a new dsv4_fp4_b300_vllm.sh script restored from an abandoned branch, and (3) updates runners/launch_b300-nv.sh to prefer a framework-tagged script (${model}_${precision}_b300_${FRAMEWORK}.sh) with a fallback to the legacy unsuffixed/_trt-suffixed name. The launcher change is small and localized to the non-multinode branch.

Security risks

None meaningful. The launcher change is a string-built file path with a fall-through, the new vLLM script does sed -i on a vendored vLLM source file but this is a benchmark recipe running inside the GPU container on a SLURM allocation, which is consistent with the pattern already used by the SGLang sibling. No auth, crypto, or permissions code is touched.

Level of scrutiny

Low. This is benchmark/runner plumbing, not production-critical code. The launcher fallback logic cleanly preserves backward compatibility for the ~17 existing b300 scripts that have not been retagged, and the new vLLM script is explicitly not wired into nvidia-master.yaml per the PR description, so it cannot accidentally get picked up by CI until a follow-up PR opts in.

Other factors

The bug hunter found only a single nit — a misleading comment at the top of dsv4_fp4_b300_vllm.sh that references dsv4_fp4_b200.sh (an SGLang script) as if it were a vLLM sibling. That is documentation-only and does not affect runtime behavior. The test plan in the PR description covers the three relevant fallback cases (new tagged script, legacy-name script, _mtp variant). Happy to let this merge without human review.

Comment on lines +3 to +5
# Per https://vllm.ai/blog/deepseek-v4 the DeepSeek-V4-Pro recipe lists
# 8xB200 and 8xB300 with identical flags, so this script mirrors
# dsv4_fp4_b200.sh.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The header comment claims this script "mirrors dsv4_fp4_b200.sh" since the vLLM blog lists 8xB200 and 8xB300 with identical flags, but dsv4_fp4_b200.sh is an SGLang script (uses sglang serve with SGLang-specific flags like --moe-runner-backend, --moe-a2a-backend deepep, CONC-based recipe dispatch). This new script uses vllm serve with an entirely different flag set and a vLLM-specific monkey-patch — they share no engine or flags. Consider dropping the mirrors dsv4_fp4_b200.sh reference (or pointing it at an actual vLLM companion) so future readers don't go looking for parity that doesn't exist.

Extended reasoning...

What's wrong

Lines 3-5 of benchmarks/single_node/dsv4_fp4_b300_vllm.sh read:

# Per https://vllm.ai/blog/deepseek-v4 the DeepSeek-V4-Pro recipe lists
# 8xB200 and 8xB300 with identical flags, so this script mirrors
# dsv4_fp4_b200.sh.

The "so this script mirrors dsv4_fp4_b200.sh" conclusion only follows if dsv4_fp4_b200.sh is itself a vLLM script — but it isn't. benchmarks/single_node/dsv4_fp4_b200.sh invokes PYTHONNOUSERSITE=1 sglang serve and uses SGLang-only flags (--moe-runner-backend flashinfer_mxfp4, --moe-a2a-backend deepep, --enable-dp-attention, --deepep-config, etc.) plus a CONC-based 3-recipe dispatch (low-latency / balanced / max-throughput).

The new script doesn't mirror it

dsv4_fp4_b300_vllm.sh invokes vllm serve with a totally disjoint flag set (--kv-cache-dtype fp8, --block-size 256, --enable-expert-parallel, --data-parallel-size, --compilation-config '{...}', --tokenizer-mode deepseek_v4, --reasoning-parser deepseek_v4, etc.), monkey-patches vLLM's sparse_attn_indexer.py, has no RECIPE_FLAGS array, and no CONC-based dispatch. The two scripts share nothing beyond trivial benchmark_lib.sh boilerplate.

Why the existing wording is misleading

The vLLM blog parity claim ("B200 and B300 with identical flags") is a vLLM-to-vLLM statement and would justify mirroring a hypothetical dsv4_fp4_b200_vllm.sh — but no such file exists in the repo. The PR description even notes that this script was "restored from the abandoned origin/claude/add-dsv4-fp4-b300-vllm branch," which is a plausible explanation for the stale reference: it may originally have pointed at a never-merged vLLM b200 sibling.

Step-by-step proof

  1. Open benchmarks/single_node/dsv4_fp4_b200.sh and grep for the server invocation → line ~86 reads PYTHONNOUSERSITE=1 sglang serve \. So b200 is SGLang.
  2. Open benchmarks/single_node/dsv4_fp4_b300_vllm.sh and grep for the server invocation → it reads vllm serve $MODEL .... So b300_vllm is vLLM.
  3. Compare flag lists: b200 uses --moe-runner-backend, --moe-a2a-backend deepep, --mem-fraction-static, --disable-radix-cache, etc.; b300_vllm uses --kv-cache-dtype fp8, --block-size, --enable-expert-parallel, --data-parallel-size, --compilation-config, etc. → zero overlap.
  4. ls benchmarks/single_node/dsv4_fp4_b200_vllm.sh → no such file. So no actual vLLM b200 sibling to mirror.
  5. Conclusion: the comment is factually wrong; reading b200.sh expecting flag/structure parity will mislead.

Impact / fix

This is documentation only — no runtime behavior is affected. But the comment is the first thing a reader sees, and the PR is specifically about disambiguating engines (_sglang vs _vllm suffixes). Leaving a comment that points the vLLM script at the SGLang script undercuts that disambiguation. Suggested fix: either drop the so this script mirrors dsv4_fp4_b200.sh clause and keep just the recipe-source link, or replace it with something accurate like "flags follow the vLLM DeepSeek-V4-Pro recipe; B200 and B300 use identical flags per the blog."

@cquil11 cquil11 merged commit 4d9fe4a into main Apr 24, 2026
8 checks passed
@cquil11 cquil11 deleted the chore/fix-engine-suffix branch April 24, 2026 23:31
functionstackx added a commit that referenced this pull request Apr 25, 2026
Mirror of #1146 for B200. Each model historically used one inference
engine, so the b200 launchers just resolved
benchmarks/single_node/${model}_${precision}_b200.sh regardless of
FRAMEWORK. With dsv4 we now want both an sglang script (already on
main as dsv4_fp4_b200.sh from #1131) and a vllm script (added by this
PR as dsv4_fp4_b200_vllm.sh) to coexist.

- launch_b200-{nb,dgxc-slurm,cw}.sh prefer an engine-tagged script
  (e.g. dsv4_fp4_b200_vllm.sh) and fall back to the legacy unsuffixed
  name (or the existing _trt suffix) when the tagged variant is
  absent. Existing dsr1/glm5/qwen3.5/kimik2.5/minimaxm2.5/gptoss
  /minimaxm2.5/dsv4-sglang b200 scripts keep their current names.
- This wires up the dsv4-fp4-b200-vllm config so FRAMEWORK=vllm
  resolves to dsv4_fp4_b200_vllm.sh instead of the sglang script
  that shares the unsuffixed path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude claude Bot mentioned this pull request Apr 25, 2026
3 tasks
cquil11 added a commit that referenced this pull request Apr 25, 2026
MTP variant of dsv4-fp4-b300-sglang. EAGLE / MTP enabled per the
cookbook recipes at https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4:
  low-latency  (CONC <= 32):       EAGLE 3 steps / 4 draft tokens
  balanced     (32 < CONC <= 128): EAGLE 1 step  / 2 draft tokens

Max-throughput is intentionally omitted -- the cookbook says MTP off
at saturation because the verify step costs more than it saves.

Sets SGLANG_ENABLE_SPEC_V2=1 (required for MTP) and passes
--use-chat-template to bench_serving so EAGLE acceptance rate isn't
depressed by raw prompts.

Bench script lives at benchmarks/single_node/dsv4_fp4_b300_sglang_mtp.sh
to match the framework-tagged naming convention introduced in #1146.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 25, 2026
Per the runner naming convention introduced in #1146
(BENCH_SCRIPT="${BENCH_BASE}_${FRAMEWORK}${SPEC_SUFFIX}.sh"), the b300
runner now prefers benchmarks/single_node/dsv4_fp4_b300_sglang.sh over
the legacy dsv4_fp4_b300.sh. The merge from main left this branch with
both scripts: the legacy file carrying the recipe-per-CONC dispatch
this PR added, and the framework-tagged file with the low-latency-only
fallback content from main. CI was therefore picking the wrong script.

Move the recipe-per-CONC dispatch onto dsv4_fp4_b300_sglang.sh and
delete the legacy filename so the runner picks up the intended logic.
Update the yaml comment to point at the new path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 25, 2026
* feat: add DeepSeek-V4-Flash FP4 B300 SGLang benchmark

Adds dsv4-fp4-b300-sglang config, single-node benchmark script, and
perf-changelog entry for the DeepSeek-V4 recipe from the SGLang
cookbook. The cookbook ships a B200 (not B300) recipe, so this
reuses the B200 Flash Low-Latency recipe on B300 until a
B300-specific recipe lands. Speculative decoding (EAGLE) and prefix
caching are disabled per request.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: switch dsv4-fp4-b300-sglang to Pro + Max-Throughput recipe

Match parallelism (TP=8/EP=8/dp-attn=true) and concurrency ranges
(4-1024 for 1k1k, 4-512 for 8k1k) to dsv4-fp4-b200-vllm. Use the
DeepSeek-V4-Pro variant with the cookbook Max-Throughput recipe
(DP=8 + DeepEP, no MTP), which aligns with the requested no-spec
parallelism. Prefix caching remains disabled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: sync launch_b200-dgxc-slurm.sh cache mount from claude/add-dsv4-fp4-b200-vllm

Port the HF cache mount rework from the DSV4 B200 VLLM branch so
both PRs stay consistent: use the shared /scratch/fsw/gharunners/hf-hub-cache
path, drop the local MODEL override, and mount onto \$HF_HUB_CACHE
inside the container.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: restore trailing whitespace stripped from glm5.1 changelog entry

The dsv4-fp4-b300-sglang entry was appended correctly, but the earlier
edit also stripped trailing spaces on an existing line, producing a
spurious deletion. Revert so the diff is additive-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: add flock-guarded squash import to B300 runner

Mirror the lockfile logic already in launch_b200-dgxc-slurm.sh and
launch_h200-dgxc-slurm.sh: serialize concurrent enroot imports of
the same squash file via flock, skip the import when the squash is
already valid, and override ENROOT_CACHE_PATH to avoid permission
issues with the system-wide cache on worker nodes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: drop ENROOT_CACHE_PATH override from B300 runner

The override ("avoid permission issues with system-wide cache on
worker nodes") is a dgxc-slurm-specific workaround; launch_b300-nv.sh
is on the NV slurm cluster, not dgxc-slurm. Copying it in caused
the benchmark srun's pyxis shadow hook to fail with
'mkdir: cannot create directory pyxis_$JOBID.1/data: File exists'.
Keep the flock + skip-if-valid logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: point B300 runner at shared gharunners/{squash,hf-hub-cache}

Move the squash cache from /data/squash to /data/home/sa-shared/gharunners/squash,
and the HF cache mount from /scratch/models to /data/home/sa-shared/gharunners/hf-hub-cache.
Also mount the host HF cache onto \$HF_HUB_CACHE inside the container so
tools reading the default HF path pick it up (matches the B200 dgxc-slurm
runner). Drop the /scratch/models Qwen3.5 path override since that path
is no longer used.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: move enroot import out of srun to avoid pyxis namespace collision

Running two srun steps in the same allocation (flock+import, then the
benchmark --container-image srun) reproducibly fails on this cluster
with:
  error: pyxis: mkdir: cannot create directory
    '/scratch/data/user-$UID/pyxis_$JOBID.1/data': File exists
  error: pyxis:     [ERROR] /etc/enroot/hooks.d/10-shadow.sh exited with return code 1

Per NVIDIA/pyxis#138, two srun steps sharing an allocation can leave
enroot/pyxis state between steps. Collapsing to a single srun (the
benchmark) is the cleanest workaround. Move the flock-guarded
enroot import to the host side, before salloc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: wipe stale pyxis scratch dirs for this JOB_ID before benchmark srun

Even with a single srun step, pyxis fails with
  error: pyxis: mkdir: cannot create directory
      '/scratch/data/user-$UID/pyxis_$JOBID.0/data': File exists
on fresh SLURM JOB_IDs. The /scratch path is left behind by previous
jobs whose IDs SLURM later reuses (and the cluster's pyxis epilog
doesn't clean it up). Wipe pyxis_$JOBID.* from the host after salloc;
no-op if /scratch is node-local, effective if it's shared NFS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert: drop all B300 runner changes, mirror #1128's approach

PR #1128 (dsv4-fp4fp8-b300-vllm) runs on the same cluster with ZERO
changes to launch_b300-nv.sh. The pyxis 10-shadow.sh failures we were
chasing aren't caused by the runner -- reset it to origin/main and
keep the sglang config/bench additions only.

Reverts (from this branch):
- 4bb1f1a point B300 runner at shared gharunners/{squash,hf-hub-cache}
- 106deea drop ENROOT_CACHE_PATH override
- 97a488e add flock-guarded squash import
- 744c5a0 move enroot import out of srun
- d003c59 wipe stale pyxis scratch before benchmark srun

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* runner: add head-node flock-guarded squash import on B300

Move enroot import out of srun to the head node and serialize parallel
GH jobs with flock on the shared squash file. Skips the import when a
valid squash already exists. The benchmark srun is now the only step
in the allocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: mount at /ix and clear baked-in CUDA_VISIBLE_DEVICES

Port the B200 branch's fix for the lmsysorg/sglang:deepseek-v4-blackwell
image on B300:
- The image installs sglang editable under /workspace/sglang; the default
  $GITHUB_WORKSPACE:/workspace/ bind-mount masks the install and breaks
  'import sglang'. For this image, mount at /ix instead.
- The image's ENV bakes CUDA_VISIBLE_DEVICES=4,5,6,7, masking half the
  GPUs Slurm allocates. Unset it in the bench script so TP=8 sees all 8.
- Write artefacts under $PWD instead of hard-coded /workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* runner: use /data/models pre-staged path for dsv4 on B300

Pre-staged models on the B300 cluster live under /data/models
(Qwen3.5-397B-A17B-FP8, dsv4-pro, etc.). Switch HF_HUB_CACHE_MOUNT
from /scratch/models to /data/models, and export MODEL to
/data/models/dsv4-pro when MODEL_PREFIX=dsv4 so the benchmark reads
from the mounted dir directly. The bench script skips `hf download`
when MODEL looks like an absolute path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: switch B300 dsv4 sglang to bw-ultra-compiled image

The stock lmsysorg/sglang:deepseek-v4-blackwell image ships kernels
compiled for B200 (SM_100) and crashes on B300 with
  RuntimeError: RMSNorm failed with error code no kernel image is
  available for execution on the device
during CUDA graph capture. Switch to cquil/sglang-deepseek-v4-bw-ultra:v1,
which is recompiled with B300 SM support.

Broaden the /ix mount conditional to match both image tags: the fork
keeps the same /workspace/sglang editable install that would otherwise
be masked by $GITHUB_WORKSPACE:/workspace/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: switch B300 dsv4 sglang image to yhyang201/sglang-b300:v3

Use the B300-recompiled image from yhyang201; extend the /ix mount
conditional to match the new tag in addition to the previous
deepseek-v4-blackwell / deepseek-v4-bw-ultra patterns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* update b300

* feat(dsv4-fp4-b300-sglang): pick recipe by CONC; split search-space

Mirror chore/dsv4-sgl-b200 commits 103a202 + 43be495 for B300:

Bench script now selects one of three cookbook recipes by CONC instead
of a single static flag set:
  CONC <= 32   -> low-latency    (TP only, chunked-prefill 4096,
                                  disable-flashinfer-autotune)
  33..128      -> balanced       (+ DP-attention, max-running-reqs=128,
                                  cuda-graph-max-bs=64, deepep-config)
  CONC > 128   -> max-throughput (+ DP-attention, max-running-reqs=256,
                                  cuda-graph-max-bs=64, deepep-config)
No speculative decoding in any recipe; --disable-radix-cache kept for
the no-prefix-caching baseline.

Split the dsv4-fp4-b300-sglang search-space rows per recipe boundary so
result filenames (ep=, dpa=) accurately reflect which recipe ran.
ep=8 on balanced/max-throughput reflects sglang's implicit
ep_size=tp_size override when --moe-a2a-backend deepep is set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* update b300

Switch B300 dsv4 sglang image to lmsysorg/sglang:deepseek-v4-b300
and extend the /ix mount conditional to match the new tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dsv4-fp4-b300-sglang): hardcode low-latency recipe at every CONC

The DeepEP FP8 weight-postprocess path is broken for
deepseek-ai/DeepSeek-V4-Pro on B300 with
lmsysorg/sglang:deepseek-v4-b300 -- every sglang launch with
--moe-a2a-backend deepep fails during model load with
  RuntimeError: Recipe must be a list/tuple of 3 integers.
raised from sglang.srt.layers.quantization.fp8
.process_weights_after_loading_block_quant (fp8.py:957). The balanced
and max-throughput recipes both go through that path; the low-latency
recipe (TP-only, flashinfer_mxfp4 MoE) does not and loads cleanly.

Collapse the yaml search-space back to a single row spanning the full
CONC range (4..1024 for 1k1k, 4..512 for 8k1k) and hardcode the bench
script to the low-latency flags at every CONC. TODO(Cam) noted in both
files to restore the recipe-per-CONC dispatch once the DeepEP FP8 load
path is fixed upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* trigger test check

* Revert "feat(dsv4-fp4-b300-sglang): hardcode low-latency recipe at every CONC"

This reverts commit bc43672.

* trigger test check

* Move dsv4 b300 sglang bench script to framework-tagged path

Per the runner naming convention introduced in #1146
(BENCH_SCRIPT="${BENCH_BASE}_${FRAMEWORK}${SPEC_SUFFIX}.sh"), the b300
runner now prefers benchmarks/single_node/dsv4_fp4_b300_sglang.sh over
the legacy dsv4_fp4_b300.sh. The merge from main left this branch with
both scripts: the legacy file carrying the recipe-per-CONC dispatch
this PR added, and the framework-tagged file with the low-latency-only
fallback content from main. CI was therefore picking the wrong script.

Move the recipe-per-CONC dispatch onto dsv4_fp4_b300_sglang.sh and
delete the legacy filename so the runner picks up the intended logic.
Update the yaml comment to point at the new path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(perf-changelog): tighten dsv4-fp4-b300-sglang entry

Now that DeepEP FP8 loads cleanly, this PR is purely about restoring
the recipe-per-CONC split on top of the low-latency-only fallback
from #1143. Trim the changelog to that delta.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant