Skip to content

[AMD][Waiting for switching upstream sglang image] improve dsr1 fp4 disagg perf on mi355x#1236

Open
billishyahao wants to merge 53 commits intomainfrom
amd/mi355x-dsfp4-april14
Open

[AMD][Waiting for switching upstream sglang image] improve dsr1 fp4 disagg perf on mi355x#1236
billishyahao wants to merge 53 commits intomainfrom
amd/mi355x-dsfp4-april14

Conversation

@billishyahao
Copy link
Copy Markdown
Collaborator

@billishyahao billishyahao commented Apr 30, 2026

replacement of #983

Note that we are waiting for switching the image to sglang upstream rocm image which is pending until the following issue being resolved on sglang main:

  1. dep high tput regression [AITER-Upgrade] PR readiness sgl-project/sglang#21302 (comment)

The new patch is adding the following optimization:

"Bump SGL mori image to April 28"
"Add more low latency sweep configs"
"Enable v2 mxfp4 DSR1 0528 model"
"Enable fp4 disp feature on mori"

billishyahao and others added 30 commits March 16, 2026 08:36
…transformers v5

Transformers v5 incorrectly rebuilds pre_tokenizer/decoder components for
models like DeepSeek-R1 that use LlamaTokenizerFast with a non-Llama
tokenizer architecture. The sglang server fixes this at startup, but the
benchmark client loads the tokenizer without these fixes, causing a ~5x
token count inflation (e.g. 7000 tokens -> 35000 tokens) and false
performance regressions in TTFT and throughput benchmarks.

Apply the same tokenizer fixes (pre_tokenizer/decoder restoration and
add_bos_token recovery) that sglang server applies, so client and server
tokenize identically. No-op on transformers v4.

Made-with: Cursor
Comment thread perf-changelog.yaml
Comment on lines +1217 to +1225
- config-keys:
- dsr1-fp4-mi355x-sglang-disagg
- dsr1-fp4-mi355x-sglang-disagg-mtp
description:
- "Bump SGL mori image to March 27"
- "Add more low latency sweep configs"
- "Enable v2 mxfp4 DSR1 0528 model"
- "Enable fp4 disp feature on mori"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/983
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog entry at lines 1217-1225 has two factual inaccuracies: (1) pr-link points to #983 but the PR description states this is a 'replacement of pull/983' — i.e. #983 was superseded and this is PR #1236; the convention throughout this file is for pr-link to reference the actual merging PR. (2) The bullet 'Bump SGL mori image to March 27' is wrong for the keyed FP4 configs — those images are actually bumped to rocm/sgl-dev:sglang-0.5.10-rocm720-mi35x-mori-0428 (April 28); only the FP8 configs (not listed under config-keys here) use the 0327 (March 27) tag. Suggest updating the pr-link to #1236 and changing 'March 27' to 'April 28' (or 'mori-0428').

Extended reasoning...

Two metadata/text inaccuracies in the new perf-changelog entry (lines 1217-1225)

(1) Wrong pr-link. The PR description explicitly opens with: "replacement of #983". That phrasing identifies #983 as the prior, superseded PR — the one being replaced. This PR is #1236, and the convention throughout perf-changelog.yaml is that pr-link references the merging PR (e.g. the immediately preceding entry at line 1215 uses pr-link 973 for that PR's own entry; entries that don't yet know their PR number use the placeholder pull/XXX, used 6 times in the file). Setting pr-link: .../pull/983 here breaks that convention and points readers at an unrelated, superseded PR rather than the one that actually merged the changes. The author appears to have copied the link from the superseded PR text rather than substituting the new PR number.

(2) Wrong image-bump date. The bullet "Bump SGL mori image to March 27" is factually wrong for the configs this entry is keyed on. Walking through the diff in .github/configs/amd-master.yaml:

  • dsr1-fp4-mi355x-sglang-disagg (line ~798): image goes from sglang-0.5.9-rocm720-mi35x-mori-0227-3sglang-0.5.10-rocm720-mi35x-mori-**0428** (April 28).
  • dsr1-fp4-mi355x-sglang-disagg-mtp (line ~1006): image goes from sglang-0.5.9-rocm720-mi35x-mori-0227-3sglang-0.5.10-rocm720-mi35x-mori-**0428** (April 28).
  • dsr1-fp8-mi355x-sglang-disagg (line 488): image goes from mori-0227-2mori-**0327** (March 27).
  • dsr1-fp8-mi355x-sglang-disagg-mtp (line 644): image goes from mori-0227-2mori-**0327** (March 27).

The config-keys listed in this entry are only the two FP4 configs. So the image bump that this entry actually documents is the April 28 (0428) bump, not the March 27 (0327) bump that applies to the FP8 configs (which are not in this entry's config-keys). The text appears to have been copied/adapted from an earlier entry (PR #823 at line 1080, which legitimately documented the Feb 27 bump for FP8/FP4 together) without updating the date to match the actual change in this PR.

Impact. Both issues are documentation/metadata only — no runtime behavior is affected. However, perf-changelog.yaml is the authoritative human-readable record of what changed in each performance release, so an entry that points at the wrong PR and misstates the image version makes future bisection/regression-investigation harder than it needs to be.

Fix. Update the entry at lines 1217-1225 to:

- config-keys:
    - dsr1-fp4-mi355x-sglang-disagg
    - dsr1-fp4-mi355x-sglang-disagg-mtp
  description:
    - "Bump SGL mori image to April 28 (mori-0428)"
    - "Add more low latency sweep configs"
    - "Enable v2 mxfp4 DSR1 0528 model"
    - "Enable fp4 disp feature on mori"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1236

(The pull/XXX placeholder convention used elsewhere in this file would also be acceptable for pr-link if the merging PR number is not yet known at write time, but #1236 is known.)

Severity: nit — both issues are confined to changelog text/metadata and have no functional impact, but the entry as written is straightforwardly wrong on its face and worth fixing before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants