[AMD] Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.10#1237
[AMD] Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.10#1237
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Claude finished @ppalanga's task in 44s —— View job Review of PR #1237
SummaryThe image update from One blocking issue found: the |
There was a problem hiding this comment.
LGTM — straightforward Docker image bump for MI355X DeepSeek-R1 FP4 SGLang.
Extended reasoning...
Overview
This PR is a one-line Docker image tag bump in .github/configs/amd-master.yaml for the dsr1-fp4-mi355x-sglang config (lmsysorg/sglang:v0.5.9-rocm700-mi35x → v0.5.10.post1-rocm720-mi35x), plus the corresponding 6-line entry appended to perf-changelog.yaml documenting the change. It exactly matches the pattern of dozens of prior image-bump PRs in this repo (e.g. PRs #330, #369, #395, #595, #816 for the same config-key).
Security risks
None. This is a benchmark-config tag pointing at an upstream public image. No auth, crypto, permissions, or other security-sensitive code is touched.
Level of scrutiny
Very low. Both files are pure benchmark configuration — no runtime/production code paths. The only behavioral effect is which container the AMD MI355X DSR1-FP4 SGLang job pulls.
Other factors
The only bug surfaced is a copy-paste typo in the new changelog entry's pr-link (it points at InferenceMAX/InferenceMAX/pull/1237 instead of SemiAnalysisAI/InferenceX/pull/1237); it has been posted as an inline comment. This field is documentation-only and does not affect benchmark execution, so it doesn't warrant blocking the merge — easy to fix in a follow-up if desired.
| - dsr1-fp4-mi355x-sglang | ||
| description: | ||
| - "Updating MI355x Deepseek-R1 FP4 SGLang Image to upstream v0.5.10.post1" | ||
| pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/1237 |
There was a problem hiding this comment.
🟡 The new pr-link at perf-changelog.yaml:2039 points to https://github.com/InferenceMAX/InferenceMAX/pull/1237, but this PR is in SemiAnalysisAI/InferenceX. Every other entry in the file (231 of them, including the immediately preceding entry for PR #1157 on line 2033) uses https://github.com/SemiAnalysisAI/InferenceX/pull/<num>. Should be https://github.com/SemiAnalysisAI/InferenceX/pull/1237 — looks like a copy-paste typo.
Extended reasoning...
What the bug is
The new perf-changelog.yaml entry added by this PR (lines 2034-2039) sets:
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/1237but this PR (#1237) lives in SemiAnalysisAI/InferenceX, not InferenceMAX/InferenceMAX. The repo URL is wrong.
Why this is clearly a typo
- Every single one of the other ~231
pr-linkentries inperf-changelog.yamluseshttps://github.com/SemiAnalysisAI/InferenceX/pull/<num>. This new entry is the only outlier with a different host repo. - The immediately preceding entry on line 2033 (for PR Day 0 DeepSeek V4 Pro FP4 GB300 disaggregated SGLang benchmarks #1157) uses the correct
https://github.com/SemiAnalysisAI/InferenceX/pull/1157. - Per the PR metadata, this PR is [AMD] Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.10 #1237 in the
SemiAnalysisAI/InferenceXrepo (the same repo whereperf-changelog.yamllives). The README does mention the project was 'formerly InferenceMAX', which likely explains how this slipped in — the maintainer probably had the old name in mental cache when copy-pasting.
Proof / step-by-step
- Open
perf-changelog.yamlat line 2039:pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/1237. - Grep all
pr-link:lines in the file → every other one uses theSemiAnalysisAI/InferenceXhost. - Look at the prior entry (line 2033, PR Day 0 DeepSeek V4 Pro FP4 GB300 disaggregated SGLang benchmarks #1157):
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1157— the convention. - Check the PR metadata for this PR: it is PR [AMD] Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.10 #1237 of
SemiAnalysisAI/InferenceX. The link as written either 404s or, if that other repo exists, points at an unrelated PR.
Impact
Low — pr-link is a documentation/audit field; it doesn't affect benchmark execution. But the link is broken or misleading, so the changelog cross-reference is unusable for anyone clicking through to find the originating PR.
Fix
Change line 2039 to:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1237Re bug_003 dedup objection
bug_001 and bug_003 are the same finding (same file, same line, same fix); the synthesis agent has correctly merged them into a single report here.
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Updating the SGLang docker image to the SGLang community docker image
Image=lmsysorg/sglang:v0.5.10.post1-rocm720-mi35x
Impacted configurations: MI355x Deepseek-R1 FP4
Link to the runs: TBD