Skip to content

[AMD] Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.10#1237

Open
ppalanga wants to merge 3 commits intomainfrom
update_dsr1fp4_amd_sgl_to_5_10
Open

[AMD] Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.10#1237
ppalanga wants to merge 3 commits intomainfrom
update_dsr1fp4_amd_sgl_to_5_10

Conversation

@ppalanga
Copy link
Copy Markdown
Collaborator

Updating the SGLang docker image to the SGLang community docker image
Image=lmsysorg/sglang:v0.5.10.post1-rocm720-mi35x

Impacted configurations: MI355x Deepseek-R1 FP4

Link to the runs: TBD

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@ppalanga ppalanga marked this pull request as ready for review April 30, 2026 14:08
@ppalanga ppalanga requested a review from a team April 30, 2026 14:08
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Claude finished @ppalanga's task in 44s —— View job


Review of PR #1237

  • Gather context and read changed files
  • Validate master config changes (image accessibility, model prefixes)
  • Validate perf-changelog entry (position, content)
  • Post review summary

Summary

The image update from v0.5.9-rocm700-mi35x to v0.5.10.post1-rocm720-mi35x looks good — the image is on Docker Hub (publicly accessible), the model prefix (dsr1) is valid, and the perf-changelog entry is correctly appended at the end of the file.

One blocking issue found: the pr-link in perf-changelog.yaml points to InferenceMAX/InferenceMAX instead of SemiAnalysisAI/InferenceX. See inline comment with a suggested fix.

Comment thread perf-changelog.yaml Outdated
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward Docker image bump for MI355X DeepSeek-R1 FP4 SGLang.

Extended reasoning...

Overview

This PR is a one-line Docker image tag bump in .github/configs/amd-master.yaml for the dsr1-fp4-mi355x-sglang config (lmsysorg/sglang:v0.5.9-rocm700-mi35xv0.5.10.post1-rocm720-mi35x), plus the corresponding 6-line entry appended to perf-changelog.yaml documenting the change. It exactly matches the pattern of dozens of prior image-bump PRs in this repo (e.g. PRs #330, #369, #395, #595, #816 for the same config-key).

Security risks

None. This is a benchmark-config tag pointing at an upstream public image. No auth, crypto, permissions, or other security-sensitive code is touched.

Level of scrutiny

Very low. Both files are pure benchmark configuration — no runtime/production code paths. The only behavioral effect is which container the AMD MI355X DSR1-FP4 SGLang job pulls.

Other factors

The only bug surfaced is a copy-paste typo in the new changelog entry's pr-link (it points at InferenceMAX/InferenceMAX/pull/1237 instead of SemiAnalysisAI/InferenceX/pull/1237); it has been posted as an inline comment. This field is documentation-only and does not affect benchmark execution, so it doesn't warrant blocking the merge — easy to fix in a follow-up if desired.

Comment thread perf-changelog.yaml Outdated
- dsr1-fp4-mi355x-sglang
description:
- "Updating MI355x Deepseek-R1 FP4 SGLang Image to upstream v0.5.10.post1"
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/1237
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new pr-link at perf-changelog.yaml:2039 points to https://github.com/InferenceMAX/InferenceMAX/pull/1237, but this PR is in SemiAnalysisAI/InferenceX. Every other entry in the file (231 of them, including the immediately preceding entry for PR #1157 on line 2033) uses https://github.com/SemiAnalysisAI/InferenceX/pull/<num>. Should be https://github.com/SemiAnalysisAI/InferenceX/pull/1237 — looks like a copy-paste typo.

Extended reasoning...

What the bug is

The new perf-changelog.yaml entry added by this PR (lines 2034-2039) sets:

  pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/1237

but this PR (#1237) lives in SemiAnalysisAI/InferenceX, not InferenceMAX/InferenceMAX. The repo URL is wrong.

Why this is clearly a typo

  • Every single one of the other ~231 pr-link entries in perf-changelog.yaml uses https://github.com/SemiAnalysisAI/InferenceX/pull/<num>. This new entry is the only outlier with a different host repo.
  • The immediately preceding entry on line 2033 (for PR Day 0 DeepSeek V4 Pro FP4 GB300 disaggregated SGLang benchmarks #1157) uses the correct https://github.com/SemiAnalysisAI/InferenceX/pull/1157.
  • Per the PR metadata, this PR is [AMD] Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.10 #1237 in the SemiAnalysisAI/InferenceX repo (the same repo where perf-changelog.yaml lives). The README does mention the project was 'formerly InferenceMAX', which likely explains how this slipped in — the maintainer probably had the old name in mental cache when copy-pasting.

Proof / step-by-step

  1. Open perf-changelog.yaml at line 2039: pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/1237.
  2. Grep all pr-link: lines in the file → every other one uses the SemiAnalysisAI/InferenceX host.
  3. Look at the prior entry (line 2033, PR Day 0 DeepSeek V4 Pro FP4 GB300 disaggregated SGLang benchmarks #1157): pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1157 — the convention.
  4. Check the PR metadata for this PR: it is PR [AMD] Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.10 #1237 of SemiAnalysisAI/InferenceX. The link as written either 404s or, if that other repo exists, points at an unrelated PR.

Impact

Low — pr-link is a documentation/audit field; it doesn't affect benchmark execution. But the link is broken or misleading, so the changelog cross-reference is unusable for anyone clicking through to find the originating PR.

Fix

Change line 2039 to:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1237

Re bug_003 dedup objection

bug_001 and bug_003 are the same finding (same file, same line, same fix); the synthesis agent has correctly merged them into a single report here.

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant