Conversation
* Switching dsr1 fp8 to lmsys images for MI355, MI325 and MI300 --------- Co-authored-by: Chun Fang <chun.fang@amd.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Reminder:
So for this PR, you will add something like the following entry to the bottom of - config-keys:
- dsr1-fp8-mi300x-sglang
- dsr1-fp8-mi325x-sglang
- dsr1-fp8-mi355x-sglang
description: |
- Use upstream SGLang images on mi300, mi325 and mi355 for dsr1fp8
PR: https://github.com/InferenceMAX/InferenceMAX/pull/332Then add the |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the SGLang Docker images for dsr1-fp8 benchmarks on MI300, MI325, and MI355 to a newer upstream version. It also adjusts the benchmark scripts for MI355x to include new environment variables and server arguments required by the updated image.
My review has identified a couple of issues:
- The benchmark scripts for MI300x and MI325x have not been updated with the new required settings, which will likely cause those benchmarks to fail. This is a high-priority issue.
- The changelog entry contains an incorrect pull request number.
Please see the detailed comments for suggestions on how to address these points.
| - dsr1-fp8-mi355x-sglang | ||
| description: | | ||
| - Use upstream SGLang images on mi300, mi325 and mi355 for dsr1fp8 | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/332 |
|
@rkarhila-amd is this ready to ship? tests pass. please see code review comments |
Updated the pull request link for dsr1fp8 in the changelog.
|
@rkarhila-amd @cquil11 ready to merge? successfully validation here https://github.com/InferenceMAX/InferenceMAX/actions/runs/20609406656 |
| --mem-fraction-static 0.8 --disable-radix-cache \ | ||
| --num-continuous-decode-steps 4 \ | ||
| --max-prefill-tokens 196608 \ | ||
| --enable-torch-compile \ |
There was a problem hiding this comment.
MI300x and MI325x scripts missing flags for new image
The PR updates the SGLang image from v0.5.2 to v0.5.5.post3 for all three platforms (mi300x, mi325x, mi355x), but only the mi355x benchmark scripts were updated with the new flags (--attention-backend aiter, --enable-torch-compile, RCCL_MSCCL_ENABLE=0, ROCM_QUICK_REDUCE_QUANTIZATION=INT4). The existing dsr1_fp8_mi300x_*.sh and dsr1_fp8_mi325x_*.sh scripts lack these flags despite also receiving the new image version. This inconsistency may cause mi300x and mi325x benchmarks to fail or produce suboptimal results with the new upstream image.
Note
Switch to upstream SGLang images for dsr1 FP8 on MI300/325/355
imagetags inamd-master.yamltolmsysorg/sglang:v0.5.5.post3-rocm700-*fordsr1-fp8-mi300x/mi325x/mi355x-sglangbenchmarks/dsr1_fp8_mi355x_{docker,slurm}.shwithSGLANG_USE_AITER=1,RCCL_MSCCL_ENABLE=0,ROCM_QUICK_REDUCE_QUANTIZATION=INT4, and server flags--attention-backend aiter,--enable-torch-compile,--cuda-graph-max-bs 128perf-changelog.yamlWritten by Cursor Bugbot for commit 1324e9e. This will update automatically on new commits. Configure here.