Recently, new and improved ROCm SGLang images have been upstreamed. This is great. Now we need to integrate into InferenceMAX.
PR #247 was opened, but there are some things wrong with it and nothing has been verified. AMD should allocate an inference engineer to make sure benchmarks are tuned accordingly and performance is optimal.
Recently, new and improved ROCm SGLang images have been upstreamed. This is great. Now we need to integrate into InferenceMAX.
PR #247 was opened, but there are some things wrong with it and nothing has been verified. AMD should allocate an inference engineer to make sure benchmarks are tuned accordingly and performance is optimal.