Skip to content

suggestion from lmsys sglang: deepseek #138

@functionstackx

Description

@functionstackx

lmsys sglang engineers r saying that the pareto frontier is not optimal as dp_attention, and other parallelism strategy is hard set and the only thing that get sweeped is fp4 4 gpu, 8 gpu.

probably can only implement after https://github.com/InferenceMAX/InferenceMAX/pull/111 is merged

#145 makes EP, DP attention sweeps as first class feature in the yaml

B200 FP4: https://github.com/InferenceMAX/InferenceMAX/blob/main/benchmarks/dsr1_fp4_b200_docker.sh#L20-B200 FP8: https://github.com/InferenceMAX/InferenceMAX/blob/main/benchmarks/dsr1_fp8_b200_docker.sh
H200 FP8: https://github.com/InferenceMAX/InferenceMAX/blob/main/benchmarks/dsr1_fp8_h200_slurm.sh

Metadata

Metadata

Labels

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions