diff --git a/.github/README.md b/.github/README.md new file mode 100644 index 000000000..69fc1069f --- /dev/null +++ b/.github/README.md @@ -0,0 +1,124 @@ +# How to Test Workflows + +In order to test configurations described in `.github/configs`, the primary workflow file used is `.github/workflows/e2e-tests.yml`. As input, this workflow takes in the CLI arguments for the `utils/matrix-logic/generate_sweep_configs.py` script. The usage for this script is shown below: + +``` +usage: generate_sweep_configs.py [-h] {full-sweep,test-config,runner-model-sweep,runner-sweep,custom} ... + +Generate benchmark configurations from YAML config files + +positional arguments: + {full-sweep,test-config,runner-model-sweep,runner-sweep,custom} + Available commands + full-sweep Generate full sweep configurations with optional filtering by model, precision, framework, runner type, and sequence lengths + test-config Given a config key, run that configuration as specified. Optionally specify --test-mode to only run one parallelism-concurrency pair for the config. + runner-model-sweep Given a runner type, find all configurations matching the type, and run that configuration on all individual runner nodes for the specified runner type. This is meant to validate + that all runner nodes work on all configurations for a runner type. For instance, to validate that all configs that specify an h200 runner successfully run across all h200 runner + nodes. + runner-sweep Given a model (and optionally a precision and framework), find all configurations matching the inputs, and run those configurations across all compatible runner nodes. This is + meant to validate all runner nodes that should run a particular model can. For instance, this should be used to validate that all runners nodes that should run gptoss-120b + actually do so successfully. + custom Enter custom values + +options: + -h, --help show this help message and exit +``` + +Instead of explaining each command at a high level, let's just walk through some common testing scenarios and describe how to run them. + +**Scenario 1**: I want to change increase the concurrency from 128 to 256 in the 1k1k scenario for the `dsr1-fp4-b200-sglang` config (from `.github/configs/nvidia-master.yaml`) and then test it. + +Go to the GitHub Actions UI, click on the `End-to-End Tests` workflow, and enter the text following command as the text input: +``` +test-config --key dsr1-fp4-b200-sglang --seq-len 1k1k --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml +``` + +Workflow Run Example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/18986046399 + +If we wanted to also test 1k8k or 8k1k scenarios, we would simply append `1k8k` or `8k1k` to `--seq-len`, respectively. + +Further, if we wanted to run that config on *one specific* runner node, we could specify that by appending `--runner-node` to the argument list. Note that if the specified runner node is not compatible with the specified config key (as dictated by `.github/configs/runners.yaml`), then the workflow will error: + +``` +test-config --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml --key dsr1-fp4-b200-sglang --seq-len 1k1k --runner-node mi300x-amd_0 + +ValueError: Runner node 'mi300x-amd_0' is not compatible with config 'dsr1-fp4-b200-sglang' which runs on runner type 'b200'. Available runner nodes for this config are 'b200-nb_0, b200-nb_1, b200-nvd_0, b200-nvd_1, b200-nvd_2, b200-nvd_3, b200-tg_0'. +``` + +Workflow Run Example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/18986053019/job/54229839736 + +**Scenario 2**: I just made a change to the `benchmarks/dsr1_fp8_b200_docker.sh` and I need to verify that these changes work across all B200 runners. + +Go to the GitHub Actions UI, click on the `End-to-End Tests` workflow, and enter the text following command as the text input: +``` +runner-sweep --runner-type b200 --model-prefix dsr1 --precision fp8 --config-files .github/configs/amd-master.yaml .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml +``` + +Workflow Run Example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/18986283169 + +This will run a test (just the highest available parallelism and lowest available concurrency) for each B200 runner node for each Deepseek config that runs on B200 with fp8 precision. I.e., this can be used to "sweep" across runners for a particular model to test that all runners still work with changes that have been made. + +**Scenario 3**: I just upgraded the CUDA drivers on all H200 runners and need to verify that all models that use H200 still work correctly across all H200 nodes. + +Go to the GitHub Actions UI, click on the `End-to-End Tests` workflow, and enter the following command as the text input: +``` +runner-model-sweep --runner-type h200 --config-files .github/configs/amd-master.yaml .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml +``` + +Workflow Run Example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/18986292917 + +This will run a test (just the highest available parallelism and lowest available concurrency) for each configuration that specifies the `h200` runner type, across all H200 runner nodes defined in `.github/configs/runners.yaml`. + +For example, if you have configs `dsr1-fp8-h200-sglang`, `dsr1-fp8-h200-trt`, and `gptoss-fp4-h200-vllm` that all use `runner: h200`, and you have 8 H200 nodes (`h200-cw_0`, `h200-cw_1`, etc.), this will run all 3 configs on all 8 nodes (24 total test runs). + +This is particularly useful when: +- You've made infrastructure changes to a specific runner type (driver updates, system configuration, Docker setup) +- You've added new runner nodes and want to validate they work with all existing model configurations +- You want to verify that all models remain compatible with a specific GPU type after system updates + +**Key difference from Scenario 2**: +- `runner-sweep`: Fix a **model**, sweep across runners → "Does this model work on all its runners?" +- `runner-model-sweep`: Fix a **runner type**, sweep across models → "Do all models work on this runner type?" + +## Additional Use Cases with `full-sweep` + +The `full-sweep` command supports multiple filters that can be combined for targeted testing: + +**Test all gptoss configurations on B200 with 1k1k sequence lengths:** +``` +full-sweep --model-prefix gptoss --runner-type b200 --seq-lens 1k1k --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml +``` + +**Test all fp8 precision configs across all runners for 1k8k workloads:** +``` +full-sweep --precision fp8 --seq-lens 1k8k --config-files .github/configs/nvidia-master.yaml .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml +``` + +**Test all TRT configs on H200 runners:** +``` +full-sweep --framework trt --runner-type h200 h200-trt --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml +``` + +**Quick smoke test of all configs (highest TP, lowest concurrency only):** +``` +full-sweep --test-mode --config-files .github/configs/nvidia-master.yaml .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml +``` + +**Test specific model on specific hardware with specific sequence lengths:** +``` +full-sweep --model-prefix dsr1 --runner-type b200 --precision fp4 --framework sglang --seq-lens 1k1k 8k1k --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml +``` + +## Custom One-off Tests + +**Scenario 4**: I want to run a quick test with a custom image, model, or configuration that isn't in the config files yet. + +Use the `custom` command to specify all parameters manually: +``` +custom --runner-label b200-nb_0 --image vllm/vllm-openai:v0.11.0 --model meta-llama/Llama-3.1-70B --framework vllm --precision fp8 --exp-name llama70b_test --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml +``` + +This runs a single 1k1k test job with your custom parameters on the specified runner node. Useful for: +- Testing new images before adding them to config files +- Quick validation of new models +- Experimenting with different frameworks or precisions diff --git a/.github/configs/CONFIGS.md b/.github/configs/CONFIGS.md new file mode 100644 index 000000000..218e17821 --- /dev/null +++ b/.github/configs/CONFIGS.md @@ -0,0 +1,52 @@ +# Configs + +The config files in this directory are meant to be a "source of truth" for what benchmark configurations can/should be run. As such, they must follow a precise format which is described below. + +## Master Configs (AMD, NVIDIA, etc.) + +```yaml +entry-name: + image: string + model: string + model-prefix: string + runner: string + precision: string + framework: string + seq-len-configs: + - isl: int + osl: int + search-space: + - { tp: int, conc-start: int, conc-end: int } + # Optionally, specify 'ep' (expert-parallelism) and 'dp-attn' (data parallel attention) + - { tp: int, ep: int, dp-attn: bool, conc-start: int, conc-end: int } + - ... + - ... +``` +Note: while not required, `entry-name` typically takes the format `---`. + +The below list describes what each field is: + +- `image`: The image used to serve the benchmark, e.g., `vllm/vllm-openai:v0.10.2` +- `model`: The model to server, e.g., `openai/gpt-oss-120b` +- `model-prefix`: The canonical InferenceMAX model prefix reference, i.e., `dsr1` for Deepseek, `gptoss` for gptoss-120b, etc. This value is used to decipher which script in `benchmarks/` should be used in order to launch the benchmark. +- `runner`: This is the runner on which to run the benchmark. This must be a valid runner (key or value) from `runners.yaml`. +- `precision`: The precision to run the benchmark. Again, this is used to find which script to run in `benchmarks/`. +- `framework`: The framework (serving runtime) to serve the benchmark, e.g., `vllm`, `sglang`, `trt`. +- `seq-len-configs`: A list of possible sequence lengths to benchmark. Each entry must have the following fields: + - `isl`: An integer representing the input sequence length, e.g., `1024` + - `osl`: An integer representing the output sequence length, e.g., `8192` + - `search-space`: A list of configurations to run with respective `isl` and `osl`, each entry must be a dict with the following fields: + - `tp`: An integer representing the tensor parallelism level that the configuration will be served at. + - `conc-start`: An integer representing the starting level of concurrency e.g., `4` + - `conc-end`: An integer representing the ending level of concurrency (inclusive) e.g., `128` + - Note: the step factor between `conc-start` and `conc-end` is 2, so if `conc-start` is 4 and `conc-end` is 128, all concurrencies `4, 8, 16, 32, ..., 128` will be run. + - (Optional) `ep`: An integer representing the expert parallelism level that the configuration will be served at. Default is 1 (no expert parallelism) when not specified. + - (Optional) `dp-attn`: A boolean representing whether or not to activate data parallel attention for the configuration. Default is false when not specified. + +Notes: +- No extra fields besides the ones listed may be specified, or else the benchmarks will fail to run. +- Setting the fields above, particularly `ep` and `dp-attn`, only guarantee that the respective values will be passed as environment variables to the benchmark scripts! Actually using those environment variables is an implementation detail at the level of the benchmark Bash script. + +## Runners + +The `runners.yaml` config represents the available runners in the repository. The keys are the runner *types* (i.e., the GPUs as well as some specific combinations like `h200-trt`) whereas the value is a list of *runner nodes*. This config is used to verify the master configs. diff --git a/.github/configs/amd-master.yaml b/.github/configs/amd-master.yaml new file mode 100644 index 000000000..82251c8be --- /dev/null +++ b/.github/configs/amd-master.yaml @@ -0,0 +1,171 @@ +dsr1-fp4-mi355x-sglang: + image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915 + model: amd/DeepSeek-R1-0528-MXFP4-Preview + model-prefix: dsr1 + runner: mi355x + precision: fp4 + framework: sglang + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + +dsr1-fp8-mi300x-sglang: + image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915 + model: deepseek-ai/DeepSeek-R1-0528 + model-prefix: dsr1 + runner: mi300x + precision: fp8 + framework: sglang + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + +dsr1-fp8-mi325x-sglang: + image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915 + model: deepseek-ai/DeepSeek-R1-0528 + model-prefix: dsr1 + runner: mi325x + precision: fp8 + framework: sglang + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + +dsr1-fp8-mi355x-sglang: + image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915 + model: deepseek-ai/DeepSeek-R1-0528 + model-prefix: dsr1 + runner: mi355x + precision: fp8 + framework: sglang + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + +gptoss-fp4-mi300x-vllm: + image: rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1 + model: openai/gpt-oss-120b + model-prefix: gptoss + runner: mi300x + precision: fp4 + framework: vllm + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 1, conc-start: 64, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 16 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 1, conc-start: 64, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 16 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 16 } + +gptoss-fp4-mi325x-vllm: + image: rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1 + model: openai/gpt-oss-120b + model-prefix: gptoss + runner: mi325x + precision: fp4 + framework: vllm + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 1, conc-start: 64, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 64, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 8 } + - { tp: 4, conc-start: 4, conc-end: 8 } + - { tp: 8, conc-start: 4, conc-end: 16 } + +gptoss-fp4-mi355x-vllm: + image: rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1 + model: openai/gpt-oss-120b + model-prefix: gptoss + runner: mi355x + precision: fp4 + framework: vllm + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 8 } + - { tp: 8, conc-start: 4, conc-end: 16 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 8 } + - { tp: 8, conc-start: 4, conc-end: 16 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 4 } + - { tp: 8, conc-start: 4, conc-end: 8 } diff --git a/.github/configs/nvidia-master.yaml b/.github/configs/nvidia-master.yaml new file mode 100644 index 000000000..e9af1ce19 --- /dev/null +++ b/.github/configs/nvidia-master.yaml @@ -0,0 +1,316 @@ +dsr1-fp4-b200-sglang: + image: lmsysorg/sglang:v0.5.3rc1-cu129-b200 + model: nvidia/DeepSeek-R1-0528-FP4-V2 + model-prefix: dsr1 + runner: b200 + precision: fp4 + framework: sglang + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 4, conc-start: 4, conc-end: 128 } + - { tp: 8, conc-start: 4, conc-end: 128 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 4, conc-start: 4, conc-end: 128 } + - { tp: 8, conc-start: 4, conc-end: 128 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 4, conc-start: 4, conc-end: 128 } + - { tp: 8, conc-start: 4, conc-end: 16 } + +dsr1-fp4-b200-trt: + image: nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2 + model: nvidia/DeepSeek-R1-0528-FP4-V2 + model-prefix: dsr1 + runner: b200-trt + precision: fp4 + framework: trt + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + # If TP=4, + # If CONC > 32, then EP=4 + # If CONC >= 256, DP_ATTN=true + - { tp: 4, conc-start: 4, conc-end: 32 } + - { tp: 4, ep: 4, conc-start: 64, conc-end: 128 } + - { tp: 4, ep: 4, dp-attn: true, conc-start: 256, conc-end: 256 } + # If TP=8, + # If CONC > 8, then EP=8 + # If CONC >= 256, DP_ATTN=true + - { tp: 8, conc-start: 4, conc-end: 8 } + - { tp: 8, ep: 8, conc-start: 16, conc-end: 128 } + - { tp: 8, ep: 8, dp-attn: true, conc-start: 256, conc-end: 256 } + - isl: 1024 + osl: 8192 + search-space: + # If TP=4, + # If CONC > 32, then EP=4 + # If CONC >= 256, DP_ATTN=true + - { tp: 4, conc-start: 4, conc-end: 32 } + - { tp: 4, ep: 4, conc-start: 64, conc-end: 128 } + - { tp: 4, ep: 4, dp-attn: true, conc-start: 256, conc-end: 256 } + # If TP=8, + # If CONC > 16, then EP=8 + # If CONC >= 256, DP_ATTN=true + - { tp: 8, conc-start: 4, conc-end: 16 } + - { tp: 8, ep: 8, conc-start: 32, conc-end: 128 } + - { tp: 8, ep: 8, dp-attn: true, conc-start: 256, conc-end: 256 } + - isl: 8192 + osl: 1024 + search-space: + # If TP=4, + # If CONC > 32, then EP=4 and DP_ATTN=true + - { tp: 4, ep: 4, conc-start: 4, conc-end: 32 } + - { tp: 4, ep: 4, dp-attn: true, conc-start: 64, conc-end: 256 } + # If TP=8, + # If CONC > 32, then EP=8 and DP_ATTN=true + - { tp: 8, conc-start: 4, conc-end: 32 } + - { tp: 8, ep: 8, dp-attn: true, conc-start: 64, conc-end: 256 } + +dsr1-fp8-b200-sglang: + image: lmsysorg/sglang:v0.5.3rc1-cu129-b200 + model: deepseek-ai/DeepSeek-R1-0528 + model-prefix: dsr1 + runner: b200 + precision: fp8 + framework: sglang + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + +dsr1-fp8-b200-trt: + image: nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2 + model: deepseek-ai/DeepSeek-R1-0528 + model-prefix: dsr1 + runner: b200-trt + precision: fp8 + framework: trt + seq-len-configs: + # For all sequence lengths, EP=TP + - isl: 1024 + osl: 1024 + search-space: + # If CONC > 32, then DP_ATTN=true + - { tp: 8, ep: 8, conc-start: 4, conc-end: 32 } + - { tp: 8, ep: 8, dp-attn: true, conc-start: 64, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + # If CONC > 64, then DP_ATTN=true + - { tp: 8, ep: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + # If CONC > 64, then DP_ATTN=true + - { tp: 8, ep: 8, conc-start: 4, conc-end: 64 } + +dsr1-fp8-h200-sglang: + image: lmsysorg/sglang:v0.5.2rc2-cu126 + model: deepseek-ai/DeepSeek-R1-0528 + model-prefix: dsr1 + runner: h200 + precision: fp8 + framework: sglang + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 8, conc-start: 4, conc-end: 64 } + +dsr1-fp8-h200-trt: + image: nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2 + model: deepseek-ai/DeepSeek-R1-0528 + model-prefix: dsr1 + runner: h200-trt + precision: fp8 + framework: trt + # For all sequence lengths, EP=TP + seq-len-configs: + - isl: 1024 + osl: 1024 + # If CONC > 64, then DP_ATTN=true + search-space: + - { tp: 8, ep: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + # If CONC > 64, then DP_ATTN=true + search-space: + - { tp: 8, ep: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + # If CONC > 32, then DP_ATTN=true + search-space: + - { tp: 8, ep: 8, conc-start: 4, conc-end: 32 } + - { tp: 8, ep: 8, dp-attn: true, conc-start: 64, conc-end: 64 } + +gptoss-fp4-b200-trt: + image: nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1 + model: openai/gpt-oss-120b + model-prefix: gptoss + runner: b200-nvs + precision: fp4 + framework: trt + # For all sequence lengths, if CONC >= 256, then EP=TP and DP_ATTN=true + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 1, conc-start: 64, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 8 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 1, conc-start: 64, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 1, conc-start: 64, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 8 } + +gptoss-fp4-b200-vllm: + image: vllm/vllm-openai:v0.10.2 + model: openai/gpt-oss-120b + model-prefix: gptoss + runner: b200 + precision: fp4 + framework: vllm + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 8 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 8 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 4 } + +gptoss-fp4-h100-vllm: + image: vllm/vllm-openai:v0.10.2 + model: openai/gpt-oss-120b + model-prefix: gptoss + runner: h100 + precision: fp4 + framework: vllm + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 16 } + +gptoss-fp4-h200-trt: + image: nvcr.io#nvidia/tensorrt-llm/release:gpt-oss-dev + model: openai/gpt-oss-120b + model-prefix: gptoss + runner: h200-trt + precision: fp4 + framework: trt + # For all sequence lengths, EP=TP, DP_ATTENTION=false + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 1, ep: 1, dp-attn: false, conc-start: 4, conc-end: 64 } + - { tp: 2, ep: 2, dp-attn: false, conc-start: 4, conc-end: 64 } + - { tp: 4, ep: 4, dp-attn: false, conc-start: 4, conc-end: 32 } + - { tp: 8, ep: 8, dp-attn: false, conc-start: 4, conc-end: 8 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 1, ep: 1, dp-attn: false, conc-start: 4, conc-end: 64 } + - { tp: 2, ep: 2, dp-attn: false, conc-start: 4, conc-end: 64 } + - { tp: 4, ep: 4, dp-attn: false, conc-start: 4, conc-end: 64 } + - { tp: 8, ep: 8, dp-attn: false, conc-start: 4, conc-end: 8 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 1, ep: 1, dp-attn: false, conc-start: 4, conc-end: 64 } + - { tp: 2, ep: 2, dp-attn: false, conc-start: 4, conc-end: 64 } + - { tp: 4, ep: 4, dp-attn: false, conc-start: 4, conc-end: 64 } + - { tp: 8, ep: 8, dp-attn: false, conc-start: 4, conc-end: 8 } + +gptoss-fp4-h200-vllm: + image: vllm/vllm-openai:v0.10.2 + model: openai/gpt-oss-120b + model-prefix: gptoss + runner: h200 + precision: fp4 + framework: vllm + seq-len-configs: + - isl: 1024 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 4 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 1024 + osl: 8192 + search-space: + - { tp: 1, conc-start: 4, conc-end: 4 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 64 } + - isl: 8192 + osl: 1024 + search-space: + - { tp: 1, conc-start: 4, conc-end: 64 } + - { tp: 2, conc-start: 4, conc-end: 64 } + - { tp: 4, conc-start: 4, conc-end: 64 } + - { tp: 8, conc-start: 4, conc-end: 32 } diff --git a/.github/configs/runners.yaml b/.github/configs/runners.yaml new file mode 100644 index 000000000..cdd865561 --- /dev/null +++ b/.github/configs/runners.yaml @@ -0,0 +1,61 @@ +h100: +- 'h100-cr_0' +- 'h100-cr_1' +- 'h100-cw_0' +- 'h100-cw_1' +h200: +- 'h200-cw_0' +- 'h200-cw_1' +- 'h200-nb_0' +- 'h200-nb_1' +- 'h200-nb_2' +- 'h200-nb_3' +- 'h200-nv_0' +- 'h200-nv_1' +- 'h200-nv_2' +- 'h200-nv_3' +h200-trt: +- 'h200-cw_0' +- 'h200-cw_1' +- 'h200-nb_0' +- 'h200-nb_1' +- 'h200-nb_2' +- 'h200-nb_3' +- 'h200-nv_0' +- 'h200-nv_1' +- 'h200-nv_2' +- 'h200-nv_3' +b200-trt: +- 'b200-nv_0' +- 'b200-nv_1' +b200-nvs: +- 'b200-nv_0' +- 'b200-nv_1' +b200: +- 'b200-nb_0' +- 'b200-nb_1' +- 'b200-nvd_0' +- 'b200-nvd_1' +- 'b200-nvd_2' +- 'b200-nvd_3' +mi300x: +- 'mi300x-amd_0' +- 'mi300x-amd_1' +- 'mi300x-amd_2' +- 'mi300x-amd_3' +- 'mi300x-amd_4' +- 'mi300x-cr_0' +- 'mi300x-oci_0' +mi325x: +- 'mi325x-amd_0' +- 'mi325x-tw_0' +- 'mi325x-tw_1' +- 'mi325x-tw_2' +- 'mi325x-tw_3' +mi355x: +- 'mi355x-amd_0' +- 'mi355x-amd_1' +- 'mi355x-amd_2' +- 'mi355x-amd_3' +gb200: +- gb200-nv_0 diff --git a/.github/workflows/benchmark-multinode-tmpl.yml b/.github/workflows/benchmark-multinode-tmpl.yml index 07f5b876d..4b079f578 100644 --- a/.github/workflows/benchmark-multinode-tmpl.yml +++ b/.github/workflows/benchmark-multinode-tmpl.yml @@ -31,8 +31,9 @@ on: required: true type: string random-range-ratio: - required: true + required: false type: string + default: '0.8' mtp-mode: required: true type: string @@ -85,6 +86,8 @@ jobs: fi - name: Process results + env: + RUNNER_TYPE: ${{ inputs.runner }} run: | # Process each result file for result_file in ${RESULT_FILENAME}_*.json; do @@ -93,7 +96,7 @@ jobs: # Extract GPU count from filename for tp_size calculation gpus=$(echo "$result_file" | sed "s/.*_gpus\([0-9]*\)\.json/\1/") if [ -n "$gpus" ]; then - python3 utils/process_result.py ${{ inputs.runner }} $gpus ${result_file%.json} $FRAMEWORK $PRECISION $MTP_MODE + TP=$gpus RESULT_FILENAME=${result_file%.json} EP_SIZE=1 DP_ATTENTION=false python3 utils/process_result.py fi fi done diff --git a/.github/workflows/benchmark-tmpl.yml b/.github/workflows/benchmark-tmpl.yml index 313087946..8d041bc73 100644 --- a/.github/workflows/benchmark-tmpl.yml +++ b/.github/workflows/benchmark-tmpl.yml @@ -11,10 +11,10 @@ on: model: required: true type: string - framework: + precision: required: true type: string - precision: + framework: required: true type: string exp-name: @@ -26,18 +26,25 @@ on: osl: required: true type: string - max-model-len: + tp: required: true type: string - random-range-ratio: + ep: + required: true + type: string + dp-attn: + required: true + type: boolean + max-model-len: required: true type: string - tp-list: + conc: required: true type: string - conc-list: + random-range-ratio: + required: false type: string - default: '[4, 8, 16, 32, 64]' + default: '0.8' env: HF_TOKEN: ${{ secrets.HF_TOKEN }} @@ -51,23 +58,16 @@ env: IMAGE: ${{ inputs.image }} FRAMEWORK: ${{ inputs.framework }} PRECISION: ${{ inputs.precision }} + TP: ${{ inputs.tp }} + EP_SIZE: ${{ inputs.ep }} + DP_ATTENTION: ${{ inputs.dp-attn }} + CONC: ${{ inputs.conc }} jobs: benchmark: runs-on: ${{ inputs.runner }} timeout-minutes: 180 - - strategy: - fail-fast: false - matrix: - tp: ${{ fromJson(inputs.tp-list) }} - conc: ${{ fromJson(inputs.conc-list) }} - name: '${{ inputs.exp-name }} ${{ inputs.runner }} ${{ inputs.precision }} tp${{ matrix.tp }} conc${{ matrix.conc }}' - - env: - TP: ${{ matrix.tp }} - CONC: ${{ matrix.conc }} - + name: '${{ inputs.exp-name }} ${{ inputs.runner }} ${{ inputs.precision }} tp=${{ inputs.tp }} ep=${{ inputs.ep }} dpa=${{ inputs.dp-attn }} conc=${{ inputs.conc }}' steps: - name: Resource cleanup run: | @@ -127,7 +127,7 @@ jobs: - name: Launch job script env: RUNNER_NAME: ${{ runner.name }} - RESULT_FILENAME: ${{ env.EXP_NAME }}_${{ env.PRECISION }}_${{ env.FRAMEWORK }}_tp${{ env.TP }}_conc${{ env.CONC }}_${{ runner.name }} + RESULT_FILENAME: ${{ env.EXP_NAME }}_${{ env.PRECISION }}_${{ env.FRAMEWORK }}_tp${{ env.TP }}_ep${{ env.EP_SIZE }}_dpa_${{ env.DP_ATTENTION }}_conc${{ env.CONC }}_${{ runner.name }} run: | bash ./runners/launch_${RUNNER_NAME%%_*}.sh if [ -f "$RESULT_FILENAME.json" ]; then @@ -138,11 +138,12 @@ jobs: fi - name: Process result + env: + RUNNER_TYPE: ${{ inputs.runner }} run: | - python3 utils/process_result.py ${{ inputs.runner }} $TP $RESULT_FILENAME $FRAMEWORK $PRECISION - + python3 utils/process_result.py - name: Upload result uses: actions/upload-artifact@v4 with: name: ${{ env.RESULT_FILENAME }} - path: agg_${{ env.RESULT_FILENAME }}.json + path: agg_${{ env.RESULT_FILENAME }}.json \ No newline at end of file diff --git a/.github/workflows/collect-results.yml b/.github/workflows/collect-results.yml index 14c499c0d..c1799117e 100644 --- a/.github/workflows/collect-results.yml +++ b/.github/workflows/collect-results.yml @@ -40,7 +40,6 @@ jobs: run: | pip install -q matplotlib python3 utils/plot_perf.py results/ ${{ inputs.exp-name }} - - name: Upload performance graphs uses: actions/upload-artifact@v4 with: diff --git a/.github/workflows/dsr1-tmpl.yml b/.github/workflows/dsr1-tmpl.yml deleted file mode 100644 index 3a48710f2..000000000 --- a/.github/workflows/dsr1-tmpl.yml +++ /dev/null @@ -1,265 +0,0 @@ -name: Template - DeepSeek R1 - -on: - workflow_call: - inputs: - exp-name: - required: true - type: string - isl: - required: true - type: string - osl: - required: true - type: string - max-model-len: - required: true - type: string - random-range-ratio: - required: true - type: string - - use_h200: - type: boolean - required: true - use_b200: - type: boolean - required: true - use_mi300x: - type: boolean - required: true - use_mi325x: - type: boolean - required: true - use_mi355x: - type: boolean - required: true - use_gb200: - type: boolean - required: false - default: false - -jobs: - bmk-h200-fp8: - if: ${{ inputs.use_h200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: h200 - image: 'lmsysorg/sglang:v0.5.2rc2-cu126' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'sglang' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[8]' - - bmk-h200-trt-fp8: - if: ${{ inputs.use_h200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: h200-trt - image: 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'trt' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[8]' - - bmk-b200-fp8: - if: ${{ inputs.use_b200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: b200 - image: 'lmsysorg/sglang:v0.5.3rc1-cu129-b200' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'sglang' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[8]' - - bmk-b200-trt-fp8: - if: ${{ inputs.use_b200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: b200-trt - image: 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'trt' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[8]' - - bmk-mi300x-fp8: - if: ${{ inputs.use_mi300x }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: mi300x - image: 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'sglang' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[8]' - - bmk-mi325x-fp8: - if: ${{ inputs.use_mi325x }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: mi325x - image: 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'sglang' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[8]' - - bmk-mi355x-fp8: - if: ${{ inputs.use_mi355x }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: mi355x - image: 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'sglang' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[8]' - - bmk-b200-fp4: - if: ${{ inputs.use_b200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: b200 - image: 'lmsysorg/sglang:v0.5.3rc1-cu129-b200' - model: 'nvidia/DeepSeek-R1-0528-FP4' - framework: 'sglang' - precision: 'fp4' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[4,8]' - conc-list: '[4, 8, 16, 32, 64, 128]' # Custom concurrency values for this job - - bmk-b200-trt-fp4: - if: ${{ inputs.use_b200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: b200-trt - image: 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2' - model: 'nvidia/DeepSeek-R1-0528-FP4-v2' - framework: 'trt' - precision: fp4 - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - tp-list: '[4, 8]' - conc-list: '[4, 8, 16, 32, 64, 128, 256]' # DPA4EP4 is already 30 tok/s/user and DPA8EP8 is already 35tok/s/user. 512 conc would be too much so we skipping it - - bmk-mi355x-fp4: - if: ${{ inputs.use_mi355x }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: mi355x - image: 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915' - framework: 'sglang' - precision: 'fp4' - model: 'amd/DeepSeek-R1-0528-MXFP4-Preview' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - # These tensor parallelism settings are not necessary as they cannot fall on the Pareto frontier with this particular container - we remove them to save CI time. - tp-list: ${{ inputs.isl == 1024 && inputs.osl == 1024 && '[4, 8]' || '[8]' }} - - bmk-gb200-fp4-multinode-mtp-off: - if: ${{ inputs.use_gb200 && !(inputs.isl == '1024' && inputs.osl == '8192') }} - uses: ./.github/workflows/benchmark-multinode-tmpl.yml - secrets: inherit - with: - runner: gb200 - image: 'nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3' - model: 'deepseek-r1-fp4' - framework: 'dynamo-trtllm' - precision: 'fp4' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - mtp-mode: 'off' - - bmk-gb200-fp4-multinode-mtp-on: - if: ${{ inputs.use_gb200 && !(inputs.isl == '1024' && inputs.osl == '8192') }} - uses: ./.github/workflows/benchmark-multinode-tmpl.yml - secrets: inherit - with: - runner: gb200 - image: 'nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3' - model: 'deepseek-r1-fp4' - framework: 'dynamo-trtllm' - precision: 'fp4' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - mtp-mode: 'on' - - bmk-gb200-fp8-multinode: - if: ${{ inputs.use_gb200 && !(inputs.isl == '1024' && inputs.osl == '8192') }} - uses: ./.github/workflows/benchmark-multinode-tmpl.yml - secrets: inherit - with: - runner: gb200 - image: 'nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.5.1-rc0.pre1' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'dynamo-sglang' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - mtp-mode: 'off' diff --git a/.github/workflows/e2e-tests.yml b/.github/workflows/e2e-tests.yml new file mode 100644 index 000000000..1d13b3a87 --- /dev/null +++ b/.github/workflows/e2e-tests.yml @@ -0,0 +1,82 @@ +name: End-to-End Tests +run-name: e2e Test - ${{ github.event.inputs.generate-cli-command }} + +on: + workflow_dispatch: + inputs: + generate-cli-command: + description: "Command passed to generate matrix script" + required: true + type: string + +jobs: + get-jobs: + runs-on: ubuntu-latest + outputs: + search-space-config: ${{ steps.get-jobs.outputs.search-space-config }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - id: get-jobs + run: | + pip install pydantic + CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py ${{ inputs.generate-cli-command }}) + echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT + + test-sweep: + needs: get-jobs + uses: ./.github/workflows/benchmark-tmpl.yml + name: ${{ inputs.generate-cli-command }} + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-jobs.outputs.search-space-config) }} + secrets: inherit + with: + exp-name: ${{ matrix.config.exp-name }} + isl: ${{ matrix.config.isl }} + osl: ${{ matrix.config.osl }} + max-model-len: ${{ matrix.config.max-model-len }} + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} + + calc-success-rate: + needs: test-sweep + if: ${{ always() }} + runs-on: ubuntu-latest + + env: + RESULTS_DIR: "results/" + STATS_FILENAME: "run_stats" + GITHUB_TOKEN: ${{ secrets.REPO_PAT }} + + steps: + - uses: actions/checkout@v3 + with: + token: ${{ secrets.REPO_PAT }} + fetch-depth: 0 + + - name: Download results artifacts + uses: actions/download-artifact@v4 + with: + path: ${{ env.RESULTS_DIR }} + pattern: results_* + + - name: Install python dependencies + run: pip install PyGithub + + - name: Calculate success rate + run: python3 utils/calc_success_rate.py $STATS_FILENAME + + - uses: actions/upload-artifact@v4 + with: + name: "run-stats" + path: ${{ env.STATS_FILENAME }}.json diff --git a/.github/workflows/full-sweep-1k1k-scheduler.yml b/.github/workflows/full-sweep-1k1k-scheduler.yml index 601c760b3..6e2128218 100644 --- a/.github/workflows/full-sweep-1k1k-scheduler.yml +++ b/.github/workflows/full-sweep-1k1k-scheduler.yml @@ -1,59 +1,144 @@ -name: Full Sweep Scheduler - 1k1k - -concurrency: - group: benchmark-lock-1k1k - cancel-in-progress: true +name: "Full Sweep Scheduler - 1k1k" on: - workflow_dispatch: - schedule: - - cron: '0 23 * * *' + workflow_dispatch: + schedule: + - cron: "0 3 * * *" jobs: - mega-run: - uses: ./.github/workflows/full-sweep-tmpl.yml - secrets: inherit - with: - run_1k1k: true - run_8k1k: false - run_1k8k: false - use_h100: true - use_h200: true - use_b200: true - use_mi300x: true - use_mi325x: true - use_mi355x: true - use_gb200: true - - calc-success-rate: - needs: mega-run - if: ${{ always() }} - runs-on: ubuntu-latest - - env: - RESULTS_DIR: "results/" - STATS_FILENAME: "run_stats" - GITHUB_TOKEN: ${{ secrets.REPO_PAT }} - - steps: - - uses: actions/checkout@v3 + get-dsr1-configs: + runs-on: ubuntu-latest + outputs: + search-space-config: ${{ steps.get-dsr1-configs.outputs.search-space-config }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - id: get-dsr1-configs + run: | + pip install pydantic + CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k1k --model-prefix dsr1) + echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT + + get-gptoss-configs: + runs-on: ubuntu-latest + outputs: + search-space-config: ${{ steps.get-gptoss-configs.outputs.search-space-config }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - id: get-gptoss-configs + run: | + pip install pydantic + CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k1k --model-prefix gptoss) + echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT + + benchmark-dsr1: + needs: get-dsr1-configs + uses: ./.github/workflows/benchmark-tmpl.yml + name: dsr1 1k1k + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-dsr1-configs.outputs.search-space-config) }} + secrets: inherit with: - token: ${{ secrets.REPO_PAT }} - fetch-depth: 0 + exp-name: "dsr1_1k1k" + isl: 1024 + osl: 1024 + max-model-len: 2048 + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} - - name: Download results artifacts - uses: actions/download-artifact@v4 + benchmark-gptoss: + needs: get-gptoss-configs + uses: ./.github/workflows/benchmark-tmpl.yml + name: gptoss 1k1k + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-gptoss-configs.outputs.search-space-config) }} + secrets: inherit with: - path: ${{ env.RESULTS_DIR }} - pattern: results_* + exp-name: "gptoss_1k1k" + isl: 1024 + osl: 1024 + max-model-len: 2048 + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} - - name: Install python dependencies - run: pip install PyGithub + # This is a workaround until we can integrate GB200 into master configs. + benchmark-gb200: + uses: ./.github/workflows/benchmark-multinode-tmpl.yml + name: gb200 1k1k sweep + strategy: + fail-fast: false + matrix: + config: + - { + "image": "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3", + "model": "deepseek-r1-fp4", + "model-prefix": "dsr1", + "precision": "fp4", + "framework": "dynamo-trtllm", + "mtp": "off", + } + - { + "image": "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3", + "model": "deepseek-r1-fp4", + "model-prefix": "dsr1", + "precision": "fp4", + "framework": "dynamo-trtllm", + "mtp": "on", + } + - { + "image": "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.5.1-rc0.pre1", + "model": "deepseek-ai/DeepSeek-R1-0528", + "model-prefix": "dsr1", + "precision": "fp8", + "framework": "dynamo-sglang", + "mtp": "off", + } + secrets: inherit + with: + runner: gb200 + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + exp-name: ${{ matrix.config.model-prefix }}_1k1k + isl: 1024 + osl: 1024 + max-model-len: 2048 + mtp-mode: ${{ matrix.config.mtp }} - - name: Calculate success rate - run: python3 utils/calc_success_rate.py $STATS_FILENAME + collect-dsr1-results: + needs: [benchmark-dsr1, benchmark-gb200] + if: ${{ always() }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit + with: + exp-name: "dsr1_1k1k" - - uses: actions/upload-artifact@v4 + collect-gptoss-results: + needs: benchmark-gptoss + if: ${{ always() }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit with: - name: "run-stats" - path: ${{ env.STATS_FILENAME }}.json + exp-name: "gptoss_1k1k" diff --git a/.github/workflows/full-sweep-1k8k-scheduler.yml b/.github/workflows/full-sweep-1k8k-scheduler.yml index 967935335..b8437969e 100644 --- a/.github/workflows/full-sweep-1k8k-scheduler.yml +++ b/.github/workflows/full-sweep-1k8k-scheduler.yml @@ -1,59 +1,144 @@ -name: Full Sweep Scheduler - 1k8k - -concurrency: - group: benchmark-lock-1k8k - cancel-in-progress: true +name: "Full Sweep Scheduler - 1k8k" on: - workflow_dispatch: - schedule: - - cron: '0 23 * * *' + workflow_dispatch: + schedule: + - cron: "0 3 * * *" jobs: - mega-run: - uses: ./.github/workflows/full-sweep-tmpl.yml - secrets: inherit - with: - run_1k1k: false - run_8k1k: false - run_1k8k: true - use_h100: true - use_h200: true - use_b200: true - use_mi300x: true - use_mi325x: true - use_mi355x: true - use_gb200: true - - calc-success-rate: - needs: mega-run - if: ${{ always() }} - runs-on: ubuntu-latest - - env: - RESULTS_DIR: "results/" - STATS_FILENAME: "run_stats" - GITHUB_TOKEN: ${{ secrets.REPO_PAT }} - - steps: - - uses: actions/checkout@v3 + get-dsr1-configs: + runs-on: ubuntu-latest + outputs: + search-space-config: ${{ steps.get-dsr1-configs.outputs.search-space-config }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - id: get-dsr1-configs + run: | + pip install pydantic + CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k8k --model-prefix dsr1) + echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT + + get-gptoss-configs: + runs-on: ubuntu-latest + outputs: + search-space-config: ${{ steps.get-gptoss-configs.outputs.search-space-config }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - id: get-gptoss-configs + run: | + pip install pydantic + CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k8k --model-prefix gptoss) + echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT + + benchmark-dsr1: + needs: get-dsr1-configs + uses: ./.github/workflows/benchmark-tmpl.yml + name: dsr1 1k8k + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-dsr1-configs.outputs.search-space-config) }} + secrets: inherit with: - token: ${{ secrets.REPO_PAT }} - fetch-depth: 0 + exp-name: "dsr1_1k8k" + isl: 1024 + osl: 1024 + max-model-len: 2048 + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} - - name: Download results artifacts - uses: actions/download-artifact@v4 + benchmark-gptoss: + needs: get-gptoss-configs + uses: ./.github/workflows/benchmark-tmpl.yml + name: gptoss 1k8k + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-gptoss-configs.outputs.search-space-config) }} + secrets: inherit with: - path: ${{ env.RESULTS_DIR }} - pattern: results_* + exp-name: "gptoss_1k8k" + isl: 1024 + osl: 1024 + max-model-len: 2048 + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} - - name: Install python dependencies - run: pip install PyGithub + # This is a workaround until we can integrate GB200 into master configs. + benchmark-gb200: + uses: ./.github/workflows/benchmark-multinode-tmpl.yml + name: gb200 1k8k sweep + strategy: + fail-fast: false + matrix: + config: + - { + "image": "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3", + "model": "deepseek-r1-fp4", + "model-prefix": "dsr1", + "precision": "fp4", + "framework": "dynamo-trtllm", + "mtp": "off", + } + - { + "image": "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3", + "model": "deepseek-r1-fp4", + "model-prefix": "dsr1", + "precision": "fp4", + "framework": "dynamo-trtllm", + "mtp": "on", + } + - { + "image": "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.5.1-rc0.pre1", + "model": "deepseek-ai/DeepSeek-R1-0528", + "model-prefix": "dsr1", + "precision": "fp8", + "framework": "dynamo-sglang", + "mtp": "off", + } + secrets: inherit + with: + runner: gb200 + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + exp-name: ${{ matrix.config.model-prefix }}_1k8k + isl: 1024 + osl: 8192 + max-model-len: 9216 + mtp-mode: ${{ matrix.config.mtp }} - - name: Calculate success rate - run: python3 utils/calc_success_rate.py $STATS_FILENAME + collect-dsr1-results: + needs: [benchmark-dsr1, benchmark-gb200] + if: ${{ always() }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit + with: + exp-name: "dsr1_1k8k" - - uses: actions/upload-artifact@v4 + collect-gptoss-results: + needs: benchmark-gptoss + if: ${{ always() }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit with: - name: "run-stats" - path: ${{ env.STATS_FILENAME }}.json + exp-name: "gptoss_1k8k" diff --git a/.github/workflows/full-sweep-8k1k-scheduler.yml b/.github/workflows/full-sweep-8k1k-scheduler.yml index 791d9e017..bc3cd07dc 100644 --- a/.github/workflows/full-sweep-8k1k-scheduler.yml +++ b/.github/workflows/full-sweep-8k1k-scheduler.yml @@ -1,59 +1,144 @@ -name: Full Sweep Scheduler - 8k1k - -concurrency: - group: benchmark-lock-8k1k - cancel-in-progress: true +name: "Full Sweep Scheduler - 8k1k" on: - workflow_dispatch: - schedule: - - cron: '0 23 * * *' + workflow_dispatch: + schedule: + - cron: "0 3 * * *" jobs: - mega-run: - uses: ./.github/workflows/full-sweep-tmpl.yml - secrets: inherit - with: - run_1k1k: false - run_8k1k: true - run_1k8k: false - use_h100: true - use_h200: true - use_b200: true - use_mi300x: true - use_mi325x: true - use_mi355x: true - use_gb200: true - - calc-success-rate: - needs: mega-run - if: ${{ always() }} - runs-on: ubuntu-latest - - env: - RESULTS_DIR: "results/" - STATS_FILENAME: "run_stats" - GITHUB_TOKEN: ${{ secrets.REPO_PAT }} - - steps: - - uses: actions/checkout@v3 + get-dsr1-configs: + runs-on: ubuntu-latest + outputs: + search-space-config: ${{ steps.get-dsr1-configs.outputs.search-space-config }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - id: get-dsr1-configs + run: | + pip install pydantic + CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 8k1k --model-prefix dsr1) + echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT + + get-gptoss-configs: + runs-on: ubuntu-latest + outputs: + search-space-config: ${{ steps.get-gptoss-configs.outputs.search-space-config }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - id: get-gptoss-configs + run: | + pip install pydantic + CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 8k1k --model-prefix gptoss) + echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT + + benchmark-dsr1: + needs: get-dsr1-configs + uses: ./.github/workflows/benchmark-tmpl.yml + name: dsr1 8k1k + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-dsr1-configs.outputs.search-space-config) }} + secrets: inherit with: - token: ${{ secrets.REPO_PAT }} - fetch-depth: 0 + exp-name: "dsr1_8k1k" + isl: 1024 + osl: 1024 + max-model-len: 2048 + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} - - name: Download results artifacts - uses: actions/download-artifact@v4 + benchmark-gptoss: + needs: get-gptoss-configs + uses: ./.github/workflows/benchmark-tmpl.yml + name: gptoss 8k1k + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-gptoss-configs.outputs.search-space-config) }} + secrets: inherit with: - path: ${{ env.RESULTS_DIR }} - pattern: results_* + exp-name: "gptoss_8k1k" + isl: 1024 + osl: 1024 + max-model-len: 2048 + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} - - name: Install python dependencies - run: pip install PyGithub + # This is a workaround until we can integrate GB200 into master configs. + benchmark-gb200: + uses: ./.github/workflows/benchmark-multinode-tmpl.yml + name: gb200 8k1k sweep + strategy: + fail-fast: false + matrix: + config: + - { + "image": "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3", + "model": "deepseek-r1-fp4", + "model-prefix": "dsr1", + "precision": "fp4", + "framework": "dynamo-trtllm", + "mtp": "off", + } + - { + "image": "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3", + "model": "deepseek-r1-fp4", + "model-prefix": "dsr1", + "precision": "fp4", + "framework": "dynamo-trtllm", + "mtp": "on", + } + - { + "image": "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.5.1-rc0.pre1", + "model": "deepseek-ai/DeepSeek-R1-0528", + "model-prefix": "dsr1", + "precision": "fp8", + "framework": "dynamo-sglang", + "mtp": "off", + } + secrets: inherit + with: + runner: gb200 + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + exp-name: ${{ matrix.config.model-prefix }}_8k1k + isl: 8192 + osl: 1024 + max-model-len: 9216 + mtp-mode: ${{ matrix.config.mtp }} - - name: Calculate success rate - run: python3 utils/calc_success_rate.py $STATS_FILENAME + collect-dsr1-results: + needs: [benchmark-dsr1, benchmark-gb200] + if: ${{ always() }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit + with: + exp-name: "dsr1_8k1k" - - uses: actions/upload-artifact@v4 + collect-gptoss-results: + needs: benchmark-gptoss + if: ${{ always() }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit with: - name: "run-stats" - path: ${{ env.STATS_FILENAME }}.json + exp-name: "gptoss_8k1k" diff --git a/.github/workflows/full-sweep-test.yml b/.github/workflows/full-sweep-test.yml index b134e407c..3657971ac 100644 --- a/.github/workflows/full-sweep-test.yml +++ b/.github/workflows/full-sweep-test.yml @@ -1,89 +1,439 @@ name: Test - Full Sweep -concurrency: - group: benchmark-lock - cancel-in-progress: false - on: - workflow_dispatch: - inputs: - run_1k1k: - type: boolean - required: false - run_8k1k: - type: boolean - required: false - run_1k8k: - type: boolean - required: false - - use_h100: - type: boolean - required: false - use_h200: - type: boolean - required: false - use_b200: - type: boolean - required: false - use_mi300x: - type: boolean - required: false - use_mi325x: - type: boolean - required: false - use_mi355x: - type: boolean - required: false - use_gb200: - type: boolean - required: false + workflow_dispatch: + inputs: + run_1k1k: + type: boolean + required: false + run_8k1k: + type: boolean + required: false + run_1k8k: + type: boolean + required: false + use_h100: + type: boolean + required: false + use_h200: + type: boolean + required: false + use_b200: + type: boolean + required: false + use_mi300x: + type: boolean + required: false + use_mi325x: + type: boolean + required: false + use_mi355x: + type: boolean + required: false + use_gb200: + type: boolean + required: false jobs: - mega-test-run: - uses: ./.github/workflows/full-sweep-tmpl.yml - secrets: inherit - with: - run_1k1k: ${{ inputs.run_1k1k }} - run_8k1k: ${{ inputs.run_8k1k }} - run_1k8k: ${{ inputs.run_1k8k }} - use_h100: ${{ inputs.use_h100 }} - use_h200: ${{ inputs.use_h200 }} - use_b200: ${{ inputs.use_b200 }} - use_mi300x: ${{ inputs.use_mi300x }} - use_mi325x: ${{ inputs.use_mi325x }} - use_mi355x: ${{ inputs.use_mi355x }} - use_gb200: ${{ inputs.use_gb200 }} - - calc-success-rate: - needs: mega-test-run - if: ${{ always() }} - runs-on: ubuntu-latest - - env: - RESULTS_DIR: "results/" - STATS_FILENAME: "run_stats" - GITHUB_TOKEN: ${{ secrets.REPO_PAT }} - - steps: - - uses: actions/checkout@v3 + get-configs: + runs-on: ubuntu-latest + outputs: + dsr1-1k1k: ${{ steps.generate-configs.outputs.dsr1-1k1k }} + dsr1-1k8k: ${{ steps.generate-configs.outputs.dsr1-1k8k }} + dsr1-8k1k: ${{ steps.generate-configs.outputs.dsr1-8k1k }} + gptoss-1k1k: ${{ steps.generate-configs.outputs.gptoss-1k1k }} + gptoss-1k8k: ${{ steps.generate-configs.outputs.gptoss-1k8k }} + gptoss-8k1k: ${{ steps.generate-configs.outputs.gptoss-8k1k }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + + # This looks complicated, but it is just calling generate_sweep_configs.py conditioned on + # discrete inputs (i.e., run_1k1k, run_h100, etc.) to split the test sweep into discrete jobs + - id: generate-configs + run: | + pip install pydantic + + set -x + # Build runner type filters based on inputs + RUNNER_TYPES="${{ inputs.use_h100 && 'h100' || '' }} ${{ inputs.use_h200 && 'h200' || '' }} ${{ inputs.use_h200 && 'h200 h200-trt' || '' }} ${{ inputs.use_b200 && 'b200 b200-trt' || '' }} ${{ inputs.use_mi300x && 'mi300x' || '' }} ${{ inputs.use_mi325x && 'mi325x' || '' }} ${{ inputs.use_mi355x && 'mi355x' || '' }}" + + # DSR1 doesn't support H100, so exclude it + DSR1_RUNNER_TYPES=$(echo $RUNNER_TYPES | sed 's/\bh100\b//g' | xargs) + + # Generate dsr1 configs (only if we have valid runner types for DSR1) + if [ "${{ inputs.run_1k1k }}" = "true" ] && [ -n "$DSR1_RUNNER_TYPES" ]; then + DSR1_1K1K=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k1k --model-prefix dsr1 --runner-type $DSR1_RUNNER_TYPES --runner-config ${GITHUB_WORKSPACE}/.github/configs/runners.yaml) + echo "dsr1-1k1k=$DSR1_1K1K" >> $GITHUB_OUTPUT + else + echo "dsr1-1k1k=[]" >> $GITHUB_OUTPUT + fi + + if [ "${{ inputs.run_1k8k }}" = "true" ] && [ -n "$DSR1_RUNNER_TYPES" ]; then + DSR1_1K8K=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k8k --model-prefix dsr1 --runner-type $DSR1_RUNNER_TYPES --runner-config ${GITHUB_WORKSPACE}/.github/configs/runners.yaml) + echo "dsr1-1k8k=$DSR1_1K8K" >> $GITHUB_OUTPUT + else + echo "dsr1-1k8k=[]" >> $GITHUB_OUTPUT + fi + + if [ "${{ inputs.run_8k1k }}" = "true" ] && [ -n "$DSR1_RUNNER_TYPES" ]; then + DSR1_8K1K=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 8k1k --model-prefix dsr1 --runner-type $DSR1_RUNNER_TYPES --runner-config ${GITHUB_WORKSPACE}/.github/configs/runners.yaml) + echo "dsr1-8k1k=$DSR1_8K1K" >> $GITHUB_OUTPUT + else + echo "dsr1-8k1k=[]" >> $GITHUB_OUTPUT + fi + + # Generate gptoss configs (only if we have runner types selected) + if [ "${{ inputs.run_1k1k }}" = "true" ] && [ -n "$RUNNER_TYPES" ]; then + GPTOSS_1K1K=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k1k --model-prefix gptoss --runner-type $RUNNER_TYPES --runner-config ${GITHUB_WORKSPACE}/.github/configs/runners.yaml) + echo "gptoss-1k1k=$GPTOSS_1K1K" >> $GITHUB_OUTPUT + else + echo "gptoss-1k1k=[]" >> $GITHUB_OUTPUT + fi + + if [ "${{ inputs.run_1k8k }}" = "true" ] && [ -n "$RUNNER_TYPES" ]; then + GPTOSS_1K8K=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 1k8k --model-prefix gptoss --runner-type $RUNNER_TYPES --runner-config ${GITHUB_WORKSPACE}/.github/configs/runners.yaml) + echo "gptoss-1k8k=$GPTOSS_1K8K" >> $GITHUB_OUTPUT + else + echo "gptoss-1k8k=[]" >> $GITHUB_OUTPUT + fi + + if [ "${{ inputs.run_8k1k }}" = "true" ] && [ -n "$RUNNER_TYPES" ]; then + GPTOSS_8K1K=$(python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --config-files ${GITHUB_WORKSPACE}/.github/configs/nvidia-master.yaml ${GITHUB_WORKSPACE}/.github/configs/amd-master.yaml --seq-lens 8k1k --model-prefix gptoss --runner-type $RUNNER_TYPES --runner-config ${GITHUB_WORKSPACE}/.github/configs/runners.yaml) + echo "gptoss-8k1k=$GPTOSS_8K1K" >> $GITHUB_OUTPUT + else + echo "gptoss-8k1k=[]" >> $GITHUB_OUTPUT + fi + + # DSR1 1K1K Benchmarks + benchmark-dsr1-1k1k: + needs: get-configs + if: ${{ needs.get-configs.outputs.dsr1-1k1k != '[]' }} + uses: ./.github/workflows/benchmark-tmpl.yml + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-configs.outputs.dsr1-1k1k) }} + secrets: inherit + with: + exp-name: ${{ matrix.config.exp-name }} + isl: ${{ matrix.config.isl }} + osl: ${{ matrix.config.osl }} + max-model-len: ${{ matrix.config.max-model-len }} + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} + + collect-dsr1-1k1k-results: + needs: benchmark-dsr1-1k1k + if: ${{ always() && needs.get-configs.outputs.dsr1-1k1k != '[]' }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit + with: + exp-name: "dsr1_1k1k" + + # GPTOSS 1K1K Benchmarks + benchmark-gptoss-1k1k: + needs: get-configs + if: ${{ needs.get-configs.outputs.gptoss-1k1k != '[]' }} + uses: ./.github/workflows/benchmark-tmpl.yml + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-configs.outputs.gptoss-1k1k) }} + secrets: inherit + with: + exp-name: ${{ matrix.config.exp-name }} + isl: ${{ matrix.config.isl }} + osl: ${{ matrix.config.osl }} + max-model-len: ${{ matrix.config.max-model-len }} + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} + + collect-gptoss-1k1k-results: + needs: benchmark-gptoss-1k1k + if: ${{ always() && needs.get-configs.outputs.gptoss-1k1k != '[]' }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit + with: + exp-name: "gptoss_1k1k" + + # DSR1 8K1K Benchmarks + benchmark-dsr1-8k1k: + needs: get-configs + if: ${{ needs.get-configs.outputs.dsr1-8k1k != '[]' }} + uses: ./.github/workflows/benchmark-tmpl.yml + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-configs.outputs.dsr1-8k1k) }} + secrets: inherit + with: + exp-name: ${{ matrix.config.exp-name }} + isl: ${{ matrix.config.isl }} + osl: ${{ matrix.config.osl }} + max-model-len: ${{ matrix.config.max-model-len }} + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} + + collect-dsr1-8k1k-results: + needs: benchmark-dsr1-8k1k + if: ${{ always() && needs.get-configs.outputs.dsr1-8k1k != '[]' }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit + with: + exp-name: "dsr1_8k1k" + + # GPTOSS 8K1K Benchmarks + benchmark-gptoss-8k1k: + needs: get-configs + if: ${{ needs.get-configs.outputs.gptoss-8k1k != '[]' }} + uses: ./.github/workflows/benchmark-tmpl.yml + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-configs.outputs.gptoss-8k1k) }} + secrets: inherit + with: + exp-name: ${{ matrix.config.exp-name }} + isl: ${{ matrix.config.isl }} + osl: ${{ matrix.config.osl }} + max-model-len: ${{ matrix.config.max-model-len }} + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} + + collect-gptoss-8k1k-results: + needs: benchmark-gptoss-8k1k + if: ${{ always() && needs.get-configs.outputs.gptoss-8k1k != '[]' }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit + with: + exp-name: "gptoss_8k1k" + + # DSR1 1K8K Benchmarks + benchmark-dsr1-1k8k: + needs: get-configs + if: ${{ needs.get-configs.outputs.dsr1-1k8k != '[]' }} + uses: ./.github/workflows/benchmark-tmpl.yml + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-configs.outputs.dsr1-1k8k) }} + secrets: inherit + with: + exp-name: ${{ matrix.config.exp-name }} + isl: ${{ matrix.config.isl }} + osl: ${{ matrix.config.osl }} + max-model-len: ${{ matrix.config.max-model-len }} + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} + + # This is a workaround until we can integrate GB200 into master configs. + benchmark-gb200-1k1k: + if: ${{ inputs.use_gb200 && inputs.run_1k1k }} + uses: ./.github/workflows/benchmark-multinode-tmpl.yml + name: gb200 1k1k sweep + strategy: + fail-fast: false + matrix: + config: &dsr1_static_configs + - { + "image": "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3", + "model": "deepseek-r1-fp4", + "model-prefix": "dsr1", + "precision": "fp4", + "framework": "dynamo-trtllm", + "mtp": "off", + } + - { + "image": "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3", + "model": "deepseek-r1-fp4", + "model-prefix": "dsr1", + "precision": "fp4", + "framework": "dynamo-trtllm", + "mtp": "on", + } + - { + "image": "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.5.1-rc0.pre1", + "model": "deepseek-ai/DeepSeek-R1-0528", + "model-prefix": "dsr1", + "precision": "fp8", + "framework": "dynamo-sglang", + "mtp": "off", + } + secrets: inherit + with: + runner: gb200 + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + exp-name: ${{ matrix.config.model-prefix }}_1k1k + isl: 1024 + osl: 1024 + max-model-len: 2048 + mtp-mode: ${{ matrix.config.mtp }} + + benchmark-gb200-1k8k: + if: ${{ inputs.use_gb200 && inputs.run_1k8k }} + uses: ./.github/workflows/benchmark-multinode-tmpl.yml + name: gb200 1k8k sweep + strategy: + fail-fast: false + matrix: + config: *dsr1_static_configs + secrets: inherit with: - token: ${{ secrets.REPO_PAT }} - fetch-depth: 0 + runner: gb200 + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + exp-name: ${{ matrix.config.model-prefix }}_1k8k + isl: 1024 + osl: 8192 + max-model-len: 9216 + mtp-mode: ${{ matrix.config.mtp }} - - name: Download results artifacts - uses: actions/download-artifact@v4 + benchmark-gb200-8k1k: + if: ${{ inputs.use_gb200 && inputs.run_8k1k }} + uses: ./.github/workflows/benchmark-multinode-tmpl.yml + name: gb200 8k1k sweep + strategy: + fail-fast: false + matrix: + config: *dsr1_static_configs + secrets: inherit with: - path: ${{ env.RESULTS_DIR }} - pattern: results_* + runner: gb200 + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + exp-name: ${{ matrix.config.model-prefix }}_8k1k + isl: 1024 + osl: 8192 + max-model-len: 9216 + mtp-mode: ${{ matrix.config.mtp }} - - name: Install python dependencies - run: pip install PyGithub + collect-dsr1-1k8k-results: + needs: + [ + benchmark-dsr1-1k8k, + benchmark-gb200-1k1k, + benchmark-gb200-1k8k, + benchmark-gb200-8k1k, + ] + if: ${{ always() && needs.get-configs.outputs.dsr1-1k8k != '[]' }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit + with: + exp-name: "dsr1_1k8k" - - name: Calculate success rate - run: python3 utils/calc_success_rate.py $STATS_FILENAME + # GPTOSS 1K8K Benchmarks + benchmark-gptoss-1k8k: + needs: get-configs + if: ${{ needs.get-configs.outputs.gptoss-1k8k != '[]' }} + uses: ./.github/workflows/benchmark-tmpl.yml + strategy: + fail-fast: false + matrix: + config: ${{ fromJson(needs.get-configs.outputs.gptoss-1k8k) }} + secrets: inherit + with: + exp-name: ${{ matrix.config.exp-name }} + isl: ${{ matrix.config.isl }} + osl: ${{ matrix.config.osl }} + max-model-len: ${{ matrix.config.max-model-len }} + runner: ${{ matrix.config.runner }} + image: ${{ matrix.config.image }} + model: ${{ matrix.config.model }} + framework: ${{ matrix.config.framework }} + precision: ${{ matrix.config.precision }} + tp: ${{ matrix.config.tp }} + ep: ${{ matrix.config.ep }} + dp-attn: ${{ matrix.config.dp-attn }} + conc: ${{ matrix.config.conc }} - - uses: actions/upload-artifact@v4 + collect-gptoss-1k8k-results: + needs: benchmark-gptoss-1k8k + if: ${{ always() && needs.get-configs.outputs.gptoss-1k8k != '[]' }} + uses: ./.github/workflows/collect-results.yml + secrets: inherit with: - name: "run-stats" - path: ${{ env.STATS_FILENAME }}.json + exp-name: "gptoss_1k8k" + + calc-success-rate: + needs: + [ + collect-dsr1-1k1k-results, + collect-dsr1-1k8k-results, + collect-dsr1-8k1k-results, + collect-gptoss-1k1k-results, + collect-gptoss-1k8k-results, + collect-gptoss-8k1k-results, + ] + if: ${{ always() }} + runs-on: ubuntu-latest + + env: + RESULTS_DIR: "results/" + STATS_FILENAME: "run_stats" + GITHUB_TOKEN: ${{ secrets.REPO_PAT }} + + steps: + - uses: actions/checkout@v3 + with: + token: ${{ secrets.REPO_PAT }} + fetch-depth: 0 + + - name: Download results artifacts + uses: actions/download-artifact@v4 + with: + path: ${{ env.RESULTS_DIR }} + pattern: results_* + + - name: Install python dependencies + run: pip install PyGithub + + - name: Calculate success rate + run: python3 utils/calc_success_rate.py $STATS_FILENAME + + - uses: actions/upload-artifact@v4 + with: + name: "run-stats" + path: ${{ env.STATS_FILENAME }}.json diff --git a/.github/workflows/full-sweep-tmpl.yml b/.github/workflows/full-sweep-tmpl.yml deleted file mode 100644 index 869928cb7..000000000 --- a/.github/workflows/full-sweep-tmpl.yml +++ /dev/null @@ -1,188 +0,0 @@ -name: Template - Full Sweep - -on: - workflow_call: - inputs: - run_1k1k: - type: boolean - required: true - run_8k1k: - type: boolean - required: true - run_1k8k: - type: boolean - required: true - - use_h100: - type: boolean - required: true - use_h200: - type: boolean - required: true - use_b200: - type: boolean - required: true - use_mi300x: - type: boolean - required: true - use_mi325x: - type: boolean - required: true - use_mi355x: - type: boolean - required: true - use_gb200: - type: boolean - required: false - default: false - -jobs: - dsr1-1k1k: - if: ${{ inputs.run_1k1k }} - uses: ./.github/workflows/dsr1-tmpl.yml - secrets: inherit - with: - exp-name: 'dsr1_1k1k' - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - use_h200: ${{ inputs.use_h200 }} - use_b200: ${{ inputs.use_b200 }} - use_mi300x: ${{ inputs.use_mi300x }} - use_mi325x: ${{ inputs.use_mi325x }} - use_mi355x: ${{ inputs.use_mi355x }} - use_gb200: ${{ inputs.use_gb200 }} - - collect-dsr1-1k1k-results: - needs: dsr1-1k1k - if: ${{ inputs.run_1k1k && always() }} - uses: ./.github/workflows/collect-results.yml - secrets: inherit - with: - exp-name: 'dsr1_1k1k' - - gptoss-1k1k: - if: ${{ inputs.run_1k1k }} - uses: ./.github/workflows/gptoss-tmpl.yml - secrets: inherit - with: - exp-name: 'gptoss_1k1k' - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - use_h100: ${{ inputs.use_h100 }} - use_h200: ${{ inputs.use_h200 }} - use_b200: ${{ inputs.use_b200 }} - use_mi300x: ${{ inputs.use_mi300x }} - use_mi325x: ${{ inputs.use_mi325x }} - use_mi355x: ${{ inputs.use_mi355x }} - - collect-gptoss-1k1k-results: - needs: gptoss-1k1k - if: ${{ inputs.run_1k1k && always() }} - uses: ./.github/workflows/collect-results.yml - secrets: inherit - with: - exp-name: 'gptoss_1k1k' - - dsr1-8k1k: - if: ${{ inputs.run_8k1k }} - uses: ./.github/workflows/dsr1-tmpl.yml - secrets: inherit - with: - exp-name: 'dsr1_8k1k' - isl: 8192 - osl: 1024 - max-model-len: 9216 - random-range-ratio: 0.8 - use_h200: ${{ inputs.use_h200 }} - use_b200: ${{ inputs.use_b200 }} - use_mi300x: ${{ inputs.use_mi300x }} - use_mi325x: ${{ inputs.use_mi325x }} - use_mi355x: ${{ inputs.use_mi355x }} - use_gb200: ${{ inputs.use_gb200 }} - - collect-dsr1-8k1k-results: - needs: dsr1-8k1k - if: ${{ inputs.run_8k1k && always() }} - uses: ./.github/workflows/collect-results.yml - secrets: inherit - with: - exp-name: 'dsr1_8k1k' - - gptoss-8k1k: - if: ${{ inputs.run_8k1k }} - uses: ./.github/workflows/gptoss-tmpl.yml - secrets: inherit - with: - exp-name: 'gptoss_8k1k' - isl: 8192 - osl: 1024 - max-model-len: 9216 - random-range-ratio: 0.8 - use_h100: ${{ inputs.use_h100 }} - use_h200: ${{ inputs.use_h200 }} - use_b200: ${{ inputs.use_b200 }} - use_mi300x: ${{ inputs.use_mi300x }} - use_mi325x: ${{ inputs.use_mi325x }} - use_mi355x: ${{ inputs.use_mi355x }} - - collect-gptoss-8k1k-results: - needs: gptoss-8k1k - if: ${{ inputs.run_8k1k && always() }} - uses: ./.github/workflows/collect-results.yml - secrets: inherit - with: - exp-name: 'gptoss_8k1k' - - dsr1-1k8k: - if: ${{ inputs.run_1k8k }} - uses: ./.github/workflows/dsr1-tmpl.yml - secrets: inherit - with: - exp-name: 'dsr1_1k8k' - isl: 1024 - osl: 8192 - max-model-len: 9216 - random-range-ratio: 0.8 - use_h200: ${{ inputs.use_h200 }} - use_b200: ${{ inputs.use_b200 }} - use_mi300x: ${{ inputs.use_mi300x }} - use_mi325x: ${{ inputs.use_mi325x }} - use_mi355x: ${{ inputs.use_mi355x }} - use_gb200: ${{ inputs.use_gb200 }} - - collect-dsr1-1k8k-results: - needs: dsr1-1k8k - if: ${{ inputs.run_1k8k && always() }} - uses: ./.github/workflows/collect-results.yml - secrets: inherit - with: - exp-name: 'dsr1_1k8k' - - gptoss-1k8k: - if: ${{ inputs.run_1k8k }} - uses: ./.github/workflows/gptoss-tmpl.yml - secrets: inherit - with: - exp-name: 'gptoss_1k8k' - isl: 1024 - osl: 8192 - max-model-len: 9216 - random-range-ratio: 0.8 - use_h100: ${{ inputs.use_h100 }} - use_h200: ${{ inputs.use_h200 }} - use_b200: ${{ inputs.use_b200 }} - use_mi300x: ${{ inputs.use_mi300x }} - use_mi325x: ${{ inputs.use_mi325x }} - use_mi355x: ${{ inputs.use_mi355x }} - - collect-gptoss-1k8k-results: - needs: gptoss-1k8k - if: ${{ inputs.run_1k8k && always() }} - uses: ./.github/workflows/collect-results.yml - secrets: inherit - with: - exp-name: 'gptoss_1k8k' diff --git a/.github/workflows/gb200-tests.yml b/.github/workflows/gb200-tests.yml new file mode 100644 index 000000000..c700599d9 --- /dev/null +++ b/.github/workflows/gb200-tests.yml @@ -0,0 +1,91 @@ +name: GB200 Tests + +on: + workflow_dispatch: + inputs: + image: + description: "Serving Image" + required: true + type: choice + options: + - "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.5.1-rc0.pre1" + - "nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3" + + model: + description: "Model" + required: true + type: choice + options: + - "deepseek-ai/DeepSeek-R1-0528" + - "deepseek-r1-fp4" + + precision: + description: "Precision" + required: true + type: choice + options: + - "fp4" + - "fp8" + + framework: + description: "Framework" + required: true + type: choice + options: + - "dynamo-trtllm" + - "dynamo-sglang" + + mtp: + description: "Mtp On/Off" + required: true + type: choice + options: + - "on" + - "off" + + isl: + description: "ISL" + required: true + type: string + + osl: + description: "OSL" + required: true + type: string + +jobs: + pre-run: + runs-on: ubuntu-latest + outputs: + max-model-len: ${{ steps.calc.outputs.max-model-len }} + steps: + - id: calc + shell: python + run: | + import os + import sys + try: + isl = int("${{ inputs.isl }}") + osl = int("${{ inputs.osl }}") + except ValueError: + print("Error: ISL and OSL must be integers") + sys.exit(1) + with open(os.environ['GITHUB_OUTPUT'], 'a') as f: + f.write(f"max-model-len={isl + osl}\n") + + benchmark-gb200: + needs: pre-run + uses: ./.github/workflows/benchmark-multinode-tmpl.yml + name: gb200 test + secrets: inherit + with: + runner: gb200 + image: ${{ inputs.image }} + model: ${{ inputs.model }} + framework: ${{ inputs.framework }} + precision: ${{ inputs.precision }} + exp-name: dsr1_1k1k + isl: ${{ inputs.isl }} + osl: ${{ inputs.osl }} + max-model-len: ${{ needs.pre-run.outputs.max-model-len }} + mtp-mode: ${{ inputs.mtp }} diff --git a/.github/workflows/gptoss-tmpl.yml b/.github/workflows/gptoss-tmpl.yml deleted file mode 100644 index 95c501411..000000000 --- a/.github/workflows/gptoss-tmpl.yml +++ /dev/null @@ -1,176 +0,0 @@ -name: Template - gpt-oss - -on: - workflow_call: - inputs: - exp-name: - required: true - type: string - isl: - required: true - type: string - osl: - required: true - type: string - max-model-len: - required: true - type: string - random-range-ratio: - required: true - type: string - - use_h100: - type: boolean - required: true - use_h200: - type: boolean - required: true - use_b200: - type: boolean - required: true - use_mi300x: - type: boolean - required: true - use_mi325x: - type: boolean - required: true - use_mi355x: - type: boolean - required: true - -jobs: - bmk-h100: - if: ${{ inputs.use_h100 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - runner: h100 - image: 'vllm/vllm-openai:v0.10.2' - model: 'openai/gpt-oss-120b' - tp-list: '[2, 4, 8]' - framework: 'vllm' - precision: 'fp4' - - bmk-h200: - if: ${{ inputs.use_h200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - runner: h200 - image: 'vllm/vllm-openai:v0.10.2' - model: 'openai/gpt-oss-120b' - tp-list: '[1, 2, 4, 8]' - framework: 'vllm' - precision: 'fp4' - - bmk-b200: - if: ${{ inputs.use_b200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - runner: b200 - image: 'vllm/vllm-openai:v0.10.2' - model: 'openai/gpt-oss-120b' - tp-list: '[1, 2, 4, 8]' - framework: 'vllm' - precision: 'fp4' - - bmk-b200-trt: - if: ${{ inputs.use_b200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - runner: b200-trt - image: 'nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1' - model: 'openai/gpt-oss-120b' - tp-list: '[1, 2, 4, 8]' - framework: 'trt' - precision: 'fp4' - - bmk-h200-trt: - if: ${{ inputs.use_h200 }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - runner: h200-trt - image: 'nvcr.io#nvidia/tensorrt-llm/release:gpt-oss-dev' - model: 'openai/gpt-oss-120b' - tp-list: '[1, 2, 4, 8]' - framework: 'trt' - precision: 'fp4' - - bmk-mi300x: - if: ${{ inputs.use_mi300x }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - runner: mi300x - image: 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1' - model: 'openai/gpt-oss-120b' - tp-list: '[1, 2, 4, 8]' - framework: 'vllm' - precision: 'fp4' - - bmk-mi325x: - if: ${{ inputs.use_mi325x }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - runner: mi325x - image: 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1' - model: 'openai/gpt-oss-120b' - tp-list: '[1, 2, 4, 8]' - framework: 'vllm' - precision: 'fp4' - - bmk-mi355x: - if: ${{ inputs.use_mi355x }} - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} - isl: ${{ inputs.isl }} - osl: ${{ inputs.osl }} - max-model-len: ${{ inputs.max-model-len }} - random-range-ratio: ${{ inputs.random-range-ratio }} - runner: mi355x - image: 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1' - model: 'openai/gpt-oss-120b' - tp-list: '[1, 4, 8]' - framework: 'vllm' - precision: 'fp4' diff --git a/.github/workflows/runner-model-sweep-test.yml b/.github/workflows/runner-model-sweep-test.yml deleted file mode 100644 index e4f2b7303..000000000 --- a/.github/workflows/runner-model-sweep-test.yml +++ /dev/null @@ -1,289 +0,0 @@ -name: 'Test - Runner Model Sweep' -run-name: '${{ github.event.inputs.runner }} Sweep' -on: - workflow_dispatch: - inputs: - runner: - description: 'Runner Type' - required: true - type: choice - options: - - 'h100' - - 'h200' - - 'h200-trt' - - 'b200' - - 'b200-trt' - - 'mi300x' - - 'mi325x' - - 'mi355x' - -env: - HF_TOKEN: ${{ secrets.HF_TOKEN }} - HF_HUB_CACHE: '/mnt/hf_hub_cache/' - -jobs: - bmk-h100: - if: ${{ inputs.runner == 'h100' }} - strategy: - fail-fast: false - matrix: - runner: - - 'h100-cr_0' - - 'h100-cr_1' - - 'h100-cw_0' - - 'h100-cw_1' - config: - - { image: 'vllm/vllm-openai:v0.10.2', model: 'openai/gpt-oss-120b', framework: 'vllm', precision: 'fp4', exp-name: 'gptoss_test' } - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ matrix.config.image }} - model: ${{ matrix.config.model }} - framework: ${{ matrix.config.framework }} - precision: ${{ matrix.config.precision }} - exp-name: ${{ matrix.config.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk-h200: - if: ${{ inputs.runner == 'h200' }} - strategy: - fail-fast: false - matrix: - runner: - - 'h200-cw_0' - - 'h200-cw_1' - - 'h200-nb_0' - - 'h200-nb_1' - - 'h200-nb_2' - - 'h200-nb_3' - - 'h200-nv_0' - - 'h200-nv_1' - - 'h200-nv_2' - - 'h200-nv_3' - config: - - { image: 'lmsysorg/sglang:v0.5.2rc2-cu126', model: 'deepseek-ai/DeepSeek-R1-0528', framework: 'sglang', precision: 'fp8', exp-name: 'dsr1_test' } - - { image: 'vllm/vllm-openai:v0.10.2', model: 'openai/gpt-oss-120b', framework: 'vllm', precision: 'fp4', exp-name: 'gptoss_test' } - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ matrix.config.image }} - model: ${{ matrix.config.model }} - framework: ${{ matrix.config.framework }} - precision: ${{ matrix.config.precision }} - exp-name: ${{ matrix.config.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk-h200-trt: - if: ${{ inputs.runner == 'h200-trt' }} - strategy: - fail-fast: false - matrix: - runner: - - 'h200-cw_0' - - 'h200-cw_1' - - 'h200-nb_0' - - 'h200-nb_1' - - 'h200-nb_2' - - 'h200-nb_3' - - 'h200-nv_0' - - 'h200-nv_1' - - 'h200-nv_2' - - 'h200-nv_3' - config: - - { image: 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2', model: 'deepseek-ai/DeepSeek-R1-0528', framework: 'trt', precision: 'fp8', exp-name: 'dsr1_test' } - - { image: 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2', model: 'openai/gpt-oss-120b', framework: 'trt', precision: 'fp4', exp-name: 'gptoss_test' } - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ matrix.config.image }} - model: ${{ matrix.config.model }} - framework: ${{ matrix.config.framework }} - precision: ${{ matrix.config.precision }} - exp-name: ${{ matrix.config.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk-b200: - if: ${{ inputs.runner == 'b200' }} - strategy: - fail-fast: false - matrix: - runner: - - 'b200-nvd_0' - - 'b200-nvd_1' - - 'b200-nvd_2' - - 'b200-nvd_3' - config: - - { image: 'lmsysorg/sglang:v0.5.3rc1-cu129-b200', model: 'deepseek-ai/DeepSeek-R1-0528', framework: 'sglang', precision: 'fp8', exp-name: 'dsr1_test' } - - { image: 'lmsysorg/sglang:v0.5.3rc1-cu129-b200', model: 'nvidia/DeepSeek-R1-0528-FP4', framework: 'sglang', precision: 'fp4', exp-name: 'dsr1_test' } - - { image: 'vllm/vllm-openai:v0.10.2', model: 'openai/gpt-oss-120b', framework: 'vllm', precision: 'fp4', exp-name: 'gptoss_test' } - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ matrix.config.image }} - model: ${{ matrix.config.model }} - framework: ${{ matrix.config.framework }} - precision: ${{ matrix.config.precision }} - exp-name: ${{ matrix.config.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[4]' - - bmk-b200-trt: - if: ${{ inputs.runner == 'b200-trt' }} - strategy: - fail-fast: false - matrix: - runner: - - 'b200-nv_0' - - 'b200-nv_1' - - 'b200-nb_0' - - 'b200-nb_1' - config: - - { image: 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2', model: 'deepseek-ai/DeepSeek-R1-0528', framework: 'trt', precision: 'fp8', exp-name: 'dsr1_test' } - - { image: 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2', model: 'nvidia/DeepSeek-R1-0528-FP4', framework: 'trt', precision: 'fp4', exp-name: 'dsr1_test' } - - { image: 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2', model: 'openai/gpt-oss-120b', framework: 'trt', precision: 'fp4', exp-name: 'gptoss_test' } - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ matrix.config.image }} - model: ${{ matrix.config.model }} - framework: ${{ matrix.config.framework }} - precision: ${{ matrix.config.precision }} - exp-name: ${{ matrix.config.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk-mi300x: - if: ${{ inputs.runner == 'mi300x' }} - strategy: - fail-fast: false - matrix: - runner: - - 'mi300x-amd_0' - - 'mi300x-amd_1' - - 'mi300x-amd_2' - - 'mi300x-amd_3' - - 'mi300x-amd_4' - - 'mi300x-cr_0' - - 'mi300x-oci_0' - config: - - { image: 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915', model: 'deepseek-ai/DeepSeek-R1-0528', framework: 'sglang', precision: 'fp8', exp-name: 'dsr1_test' } - - { image: 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1', model: 'openai/gpt-oss-120b', framework: 'vllm', precision: 'fp4', exp-name: 'gptoss_test' } - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ matrix.config.image }} - model: ${{ matrix.config.model }} - framework: ${{ matrix.config.framework }} - precision: ${{ matrix.config.precision }} - exp-name: ${{ matrix.config.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk-mi325x: - if: ${{ inputs.runner == 'mi325x' }} - strategy: - fail-fast: false - matrix: - runner: - - 'mi325x-amd_0' - - 'mi325x-tw_0' - - 'mi325x-tw_1' - - 'mi325x-tw_2' - - 'mi325x-tw_3' - config: - - { image: 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915', model: 'deepseek-ai/DeepSeek-R1-0528', framework: 'sglang', precision: 'fp8', exp-name: 'dsr1_test' } - - { image: 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1', model: 'openai/gpt-oss-120b', framework: 'vllm', precision: 'fp4', exp-name: 'gptoss_test' } - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ matrix.config.image }} - model: ${{ matrix.config.model }} - framework: ${{ matrix.config.framework }} - precision: ${{ matrix.config.precision }} - exp-name: ${{ matrix.config.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk-mi355x: - if: ${{ inputs.runner == 'mi355x' }} - strategy: - fail-fast: false - matrix: - runner: - - 'mi355x-amd_0' - - 'mi355x-amd_1' - - 'mi355x-amd_2' - - 'mi355x-amd_3' - config: - - { image: 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915', model: 'deepseek-ai/DeepSeek-R1-0528', framework: 'sglang', precision: 'fp8', exp-name: 'dsr1_test' } - - { image: 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915', model: 'amd/DeepSeek-R1-0528-MXFP4-Preview', framework: 'sglang', precision: 'fp4', exp-name: 'dsr1_test' } - - { image: 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1', model: 'openai/gpt-oss-120b', framework: 'vllm', precision: 'fp4', exp-name: 'gptoss_test' } - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ matrix.config.image }} - model: ${{ matrix.config.model }} - framework: ${{ matrix.config.framework }} - precision: ${{ matrix.config.precision }} - exp-name: ${{ matrix.config.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' diff --git a/.github/workflows/runner-sweep-test.yml b/.github/workflows/runner-sweep-test.yml deleted file mode 100644 index 8f824c4d1..000000000 --- a/.github/workflows/runner-sweep-test.yml +++ /dev/null @@ -1,328 +0,0 @@ -name: 'Test - Runner Sweep' -run-name: '${{ github.event.inputs.runner }} Sweep - ${{ github.event.inputs.model }}' -on: - workflow_dispatch: - inputs: - runner: - description: 'Runner Type' - required: true - type: choice - options: - - 'h100' - - 'h200' - - 'b200' - - 'h200-trt' - - 'b200-trt' - - 'mi300x' - - 'mi325x' - - 'mi355x' - - 'gb200' - - image: - description: 'Docker Image' - required: true - type: choice - options: - - 'lmsysorg/sglang:v0.4.9.post1-cu126' - - 'lmsysorg/sglang:v0.5.0rc1-cu128-b200' - - 'lmsysorg/sglang:v0.5.2rc2-cu126' - - 'lmsysorg/sglang:v0.5.3rc1-cu129-b200' - - 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2' - - 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post1' - - 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2' - - 'nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1' - - 'nvcr.io#nvidia/tensorrt-llm/release:gpt-oss-dev' - - 'nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3' - - 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915' - - 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915' - - 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250915' - - 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1' - - 'vllm/vllm-openai:v0.10.2' - - model: - description: 'Model' - required: true - type: choice - options: - - 'amd/DeepSeek-R1-0528-MXFP4-Preview' - - 'deepseek-ai/DeepSeek-R1-0528' - - 'nvidia/DeepSeek-R1-0528-FP4' - - 'nvidia/DeepSeek-R1-0528-FP4-v2' - - 'openai/gpt-oss-120b' - - framework: - description: 'Framework' - required: true - type: choice - options: - - 'vllm' - - 'sglang' - - 'trt' - - precision: - description: 'Precision' - required: true - type: choice - options: - - 'fp8' - - 'fp4' - - exp-name: - description: 'Experiment Name' - required: true - type: choice - options: - - 'dsr1_test' - - 'gptoss_test' - - -env: - HF_TOKEN: ${{ secrets.HF_TOKEN }} - HF_HUB_CACHE: '/mnt/hf_hub_cache/' - -jobs: - bmk_h100: - if: ${{ inputs.runner == 'h100' }} - strategy: - fail-fast: false - matrix: - runner: - - 'h100-cr_0' - - 'h100-cr_1' - - 'h100-cw_0' - - 'h100-cw_1' - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ inputs.image }} - model: ${{ inputs.model }} - framework: ${{ inputs.framework }} - precision: ${{ inputs.precision }} - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk_h200: - if: ${{ inputs.runner == 'h200' || inputs.runner == 'h200-trt' }} - strategy: - fail-fast: false - matrix: - runner: - - 'h200-cw_0' - - 'h200-cw_1' - - 'h200-nb_0' - - 'h200-nb_1' - - 'h200-nb_2' - - 'h200-nb_3' - - 'h200-nv_0' - - 'h200-nv_1' - - 'h200-nv_2' - - 'h200-nv_3' - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ inputs.image }} - model: ${{ inputs.model }} - framework: ${{ inputs.framework }} - precision: ${{ inputs.precision }} - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[4]' - conc-list: '[64]' - - bmk_b200: - if: ${{ inputs.runner == 'b200' }} - strategy: - fail-fast: false - matrix: - runner: - - 'b200-nv_0' - - 'b200-nv_1' - - 'b200-nvd_0' - - 'b200-nvd_1' - - 'b200-tg_0' - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ inputs.image }} - model: ${{ inputs.model }} - framework: ${{ inputs.framework }} - precision: ${{ inputs.precision }} - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk_b200-trt: - if: ${{ inputs.runner == 'b200-trt' }} - strategy: - fail-fast: false - matrix: - runner: - - 'b200-nv_0' - - 'b200-nv_1' - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ inputs.image }} - model: ${{ inputs.model }} - framework: ${{ inputs.framework }} - precision: ${{ inputs.precision }} - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk_mi300x: - if: ${{ inputs.runner == 'mi300x' }} - strategy: - fail-fast: false - matrix: - runner: - - 'mi300x-amd_0' - - 'mi300x-amd_1' - - 'mi300x-amd_2' - - 'mi300x-amd_3' - - 'mi300x-amd_4' - - 'mi300x-cr_0' - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ inputs.image }} - model: ${{ inputs.model }} - framework: ${{ inputs.framework }} - precision: ${{ inputs.precision }} - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk_mi325x: - if: ${{ inputs.runner == 'mi325x' }} - strategy: - fail-fast: false - matrix: - runner: - - 'mi325x-amd_0' - - 'mi325x-tw_0' - - 'mi325x-tw_1' - - 'mi325x-tw_2' - - 'mi325x-tw_3' - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ inputs.image }} - model: ${{ inputs.model }} - framework: ${{ inputs.framework }} - precision: ${{ inputs.precision }} - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk_mi355x: - if: ${{ inputs.runner == 'mi355x' }} - strategy: - fail-fast: false - matrix: - runner: - - 'mi355x-amd_0' - - 'mi355x-amd_1' - - 'mi355x-amd_2' - - 'mi355x-amd_3' - - name: '${{ matrix.runner }}' - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ matrix.runner }} - image: ${{ inputs.image }} - model: ${{ inputs.model }} - framework: ${{ inputs.framework }} - precision: ${{ inputs.precision }} - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[1]' - - bmk_gb200: - if: ${{ inputs.runner == 'gb200' && inputs.framework == 'trt' }} - uses: ./.github/workflows/benchmark-multinode-tmpl.yml - secrets: inherit - with: - runner: gb200 - image: 'nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.5.1-rc0.pre3' - model: 'deepseek-r1-fp4' - framework: 'dynamo-trtllm' - precision: 'fp4' - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - mtp-mode: 'off' - - bmk_gb200-sgl: - if: ${{ inputs.runner == 'gb200' && inputs.framework == 'sglang' }} - uses: ./.github/workflows/benchmark-multinode-tmpl.yml - secrets: inherit - with: - runner: gb200 - image: 'nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.5.1-rc0.pre1' - model: 'deepseek-ai/DeepSeek-R1-0528' - framework: 'dynamo-sglang' - precision: 'fp8' - exp-name: ${{ inputs.exp-name }} - isl: 8192 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - mtp-mode: 'off' - - collect-test-results: - needs: [ bmk_h100, bmk_h200, bmk_b200, bmk_b200-trt, bmk_mi300x, bmk_mi325x, bmk_mi355x, bmk_gb200, bmk_gb200-sgl ] - if: ${{ always() && !cancelled() }} - uses: ./.github/workflows/collect-results.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} diff --git a/.github/workflows/runner-test.yml b/.github/workflows/runner-test.yml deleted file mode 100644 index 99909349c..000000000 --- a/.github/workflows/runner-test.yml +++ /dev/null @@ -1,131 +0,0 @@ -name: Test - Runner -run-name: '${{ github.event.inputs.runner }} - ${{ github.event.inputs.model }}' -on: - workflow_dispatch: - inputs: - runner: - description: 'Runner' - required: true - type: choice - options: - - 'h100-cr_0' - - 'h100-cr_1' - - 'h100-cw_0' - - 'h100-cw_1' - - 'h200-cw_0' - - 'h200-cw_1' - - 'h200-nb_0' - - 'h200-nb_1' - - 'h200-nb_2' - - 'h200-nb_3' - - 'h200-nv_0' - - 'h200-nv_1' - - 'h200-nv_2' - - 'h200-nv_3' - - 'b200-nv_0' - - 'b200-nv_1' - - 'b200-nb_0' - - 'b200-nb_1' - - 'b200-nvd_0' - - 'b200-nvd_1' - - 'b200-nvd_2' - - 'b200-nvd_3' - - 'b200-tg_0' - - 'mi300x-amd_0' - - 'mi300x-amd_1' - - 'mi300x-amd_2' - - 'mi300x-amd_3' - - 'mi300x-amd_4' - - 'mi300x-cr_0' - - 'mi300x-oci_0' - - 'mi325x-amd_0' - - 'mi325x-tw_0' - - 'mi325x-tw_1' - - 'mi325x-tw_2' - - 'mi325x-tw_3' - - 'mi355x-amd_0' - - 'mi355x-amd_1' - - 'mi355x-amd_2' - - 'mi355x-amd_3' - - image: - description: 'Docker Image' - required: true - type: choice - options: - - 'lmsysorg/sglang:v0.4.9.post1-cu126' - - 'lmsysorg/sglang:v0.5.0rc1-cu128-b200' - - 'lmsysorg/sglang:v0.5.2rc2-cu126' - - 'lmsysorg/sglang:v0.5.3rc1-cu129-b200' - - 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2' - - 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post1' - - 'nvcr.io#nvidia/tensorrt-llm/release:1.1.0rc2.post2' - - 'nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1' - - 'nvcr.io#nvidia/tensorrt-llm/release:gpt-oss-dev' - - 'rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1' - - 'rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_sgl-dev-v0.5.2rc2-mi30x_rc1' - - 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915' - - 'rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915' - - 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250915' - - 'rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1' - - 'vllm/vllm-openai:v0.10.2' - model: - description: 'Model' - required: true - type: choice - options: - - 'amd/DeepSeek-R1-0528-MXFP4-Preview' - - 'deepseek-ai/DeepSeek-R1-0528' - - 'nvidia/DeepSeek-R1-0528-FP4' - - 'nvidia/DeepSeek-R1-0528-FP4-v2' - - 'openai/gpt-oss-120b' - - framework: - description: 'Framework' - required: true - type: choice - options: - - 'vllm' - - 'sglang' - - 'trt' - - precision: - description: 'Precision' - required: true - type: choice - options: - - 'fp8' - - 'fp4' - - exp-name: - description: 'Experiment Name' - required: true - type: choice - options: - - 'dsr1_test' - - 'gptoss_test' - -jobs: - runner-test: - uses: ./.github/workflows/benchmark-tmpl.yml - secrets: inherit - with: - runner: ${{ inputs.runner }} - image: ${{ inputs.image }} - model: ${{ inputs.model }} - framework: ${{ inputs.framework }} - precision: ${{ inputs.precision }} - exp-name: ${{ inputs.exp-name }} - isl: 1024 - osl: 1024 - max-model-len: 2048 - random-range-ratio: 0.8 - tp-list: '[8]' - conc-list: '[4]' - - collect-test-results: - needs: runner-test - uses: ./.github/workflows/collect-results.yml - secrets: inherit - with: - exp-name: ${{ inputs.exp-name }} diff --git a/.gitignore b/.gitignore new file mode 100644 index 000000000..03d36472a --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +**/__pycache__/** +**/.coverage \ No newline at end of file diff --git a/benchmarks/dsr1_fp4_b200_trt_slurm.sh b/benchmarks/dsr1_fp4_b200_trt_slurm.sh index c94c36667..6f4f814a0 100644 --- a/benchmarks/dsr1_fp4_b200_trt_slurm.sh +++ b/benchmarks/dsr1_fp4_b200_trt_slurm.sh @@ -13,69 +13,49 @@ # CONC # RESULT_FILENAME # PORT_OFFSET +# DP_ATTENTION +# EP_SIZE echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" -echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL" +echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL, EP_SIZE: $EP_SIZE, DP_ATTENTION: $DP_ATTENTION" hf download $MODEL # ========= Determine DP_ATTENTION, EP_SIZE and MOE_BACKEND based on ISL, OSL, CONC ========= -EP_SIZE="1" MOE_BACKEND="TRTLLM" -DP_ATTENTION=false if [[ "$TP" == "4" ]]; then if [[ "$ISL" == "1024" && "$OSL" == "1024" ]]; then - if [[ $CONC -gt 32 ]]; then - EP_SIZE="$TP" - fi if [[ $CONC -ge 256 ]]; then - DP_ATTENTION=true MOE_BACKEND="CUTLASS" fi elif [[ "$ISL" == "1024" && "$OSL" == "8192" ]]; then - if [[ $CONC -gt 32 ]]; then - EP_SIZE="$TP" - fi if [[ $CONC -ge 256 ]]; then - DP_ATTENTION=true MOE_BACKEND="CUTLASS" fi elif [[ "$ISL" == "8192" && "$OSL" == "1024" ]]; then if [[ $CONC -gt 32 ]]; then - EP_SIZE="$TP" - DP_ATTENTION=true MOE_BACKEND="CUTLASS" fi fi elif [[ "$TP" == "8" ]]; then if [[ "$ISL" == "1024" && "$OSL" == "1024" ]]; then - if [[ $CONC -gt 8 ]]; then - EP_SIZE="$TP" - fi if [[ $CONC -ge 256 ]]; then - DP_ATTENTION=true MOE_BACKEND="CUTLASS" fi elif [[ "$ISL" == "1024" && "$OSL" == "8192" ]]; then - if [[ $CONC -gt 16 ]]; then - EP_SIZE="$TP" - fi if [[ $CONC -ge 256 ]]; then - DP_ATTENTION=true MOE_BACKEND="CUTLASS" fi elif [[ "$ISL" == "8192" && "$OSL" == "1024" ]]; then if [[ $CONC -gt 32 ]]; then - EP_SIZE="$TP" - DP_ATTENTION=true MOE_BACKEND="CUTLASS" fi fi fi -echo "Final configuration: EP_SIZE='$EP_SIZE', MOE_BACKEND='$MOE_BACKEND', DP_ATTENTION='$DP_ATTENTION'" +echo "MOE_BACKEND set to '$MOE_BACKEND'" SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log) PORT=$(( 8888 + $PORT_OFFSET )) diff --git a/benchmarks/dsr1_fp8_b200_trt_slurm.sh b/benchmarks/dsr1_fp8_b200_trt_slurm.sh index 041af8158..58d4525f1 100644 --- a/benchmarks/dsr1_fp8_b200_trt_slurm.sh +++ b/benchmarks/dsr1_fp8_b200_trt_slurm.sh @@ -13,33 +13,19 @@ # CONC # RESULT_FILENAME # PORT_OFFSET +# DP_ATTENTION +# EP_SIZE echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" -echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL" +echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL, EP_SIZE: $EP_SIZE, DP_ATTENTION: $DP_ATTENTION" hf download $MODEL # ========= Determine DP_ATTENTION, EP_SIZE and MOE_BACKEND based on ISL, OSL, CONC ========= -EP_SIZE="$TP" MOE_BACKEND="DEEPGEMM" -DP_ATTENTION=false -if [[ "$ISL" == "1024" && "$OSL" == "1024" ]]; then - if [[ $CONC -gt 32 ]]; then - DP_ATTENTION=true - fi -elif [[ "$ISL" == "1024" && "$OSL" == "8192" ]]; then - if [[ $CONC -gt 64 ]]; then - DP_ATTENTION=true - fi -elif [[ "$ISL" == "8192" && "$OSL" == "1024" ]]; then - if [[ $CONC -gt 64 ]]; then - DP_ATTENTION=true - fi -fi - -echo "Final configuration: EP_SIZE='$EP_SIZE', MOE_BACKEND='$MOE_BACKEND', DP_ATTENTION='$DP_ATTENTION'" +echo "MOE_BACKEND set to '$MOE_BACKEND'" SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log) PORT=$(( 8888 + $PORT_OFFSET )) diff --git a/benchmarks/dsr1_fp8_h200_trt_slurm.sh b/benchmarks/dsr1_fp8_h200_trt_slurm.sh index bbc56e2d1..7b566c0ab 100644 --- a/benchmarks/dsr1_fp8_h200_trt_slurm.sh +++ b/benchmarks/dsr1_fp8_h200_trt_slurm.sh @@ -13,33 +13,19 @@ # CONC # RESULT_FILENAME # PORT_OFFSET +# DP_ATTENTION +# EP_SIZE echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" -echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL" +echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL, EP_SIZE: $EP_SIZE, DP_ATTENTION: $DP_ATTENTION" hf download $MODEL # ========= Determine DP_ATTENTION, EP_SIZE and MOE_BACKEND based on ISL, OSL, CONC ========= -EP_SIZE="$TP" MOE_BACKEND="CUTLASS" -DP_ATTENTION=false -if [[ "$ISL" == "1024" && "$OSL" == "1024" ]]; then - if [[ $CONC -gt 64 ]]; then - DP_ATTENTION=true - fi -elif [[ "$ISL" == "1024" && "$OSL" == "8192" ]]; then - if [[ $CONC -gt 64 ]]; then - DP_ATTENTION=true - fi -elif [[ "$ISL" == "8192" && "$OSL" == "1024" ]]; then - if [[ $CONC -gt 32 ]]; then - DP_ATTENTION=true - fi -fi - -echo "Final configuration: EP_SIZE='$EP_SIZE', MOE_BACKEND='$MOE_BACKEND', DP_ATTENTION='$DP_ATTENTION'" +echo "MOE_BACKEND set to '$MOE_BACKEND'" SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log) PORT=$(( 8888 + $PORT_OFFSET )) diff --git a/benchmarks/gptoss_fp4_b200_trt_slurm.sh b/benchmarks/gptoss_fp4_b200_trt_slurm.sh index f85f5c13f..349930dfb 100644 --- a/benchmarks/gptoss_fp4_b200_trt_slurm.sh +++ b/benchmarks/gptoss_fp4_b200_trt_slurm.sh @@ -13,32 +13,24 @@ # CONC # RESULT_FILENAME # PORT_OFFSET +# DP_ATTENTION +# EP_SIZE # GPTOSS TRTLLM Deployment Guide: # https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" -echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL" +echo "TP: $TP, CONC: $CONC, ISL: $ISL, OSL: $OSL, EP_SIZE: $EP_SIZE, DP_ATTENTION: $DP_ATTENTION" hf download $MODEL SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log) PORT=$(( 8888 + $PORT_OFFSET )) # ========= Determine DP_ATTENTION, EP_SIZE and MOE_BACKEND based on ISL, OSL, CONC ========= -EP_SIZE="1" MOE_BACKEND="TRTLLM" -DP_ATTENTION=false - -# Higher concurrencies: Concurrency >= 256 -# MoE Backend = CUTLASS -# Use DP attention with expert parallel MoE -if [[ $CONC -ge 256 ]]; then - EP_SIZE="$TP" - DP_ATTENTION=true -fi -echo "Final configuration: EP_SIZE='$EP_SIZE', MOE_BACKEND='$MOE_BACKEND', DP_ATTENTION='$DP_ATTENTION'" +echo "MOE_BACKEND set to '$MOE_BACKEND'" EXTRA_CONFIG_FILE="gptoss-fp4.yml" export TRTLLM_ENABLE_PDL=1 diff --git a/benchmarks/gptoss_fp4_h200_trt_slurm.sh b/benchmarks/gptoss_fp4_h200_trt_slurm.sh index ee2b32df8..c148a3cb7 100644 --- a/benchmarks/gptoss_fp4_h200_trt_slurm.sh +++ b/benchmarks/gptoss_fp4_h200_trt_slurm.sh @@ -13,6 +13,8 @@ # CONC # RESULT_FILENAME # PORT_OFFSET +# DP_ATTENTION +# EP_SIZE echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" @@ -30,7 +32,7 @@ cat > gptoss-config.yml << EOF cuda_graph_config: enable_padding: true max_batch_size: $CONC -enable_attention_dp: false +enable_attention_dp: $DP_ATTENTION kv_cache_config: dtype: auto enable_block_reuse: false @@ -42,9 +44,8 @@ print_iter_log: true stream_interval: 20 EOF - #mpirun -n 1 --oversubscribe --allow-run-as-root trtllm-serve $MODEL --tp_size $TP --trust_remote_code --max_seq_len $MAX_MODEL_LEN --max_num_tokens $MAX_MODEL_LEN --num_postprocess_workers 2 --extra_llm_api_options llama-config.yml --port $PORT > $SERVER_LOG 2>&1 & -mpirun -n 1 --oversubscribe --allow-run-as-root trtllm-serve $MODEL --max_batch_size $CONC --max_num_tokens 20000 --backend pytorch --extra_llm_api_options gptoss-config.yml --ep_size=$TP --trust_remote_code --gpus_per_node 8 --host 0.0.0.0 --port $PORT --tp_size=$TP --pp_size=1 > $SERVER_LOG 2>&1 & +mpirun -n 1 --oversubscribe --allow-run-as-root trtllm-serve $MODEL --max_batch_size $CONC --max_num_tokens 20000 --backend pytorch --extra_llm_api_options gptoss-config.yml --ep_size=$EP_SIZE --trust_remote_code --gpus_per_node 8 --host 0.0.0.0 --port $PORT --tp_size=$TP --pp_size=1 > $SERVER_LOG 2>&1 & set +x diff --git a/utils/matrix-logic/generate_sweep_configs.py b/utils/matrix-logic/generate_sweep_configs.py new file mode 100644 index 000000000..bb0e22911 --- /dev/null +++ b/utils/matrix-logic/generate_sweep_configs.py @@ -0,0 +1,956 @@ +import json +import yaml +import argparse +from pydantic import BaseModel, Field, ValidationError, ConfigDict +from typing import List + +# Field name constants +# Top-level config fields +FIELD_IMAGE = 'image' +FIELD_MODEL = 'model' +FIELD_MODEL_PREFIX = 'model-prefix' +FIELD_PRECISION = 'precision' +FIELD_FRAMEWORK = 'framework' +FIELD_RUNNER = 'runner' +FIELD_SEQ_LEN_CONFIGS = 'seq-len-configs' + +# Seq-len-config fields +FIELD_ISL = 'isl' +FIELD_OSL = 'osl' +FIELD_SEARCH_SPACE = 'search-space' + +# Search-space/benchmark fields +FIELD_TP = 'tp' +FIELD_CONC_START = 'conc-start' +FIELD_CONC_END = 'conc-end' +FIELD_EP = 'ep' +FIELD_DP_ATTN = 'dp-attn' + +# Matrix entry fields +FIELD_CONC = 'conc' +FIELD_MAX_MODEL_LEN = 'max-model-len' +FIELD_EXP_NAME = 'exp-name' + +seq_len_stoi = { + "1k1k": (1024, 1024), + "1k8k": (1024, 8192), + "8k1k": (8192, 1024) +} + +# Reverse mapping for exp-name generation +seq_len_itos = {v: k for k, v in seq_len_stoi.items()} + + +def seq_len_to_str(isl: int, osl: int) -> str: + """Convert sequence lengths to short string representation. + + Returns the short name (e.g., '1k1k') if it exists in the mapping, + otherwise returns 'isl_osl' format. + """ + return seq_len_itos.get((isl, osl), f"{isl}_{osl}") + + +class MatrixEntry(BaseModel): + """Pydantic model for validating matrix entry structure.""" + model_config = ConfigDict(extra='forbid', populate_by_name=True) + + image: str + model: str + precision: str + framework: str + runner: str + isl: int + osl: int + tp: int + ep: int + dp_attn: bool = Field(alias='dp-attn') + conc: int + max_model_len: int = Field(alias='max-model-len') + exp_name: str = Field(alias='exp-name') + + +def validate_matrix_output(matrix_values: List[dict]) -> List[dict]: + """Validate that matrix_values entries match the expected structure. + + Raises ValueError if any entry fails validation. + Returns the original list if all entries are valid. + """ + for i, entry in enumerate(matrix_values): + try: + MatrixEntry(**entry) + except ValidationError as e: + raise ValueError(f"Matrix entry at index {i} failed validation:\n{e}") + return matrix_values + + +def validate_master_configs_structure(all_config_data): + """Validate the structure of all master config entries. + + This validates that all required fields are present, have correct types, + and no extra fields exist. Should be called once after loading config files. + """ + for key, val in all_config_data.items(): + # Check for required top-level fields and their types + required_fields = { + FIELD_IMAGE: str, + FIELD_MODEL: str, + FIELD_MODEL_PREFIX: str, + FIELD_PRECISION: str, + FIELD_FRAMEWORK: str, + FIELD_RUNNER: str, + FIELD_SEQ_LEN_CONFIGS: list + } + + for field, expected_type in required_fields.items(): + if field not in val or val[field] is None: + raise ValueError( + f"Missing required field '{field}' for key '{key}'") + if not isinstance(val[field], expected_type): + raise ValueError( + f"Field '{field}' must be {expected_type.__name__} for key '{key}', got {type(val[field]).__name__}") + + seq_len_configs = val[FIELD_SEQ_LEN_CONFIGS] + if len(seq_len_configs) == 0: + raise ValueError( + f"'{FIELD_SEQ_LEN_CONFIGS}' must be a non-empty list for key '{key}'") + + # Validate each seq-len-config + for i, seq_config in enumerate(seq_len_configs): + # Check isl + if FIELD_ISL not in seq_config or seq_config[FIELD_ISL] is None: + raise ValueError( + f"Missing '{FIELD_ISL}' in seq-len-config[{i}] for key '{key}'") + if not isinstance(seq_config[FIELD_ISL], int): + raise ValueError( + f"'{FIELD_ISL}' must be int in seq-len-config[{i}] for key '{key}'") + + # Check osl + if FIELD_OSL not in seq_config or seq_config[FIELD_OSL] is None: + raise ValueError( + f"Missing '{FIELD_OSL}' in seq-len-config[{i}] for key '{key}'") + if not isinstance(seq_config[FIELD_OSL], int): + raise ValueError( + f"'{FIELD_OSL}' must be int in seq-len-config[{i}] for key '{key}'") + + bmk_space = seq_config.get(FIELD_SEARCH_SPACE) + if not bmk_space or not isinstance(bmk_space, list) or len(bmk_space) == 0: + raise ValueError( + f"Missing or invalid '{FIELD_SEARCH_SPACE}' in seq-len-config[{i}] for key '{key}'") + + # Validate each benchmark in search-space + for j, bmk in enumerate(bmk_space): + # Define allowed fields + allowed_fields = {FIELD_TP, FIELD_CONC_START, + FIELD_CONC_END, FIELD_EP, FIELD_DP_ATTN} + required_bmk_fields = {FIELD_TP: int, + FIELD_CONC_START: int, FIELD_CONC_END: int} + optional_bmk_fields = {FIELD_EP: int, FIELD_DP_ATTN: bool} + + # Check for extra fields + extra_fields = set(bmk.keys()) - allowed_fields + if extra_fields: + raise ValueError( + f"Extra fields {extra_fields} in search-space[{j}] of seq-len-config[{i}] for key '{key}'") + + # Validate required fields + for field, expected_type in required_bmk_fields.items(): + if field not in bmk or bmk[field] is None: + raise ValueError( + f"Missing '{field}' in search-space[{j}] of seq-len-config[{i}] for key '{key}'") + if not isinstance(bmk[field], expected_type): + raise ValueError( + f"'{field}' must be {expected_type.__name__} in search-space[{j}] of seq-len-config[{i}] for key '{key}'") + + # Validate optional fields if they exist + for field, expected_type in optional_bmk_fields.items(): + if field in bmk and bmk[field] is not None: + if not isinstance(bmk[field], expected_type): + raise ValueError( + f"'{field}' must be {expected_type.__name__} in search-space[{j}] of seq-len-config[{i}] for key '{key}'") + + +def generate_full_sweep(args, all_config_data): + """Generate full sweep configurations with optional filtering. + + Supports filtering by model prefix, precision, framework, runner type, and sequence lengths. + Supports test mode to only run highest TP with lowest concurrency. + + All filters are optional - can generate sweeps for all configs or filter by specific criteria. + + Assumes all_config_data has been validated by validate_config_structure(). + """ + # Validate runner types if specified + if args.runner_type: + if not args.runner_config: + raise ValueError( + "--runner-config is required when --runner-type is specified") + + try: + with open(args.runner_config, 'r') as f: + runner_config = yaml.safe_load(f) + except FileNotFoundError: + raise ValueError( + f"Runner config file '{args.runner_config}' does not exist.") + + valid_runner_types = set(runner_config.keys()) + invalid_runners = set(args.runner_type) - valid_runner_types + if invalid_runners: + raise ValueError( + f"Invalid runner type(s): {invalid_runners}. " + f"Valid runner types are: {', '.join(sorted(valid_runner_types))}") + + matrix_values = [] + + # Convert seq-lens to set of (isl, osl) tuples for filtering + seq_lens_filter = None + if args.seq_lens: + seq_lens_filter = {seq_len_stoi[sl] for sl in args.seq_lens} + + for key, val in all_config_data.items(): + # Filter by model prefix if specified + if args.model_prefix: + if not any(key.startswith(prefix) for prefix in args.model_prefix): + continue + + # Filter by precision if specified + if args.precision and val[FIELD_PRECISION] not in args.precision: + continue + + # Filter by framework if specified + if args.framework and val[FIELD_FRAMEWORK] not in args.framework: + continue + + # Filter by runner type if specified + if args.runner_type and val[FIELD_RUNNER] not in args.runner_type: + continue + + seq_len_configs = val[FIELD_SEQ_LEN_CONFIGS] + image = val[FIELD_IMAGE] + model = val[FIELD_MODEL] + precision = val[FIELD_PRECISION] + framework = val[FIELD_FRAMEWORK] + runner = val[FIELD_RUNNER] + model_code = val[FIELD_MODEL_PREFIX] + + for seq_config in seq_len_configs: + isl = seq_config[FIELD_ISL] + osl = seq_config[FIELD_OSL] + + # Filter by sequence lengths if specified + if seq_lens_filter and (isl, osl) not in seq_lens_filter: + continue + + bmk_space = seq_config[FIELD_SEARCH_SPACE] + + if args.test_mode: + # In test mode, use highest TP with lowest concurrency + highest_tp_bmk = max(bmk_space, key=lambda x: x[FIELD_TP]) + tp = highest_tp_bmk[FIELD_TP] + conc = highest_tp_bmk[FIELD_CONC_START] + ep = highest_tp_bmk.get(FIELD_EP) + dp_attn = highest_tp_bmk.get(FIELD_DP_ATTN) + + seq_len_str = seq_len_to_str(isl, osl) + entry = { + FIELD_IMAGE: image, + FIELD_MODEL: model, + FIELD_PRECISION: precision, + FIELD_FRAMEWORK: framework, + FIELD_RUNNER: runner, + FIELD_ISL: isl, + FIELD_OSL: osl, + FIELD_TP: tp, + FIELD_EP: 1, # Default + FIELD_DP_ATTN: False, # Default + FIELD_CONC: conc, + FIELD_MAX_MODEL_LEN: isl + osl + 200, + FIELD_EXP_NAME: f"{model_code}_{seq_len_str}", + } + + if ep is not None: + entry[FIELD_EP] = ep + if dp_attn is not None: + entry[FIELD_DP_ATTN] = dp_attn + + matrix_values.append(entry) + else: + # Full sweep mode + for bmk in bmk_space: + tp = bmk[FIELD_TP] + conc_start = bmk[FIELD_CONC_START] + conc_end = bmk[FIELD_CONC_END] + ep = bmk.get(FIELD_EP) + dp_attn = bmk.get(FIELD_DP_ATTN) + + conc = conc_start + while conc <= conc_end: + seq_len_str = seq_len_to_str(isl, osl) + entry = { + FIELD_IMAGE: image, + FIELD_MODEL: model, + FIELD_PRECISION: precision, + FIELD_FRAMEWORK: framework, + FIELD_RUNNER: runner, + FIELD_ISL: isl, + FIELD_OSL: osl, + FIELD_TP: tp, + FIELD_CONC: conc, + FIELD_MAX_MODEL_LEN: isl + osl + 200, + FIELD_EP: 1, # Default + FIELD_DP_ATTN: False, # Default + FIELD_EXP_NAME: f"{model_code}_{seq_len_str}", + } + + if ep is not None: + entry[FIELD_EP] = ep + if dp_attn is not None: + entry[FIELD_DP_ATTN] = dp_attn + + matrix_values.append(entry) + + if conc == conc_end: + break + conc *= args.step_size + if conc > conc_end: + conc = conc_end + + if len(matrix_values) == 0: + error_msg = "No configs found matching filters:" + if args.model_prefix: + error_msg += f" model-prefix={args.model_prefix}" + if args.precision: + error_msg += f" precision={args.precision}" + if args.framework: + error_msg += f" framework={args.framework}" + if args.runner_type: + error_msg += f" runner-type={args.runner_type}" + if seq_lens_filter: + error_msg += f" seq-lens={args.seq_lens}" + raise ValueError(error_msg) + + return matrix_values + + +def generate_test_config(args, all_config_data): + """Generate test configurations for a specific key. + + Assumes all_config_data has been validated by validate_config_structure(). + """ + try: + with open(args.runner_config, 'r') as f: + runner_config = yaml.safe_load(f) + except FileNotFoundError as e: + raise ValueError( + f"Runner config file '{args.runner_config}' does not exist.") + + val = all_config_data.get(args.key) + + if not val: + raise ValueError( + f"Specified key '{args.key}' does not exist in config files.") + + # Extract model code from config + model_code = val[FIELD_MODEL_PREFIX] + + runner_nodes = runner_config.get(val[FIELD_RUNNER], []) + if args.runner_node and args.runner_node not in runner_nodes: + raise ValueError( + f"Runner node '{args.runner_node}' is not compatible with config '{args.key}' which runs on runner type '{val[FIELD_RUNNER]}'. Available runner nodes for this config are '{', '.join(runner_nodes)}'.") + + seq_len_configs = val[FIELD_SEQ_LEN_CONFIGS] + image = val[FIELD_IMAGE] + model = val[FIELD_MODEL] + precision = val[FIELD_PRECISION] + framework = val[FIELD_FRAMEWORK] + # Use default runner or specific runner node if input by user + runner = val[FIELD_RUNNER] if not args.runner_node else args.runner_node + + # Convert seq-lens to set of (isl, osl) tuples for filtering + seq_lens_filter = None + if args.seq_lens: + seq_lens_filter = {seq_len_stoi[sl] for sl in args.seq_lens} + + matrix_values = [] + + # Process each sequence length configuration + for seq_config in seq_len_configs: + isl = seq_config[FIELD_ISL] + osl = seq_config[FIELD_OSL] + + # Filter by sequence lengths if specified + if seq_lens_filter and (isl, osl) not in seq_lens_filter: + continue + + bmk_space = seq_config[FIELD_SEARCH_SPACE] + + for bmk in bmk_space: + tp = bmk[FIELD_TP] + conc_start = bmk[FIELD_CONC_START] + conc_end = bmk[FIELD_CONC_END] + ep = bmk.get(FIELD_EP) + dp_attn = bmk.get(FIELD_DP_ATTN) + + # In test mode, only use the lowest concurrency (conc_start) + if args.test_mode: + entry = { + FIELD_IMAGE: image, + FIELD_MODEL: model, + FIELD_PRECISION: precision, + FIELD_FRAMEWORK: framework, + FIELD_RUNNER: runner, + FIELD_ISL: isl, + FIELD_OSL: osl, + FIELD_TP: tp, + FIELD_EP: 1, # Default, + FIELD_DP_ATTN: False, # Default + FIELD_CONC: conc_start, + FIELD_MAX_MODEL_LEN: isl + osl, + FIELD_EXP_NAME: f"{model_code}_test", + } + + # Add optional fields if they exist + if ep is not None: + entry[FIELD_EP] = ep + if dp_attn is not None: + entry[FIELD_DP_ATTN] = dp_attn + + matrix_values.append(entry) + else: + # Generate entries for each concurrency value in the range + conc = conc_start + while conc <= conc_end: + seq_len_str = seq_len_to_str(isl, osl) + entry = { + FIELD_IMAGE: image, + FIELD_MODEL: model, + FIELD_PRECISION: precision, + FIELD_FRAMEWORK: framework, + FIELD_RUNNER: runner, + FIELD_ISL: isl, + FIELD_OSL: osl, + FIELD_TP: tp, + FIELD_EP: 1, # Default, + FIELD_DP_ATTN: False, # Default + FIELD_CONC: conc, + FIELD_MAX_MODEL_LEN: isl + osl, + FIELD_EXP_NAME: f"{model_code}_{seq_len_str}", + } + + # Add optional fields if they exist + if ep is not None: + entry[FIELD_EP] = ep + if dp_attn is not None: + entry[FIELD_DP_ATTN] = dp_attn + + matrix_values.append(entry) + + if conc == conc_end: + break + conc *= args.step_size + if conc > conc_end: + conc = conc_end + + return matrix_values + + +def generate_runner_model_sweep_config(args, all_config_data): + """Generate runner-model sweep configurations. + + Assumes all_config_data has been validated by validate_config_structure(). + """ + try: + with open(args.runner_config, 'r') as f: + runner_config = yaml.safe_load(f) + except FileNotFoundError as e: + raise ValueError( + f"Runner config file '{args.runner_config}' does not exist.") + + runner_nodes = runner_config.get(args.runner_type) + + if not runner_nodes: + raise ValueError( + f"Runner '{args.runner_type}' does not exist in runner config '{args.runner_config}'. Must choose from existing runner types: '{', '.join(runner_config.keys())}'.") + + matrix_values = [] + for key, val in all_config_data.items(): + # Only consider configs with specified runner + if val[FIELD_RUNNER] != args.runner_type: + continue + + # Get model code for exp_name + model_code = val[FIELD_MODEL_PREFIX] + + # Find 1k1k config + target_config = None + for config in val[FIELD_SEQ_LEN_CONFIGS]: + if config[FIELD_ISL] == 1024 and config[FIELD_OSL] == 1024: + target_config = config + break + + highest_tp_bmk = max(target_config[FIELD_SEARCH_SPACE], key=lambda x: x[FIELD_TP]) + # Since we are just testing, pick the highest TP for this config and just test + # on that TP with the lowest concurrency available + highest_tp = highest_tp_bmk[FIELD_TP] + lowest_conc = highest_tp_bmk[FIELD_CONC_START] + + ep = highest_tp_bmk.get(FIELD_EP) + dp_attn = highest_tp_bmk.get(FIELD_DP_ATTN) + + for node in runner_nodes: + entry = { + FIELD_IMAGE: val[FIELD_IMAGE], + FIELD_MODEL: val[FIELD_MODEL], + FIELD_PRECISION: val[FIELD_PRECISION], + FIELD_FRAMEWORK: val[FIELD_FRAMEWORK], + # Add one entry for each node under specified runner type + FIELD_RUNNER: node, + # Again, just use 1k1k since this is just meant to smoke test all runners + FIELD_ISL: 1024, + FIELD_OSL: 1024, + FIELD_TP: highest_tp, + FIELD_EP: 1, # Default, + FIELD_DP_ATTN: False, # Default + FIELD_CONC: lowest_conc, + FIELD_MAX_MODEL_LEN: 2048, + FIELD_EXP_NAME: f"{model_code}_test", + } + + # Add optional fields if they exist + if ep is not None: + entry[FIELD_EP] = ep + if dp_attn is not None: + entry[FIELD_DP_ATTN] = dp_attn + + matrix_values.append(entry) + + return matrix_values + + +def generate_custom_test(args): + """Generate single 1k1k job for custom inputs. + """ + try: + with open(args.runner_config, 'r') as f: + runner_config = yaml.safe_load(f) + except FileNotFoundError as e: + raise ValueError( + f"Runner config file '{args.runner_config}' does not exist.") + + found_runner_label = False + for runner_type, runner_nodes in runner_config.items(): + if args.runner_label == runner_type or args.runner_label in runner_nodes: + found_runner_label = True + + if not found_runner_label: + raise ValueError(f"Unable to find specified runner label '{args.runner_label}'.") + + if not runner_nodes: + raise ValueError( + f"Runner '{args.runner_type}' does not exist in runner config '{args.runner_config}'. Must choose from existing runner types: '{', '.join(runner_config.keys())}'.") + + return [ + { + FIELD_IMAGE: args.image, + FIELD_MODEL: args.model, + FIELD_PRECISION: args.precision, + FIELD_FRAMEWORK: args.framework, + FIELD_RUNNER: args.runner_label, + # Again, just use 1k1k since this is just meant to smoke test all runners + FIELD_ISL: 1024, + FIELD_OSL: 1024, + FIELD_TP: 8, + FIELD_EP: 1, + FIELD_DP_ATTN: False, + FIELD_CONC: 4, + FIELD_EXP_NAME: args.exp_name, + FIELD_MAX_MODEL_LEN: 2048, + } + ] + + +def generate_runner_sweep_config(args, all_config_data): + """Generate runner sweep configurations. + + Assumes all_config_data has been validated by validate_config_structure(). + """ + try: + with open(args.runner_config, 'r') as f: + runner_config = yaml.safe_load(f) + except FileNotFoundError as e: + raise ValueError( + f"Runner config file '{args.runner_config}' does not exist.") + + if not runner_config.get(args.runner_type): + raise ValueError( + f"Runner '{args.runner_type}' does not exist in runner config '{args.runner_config}'. Must choose from existing runner types: '{', '.join(runner_config.keys())}'.") + + + matrix_values = [] + for key, val in all_config_data.items(): + # Only consider configs with specified runner + if not key.startswith(args.model_prefix): + continue + + if not val[FIELD_RUNNER] == args.runner_type: + continue + + # Optionally filter by precision and framework + if (args.precision and val[FIELD_PRECISION] != args.precision) or (args.framework and val[FIELD_FRAMEWORK] != args.framework): + continue + + # Get model code for exp_name + model_code = val[FIELD_MODEL_PREFIX] + + runner_nodes = runner_config.get(val[FIELD_RUNNER]) + if not runner_nodes: + raise ValueError( + f"Runner '{val[FIELD_RUNNER]}' does not exist in runner config '{args.runner_config}'. Must choose from existing runner types: '{', '.join(runner_config.keys())}'.") + + # Find 1k1k config + target_config = None + for config in val[FIELD_SEQ_LEN_CONFIGS]: + if config[FIELD_ISL] == 1024 and config[FIELD_OSL] == 1024: + target_config = config + break + + highest_tp_bmk = max(target_config[FIELD_SEARCH_SPACE], key=lambda x: x[FIELD_TP]) + # Since we are just testing, pick the highest TP for this config and just test + # on that TP with the lowest concurrency available + highest_tp = highest_tp_bmk[FIELD_TP] + lowest_conc = highest_tp_bmk[FIELD_CONC_START] + + ep = highest_tp_bmk.get(FIELD_EP) + dp_attn = highest_tp_bmk.get(FIELD_DP_ATTN) + + for node in runner_nodes: + entry = { + FIELD_IMAGE: val[FIELD_IMAGE], + FIELD_MODEL: val[FIELD_MODEL], + FIELD_PRECISION: val[FIELD_PRECISION], + FIELD_FRAMEWORK: val[FIELD_FRAMEWORK], + # Add one entry for each node under specified runner type + FIELD_RUNNER: node, + # Again, just use 1k1k since this is just meant to smoke test all runners + FIELD_ISL: 1024, + FIELD_OSL: 1024, + FIELD_TP: highest_tp, + FIELD_EP: 1, # Default, + FIELD_DP_ATTN: False, # Default + FIELD_CONC: lowest_conc, + FIELD_EXP_NAME: f"{model_code}_test", + FIELD_MAX_MODEL_LEN: 2048, + } + + # Add optional fields if they exist + if ep is not None: + entry[FIELD_EP] = ep + if dp_attn is not None: + entry[FIELD_DP_ATTN] = dp_attn + + matrix_values.append(entry) + + if len(matrix_values) == 0: + error_msg = f"No configs found matching model prefix '{args.model_prefix}'" + if args.precision: + error_msg += f", precision '{args.precision}'" + if args.framework: + error_msg += f", framework '{args.framework}'" + raise ValueError(error_msg + ".") + + return matrix_values + + +def load_config_files(config_files): + """Load and merge configuration files.""" + all_config_data = {} + for config_file in config_files: + try: + with open(config_file, 'r') as f: + config_data = yaml.safe_load(f) + assert isinstance( + config_data, dict), f"Config file '{config_file}' must contain a dictionary" + + # Check for duplicate keys, this is only in place to prevent against the very unlikely + # case where an entry in one config accidentally/purposefully tries to override an entry in another config + duplicate_keys = set(all_config_data.keys()) & set( + config_data.keys()) + if duplicate_keys: + raise ValueError( + f"Duplicate configuration keys found in '{config_file}': {', '.join(sorted(duplicate_keys))}" + ) + + all_config_data.update(config_data) + except FileNotFoundError: + raise ValueError(f"Input file '{config_file}' does not exist.") + + return all_config_data + + +def main(): + # Create parent parser with common arguments + parent_parser = argparse.ArgumentParser(add_help=False) + parent_parser.add_argument( + '--config-files', + nargs='+', + required=True, + help='One or more configuration files (YAML format)' + ) + + # Create main parser + parser = argparse.ArgumentParser( + description='Generate benchmark configurations from YAML config files' + ) + + # Create subparsers for subcommands + subparsers = parser.add_subparsers( + dest='command', + required=True, + help='Available commands' + ) + + # Subcommand: full-sweep + full_sweep_parser = subparsers.add_parser( + 'full-sweep', + parents=[parent_parser], + add_help=False, + help='Generate full sweep configurations with optional filtering by model, precision, framework, runner type, and sequence lengths' + ) + full_sweep_parser.add_argument( + '--model-prefix', + nargs='+', + required=False, + help='Model prefix(es) to filter configurations (optional, can specify multiple)' + ) + full_sweep_parser.add_argument( + '--precision', + nargs='+', + required=False, + help='Precision(s) to filter by (e.g., fp4, fp8) (optional, can specify multiple)' + ) + full_sweep_parser.add_argument( + '--framework', + nargs='+', + required=False, + help='Framework(s) to filter by (e.g., vllm, trt, sglang) (optional, can specify multiple)' + ) + full_sweep_parser.add_argument( + '--runner-type', + nargs='+', + required=False, + help='Runner type(s) to filter by (e.g., h200, h100) (optional, can specify multiple)' + ) + full_sweep_parser.add_argument( + '--runner-config', + required=False, + help='Configuration file holding runner information (required if --runner-type is specified)' + ) + full_sweep_parser.add_argument( + '--seq-lens', + nargs='+', + choices=list(seq_len_stoi.keys()), + required=False, + help=f"Sequence length configurations to include: {', '.join(seq_len_stoi.keys())}. If not specified, all sequence lengths are included." + ) + full_sweep_parser.add_argument( + '--step-size', + type=int, + default=2, + help='Step size for concurrency values (default: 2)' + ) + full_sweep_parser.add_argument( + '--test-mode', + action='store_true', + help='Test mode: only run highest TP with lowest concurrency for each matching config' + ) + full_sweep_parser.add_argument( + '-h', '--help', + action='help', + help='Show this help message and exit' + ) + + # Subcommand: test-config + test_config_parser = subparsers.add_parser( + 'test-config', + parents=[parent_parser], + add_help=False, + help='Given a config key, run that configuration as specified. Optionally specify --test-mode to only run one parallelism-concurrency pair for the config.' + ) + test_config_parser.add_argument( + '--runner-config', + required=True, + help='Configuration file holding runner information' + ) + test_config_parser.add_argument( + '--key', + required=True, + help='Configuration key to use' + ) + test_config_parser.add_argument( + '--runner-node', + required=False, + help='Specific runner node to use' + ) + test_config_parser.add_argument( + '--seq-lens', + nargs='+', + choices=list(seq_len_stoi.keys()), + required=False, + help=f"Sequence length configurations to include: {', '.join(seq_len_stoi.keys())}. If not specified, all sequence lengths are included." + ) + test_config_parser.add_argument( + '--step-size', + type=int, + default=2, + help='Step size for concurrency values (default: 2)' + ) + test_config_parser.add_argument( + '--test-mode', + action='store_true', + help='Generate only the lowest concurrency value for each TP level' + ) + test_config_parser.add_argument( + '-h', '--help', + action='help', + help='Show this help message and exit' + ) + + # Subcommand: runner-model-sweep + test_config_parser = subparsers.add_parser( + 'runner-model-sweep', + parents=[parent_parser], + add_help=False, + help='Given a runner type, find all configurations matching the type, and run that configuration on all individual runner nodes for the specified runner type. This is meant to validate that all runner nodes work on all configurations for a runner type. For instance, to validate that all configs that specify an h200 runner successfully run across all h200 runner nodes.' + ) + test_config_parser.add_argument( + '--runner-type', + required=True, + help='Runner type (e.g., h200-trt, h100)' + ) + test_config_parser.add_argument( + '--runner-config', + required=True, + help='Configuration file holding runner information' + ) + test_config_parser.add_argument( + '-h', '--help', + action='help', + help='Show this help message and exit' + ) + + # Subcommand: runner-sweep + test_config_parser = subparsers.add_parser( + 'runner-sweep', + parents=[parent_parser], + add_help=False, + help='Given a model (and optionally a precision and framework), find all configurations matching the inputs, and run those configurations across all compatible runner nodes. This is meant to validate all runner nodes that should run a particular model can. For instance, this should be used to validate that all runners nodes that should run gptoss-120b actually do so successfully.' + ) + test_config_parser.add_argument( + '--runner-type', + required=True, + help='Runner type (e.g., h200-trt, h100)' + ) + test_config_parser.add_argument( + '--model-prefix', + required=True, + help='Model prefix (e.g., 70b)' + ) + test_config_parser.add_argument( + '--precision', + required=False, + help='Precision to filter by (e.g., fp4) (optional)' + ) + test_config_parser.add_argument( + '--framework', + required=False, + help='Framework to filter by (e.g., trt) (optional)' + ) + test_config_parser.add_argument( + '--runner-config', + required=True, + help='Configuration file holding runner information' + ) + test_config_parser.add_argument( + '-h', '--help', + action='help', + help='Show this help message and exit' + ) + + # Subcommand: custom + test_config_parser = subparsers.add_parser( + 'custom', + parents=[parent_parser], + add_help=False, + help='Enter custom values' + ) + test_config_parser.add_argument( + '--runner-label', + required=True, + help='Label associated with runner on which to launch the corresponding job (e.g., h200, h200-nv_1, etc.)' + ) + test_config_parser.add_argument( + '--image', + required=True, + help='Image to run the benchmark (e.g., openai/gpt-oss-120b)' + ) + test_config_parser.add_argument( + '--model', + required=True, + help='Model to run (e.g., vllm/vllm-openai:latest)' + ) + test_config_parser.add_argument( + '--framework', + required=True, + help='Framework to run on (e.g., vllm, trt, sglang)' + ) + test_config_parser.add_argument( + '--precision', + required=True, + help='Precision to run (e.g., fp4, fp8)' + ) + test_config_parser.add_argument( + '--exp-name', + required=True, + help='Experiment name (e.g., 70b_test)' + ) + test_config_parser.add_argument( + '--runner-config', + required=True, + help='Configuration file holding runner information' + ) + test_config_parser.add_argument( + '-h', '--help', + action='help', + help='Show this help message and exit' + ) + + args = parser.parse_args() + + # Load and validate configuration files + all_config_data = load_config_files(args.config_files) + validate_master_configs_structure(all_config_data) + + # Route to appropriate function based on subcommand + if args.command == 'full-sweep': + matrix_values = generate_full_sweep(args, all_config_data) + elif args.command == 'test-config': + matrix_values = generate_test_config(args, all_config_data) + elif args.command == 'runner-model-sweep': + matrix_values = generate_runner_model_sweep_config( + args, all_config_data) + elif args.command == 'runner-sweep': + matrix_values = generate_runner_sweep_config( + args, all_config_data) + elif args.command == 'custom': + matrix_values = generate_custom_test(args) + else: + parser.error(f"Unknown command: {args.command}") + + # Validate output before printing + validate_matrix_output(matrix_values) + + print(json.dumps(matrix_values)) + return matrix_values + + +if __name__ == "__main__": + main() diff --git a/utils/matrix-logic/get_test_sweep_configs.py b/utils/matrix-logic/get_test_sweep_configs.py new file mode 100644 index 000000000..87ab0457b --- /dev/null +++ b/utils/matrix-logic/get_test_sweep_configs.py @@ -0,0 +1,151 @@ +import json +import yaml +import sys +import argparse + +seq_len_stoi = { + "1k1k": (1024, 1024), + "1k8k": (1024, 8192), + "8k1k": (8192, 1024) +} + +def main(): + parser = argparse.ArgumentParser( + description='Generate benchmark matrix from a specific configuration key' + ) + parser.add_argument( + '--config-files', + nargs='+', + required=True, + help='One or more configuration files (YAML format)' + ) + parser.add_argument( + '--key', + required=True, + help='Configuration key to use' + ) + parser.add_argument( + '--seq-lens', + nargs='+', + choices=list(seq_len_stoi.keys()), + required=False, + help=f"Sequence length configurations to include: {', '.join(seq_len_stoi.keys())}. If not specified, all sequence lengths are included." + ) + parser.add_argument( + '--step-size', + type=int, + default=2, + help='Step size for concurrency values (default: 2)' + ) + + args = parser.parse_args() + + # Convert seq-lens to set of (isl, osl) tuples for filtering + seq_lens_filter = None + if args.seq_lens: + seq_lens_filter = {seq_len_stoi[sl] for sl in args.seq_lens} + + # Load and merge all config files + all_config_data = {} + for config_file in args.config_files: + try: + with open(config_file, 'r') as f: + config_data = yaml.safe_load(f) + assert isinstance(config_data, dict), f"Config file '{config_file}' must contain a dictionary" + + # Check for duplicate keys + duplicate_keys = set(all_config_data.keys()) & set(config_data.keys()) + if duplicate_keys: + raise ValueError( + f"Duplicate configuration keys found in '{config_file}': {', '.join(sorted(duplicate_keys))}" + ) + + all_config_data.update(config_data) + except FileNotFoundError: + raise ValueError(f"Input file '{config_file}' does not exist.") + + # Check if the key exists + if args.key not in all_config_data: + available_keys = ', '.join(sorted(all_config_data.keys())) + raise ValueError( + f"Key '{args.key}' not found in configuration files. " + f"Available keys: {available_keys}" + ) + + val = all_config_data[args.key] + + # Validate required fields + seq_len_configs = val.get('seq-len-configs') + assert seq_len_configs, f"Missing 'seq-len-configs' for key '{args.key}'" + + image = val.get('image') + model = val.get('model') + precision = val.get('precision') + framework = val.get('framework') + runner = val.get('runner') + + assert None not in (image, model, precision, framework, runner), \ + f"Missing required fields (image, model, precision, framework, runner) for key '{args.key}'" + + matrix_values = [] + + # Process each sequence length configuration + for seq_config in seq_len_configs: + isl = seq_config.get('isl') + osl = seq_config.get('osl') + + assert None not in (isl, osl), \ + f"Missing 'isl' or 'osl' in seq-len-config for key '{args.key}'" + + # Filter by sequence lengths if specified + if seq_lens_filter and (isl, osl) not in seq_lens_filter: + continue + + bmk_space = seq_config.get('bmk-space') + assert bmk_space, f"Missing 'bmk-space' in seq-len-config for key '{args.key}'" + + for bmk in bmk_space: + tp = bmk.get('tp') + conc_start = bmk.get('conc-start') + conc_end = bmk.get('conc-end') + ep = bmk.get('ep') + dp_attn = bmk.get('dp-attn') + + assert None not in (tp, conc_start, conc_end), \ + f"Missing 'tp', 'conc-start', or 'conc-end' in bmk-space for key '{args.key}'" + + # Generate entries for each concurrency value in the range + conc = conc_start + while conc <= conc_end: + entry = { + 'image': image, + 'model': model, + 'precision': precision, + 'framework': framework, + 'runner': runner, + 'isl': isl, + 'osl': osl, + 'tp': tp, + 'conc': conc, + 'max-model-len': isl + osl, + } + + # Add optional fields if they exist + if ep is not None: + entry['ep'] = ep + if dp_attn is not None: + entry['dp-attn'] = dp_attn + + matrix_values.append(entry) + + if conc == conc_end: + break + conc *= args.step_size + if conc > conc_end: + conc = conc_end + + print(json.dumps(matrix_values)) + return matrix_values + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/utils/matrix-logic/pytest.ini b/utils/matrix-logic/pytest.ini new file mode 100644 index 000000000..c3cd9aac7 --- /dev/null +++ b/utils/matrix-logic/pytest.ini @@ -0,0 +1,12 @@ +[pytest] +testpaths = . +python_files = test_*.py +python_classes = Test* +python_functions = test_* +addopts = + -v + --strict-markers + --tb=short +markers = + slow: marks tests as slow (deselect with '-m "not slow"') + integration: marks tests as integration tests diff --git a/utils/matrix-logic/test_generate_sweep_configs.py b/utils/matrix-logic/test_generate_sweep_configs.py new file mode 100644 index 000000000..15c5f25a3 --- /dev/null +++ b/utils/matrix-logic/test_generate_sweep_configs.py @@ -0,0 +1,1573 @@ +import pytest +import yaml +from unittest.mock import patch +from generate_sweep_configs import ( + validate_master_configs_structure, + validate_matrix_output, + seq_len_to_str, + generate_full_sweep, + generate_test_config, + generate_runner_model_sweep_config, + generate_runner_sweep_config, + generate_custom_test, + load_config_files, + main, + MatrixEntry, +) + + +# Fixtures for test config files +@pytest.fixture +def sample_master_config(): + """Sample master config with valid entries.""" + return { + "70b-fp8-vllm": { + "image": "vllm/vllm-openai:v0.10.2", + "model": "meta-llama/Llama-3-70b", + "model-prefix": "70b", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [ + {"tp": 4, "conc-start": 1, "conc-end": 4}, + {"tp": 8, "conc-start": 2, "conc-end": 8, "ep": 2, "dp-attn": True} + ] + }, + { + "isl": 1024, + "osl": 8192, + "search-space": [ + {"tp": 8, "conc-start": 1, "conc-end": 2} + ] + } + ] + }, + "8b-fp4-trt": { + "image": "nvcr.io/nvidia/tritonserver:24.01", + "model": "meta-llama/Llama-3-8b", + "model-prefix": "8b", + "precision": "fp4", + "framework": "trt", + "runner": "h100", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [ + {"tp": 2, "conc-start": 4, "conc-end": 16} + ] + } + ] + }, + "gptoss-120b-fp8-vllm": { + "image": "vllm/vllm-openai:latest", + "model": "openai/gpt-oss-120b", + "model-prefix": "gptoss", + "precision": "fp8", + "framework": "vllm", + "runner": "h200-trt", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [ + {"tp": 8, "conc-start": 1, "conc-end": 4} + ] + } + ] + } + } + + +@pytest.fixture +def sample_runner_config(): + """Sample runner config.""" + return { + "h200": ["h200-nv_1", "h200-nv_2"], + "h100": ["h100-aws_1"], + "h200-trt": ["h200-trt_1", "h200-trt_2", "h200-trt_3"] + } + + +@pytest.fixture +def temp_config_files(tmp_path, sample_master_config, sample_runner_config): + """Create temporary config files.""" + master_file = tmp_path / "master.yaml" + runner_file = tmp_path / "runners.yaml" + + with open(master_file, 'w') as f: + yaml.dump(sample_master_config, f) + + with open(runner_file, 'w') as f: + yaml.dump(sample_runner_config, f) + + return str(master_file), str(runner_file) + + +@pytest.fixture +def invalid_master_config(): + """Master config with validation errors.""" + return { + "missing-field": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + # Missing precision, framework, runner, seq-len-configs + } + } + + +# Tests for seq_len_to_str +def test_seq_len_to_str_with_mapping(): + """Test seq_len_to_str with known mappings.""" + assert seq_len_to_str(1024, 1024) == "1k1k" + assert seq_len_to_str(1024, 8192) == "1k8k" + assert seq_len_to_str(8192, 1024) == "8k1k" + + +def test_seq_len_to_str_without_mapping(): + """Test seq_len_to_str fallback for unknown mappings.""" + assert seq_len_to_str(2048, 4096) == "2048_4096" + assert seq_len_to_str(512, 512) == "512_512" + + +# Tests for MatrixEntry validation +def test_matrix_entry_valid(): + """Test valid MatrixEntry.""" + entry = { + "image": "test:latest", + "model": "test/model", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "isl": 1024, + "osl": 1024, + "tp": 8, + "ep": 1, + "dp-attn": False, + "conc": 4, + "max-model-len": 2048, + "exp-name": "test_exp" + } + result = MatrixEntry(**entry) + assert result.image == "test:latest" + assert result.tp == 8 + + +def test_matrix_entry_missing_field(): + """Test MatrixEntry with missing required field.""" + entry = { + "image": "test:latest", + "model": "test/model", + # Missing other required fields + } + with pytest.raises(Exception): # Pydantic ValidationError + MatrixEntry(**entry) + + +def test_matrix_entry_wrong_type(): + """Test MatrixEntry with wrong type.""" + entry = { + "image": "test:latest", + "model": "test/model", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "isl": "not_an_int", # Wrong type + "osl": 1024, + "tp": 8, + "ep": 1, + "dp-attn": False, + "conc": 4, + "max-model-len": 2048, + "exp-name": "test_exp" + } + with pytest.raises(Exception): # Pydantic ValidationError + MatrixEntry(**entry) + + +def test_matrix_entry_extra_field(): + """Test MatrixEntry with extra field (should be forbidden).""" + entry = { + "image": "test:latest", + "model": "test/model", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "isl": 1024, + "osl": 1024, + "tp": 8, + "ep": 1, + "dp-attn": False, + "conc": 4, + "max-model-len": 2048, + "exp-name": "test_exp", + "extra-field": "should_fail" + } + with pytest.raises(Exception): # Pydantic ValidationError + MatrixEntry(**entry) + + +# Tests for validate_matrix_output +def test_validate_matrix_output_valid(): + """Test validate_matrix_output with valid entries.""" + entries = [ + { + "image": "test:latest", + "model": "test/model", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "isl": 1024, + "osl": 1024, + "tp": 8, + "ep": 1, + "dp-attn": False, + "conc": 4, + "max-model-len": 2048, + "exp-name": "test_exp" + } + ] + result = validate_matrix_output(entries) + assert result == entries + + +def test_validate_matrix_output_invalid(): + """Test validate_matrix_output with invalid entry.""" + entries = [ + { + "image": "test:latest", + "model": "test/model", + # Missing required fields + } + ] + with pytest.raises(ValueError, match="Matrix entry at index 0 failed validation"): + validate_matrix_output(entries) + + +def test_validate_matrix_output_multiple_entries(): + """Test validate_matrix_output with multiple entries.""" + entries = [ + { + "image": "test:latest", + "model": "test/model", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "isl": 1024, + "osl": 1024, + "tp": 8, + "ep": 1, + "dp-attn": False, + "conc": 4, + "max-model-len": 2048, + "exp-name": "test_exp" + }, + { + "image": "test2:latest", + "model": "test2/model", + "precision": "fp4", + "framework": "trt", + "runner": "h100", + "isl": 1024, + "osl": 1024, + "tp": 4, + "ep": 2, + "dp-attn": True, + "conc": 8, + "max-model-len": 2048, + "exp-name": "test_exp2" + } + ] + result = validate_matrix_output(entries) + assert len(result) == 2 + + +# Tests for validate_master_configs_structure +def test_validate_master_configs_structure_valid(sample_master_config): + """Test validation of valid master config.""" + validate_master_configs_structure(sample_master_config) + + +def test_validate_master_configs_structure_missing_field(): + """Test validation with missing required field.""" + config = { + "test-key": { + "image": "test:latest", + "model-prefix": "test", + # Missing other required fields + } + } + with pytest.raises(ValueError, match="Missing required field"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_wrong_type(): + """Test validation with wrong field type.""" + config = { + "test-key": { + "image": 123, # Should be string + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [] + } + } + with pytest.raises(ValueError, match="must be str"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_empty_seq_len_configs(): + """Test validation with empty seq-len-configs.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [] + } + } + with pytest.raises(ValueError, match="must be a non-empty list"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_invalid_search_space(): + """Test validation with invalid search-space.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [ + {"tp": 8} # Missing conc-start and conc-end + ] + } + ] + } + } + with pytest.raises(ValueError, match="Missing 'conc-start'"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_missing_search_space(): + """Test validation with missing search-space.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024 + # Missing search-space + } + ] + } + } + with pytest.raises(ValueError, match="Missing or invalid 'search-space'"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_search_space_not_list(): + """Test validation with search-space not being a list.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": "not_a_list" + } + ] + } + } + with pytest.raises(ValueError, match="Missing or invalid 'search-space'"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_extra_fields_in_search_space(): + """Test validation with extra fields in search-space.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [ + { + "tp": 8, + "conc-start": 1, + "conc-end": 4, + "invalid-field": "value" + } + ] + } + ] + } + } + with pytest.raises(ValueError, match="Extra fields"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_missing_isl(): + """Test validation with missing isl.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "osl": 1024, + "search-space": [{"tp": 8, "conc-start": 1, "conc-end": 4}] + } + ] + } + } + with pytest.raises(ValueError, match="Missing 'isl'"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_wrong_isl_type(): + """Test validation with wrong isl type.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": "not_int", + "osl": 1024, + "search-space": [{"tp": 8, "conc-start": 1, "conc-end": 4}] + } + ] + } + } + with pytest.raises(ValueError, match="'isl' must be int"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_missing_osl(): + """Test validation with missing osl.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "search-space": [{"tp": 8, "conc-start": 1, "conc-end": 4}] + } + ] + } + } + with pytest.raises(ValueError, match="Missing 'osl'"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_wrong_osl_type(): + """Test validation with wrong osl type.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": "not_int", + "search-space": [{"tp": 8, "conc-start": 1, "conc-end": 4}] + } + ] + } + } + with pytest.raises(ValueError, match="'osl' must be int"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_wrong_tp_type(): + """Test validation with wrong tp type.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [{"tp": "not_int", "conc-start": 1, "conc-end": 4}] + } + ] + } + } + with pytest.raises(ValueError, match="'tp' must be int"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_wrong_conc_start_type(): + """Test validation with wrong conc-start type.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [{"tp": 8, "conc-start": "not_int", "conc-end": 4}] + } + ] + } + } + with pytest.raises(ValueError, match="'conc-start' must be int"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_wrong_conc_end_type(): + """Test validation with wrong conc-end type.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [{"tp": 8, "conc-start": 1, "conc-end": "not_int"}] + } + ] + } + } + with pytest.raises(ValueError, match="'conc-end' must be int"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_wrong_ep_type(): + """Test validation with wrong ep type.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [{"tp": 8, "conc-start": 1, "conc-end": 4, "ep": "not_int"}] + } + ] + } + } + with pytest.raises(ValueError, match="'ep' must be int"): + validate_master_configs_structure(config) + + +def test_validate_master_configs_structure_wrong_dp_attn_type(): + """Test validation with wrong dp-attn type.""" + config = { + "test-key": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [{"tp": 8, "conc-start": 1, "conc-end": 4, "dp-attn": "not_bool"}] + } + ] + } + } + with pytest.raises(ValueError, match="'dp-attn' must be bool"): + validate_master_configs_structure(config) + + +# Tests for load_config_files +def test_load_config_files_valid(temp_config_files): + """Test loading valid config files.""" + master_file, _ = temp_config_files + result = load_config_files([master_file]) + assert len(result) == 3 + assert "70b-fp8-vllm" in result + + +def test_load_config_files_multiple(tmp_path, sample_master_config): + """Test loading multiple config files.""" + file1 = tmp_path / "config1.yaml" + file2 = tmp_path / "config2.yaml" + + config1 = {"70b-fp8-vllm": sample_master_config["70b-fp8-vllm"]} + config2 = {"8b-fp4-trt": sample_master_config["8b-fp4-trt"]} + + with open(file1, 'w') as f: + yaml.dump(config1, f) + with open(file2, 'w') as f: + yaml.dump(config2, f) + + result = load_config_files([str(file1), str(file2)]) + assert len(result) == 2 + + +def test_load_config_files_not_found(): + """Test loading non-existent config file.""" + with pytest.raises(ValueError, match="does not exist"): + load_config_files(["/nonexistent/file.yaml"]) + + +def test_load_config_files_duplicate_keys(tmp_path, sample_master_config): + """Test loading files with duplicate keys.""" + file1 = tmp_path / "config1.yaml" + file2 = tmp_path / "config2.yaml" + + config1 = {"70b-fp8-vllm": sample_master_config["70b-fp8-vllm"]} + config2 = {"70b-fp8-vllm": sample_master_config["70b-fp8-vllm"]} # Duplicate + + with open(file1, 'w') as f: + yaml.dump(config1, f) + with open(file2, 'w') as f: + yaml.dump(config2, f) + + with pytest.raises(ValueError, match="Duplicate configuration keys"): + load_config_files([str(file1), str(file2)]) + + +# Tests for generate_full_sweep +def test_generate_full_sweep_basic(sample_master_config, temp_config_files): + """Test basic full sweep generation.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b"] + seq_lens = ["1k1k"] + step_size = 2 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + assert len(result) > 0 + assert all(entry['exp-name'].startswith('70b_1k1k') for entry in result) + assert all(entry['isl'] == 1024 and entry['osl'] == 1024 for entry in result) + + +def test_generate_full_sweep_with_optionals(sample_master_config, temp_config_files): + """Test full sweep with optional ep and dp-attn.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b"] + seq_lens = ["1k1k"] + step_size = 2 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + # Find entry with tp=8 which should have ep=2 and dp-attn=True + tp8_entries = [e for e in result if e['tp'] == 8] + assert len(tp8_entries) > 0 + assert all(e['ep'] == 2 for e in tp8_entries) + assert all(e['dp-attn'] == True for e in tp8_entries) + + +def test_generate_full_sweep_no_matches(sample_master_config, temp_config_files): + """Test full sweep with no matching configs.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["nonexistent"] + seq_lens = ["1k1k"] + step_size = 2 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + with pytest.raises(ValueError, match="No configs found matching filters"): + generate_full_sweep(Args(), sample_master_config) + + +def test_generate_full_sweep_different_seq_len(sample_master_config, temp_config_files): + """Test full sweep with different sequence length.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b"] + seq_lens = ["1k8k"] + step_size = 2 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + assert len(result) > 0 + assert all(entry['isl'] == 1024 and entry['osl'] == 8192 for entry in result) + + +def test_generate_full_sweep_step_size(sample_master_config, temp_config_files): + """Test full sweep with different step size.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["8b"] + seq_lens = ["1k1k"] + step_size = 4 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + # Should have entries at conc=4, 8, 16 (step_size=4, conc-start=4, conc-end=16) + conc_values = sorted(set(e['conc'] for e in result)) + assert 4 in conc_values + assert 16 in conc_values + + +def test_generate_full_sweep_seq_len_not_in_config(temp_config_files): + """Test full sweep when requested seq-len is not in config.""" + _, runner_file = temp_config_files + + config = { + "test-fp8-vllm": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 8192, + "osl": 1024, # Only has 8k1k, not 1k1k + "search-space": [ + {"tp": 4, "conc-start": 1, "conc-end": 4} + ] + } + ] + } + } + + class Args: + model_prefix = ["test"] + seq_lens = ["1k1k"] # Requesting 1k1k but config only has 8k1k + step_size = 2 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + # Should raise error since no matching seq-len + with pytest.raises(ValueError, match="No configs found matching filters"): + generate_full_sweep(Args(), config) + + +def test_generate_full_sweep_concurrency_overshoot(temp_config_files): + """Test full sweep when concurrency step overshoots end value.""" + _, runner_file = temp_config_files + + config = { + "test-fp8-vllm": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [ + {"tp": 4, "conc-start": 1, "conc-end": 5} # 1, 3*2=6 overshoots, clamps to 5 + ] + } + ] + } + } + + class Args: + model_prefix = ["test"] + seq_lens = ["1k1k"] + step_size = 3 # Will overshoot: 1, 3, 9 (clamped to 5) + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), config) + conc_values = sorted(set(e['conc'] for e in result)) + # Should have 1, 3, 5 (5 is the clamped value) + assert conc_values == [1, 3, 5] + + +# Tests for generate_full_sweep with filters +def test_generate_full_sweep_no_filters(sample_master_config, temp_config_files): + """Test filtered sweep with no filters.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = None + precision = None + framework = None + runner_type = None + seq_lens = None + step_size = 2 + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + assert len(result) > 0 + + +def test_generate_full_sweep_with_filters_model_prefix(sample_master_config, temp_config_files): + """Test filtered sweep with model prefix filter.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b"] + precision = None + framework = None + runner_type = None + seq_lens = None + step_size = 2 + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + assert all("70b" in entry['exp-name'] for entry in result) + + +def test_generate_full_sweep_with_filters_multiple_filters(sample_master_config, temp_config_files): + """Test filtered sweep with multiple filters.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b"] + precision = ["fp8"] + framework = ["vllm"] + runner_type = None + seq_lens = ["1k1k"] + step_size = 2 + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + assert len(result) > 0 + assert all(entry['precision'] == 'fp8' for entry in result) + assert all(entry['framework'] == 'vllm' for entry in result) + + +def test_generate_full_sweep_with_filters_test_mode(sample_master_config, temp_config_files): + """Test filtered sweep in test mode.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b"] + precision = None + framework = None + runner_type = None + seq_lens = ["1k1k"] + step_size = 2 + test_mode = True + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + # In test mode, should only get one entry per seq-len (highest TP, lowest conc) + assert len(result) == 1 # Only one config matches 70b with 1k1k + assert result[0]['tp'] == 8 # Highest TP + assert '70b_1k1k' in result[0]['exp-name'] + + +def test_generate_full_sweep_with_filters_runner_type_validation(sample_master_config, temp_config_files): + """Test filtered sweep with invalid runner type.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = None + precision = None + framework = None + runner_type = ["invalid-runner"] + seq_lens = None + step_size = 2 + test_mode = False + runner_config = runner_file + + with pytest.raises(ValueError, match="Invalid runner type"): + generate_full_sweep(Args(), sample_master_config) + + +def test_generate_full_sweep_with_filters_runner_type_no_config(sample_master_config): + """Test filtered sweep with runner type but no config file.""" + class Args: + model_prefix = None + precision = None + framework = None + runner_type = ["h200"] + seq_lens = None + step_size = 2 + test_mode = False + runner_config = None + + with pytest.raises(ValueError, match="runner-config is required"): + generate_full_sweep(Args(), sample_master_config) + + +def test_generate_full_sweep_with_filters_multiple_runner_types(sample_master_config, temp_config_files): + """Test filtered sweep with multiple runner types.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = None + precision = None + framework = None + runner_type = ["h200", "h100"] + seq_lens = ["1k1k"] + step_size = 2 + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + runners = set(entry['runner'] for entry in result) + assert 'h200' in runners or 'h100' in runners + + +def test_generate_full_sweep_with_filters_no_matches(sample_master_config, temp_config_files): + """Test filtered sweep with no matching configs.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["nonexistent"] + precision = None + framework = None + runner_type = None + seq_lens = None + step_size = 2 + test_mode = False + runner_config = runner_file + + with pytest.raises(ValueError, match="No configs found matching filters"): + generate_full_sweep(Args(), sample_master_config) + + +def test_generate_full_sweep_with_filters_concurrency_overshoot(temp_config_files): + """Test filtered sweep when concurrency step overshoots end value.""" + _, runner_file = temp_config_files + + config = { + "test-fp8-vllm": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [ + {"tp": 4, "conc-start": 2, "conc-end": 7} # 2, 8 overshoots, clamps to 7 + ] + } + ] + } + } + + class Args: + model_prefix = None + precision = None + framework = None + runner_type = None + seq_lens = None + step_size = 4 # Will overshoot: 2, 8 (clamped to 7) + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), config) + conc_values = sorted(set(e['conc'] for e in result)) + # Should have 2, 7 (7 is the clamped value) + assert 2 in conc_values + assert 7 in conc_values + + +# Tests for generate_test_config +def test_generate_test_config_basic(sample_master_config, temp_config_files): + """Test basic test config generation.""" + _, runner_file = temp_config_files + + class Args: + key = "70b-fp8-vllm" + runner_config = runner_file + runner_node = "h200-nv_1" + seq_lens = None + step_size = 2 + test_mode = False + + result = generate_test_config(Args(), sample_master_config) + assert len(result) > 0 + + +def test_generate_test_config_test_mode(sample_master_config, temp_config_files): + """Test test config in test mode.""" + _, runner_file = temp_config_files + + class Args: + key = "70b-fp8-vllm" + runner_config = runner_file + runner_node = "h200-nv_1" + seq_lens = ["1k1k"] + step_size = 2 + test_mode = True + + result = generate_test_config(Args(), sample_master_config) + # In test mode, should only use lowest concurrency + assert all(entry['conc'] == 1 or entry['conc'] == 2 for entry in result) + + +def test_generate_test_config_specific_runner_node(sample_master_config, temp_config_files): + """Test test config with specific runner node.""" + _, runner_file = temp_config_files + + class Args: + key = "70b-fp8-vllm" + runner_config = runner_file + runner_node = "h200-nv_1" + seq_lens = None + step_size = 2 + test_mode = False + + result = generate_test_config(Args(), sample_master_config) + assert all(entry['runner'] == 'h200-nv_1' for entry in result) + + +def test_generate_test_config_invalid_key(sample_master_config, temp_config_files): + """Test test config with invalid key.""" + _, runner_file = temp_config_files + + class Args: + key = "nonexistent-key" + runner_config = runner_file + runner_node = None + seq_lens = None + step_size = 2 + test_mode = False + + with pytest.raises(ValueError, match="does not exist in config files"): + generate_test_config(Args(), sample_master_config) + + +def test_generate_test_config_invalid_runner_node(sample_master_config, temp_config_files): + """Test test config with invalid runner node.""" + _, runner_file = temp_config_files + + class Args: + key = "70b-fp8-vllm" + runner_config = runner_file + runner_node = "invalid-node" + seq_lens = None + step_size = 2 + test_mode = False + + with pytest.raises(ValueError, match="is not compatible"): + generate_test_config(Args(), sample_master_config) + + +def test_generate_test_config_missing_runner_config(sample_master_config): + """Test test config with missing runner config file.""" + class Args: + key = "70b-fp8-vllm" + runner_config = "/nonexistent/file.yaml" + runner_node = None + seq_lens = None + step_size = 2 + test_mode = False + + with pytest.raises(ValueError, match="does not exist"): + generate_test_config(Args(), sample_master_config) + + +def test_generate_test_config_concurrency_overshoot(temp_config_files): + """Test test config when concurrency step overshoots end value.""" + _, runner_file = temp_config_files + + config = { + "test-fp8-vllm": { + "image": "test:latest", + "model": "test/model", + "model-prefix": "test", + "precision": "fp8", + "framework": "vllm", + "runner": "h200", + "seq-len-configs": [ + { + "isl": 1024, + "osl": 1024, + "search-space": [ + {"tp": 4, "conc-start": 1, "conc-end": 6} + ] + } + ] + } + } + + class Args: + key = "test-fp8-vllm" + runner_config = runner_file + runner_node = "h200-nv_1" + seq_lens = None + step_size = 4 # Will overshoot: 1, 4, 16 (clamped to 6) + test_mode = False + + result = generate_test_config(Args(), config) + conc_values = sorted(set(e['conc'] for e in result)) + assert 1 in conc_values + assert 4 in conc_values + assert 6 in conc_values + + +# Tests for generate_runner_model_sweep_config +def test_generate_runner_model_sweep_config(sample_master_config, temp_config_files): + """Test runner-model sweep config generation.""" + _, runner_file = temp_config_files + + class Args: + runner_type = "h200" + runner_config = runner_file + + result = generate_runner_model_sweep_config(Args(), sample_master_config) + assert len(result) > 0 + # Should have entries for each runner node under h200 + runners = set(entry['runner'] for entry in result) + assert 'h200-nv_1' in runners + assert 'h200-nv_2' in runners + + +def test_generate_runner_model_sweep_config_invalid_runner(sample_master_config, temp_config_files): + """Test runner-model sweep with invalid runner type.""" + _, runner_file = temp_config_files + + class Args: + runner_type = "invalid-runner" + runner_config = runner_file + + with pytest.raises(ValueError, match="does not exist in runner config"): + generate_runner_model_sweep_config(Args(), sample_master_config) + + +# Tests for generate_runner_sweep_config +def test_generate_runner_sweep_config(sample_master_config, temp_config_files): + """Test runner sweep config generation.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = "70b" + runner_type = "h200" + precision = None + framework = None + runner_config = runner_file + + result = generate_runner_sweep_config(Args(), sample_master_config) + assert len(result) > 0 + + +def test_generate_runner_sweep_config_with_filters(sample_master_config, temp_config_files): + """Test runner sweep with precision and framework filters.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = "70b" + runner_type = "h200" + precision = "fp8" + framework = "vllm" + runner_config = runner_file + + result = generate_runner_sweep_config(Args(), sample_master_config) + assert all(entry['precision'] == 'fp8' for entry in result) + assert all(entry['framework'] == 'vllm' for entry in result) + + +def test_generate_runner_sweep_config_no_matches(sample_master_config, temp_config_files): + """Test runner sweep with no matching configs.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = "nonexistent" + runner_type = "h200" + precision = None + framework = None + runner_config = runner_file + + with pytest.raises(ValueError, match="No configs found matching"): + generate_runner_sweep_config(Args(), sample_master_config) + + +# Tests for generate_custom_test +def test_generate_custom_test(temp_config_files): + """Test custom test generation.""" + _, runner_file = temp_config_files + + class Args: + runner_label = "h200" + image = "vllm/vllm-openai:latest" + model = "test/model" + framework = "vllm" + precision = "fp8" + exp_name = "custom_test" + runner_config = runner_file + + result = generate_custom_test(Args()) + assert len(result) == 1 + assert result[0]['image'] == "vllm/vllm-openai:latest" + assert result[0]['exp-name'] == "custom_test" + + +def test_generate_custom_test_invalid_runner(temp_config_files): + """Test custom test with invalid runner label.""" + _, runner_file = temp_config_files + + class Args: + runner_label = "invalid-runner" + image = "vllm/vllm-openai:latest" + model = "test/model" + framework = "vllm" + precision = "fp8" + exp_name = "custom_test" + runner_config = runner_file + + with pytest.raises(ValueError, match="Unable to find specified runner label"): + generate_custom_test(Args()) + + +# Tests for main function +def test_main_full_sweep(temp_config_files): + """Test main function with full-sweep command.""" + master_file, _ = temp_config_files + + test_args = [ + "generate_sweep_configs.py", + "full-sweep", + "--config-files", master_file, + "--seq-lens", "1k1k", + "--model-prefix", "70b", + "--step-size", "2" + ] + + with patch('sys.argv', test_args): + result = main() + assert len(result) > 0 + + +def test_main_full_sweep_with_filters(temp_config_files): + """Test main function with full-sweep command with filters.""" + master_file, runner_file = temp_config_files + + test_args = [ + "generate_sweep_configs.py", + "full-sweep", + "--config-files", master_file, + "--runner-config", runner_file, + "--model-prefix", "70b", + "--precision", "fp8", + "--test-mode" + ] + + with patch('sys.argv', test_args): + result = main() + assert len(result) > 0 + + +def test_main_test_config(temp_config_files): + """Test main function with test-config command.""" + master_file, runner_file = temp_config_files + + test_args = [ + "generate_sweep_configs.py", + "test-config", + "--config-files", master_file, + "--runner-config", runner_file, + "--key", "70b-fp8-vllm", + "--runner-node", "h200-nv_1", + "--test-mode" + ] + + with patch('sys.argv', test_args): + result = main() + assert len(result) > 0 + + +def test_main_runner_model_sweep(temp_config_files): + """Test main function with runner-model-sweep command.""" + master_file, runner_file = temp_config_files + + test_args = [ + "generate_sweep_configs.py", + "runner-model-sweep", + "--config-files", master_file, + "--runner-config", runner_file, + "--runner-type", "h200" + ] + + with patch('sys.argv', test_args): + result = main() + assert len(result) > 0 + + +def test_main_runner_sweep(temp_config_files): + """Test main function with runner-sweep command.""" + master_file, runner_file = temp_config_files + + test_args = [ + "generate_sweep_configs.py", + "runner-sweep", + "--config-files", master_file, + "--runner-config", runner_file, + "--runner-type", "h200", + "--model-prefix", "70b" + ] + + with patch('sys.argv', test_args): + result = main() + assert len(result) > 0 + + +def test_main_custom(temp_config_files): + """Test main function with custom command.""" + master_file, runner_file = temp_config_files + + test_args = [ + "generate_sweep_configs.py", + "custom", + "--config-files", master_file, + "--runner-config", runner_file, + "--runner-label", "h200", + "--image", "test:latest", + "--model", "test/model", + "--framework", "vllm", + "--precision", "fp8", + "--exp-name", "custom_test" + ] + + with patch('sys.argv', test_args): + result = main() + assert len(result) == 1 + + +def test_main_invalid_config_structure(tmp_path): + """Test main with invalid config structure.""" + invalid_file = tmp_path / "invalid.yaml" + with open(invalid_file, 'w') as f: + yaml.dump({"key": {"image": "test"}}, f) # Missing required fields + + test_args = [ + "generate_sweep_configs.py", + "full-sweep", + "--config-files", str(invalid_file), + "--seq-lens", "1k1k", + "--model-prefix", "test" + ] + + with patch('sys.argv', test_args): + with pytest.raises(ValueError): + main() + + +def test_main_validation_failure(temp_config_files, monkeypatch): + """Test main with validation failure on output.""" + master_file, _ = temp_config_files + + # Monkey patch validate_matrix_output to always fail + def mock_validate(entries): + raise ValueError("Validation failed") + + monkeypatch.setattr('generate_sweep_configs.validate_matrix_output', mock_validate) + + test_args = [ + "generate_sweep_configs.py", + "full-sweep", + "--config-files", master_file, + "--seq-lens", "1k1k", + "--model-prefix", "70b" + ] + + with patch('sys.argv', test_args): + with pytest.raises(ValueError, match="Validation failed"): + main() + + +# Edge case tests +def test_concurrency_step_reaches_exact_end(sample_master_config, temp_config_files): + """Test that concurrency stepping reaches exact end value.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["8b"] + seq_lens = ["1k1k"] + step_size = 2 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + # conc-start=4, conc-end=16, step=2 should give 4,8,16 + conc_values = sorted(set(e['conc'] for e in result)) + assert 16 in conc_values + + +def test_multiple_model_prefixes_filtered_sweep(sample_master_config, temp_config_files): + """Test filtered sweep with multiple model prefixes.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b", "8b"] + precision = None + framework = None + runner_type = None + seq_lens = ["1k1k"] + step_size = 2 + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + exp_names = [e['exp-name'] for e in result] + assert any('70b' in name for name in exp_names) + assert any('8b' in name for name in exp_names) + + +def test_seq_len_filter_multiple(sample_master_config, temp_config_files): + """Test filtering with multiple sequence lengths.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b"] + precision = None + framework = None + runner_type = None + seq_lens = ["1k1k", "1k8k"] + step_size = 2 + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + seq_lens = set((e['isl'], e['osl']) for e in result) + assert (1024, 1024) in seq_lens + assert (1024, 8192) in seq_lens + + +def test_default_ep_dp_attn_values(sample_master_config, temp_config_files): + """Test that default ep and dp-attn values are set correctly.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["8b"] + seq_lens = ["1k1k"] + step_size = 2 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + # 8b config doesn't specify ep/dp-attn, so should use defaults + assert all(e['ep'] == 1 for e in result) + assert all(e['dp-attn'] == False for e in result) + + +def test_max_model_len_calculation(sample_master_config, temp_config_files): + """Test that max-model-len is calculated correctly.""" + _, runner_file = temp_config_files + + class Args: + model_prefix = ["70b"] + seq_lens = ["1k8k"] + step_size = 2 + precision = None + framework = None + runner_type = None + test_mode = False + runner_config = runner_file + + result = generate_full_sweep(Args(), sample_master_config) + # isl=1024, osl=8192, so max-model-len should be 1024+8192+200=9416 + assert all(e['max-model-len'] == 9416 for e in result) + + +if __name__ == "__main__": + pytest.main([__file__, "-v", "--cov=generate_sweep_configs", "--cov-report=term-missing"]) diff --git a/utils/process_result.py b/utils/process_result.py index aaf8ac0d2..94ca30f24 100644 --- a/utils/process_result.py +++ b/utils/process_result.py @@ -1,35 +1,36 @@ import sys import json +import os from pathlib import Path -hw = sys.argv[1] -tp_size = int(sys.argv[2]) -result_filename = sys.argv[3] -framework = sys.argv[4] -precision = sys.argv[5] +hw = os.environ.get('RUNNER_TYPE') +tp_size = int(os.environ.get('TP')) +ep_size = int(os.environ.get('EP_SIZE')) +dp_attention = os.environ.get('DP_ATTENTION') +result_filename = os.environ.get('RESULT_FILENAME') +framework = os.environ.get('FRAMEWORK') +precision = os.environ.get('PRECISION') +mtp_mode = os.environ.get('MTP_MODE') with open(f'{result_filename}.json') as f: bmk_result = json.load(f) -tput_per_gpu = float(bmk_result['total_token_throughput']) / tp_size -output_tput_per_gpu = float(bmk_result['output_throughput']) / tp_size -input_tput_per_gpu = tput_per_gpu - output_tput_per_gpu - data = { 'hw': hw, 'tp': tp_size, + 'ep': ep_size, + 'dp_attention': dp_attention, # true or false 'conc': int(bmk_result['max_concurrency']), 'model': bmk_result['model_id'], 'framework': framework, 'precision': precision, - 'tput_per_gpu': tput_per_gpu, - 'output_tput_per_gpu': output_tput_per_gpu, - 'input_tput_per_gpu': input_tput_per_gpu + 'tput_per_gpu': float(bmk_result['total_token_throughput']) / tp_size, + 'output_tput_per_gpu': float(bmk_result['output_throughput']) / tp_size } -if len(sys.argv) == 7: # MTP - data['mtp'] = sys.argv[6] +if mtp_mode: # MTP + data['mtp'] = mtp_mode for key, value in bmk_result.items(): if key.endswith('ms'): diff --git a/utils/summarize.py b/utils/summarize.py index 1f78caf9c..6d926255e 100644 --- a/utils/summarize.py +++ b/utils/summarize.py @@ -9,11 +9,11 @@ with open(result_path) as f: result = json.load(f) results.append(result) -results.sort(key=lambda r: (r['hw'], r.get('framework', 'vllm'), r.get('precision', 'fp8'), r['tp'], r['conc'])) +results.sort(key=lambda r: (r['hw'], r.get('framework', 'vllm'), r.get('precision', 'fp8'), r['tp'], r['ep'], r['conc'])) summary_header = f'''\ -| Hardware | Framework | Precision | TP | Conc | TTFT (ms) | TPOT (ms) | E2EL (s) | TPUT per GPU | -| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |\ +| Hardware | Framework | Precision | TP | EP | DP Attention | Conc | TTFT (ms) | TPOT (ms) | E2EL (s) | TPUT per GPU | +| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |\ ''' print(summary_header) @@ -25,6 +25,8 @@ f"| {framework.upper()} " f"| {precision.upper()} " f"| {result['tp']} " + f"| {result['ep']} " + f"| {result['dp_attention']} " f"| {result['conc']} " f"| {(result['median_ttft'] * 1000):.4f} " f"| {(result['median_tpot'] * 1000):.4f} "