-
Notifications
You must be signed in to change notification settings - Fork 155
feat: performance changelog triggered runs (as opposed to nightly) #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
433f2ef
add logic for event driven runs
cquil11 dd4682b
testing pt 1
cquil11 7d6e052
raise error if yaml diff in perf changelog is not valid
cquil11 ce49098
remove unused imports in process_changelog.py
cquil11 e6f6fe9
config data key fix
cquil11 b87eedd
raise error if test-config subprocess fails to run
cquil11 ba0b115
backfill changelog
cquil11 747bc2d
backfill changelog pt 2
cquil11 ca24b8e
backfill changelog pt 3
cquil11 954ebd6
backfill changelog pt 4
cquil11 ee346b3
backfill changelog pt 5
cquil11 ab6f948
backfill changelog pt 6
cquil11 27074d2
add always() condition to upload changelog metadata
cquil11 763b394
backfill changelog pt 7 (test)
cquil11 d0b2de7
backfill changelog pt 8 (revert test)
cquil11 41341ad
backfill changelog pt 9
cquil11 f131962
backfill changelog pt 11
cquil11 dfeba21
change if condition for jobs in run sweep workflow
cquil11 fd07f40
debugging run sweep workflow
cquil11 228e0a2
debugging run sweep workflow pt 2
cquil11 cb2cc8a
debugging run sweep workflow pt 3 (revert)
cquil11 055b324
debugging run sweep workflow pt 4
cquil11 ae65551
debugging run sweep workflow pt 5
cquil11 667d2e1
debugging run sweep workflow pt 6
cquil11 ef3ba6b
debugging run sweep workflow pt 7
cquil11 fae8278
add always() condition to upload changelog metadata (add back, this g…
cquil11 2018ad3
add bmk prefix to results
cquil11 5e0c779
backfill changelog official
cquil11 8d8ffa1
for concurrency group, use more unique sha
cquil11 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,235 @@ | ||
| name: "Run Sweep" | ||
| run-name: Run Sweep - ${{ github.event.pull_request.title || github.ref_name }} | ||
|
|
||
| concurrency: | ||
| group: sweep-${{ github.event.pull_request.number || github.sha }} | ||
| cancel-in-progress: true | ||
|
|
||
| on: | ||
| push: | ||
| branches: | ||
| - main | ||
| paths: | ||
| - "perf-changelog.yaml" | ||
| pull_request: | ||
| branches: | ||
| - main | ||
| types: | ||
| - ready_for_review | ||
| - synchronize | ||
| - labeled | ||
| paths: | ||
| - "perf-changelog.yaml" | ||
|
|
||
| jobs: | ||
| setup: | ||
| runs-on: ubuntu-latest | ||
| if: >- | ||
| (github.event_name == 'pull_request' && !github.event.pull_request.draft && contains(github.event.pull_request.labels.*.name, 'sweep-enabled')) || | ||
| (github.event_name != 'pull_request' && !contains(github.event.head_commit.message, '[skip-sweep]')) | ||
| outputs: | ||
| search-space-config: ${{ steps.setup.outputs.search-space-config }} | ||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - id: setup | ||
| run: | | ||
| pip install pydantic | ||
|
|
||
| if [ "${{ github.event_name }}" == "pull_request" ]; then | ||
| BASE_REF="origin/${{ github.base_ref }}" | ||
| HEAD_REF="${{ github.event.pull_request.head.sha }}" | ||
| else | ||
| BASE_REF="${{ github.event.before }}" | ||
| HEAD_REF="${{ github.event.after }}" | ||
| fi | ||
|
|
||
| CONFIG_JSON=$(python3 ${GITHUB_WORKSPACE}/utils/process_changelog.py \ | ||
| --changelog-file ${GITHUB_WORKSPACE}/perf-changelog.yaml \ | ||
| --base-ref "$BASE_REF" \ | ||
| --head-ref "$HEAD_REF") | ||
|
|
||
| echo "search-space-config=$CONFIG_JSON" >> $GITHUB_OUTPUT | ||
|
|
||
| sweep-multi-node-1k1k: | ||
| needs: setup | ||
| if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).multi_node['1k1k']) != 'null' }} | ||
| uses: ./.github/workflows/benchmark-multinode-tmpl.yml | ||
| name: multi-node 1k1k / | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| config: ${{ fromJson(needs.setup.outputs.search-space-config).multi_node['1k1k'] }} | ||
| secrets: inherit | ||
| with: &multi-node-inputs | ||
| isl: ${{ matrix.config.isl }} | ||
| osl: ${{ matrix.config.osl }} | ||
| max-model-len: ${{ matrix.config.max-model-len }} | ||
| runner: ${{ matrix.config.runner }} | ||
| image: ${{ matrix.config.image }} | ||
| model: ${{ matrix.config.model }} | ||
| model-prefix: ${{ matrix.config.model-prefix }} | ||
| framework: ${{ matrix.config.framework }} | ||
| precision: ${{ matrix.config.precision }} | ||
| exp-name: ${{ matrix.config.exp-name }} | ||
| conc-list: ${{ toJson(matrix.config.conc) }} | ||
| spec-decoding: ${{ matrix.config.spec-decoding }} | ||
| disagg: ${{ matrix.config.disagg }} | ||
|
|
||
| prefill-num-worker: ${{ matrix.config.prefill.num-worker }} | ||
| prefill-tp: ${{ matrix.config.prefill.tp }} | ||
| prefill-ep: ${{ matrix.config.prefill.ep }} | ||
| prefill-dp-attn: ${{ matrix.config.prefill.dp-attn }} | ||
| prefill-additional-settings: ${{ toJson(matrix.config.prefill.additional-settings) }} | ||
|
|
||
| decode-num-worker: ${{ matrix.config.decode.num-worker }} | ||
| decode-tp: ${{ matrix.config.decode.tp }} | ||
| decode-ep: ${{ matrix.config.decode.ep }} | ||
| decode-dp-attn: ${{ matrix.config.decode.dp-attn }} | ||
| decode-additional-settings: ${{ toJson(matrix.config.decode.additional-settings) }} | ||
|
|
||
| sweep-multi-node-1k8k: | ||
| needs: setup | ||
| if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).multi_node['1k8k']) != 'null' }} | ||
| uses: ./.github/workflows/benchmark-multinode-tmpl.yml | ||
| name: multi-node 1k8k / | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| config: ${{ fromJson(needs.setup.outputs.search-space-config).multi_node['1k8k'] }} | ||
| secrets: inherit | ||
| with: *multi-node-inputs | ||
|
|
||
| sweep-multi-node-8k1k: | ||
| needs: setup | ||
| if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).multi_node['8k1k']) != 'null' }} | ||
| uses: ./.github/workflows/benchmark-multinode-tmpl.yml | ||
| name: multi-node 8k1k / | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| config: ${{ fromJson(needs.setup.outputs.search-space-config).multi_node['8k1k'] }} | ||
| secrets: inherit | ||
| with: *multi-node-inputs | ||
|
|
||
| sweep-single-node-1k1k: | ||
| needs: setup | ||
| if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).single_node['1k1k']) != 'null' }} | ||
| uses: ./.github/workflows/benchmark-tmpl.yml | ||
| name: single-node 1k1k / | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| config: ${{ fromJson(needs.setup.outputs.search-space-config).single_node['1k1k'] }} | ||
| secrets: inherit | ||
| with: &single-node-inputs | ||
| exp-name: ${{ matrix.config.exp-name }} | ||
| isl: ${{ matrix.config.isl }} | ||
| osl: ${{ matrix.config.osl }} | ||
| max-model-len: ${{ matrix.config.max-model-len }} | ||
| runner: ${{ matrix.config.runner }} | ||
| image: ${{ matrix.config.image }} | ||
| model: ${{ matrix.config.model }} | ||
| model-prefix: ${{ matrix.config.model-prefix }} | ||
| framework: ${{ matrix.config.framework }} | ||
| precision: ${{ matrix.config.precision }} | ||
| tp: ${{ matrix.config.tp }} | ||
| ep: ${{ matrix.config.ep }} | ||
| dp-attn: ${{ matrix.config.dp-attn }} | ||
| conc: ${{ matrix.config.conc }} | ||
| spec-decoding: ${{ matrix.config.spec-decoding }} | ||
| disagg: ${{ matrix.config.disagg }} | ||
|
|
||
| sweep-single-node-1k8k: | ||
| needs: setup | ||
| if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).single_node['1k8k']) != 'null' }} | ||
| uses: ./.github/workflows/benchmark-tmpl.yml | ||
| name: single-node 1k8k / | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| config: ${{ fromJson(needs.setup.outputs.search-space-config).single_node['1k8k'] }} | ||
| secrets: inherit | ||
| with: *single-node-inputs | ||
|
|
||
| sweep-single-node-8k1k: | ||
| needs: setup | ||
| if: ${{ toJson(fromJson(needs.setup.outputs.search-space-config).single_node['8k1k']) != 'null' }} | ||
| uses: ./.github/workflows/benchmark-tmpl.yml | ||
| name: single-node 8k1k / | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| config: ${{ fromJson(needs.setup.outputs.search-space-config).single_node['8k1k'] }} | ||
| secrets: inherit | ||
| with: *single-node-inputs | ||
|
|
||
| collect-results: | ||
| needs: | ||
| [ | ||
| sweep-single-node-1k1k, | ||
| sweep-single-node-1k8k, | ||
| sweep-single-node-8k1k, | ||
| sweep-multi-node-1k1k, | ||
| sweep-multi-node-1k8k, | ||
| sweep-multi-node-8k1k, | ||
| setup, | ||
| ] | ||
| if: ${{ always() && needs.setup.result != 'skipped' }} | ||
| uses: ./.github/workflows/collect-results.yml | ||
| secrets: inherit | ||
| with: | ||
| result-prefix: "bmk" | ||
|
|
||
| upload-changelog-metadata: | ||
| needs: [setup, collect-results] | ||
| if: ${{ always() && needs.setup.result != 'skipped' }} | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Extract and save changelog metadata | ||
| env: | ||
| CONFIG_JSON: ${{ needs.setup.outputs.search-space-config }} | ||
| run: | | ||
| echo "$CONFIG_JSON" | jq '.changelog_metadata' > changelog_metadata.json | ||
|
|
||
| - name: Upload changelog artifact | ||
| uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 | ||
| with: | ||
| name: changelog-metadata | ||
| path: changelog_metadata.json | ||
|
|
||
| calc-success-rate: | ||
| needs: collect-results | ||
| if: ${{ always() && needs.collect-results.result != 'skipped'}} | ||
| runs-on: ubuntu-latest | ||
|
|
||
| env: | ||
| RESULTS_DIR: "results/" | ||
| STATS_FILENAME: "run_stats" | ||
| GITHUB_TOKEN: ${{ secrets.REPO_PAT }} | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0 | ||
| with: | ||
| token: ${{ secrets.REPO_PAT }} | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Download results artifacts | ||
| uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0 | ||
| with: | ||
| path: ${{ env.RESULTS_DIR }} | ||
| pattern: results_* | ||
|
|
||
| - name: Install python dependencies | ||
| run: pip install PyGithub | ||
|
|
||
| - name: Calculate success rate | ||
| run: python3 utils/calc_success_rate.py $STATS_FILENAME | ||
|
|
||
| - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 | ||
| with: | ||
| name: "run-stats" | ||
| path: ${{ env.STATS_FILENAME }}.json | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| - config-keys: | ||
| - 70b-fp8-*-vllm | ||
| description: | | ||
| - Add compilation-config: '{"custom_ops": ["-rms_norm", "-quant_fp8", "-silu_and_mul"]}' as | ||
| extra config to all benchmarks/70b_fp8_mi*.sh scripts | ||
| - 6-7% uplift for llama for 6/8 configs | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/95 | ||
| - config-keys: | ||
| - gptoss-fp4-*-trt | ||
| description: | | ||
| - Upgrade GPT-OSS TRT images from 'release:1.1.0rc2.post2' to '1.2.0rc0.post1' | ||
| - Add NCCL_GRAPH_REGISTER=0 to benchmarks/gptoss_fp4_b200_trt_slurm.sh | ||
| - Change kv_cache_config.dtype from 'auto' to 'fp8' in benchmarks/gptoss_fp4_b200_trt_slurm.sh | ||
| - Remove MOE_BACKEND=CUTLASS, now just defaults to TRTLLM | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/110 | ||
| - config-keys: | ||
| - gptoss* | ||
| - dsr1* | ||
| description: | | ||
| - Remove Llama 70B runs to make room for multi-node disagg prefill+wideEP on | ||
| h100/h200/b200/mi300/mi325/mi355 | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/149 | ||
| - config-keys: | ||
| - gptoss-fp4-b200-vllm | ||
| - gptoss-fp4-h100-vllm | ||
| - gptoss-fp4-h200-vllm | ||
| description: | | ||
| - Upgrade vLLM from 0.10.2 to 0.11.0 for GPT-OSS NVIDIA single-node configs | ||
| - Adds compilation-config: '{"cudagraph_mode":"PIECEWISE"} accordingly since vLLM 0.11.0 | ||
| requires now defaults to FULL_AND_PIECEWISE | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/159 | ||
| - config-keys: | ||
| - dsr1* | ||
| description: | | ||
| - Fixes bug where 1k8k and 8k1k full sweeps had incorrect max-model-len for DeepSeek | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/163 | ||
| - config-keys: | ||
| - dsr1-fp4-b200-sglang | ||
| - dsr1-fp8-b200-sglang | ||
| - dsr1-fp8-h200-sglang | ||
| description: | | ||
| - Consolidates H200 and B200 SGLang configurations to use unified v0.5.5-cu129-amd64 | ||
| image tag and updates deprecated SGLang server arguments to their current equivalents. | ||
| - --enable-flashinfer-trtllm-moe & --enable-ep-moe is no longer available in sglang so we needed to change it | ||
| - ep: 4 for all tp: 4 entries (3 occurrences in dsr1-fp4-b200-sglang) | ||
| - ep: 8 for all tp: 8 entries (6 occurrences across dsr1-fp4-b200-sglang and dsr1-fp8-b200-sglang) | ||
| - dsr1_fp4_b200_docker.sh: Replaced --enable-ep-moe with --ep-size $EP_SIZE and --enable-flashinfer-trtllm-moe with | ||
| --moe-runner-backend flashinfer_trtllm | ||
| - dsr1_fp8_b200_docker.sh: Replaced --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm and | ||
| added --ep-size $EP_SIZE | ||
| - launch_b200-nvd.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container | ||
| - launch_b200-tg.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/204 | ||
| - config-keys: | ||
| - gptoss-fp4-mi355x-vllm | ||
| - gptoss-fp4-b200-vllm | ||
| description: | | ||
| - Extend concurrency to 128 for gptoss mi355x/b200 vllm configurations | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/209 | ||
| - config-keys: | ||
| - gptoss-fp4-b200-trt | ||
| description: | | ||
| - Extend concurrency to 128 for gptoss b200 TRT configurations | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/233 | ||
| - config-keys: | ||
| - "*gb200-sglang" | ||
| description: | | ||
| - Introducing some improvements in GB200 SGLang DSR1 submission | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/257 | ||
| - config-keys: | ||
| - dsr1-fp8-h200-trt | ||
| description: | | ||
| - Update TRT image from nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1 to nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc2 | ||
| - Increase concurrency for some configurations | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/266 | ||
| - config-keys: | ||
| - gptoss-fp4-b200-vllm | ||
| - gptoss-fp4-h100-vllm | ||
| - gptoss-fp4-h200-vllm | ||
| description: | | ||
| - Update vLLM image for NVIDIA configs from vLLM 0.11.0 to vLLM 0.11.2 | ||
| - Adds kv-cache-dtype: fp8 to benchmarks/gptoss_fp4_b200_docker.sh | ||
| PR: https://github.com/InferenceMAX/InferenceMAX/pull/273 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.