-
Notifications
You must be signed in to change notification settings - Fork 155
Adding evals after throughput benchmarks #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
232 commits
Select commit
Hold shift + click to select a range
cd9cb64
initial poc
cquil11 00ac64a
remove -d flag when launching docker container
cquil11 e38b38a
syntax error
cquil11 66eae81
compatibility fixes
cquil11 fdec241
add correct endpoint prefix
cquil11 08de857
remove reference env var
cquil11 06231ee
run vllm serve in background
cquil11 21ed067
unescape sequences
cquil11 65ef1f0
stop vllm to stdout after it stops
cquil11 cb55721
stop vllm to stdout after it stops pt 2
cquil11 788b7f1
get rid of docker stop as no longer in detatched
cquil11 a87e174
clone bench serving to tmp dir
cquil11 c1d0a79
clone bench serving to tmp dir pt 2
cquil11 4823afa
add explanatory comment
cquil11 d52299f
cleaning up
cquil11 85de6e7
cleaning up
cquil11 48f7588
adding mi355x refactor
cquil11 faec31e
adding h200 initial refactor
cquil11 1ef1b23
different way to see server logs
cquil11 75523ee
cleanup
cquil11 2536652
now fail if server fails
cquil11 2d58f0d
starting on b200
cquil11 f5cf4a7
doign b200
cquil11 92af70b
reverting erroneous change
cquil11 f330d67
fixing b200
cquil11 c5fcf81
fixing b200 pt 2
cquil11 3ededf0
updating mi300
cquil11 813381b
updating mi300 pt 2
cquil11 e1b387c
updating mi300 pt 3 -- remove detached mode
cquil11 c0a5c62
cleaning up mi355x
cquil11 634768c
fixing mi300x and updating 325x
cquil11 61a5c8f
reverting max conc to 512 on gptoss fp4 b200 docker
cquil11 74363e4
mi325x debug
cquil11 220e026
add back correct launch script for new mi325x slurm cluster (#231)
cquil11 5db1af8
fixing mi300x and updating 325x
cquil11 9806d30
Merge branch 'main' into refactor-docker-runner-launch
cquil11 b4eb57e
cleanng up
cquil11 04e30f3
add wait for h200 slurm dsr1
cquil11 d36965a
max num seqs back to 512 for gptoss fpr b200 docker
cquil11 fa7cbca
fix port issue for dsr1 mi300x docker
cquil11 1031ac9
fix mi355x docker NUM_PROMPTS
cquil11 8b847f1
adding prop of failure for server logs
cquil11 832bafc
add utils function for benchmark
cquil11 ebe3b62
add utils function for benchmark
cquil11 aa9070f
function-ize the waiting for server to start
cquil11 0d2c112
dont show arg parsing set -x
cquil11 271091d
dont show arg parsing set +x oops
cquil11 898b132
dont show arg parsing set +x oops
cquil11 fd2e33e
capture server pid
cquil11 2a4faf5
Squash-merge bryan/eval into refactor-docker-runner-launch
Oseltamivir 173d7bf
evals h100-cr
Oseltamivir 4ff8a9b
evals h100-cw
Oseltamivir 83901e7
evals h200-nb
Oseltamivir 6c65a24
move eval script here
Oseltamivir 343d24e
evals mi300x-amd
Oseltamivir 2de4a18
evals mi325x-amd
Oseltamivir 21825ce
evals mi300x-tw
Oseltamivir 00bfa34
evals mi300x-oci
Oseltamivir e8aa07e
evals mi325x-tw
Oseltamivir bf4eff2
evals mi325x-tw summary
Oseltamivir 71008bb
evals mi325x-tw summary
Oseltamivir 7f3cd09
evals mi355x-amd
Oseltamivir dfff2f4
evals mi325x-tw summary
Oseltamivir 9a11152
evals mi325x-tw summary
Oseltamivir 1ead695
evals mi325x-tw summary
Oseltamivir 348d5d9
all summary
Oseltamivir 679caa6
evals b200-nvd
Oseltamivir eda5e2f
evals b200-nvd 2
Oseltamivir 42151cc
evals b200-nvd 3
Oseltamivir 512dfc0
evals h100-cr
Oseltamivir 4de631d
evals b200-nvd 1
Oseltamivir b33cb80
evals h200-trt-cw
Oseltamivir 5babdb0
evals h200-trt-cw 2
Oseltamivir 12a85b8
evals h200-trt-cw 3
Oseltamivir eb2846f
evals h100-cr 2
Oseltamivir 4166070
evals h200-trt-cw 4
Oseltamivir 5f6b772
evals h200-trt-cw 5 (EP/TP HARD)
Oseltamivir 30baa1f
evals h200-trt-cw 6 (EP/TP HARD)
Oseltamivir 5a209fd
evals h200-trt-cw 6 (EP/TP HARD)
Oseltamivir 89a9cbd
evals h200-cw dsr1
Oseltamivir 9254ef1
evals mi300x-cr dsr1
Oseltamivir 6705ea3
evals mi300x-cr dsr1 2
Oseltamivir c1fc6db
evals mi325x-cr dsr1
Oseltamivir 090630a
evals mi325x-cr dsr1 2
Oseltamivir d984d7a
evals mi355x-amd dsr1
Oseltamivir fb66e33
evals mi355x-amd dsr1 2
Oseltamivir d0eb0c4
evals mi355x-amd dsr1 3
Oseltamivir c1dc1a6
evals mi355x-amd dsr1 4
Oseltamivir 88d3bf5
evals b200-nvd dsr1
Oseltamivir 8a0677d
evals b200-nvd fp8 dsr1
Oseltamivir dab1a2c
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir f862af7
Lighteval 1
Oseltamivir 5ef76ef
Lighteval 1.75
Oseltamivir 3081241
Lighteval Mi325x
Oseltamivir f182319
Lighteval Mi300x CR
Oseltamivir 5ba2cf2
Lighteval Mi355x amd
Oseltamivir 5bf69ab
Lighteval b200_nvd
Oseltamivir f862689
Lighteval h200_cr0
Oseltamivir c3df519
Lighteval h200-nb_1
Oseltamivir c1edb9a
Lighteval h100-cw_1
Oseltamivir d21826b
Error reproduction
Oseltamivir abdad78
Error file removal
Oseltamivir bd36530
error reproducibility
Oseltamivir a0434b1
should NOT error reproduce
Oseltamivir f56a311
should NOT error reproduce
Oseltamivir 27bd2de
should NOT error reproduce
Oseltamivir c058b16
should NOT error reproduce
Oseltamivir 2e36914
Double check other runner
Oseltamivir d2cf0fb
Cleanup MI300x_AMD
Oseltamivir 0a8901a
Cleanup MI300x_AMD
Oseltamivir afd304f
Cleanup MI300x_AMD
Oseltamivir ef2ee40
Cleanup MI300x_AMD MUST WORK
Oseltamivir 3790696
works
Oseltamivir 92f244c
Working lighteval
Oseltamivir 3e30425
lightevel fix
Oseltamivir 0d87ea5
lighteval test h100-cw_1
Oseltamivir 00b1623
lighteval test h100-cr_1 + parsing
Oseltamivir 83a71d2
lighteval test b200_nvd
Oseltamivir df71abe
lighteval test b200_nvd
Oseltamivir 4aa8d34
lighteval test mi300x-amd_0
Oseltamivir fe2ecd5
lighteval test h100-cw_1
Oseltamivir fef016a
lighteval test mi300x-cr_0
Oseltamivir 124eb70
lighteval test mi325x-tw_1
Oseltamivir 2b0b986
lighteval test mi355x-amd_4
Oseltamivir dae7345
lighteval test b200-nvd_3
Oseltamivir 993b19f
lighteval test h100-cw_1 sudo test
Oseltamivir f5b3a7a
b200 fix check
Oseltamivir ff1eba6
b200 fix check
Oseltamivir d6a52ec
b200 fix check
Oseltamivir 4dd7e21
b200 fix check
Oseltamivir 37bd3df
b200 fix check
Oseltamivir 43c7c59
b200 fix check
Oseltamivir e5a8e3a
b200 fix check
Oseltamivir 8fb95f4
b200 fix check
Oseltamivir 237b4e8
b200 fix check
Oseltamivir 79eadc5
Prelimary lighteval for all
Oseltamivir a2d77ff
Prelimary lighteval for all 2 - fixed TP
Oseltamivir 4e139a0
Prelimary lighteval for all 3
Oseltamivir 76b8c2c
Fix lighteval 1
Oseltamivir fda8e2c
Check both
Oseltamivir 2e7c127
lm-eval check
Oseltamivir 867bfc3
lm-eval check
Oseltamivir 8cbe81f
lm-eval check
Oseltamivir 1b3b79f
lm-eva
Oseltamivir 65f0303
mi325x test
Oseltamivir ddd3862
mi325x test
Oseltamivir 30ad3ba
all change, test deepseek
Oseltamivir 688e2c5
all change, test deepseek
Oseltamivir 6b320ce
retest mi325x
Oseltamivir 9768dea
test b200
Oseltamivir 4c339b4
clean b200
Oseltamivir efe94aa
test h200
Oseltamivir 705fc10
H200 test
Oseltamivir f79f243
B200-nvd2 sleep
Oseltamivir d9a4fed
B200-nvd2 sleep
Oseltamivir 8c6b944
B200-nvd2 sleep
Oseltamivir 28a026f
mi325x test
Oseltamivir c4bd3d2
mi325x test, no text, no empty fix
Oseltamivir 14068bc
h100, tmp eval_out
Oseltamivir af2c385
h100, tmp eval_out, sweep integration
Oseltamivir 5e1d68d
touch up sweep naming, remove funny triton error
Oseltamivir 1a3262f
touch up sweep summary
Oseltamivir 733d7ca
touch up run name
Oseltamivir 68c1a2d
Missing eval env var docker
Oseltamivir 6cb94a7
Typo
Oseltamivir bc472c3
Add proper coverage
Oseltamivir 837622f
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir 2461447
Add evals
Oseltamivir 848e834
Merge branch 'main' into evals-on-refactor
cquil11 710d428
Cam's solution
Oseltamivir 3c8b9bc
b200 scancel fix
Oseltamivir 1390c52
Change to 2 fewshot, forgot eval env var in b200
Oseltamivir 544e698
Resolve issues
Oseltamivir dd96fcf
Merge branch 'main' into evals-on-refactor
cquil11 5ec3378
Resolve issues/nits
Oseltamivir ae4e481
fix summary table hardware
Oseltamivir 48a220d
fix summary table hardware
Oseltamivir 61327ca
fix summary table hardware 2
Oseltamivir 1cf2967
final touches
Oseltamivir 34e3b2a
Merge branch 'main' into evals-on-refactor
cquil11 1d889b8
Cleanup comments, ammend lighteval
Oseltamivir 779a257
pt 1 manual merge conflict fixes
cquil11 00e77d0
Merge branch 'main' into evals-on-refactor
cquil11 9d4b217
pt 2 manual merge conflict fixes
cquil11 a9fad5b
use double quotes for gha parsing
cquil11 e07eb69
getting rid of full sweep sched changes
cquil11 9275f0d
add back spec decoding and disagg env vars
cquil11 dba25aa
add an option to ONLY run evals
cquil11 5de917b
remove full-sweep-test workflow and add collect-evals job to run swee…
cquil11 37d05d3
add run-eval to e2e tests
cquil11 6a546e5
math500 prompt and h200 trt evals
Oseltamivir d299d41
remove run prefix
cquil11 569d0c3
add result-prefix to benchmark tmpl uploaded artifacts
cquil11 30a3431
Evals summary refactor
Oseltamivir 22c8a2b
Evals summary refactor 2
Oseltamivir 8d12b35
Evals summary aesthetics
Oseltamivir d7a515a
TRT package fix, trt testing
Oseltamivir 25f71bd
trt testing 2
Oseltamivir ab6bf8f
max_num_tokens
Oseltamivir 0472555
Merge branch 'main' into evals-on-refactor
cquil11 0d8d7d1
Merge branch 'main' into evals-on-refactor
cquil11 9a873c4
unbounded gen len
Oseltamivir 999b9f6
Fix tmpl args, add isl/osl to table
Oseltamivir 9a13250
add isl/osl
Oseltamivir 4b0f8de
set max tokens
Oseltamivir a52f4c6
remove nvd
Oseltamivir 568e1d3
In case of multiple evals
Oseltamivir d55c796
diagnostic
Oseltamivir cdd2332
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir 0699df8
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir fcd14e2
test dp_attn
Oseltamivir c902545
DP_ATTENTION back
Oseltamivir 715269c
REMOVE LIGHTEVAL
Oseltamivir c19bb21
Merge branch 'main' into evals-on-refactor, address claude
Oseltamivir be431c8
Merge branch 'main' into evals-on-refactor
Oseltamivir 50f09cc
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir 500029b
Merge branch 'main' into evals-on-refactor
Oseltamivir a353ea4
Add evals for atom, trt_mtp
Oseltamivir d6d4055
remove tokenizer from benchmarkserving
Oseltamivir 338d80c
remove model_name
Oseltamivir e28631c
More evals for spec decode
Oseltamivir fa49cdc
claude pr comments
Oseltamivir 7e628ff
chore(deps): bump the github-actions group with 2 updates (#488)
dependabot[bot] 518d004
fix: update ep metadata in gb200 dynamo sglang configs to match comme…
functionstackx 388020f
Experimental folder (increasing researcher/developer velocity) (#489)
functionstackx ef15b99
summary table
Oseltamivir b5b9ec0
Merge branch 'main' into evals-on-refactor
Oseltamivir a1f9b89
Merge branch 'main' into evals-on-refactor
Oseltamivir 62079d6
Remove git installation and repository cloning
Oseltamivir 5409158
evals final
Oseltamivir 9ae0f90
more retries, lower conc, for stability
Oseltamivir 43fd4e8
Merge branch 'main' into evals-on-refactor
Oseltamivir File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| name: Template - Collect Evals | ||
|
cquil11 marked this conversation as resolved.
|
||
|
|
||
| on: | ||
| workflow_call: | ||
| inputs: | ||
| result-prefix: | ||
| required: false | ||
| type: string | ||
| default: '' | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| collect-evals: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0 | ||
| with: | ||
| token: ${{ secrets.REPO_PAT }} | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Download eval artifacts | ||
| uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0 | ||
| with: | ||
| path: eval_results/ | ||
| pattern: ${{ inputs.result-prefix && format('eval_{0}_*', inputs.result-prefix) || 'eval_*' }} | ||
|
Oseltamivir marked this conversation as resolved.
|
||
|
|
||
| - name: Summarize evals | ||
| run: | | ||
| pip install tabulate | ||
| echo "## Eval Summary" >> $GITHUB_STEP_SUMMARY | ||
| echo "" >> $GITHUB_STEP_SUMMARY | ||
| python3 utils/collect_eval_results.py eval_results/ ${{ inputs.result-prefix || 'all' }} >> $GITHUB_STEP_SUMMARY | ||
|
|
||
| - name: Upload aggregated evals | ||
| uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 | ||
| with: | ||
| name: eval_results_${{ inputs.result-prefix || 'all' }} | ||
| path: agg_eval_${{ inputs.result-prefix || 'all' }}.json | ||
|
|
||
| - name: Cleanup downloaded eval artifacts | ||
| if: ${{ always() }} | ||
| run: | | ||
| rm -rf eval_results/ || true | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.