Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
232 commits
Select commit Hold shift + click to select a range
cd9cb64
initial poc
cquil11 Nov 12, 2025
00ac64a
remove -d flag when launching docker container
cquil11 Nov 12, 2025
e38b38a
syntax error
cquil11 Nov 12, 2025
66eae81
compatibility fixes
cquil11 Nov 12, 2025
fdec241
add correct endpoint prefix
cquil11 Nov 12, 2025
08de857
remove reference env var
cquil11 Nov 12, 2025
06231ee
run vllm serve in background
cquil11 Nov 12, 2025
21ed067
unescape sequences
cquil11 Nov 12, 2025
65ef1f0
stop vllm to stdout after it stops
cquil11 Nov 12, 2025
cb55721
stop vllm to stdout after it stops pt 2
cquil11 Nov 12, 2025
788b7f1
get rid of docker stop as no longer in detatched
cquil11 Nov 12, 2025
a87e174
clone bench serving to tmp dir
cquil11 Nov 12, 2025
c1d0a79
clone bench serving to tmp dir pt 2
cquil11 Nov 12, 2025
4823afa
add explanatory comment
cquil11 Nov 12, 2025
d52299f
cleaning up
cquil11 Nov 12, 2025
85de6e7
cleaning up
cquil11 Nov 13, 2025
48f7588
adding mi355x refactor
cquil11 Nov 13, 2025
faec31e
adding h200 initial refactor
cquil11 Nov 13, 2025
1ef1b23
different way to see server logs
cquil11 Nov 13, 2025
75523ee
cleanup
cquil11 Nov 13, 2025
2536652
now fail if server fails
cquil11 Nov 13, 2025
2d58f0d
starting on b200
cquil11 Nov 13, 2025
f5cf4a7
doign b200
cquil11 Nov 13, 2025
92af70b
reverting erroneous change
cquil11 Nov 13, 2025
f330d67
fixing b200
cquil11 Nov 14, 2025
c5fcf81
fixing b200 pt 2
cquil11 Nov 14, 2025
3ededf0
updating mi300
cquil11 Nov 14, 2025
813381b
updating mi300 pt 2
cquil11 Nov 14, 2025
e1b387c
updating mi300 pt 3 -- remove detached mode
cquil11 Nov 14, 2025
c0a5c62
cleaning up mi355x
cquil11 Nov 14, 2025
634768c
fixing mi300x and updating 325x
cquil11 Nov 14, 2025
61a5c8f
reverting max conc to 512 on gptoss fp4 b200 docker
cquil11 Nov 14, 2025
74363e4
mi325x debug
cquil11 Nov 14, 2025
220e026
add back correct launch script for new mi325x slurm cluster (#231)
cquil11 Nov 14, 2025
5db1af8
fixing mi300x and updating 325x
cquil11 Nov 14, 2025
9806d30
Merge branch 'main' into refactor-docker-runner-launch
cquil11 Nov 14, 2025
b4eb57e
cleanng up
cquil11 Nov 14, 2025
04e30f3
add wait for h200 slurm dsr1
cquil11 Nov 14, 2025
d36965a
max num seqs back to 512 for gptoss fpr b200 docker
cquil11 Nov 14, 2025
fa7cbca
fix port issue for dsr1 mi300x docker
cquil11 Nov 14, 2025
1031ac9
fix mi355x docker NUM_PROMPTS
cquil11 Nov 14, 2025
8b847f1
adding prop of failure for server logs
cquil11 Nov 14, 2025
832bafc
add utils function for benchmark
cquil11 Nov 14, 2025
ebe3b62
add utils function for benchmark
cquil11 Nov 14, 2025
aa9070f
function-ize the waiting for server to start
cquil11 Nov 14, 2025
0d2c112
dont show arg parsing set -x
cquil11 Nov 14, 2025
271091d
dont show arg parsing set +x oops
cquil11 Nov 14, 2025
898b132
dont show arg parsing set +x oops
cquil11 Nov 14, 2025
fd2e33e
capture server pid
cquil11 Nov 14, 2025
2a4faf5
Squash-merge bryan/eval into refactor-docker-runner-launch
Oseltamivir Nov 14, 2025
173d7bf
evals h100-cr
Oseltamivir Nov 15, 2025
4ff8a9b
evals h100-cw
Oseltamivir Nov 15, 2025
83901e7
evals h200-nb
Oseltamivir Nov 15, 2025
6c65a24
move eval script here
Oseltamivir Nov 15, 2025
343d24e
evals mi300x-amd
Oseltamivir Nov 15, 2025
2de4a18
evals mi325x-amd
Oseltamivir Nov 15, 2025
21825ce
evals mi300x-tw
Oseltamivir Nov 15, 2025
00bfa34
evals mi300x-oci
Oseltamivir Nov 15, 2025
e8aa07e
evals mi325x-tw
Oseltamivir Nov 15, 2025
bf4eff2
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
71008bb
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
7f3cd09
evals mi355x-amd
Oseltamivir Nov 15, 2025
dfff2f4
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
9a11152
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
1ead695
evals mi325x-tw summary
Oseltamivir Nov 15, 2025
348d5d9
all summary
Oseltamivir Nov 15, 2025
679caa6
evals b200-nvd
Oseltamivir Nov 16, 2025
eda5e2f
evals b200-nvd 2
Oseltamivir Nov 16, 2025
42151cc
evals b200-nvd 3
Oseltamivir Nov 16, 2025
512dfc0
evals h100-cr
Oseltamivir Nov 16, 2025
4de631d
evals b200-nvd 1
Oseltamivir Nov 16, 2025
b33cb80
evals h200-trt-cw
Oseltamivir Nov 16, 2025
5babdb0
evals h200-trt-cw 2
Oseltamivir Nov 16, 2025
12a85b8
evals h200-trt-cw 3
Oseltamivir Nov 16, 2025
eb2846f
evals h100-cr 2
Oseltamivir Nov 16, 2025
4166070
evals h200-trt-cw 4
Oseltamivir Nov 16, 2025
5f6b772
evals h200-trt-cw 5 (EP/TP HARD)
Oseltamivir Nov 16, 2025
30baa1f
evals h200-trt-cw 6 (EP/TP HARD)
Oseltamivir Nov 16, 2025
5a209fd
evals h200-trt-cw 6 (EP/TP HARD)
Oseltamivir Nov 16, 2025
89a9cbd
evals h200-cw dsr1
Oseltamivir Nov 16, 2025
9254ef1
evals mi300x-cr dsr1
Oseltamivir Nov 16, 2025
6705ea3
evals mi300x-cr dsr1 2
Oseltamivir Nov 16, 2025
c1fc6db
evals mi325x-cr dsr1
Oseltamivir Nov 16, 2025
090630a
evals mi325x-cr dsr1 2
Oseltamivir Nov 16, 2025
d984d7a
evals mi355x-amd dsr1
Oseltamivir Nov 16, 2025
fb66e33
evals mi355x-amd dsr1 2
Oseltamivir Nov 16, 2025
d0eb0c4
evals mi355x-amd dsr1 3
Oseltamivir Nov 16, 2025
c1dc1a6
evals mi355x-amd dsr1 4
Oseltamivir Nov 16, 2025
88d3bf5
evals b200-nvd dsr1
Oseltamivir Nov 16, 2025
8a0677d
evals b200-nvd fp8 dsr1
Oseltamivir Nov 16, 2025
dab1a2c
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Nov 20, 2025
f862af7
Lighteval 1
Oseltamivir Nov 21, 2025
5ef76ef
Lighteval 1.75
Oseltamivir Nov 21, 2025
3081241
Lighteval Mi325x
Oseltamivir Nov 21, 2025
f182319
Lighteval Mi300x CR
Oseltamivir Nov 21, 2025
5ba2cf2
Lighteval Mi355x amd
Oseltamivir Nov 21, 2025
5bf69ab
Lighteval b200_nvd
Oseltamivir Nov 21, 2025
f862689
Lighteval h200_cr0
Oseltamivir Nov 21, 2025
c3df519
Lighteval h200-nb_1
Oseltamivir Nov 21, 2025
c1edb9a
Lighteval h100-cw_1
Oseltamivir Nov 21, 2025
d21826b
Error reproduction
Oseltamivir Nov 22, 2025
abdad78
Error file removal
Oseltamivir Nov 22, 2025
bd36530
error reproducibility
Oseltamivir Nov 22, 2025
a0434b1
should NOT error reproduce
Oseltamivir Nov 22, 2025
f56a311
should NOT error reproduce
Oseltamivir Nov 22, 2025
27bd2de
should NOT error reproduce
Oseltamivir Nov 22, 2025
c058b16
should NOT error reproduce
Oseltamivir Nov 22, 2025
2e36914
Double check other runner
Oseltamivir Nov 23, 2025
d2cf0fb
Cleanup MI300x_AMD
Oseltamivir Nov 23, 2025
0a8901a
Cleanup MI300x_AMD
Oseltamivir Nov 23, 2025
afd304f
Cleanup MI300x_AMD
Oseltamivir Nov 23, 2025
ef2ee40
Cleanup MI300x_AMD MUST WORK
Oseltamivir Nov 23, 2025
3790696
works
Oseltamivir Nov 23, 2025
92f244c
Working lighteval
Oseltamivir Nov 25, 2025
3e30425
lightevel fix
Oseltamivir Nov 25, 2025
0d87ea5
lighteval test h100-cw_1
Oseltamivir Nov 25, 2025
00b1623
lighteval test h100-cr_1 + parsing
Oseltamivir Nov 25, 2025
83a71d2
lighteval test b200_nvd
Oseltamivir Nov 25, 2025
df71abe
lighteval test b200_nvd
Oseltamivir Nov 25, 2025
4aa8d34
lighteval test mi300x-amd_0
Oseltamivir Nov 25, 2025
fe2ecd5
lighteval test h100-cw_1
Oseltamivir Nov 25, 2025
fef016a
lighteval test mi300x-cr_0
Oseltamivir Nov 25, 2025
124eb70
lighteval test mi325x-tw_1
Oseltamivir Nov 25, 2025
2b0b986
lighteval test mi355x-amd_4
Oseltamivir Nov 25, 2025
dae7345
lighteval test b200-nvd_3
Oseltamivir Nov 25, 2025
993b19f
lighteval test h100-cw_1 sudo test
Oseltamivir Nov 25, 2025
f5b3a7a
b200 fix check
Oseltamivir Nov 25, 2025
ff1eba6
b200 fix check
Oseltamivir Nov 25, 2025
d6a52ec
b200 fix check
Oseltamivir Nov 25, 2025
4dd7e21
b200 fix check
Oseltamivir Nov 25, 2025
37bd3df
b200 fix check
Oseltamivir Nov 25, 2025
43c7c59
b200 fix check
Oseltamivir Nov 25, 2025
e5a8e3a
b200 fix check
Oseltamivir Nov 25, 2025
8fb95f4
b200 fix check
Oseltamivir Nov 25, 2025
237b4e8
b200 fix check
Oseltamivir Nov 25, 2025
79eadc5
Prelimary lighteval for all
Oseltamivir Nov 26, 2025
a2d77ff
Prelimary lighteval for all 2 - fixed TP
Oseltamivir Nov 26, 2025
4e139a0
Prelimary lighteval for all 3
Oseltamivir Nov 26, 2025
76b8c2c
Fix lighteval 1
Oseltamivir Nov 27, 2025
fda8e2c
Check both
Oseltamivir Nov 27, 2025
2e7c127
lm-eval check
Oseltamivir Nov 27, 2025
867bfc3
lm-eval check
Oseltamivir Nov 27, 2025
8cbe81f
lm-eval check
Oseltamivir Nov 27, 2025
1b3b79f
lm-eva
Oseltamivir Nov 27, 2025
65f0303
mi325x test
Oseltamivir Nov 27, 2025
ddd3862
mi325x test
Oseltamivir Nov 27, 2025
30ad3ba
all change, test deepseek
Oseltamivir Nov 28, 2025
688e2c5
all change, test deepseek
Oseltamivir Nov 28, 2025
6b320ce
retest mi325x
Oseltamivir Nov 28, 2025
9768dea
test b200
Oseltamivir Nov 28, 2025
4c339b4
clean b200
Oseltamivir Nov 28, 2025
efe94aa
test h200
Oseltamivir Nov 28, 2025
705fc10
H200 test
Oseltamivir Nov 28, 2025
f79f243
B200-nvd2 sleep
Oseltamivir Nov 28, 2025
d9a4fed
B200-nvd2 sleep
Oseltamivir Nov 28, 2025
8c6b944
B200-nvd2 sleep
Oseltamivir Nov 28, 2025
28a026f
mi325x test
Oseltamivir Nov 28, 2025
c4bd3d2
mi325x test, no text, no empty fix
Oseltamivir Nov 28, 2025
14068bc
h100, tmp eval_out
Oseltamivir Nov 29, 2025
af2c385
h100, tmp eval_out, sweep integration
Oseltamivir Nov 29, 2025
5e1d68d
touch up sweep naming, remove funny triton error
Oseltamivir Nov 29, 2025
1a3262f
touch up sweep summary
Oseltamivir Nov 29, 2025
733d7ca
touch up run name
Oseltamivir Nov 29, 2025
68c1a2d
Missing eval env var docker
Oseltamivir Nov 30, 2025
6cb94a7
Typo
Oseltamivir Nov 30, 2025
bc472c3
Add proper coverage
Oseltamivir Nov 30, 2025
837622f
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Dec 1, 2025
2461447
Add evals
Oseltamivir Dec 2, 2025
848e834
Merge branch 'main' into evals-on-refactor
cquil11 Dec 2, 2025
710d428
Cam's solution
Oseltamivir Dec 2, 2025
3c8b9bc
b200 scancel fix
Oseltamivir Dec 2, 2025
1390c52
Change to 2 fewshot, forgot eval env var in b200
Oseltamivir Dec 2, 2025
544e698
Resolve issues
Oseltamivir Dec 3, 2025
dd96fcf
Merge branch 'main' into evals-on-refactor
cquil11 Dec 3, 2025
5ec3378
Resolve issues/nits
Oseltamivir Dec 4, 2025
ae4e481
fix summary table hardware
Oseltamivir Dec 4, 2025
48a220d
fix summary table hardware
Oseltamivir Dec 4, 2025
61327ca
fix summary table hardware 2
Oseltamivir Dec 4, 2025
1cf2967
final touches
Oseltamivir Dec 5, 2025
34e3b2a
Merge branch 'main' into evals-on-refactor
cquil11 Dec 5, 2025
1d889b8
Cleanup comments, ammend lighteval
Oseltamivir Dec 6, 2025
779a257
pt 1 manual merge conflict fixes
cquil11 Dec 15, 2025
00e77d0
Merge branch 'main' into evals-on-refactor
cquil11 Dec 15, 2025
9d4b217
pt 2 manual merge conflict fixes
cquil11 Dec 15, 2025
a9fad5b
use double quotes for gha parsing
cquil11 Dec 15, 2025
e07eb69
getting rid of full sweep sched changes
cquil11 Dec 15, 2025
9275f0d
add back spec decoding and disagg env vars
cquil11 Dec 15, 2025
dba25aa
add an option to ONLY run evals
cquil11 Dec 16, 2025
5de917b
remove full-sweep-test workflow and add collect-evals job to run swee…
cquil11 Dec 16, 2025
37d05d3
add run-eval to e2e tests
cquil11 Dec 16, 2025
6a546e5
math500 prompt and h200 trt evals
Oseltamivir Dec 16, 2025
d299d41
remove run prefix
cquil11 Dec 16, 2025
569d0c3
add result-prefix to benchmark tmpl uploaded artifacts
cquil11 Dec 16, 2025
30a3431
Evals summary refactor
Oseltamivir Dec 17, 2025
22c8a2b
Evals summary refactor 2
Oseltamivir Dec 17, 2025
8d12b35
Evals summary aesthetics
Oseltamivir Dec 17, 2025
d7a515a
TRT package fix, trt testing
Oseltamivir Dec 18, 2025
25f71bd
trt testing 2
Oseltamivir Dec 18, 2025
ab6bf8f
max_num_tokens
Oseltamivir Dec 19, 2025
0472555
Merge branch 'main' into evals-on-refactor
cquil11 Jan 5, 2026
0d8d7d1
Merge branch 'main' into evals-on-refactor
cquil11 Jan 7, 2026
9a873c4
unbounded gen len
Oseltamivir Jan 8, 2026
999b9f6
Fix tmpl args, add isl/osl to table
Oseltamivir Jan 8, 2026
9a13250
add isl/osl
Oseltamivir Jan 8, 2026
4b0f8de
set max tokens
Oseltamivir Jan 12, 2026
a52f4c6
remove nvd
Oseltamivir Jan 12, 2026
568e1d3
In case of multiple evals
Oseltamivir Jan 13, 2026
d55c796
diagnostic
Oseltamivir Jan 13, 2026
cdd2332
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Jan 13, 2026
0699df8
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Jan 13, 2026
fcd14e2
test dp_attn
Oseltamivir Jan 13, 2026
c902545
DP_ATTENTION back
Oseltamivir Jan 14, 2026
715269c
REMOVE LIGHTEVAL
Oseltamivir Jan 15, 2026
c19bb21
Merge branch 'main' into evals-on-refactor, address claude
Oseltamivir Jan 15, 2026
be431c8
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 15, 2026
50f09cc
Merge remote-tracking branch 'origin/main' into evals-on-refactor
Oseltamivir Jan 16, 2026
500029b
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 20, 2026
a353ea4
Add evals for atom, trt_mtp
Oseltamivir Jan 20, 2026
d6d4055
remove tokenizer from benchmarkserving
Oseltamivir Jan 20, 2026
338d80c
remove model_name
Oseltamivir Jan 20, 2026
e28631c
More evals for spec decode
Oseltamivir Jan 20, 2026
fa49cdc
claude pr comments
Oseltamivir Jan 19, 2026
7e628ff
chore(deps): bump the github-actions group with 2 updates (#488)
dependabot[bot] Jan 19, 2026
518d004
fix: update ep metadata in gb200 dynamo sglang configs to match comme…
functionstackx Jan 19, 2026
388020f
Experimental folder (increasing researcher/developer velocity) (#489)
functionstackx Jan 19, 2026
ef15b99
summary table
Oseltamivir Jan 21, 2026
b5b9ec0
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 21, 2026
a1f9b89
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 21, 2026
62079d6
Remove git installation and repository cloning
Oseltamivir Jan 21, 2026
5409158
evals final
Oseltamivir Jan 21, 2026
9ae0f90
more retries, lower conc, for stability
Oseltamivir Jan 21, 2026
43fd4e8
Merge branch 'main' into evals-on-refactor
Oseltamivir Jan 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion .github/workflows/benchmark-tmpl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@ on:
disagg:
required: true
type: string
run-eval:
type: boolean
required: true
Comment thread
Oseltamivir marked this conversation as resolved.
default: false
random-range-ratio:
required: false
type: string
default: '0.8'
ref:
description: "Git ref (branch/sha) to checkout"
required: false
Expand All @@ -74,6 +82,7 @@ env:
CONC: ${{ inputs.conc }}
SPEC_DECODING: ${{ inputs.spec-decoding }}
DISAGG: ${{ inputs.disagg }}
RUN_EVAL: ${{ inputs.run-eval }}

permissions:
contents: read
Expand All @@ -82,7 +91,7 @@ jobs:
benchmark:
runs-on: ${{ inputs.runner }}
timeout-minutes: 180
name: '${{ inputs.exp-name }} ${{ inputs.runner }} ${{ inputs.framework }} ${{ inputs.precision }} tp=${{ inputs.tp }} ep=${{ inputs.ep }} dpa=${{ inputs.dp-attn }} conc=${{ inputs.conc }} spec=${{ inputs.spec-decoding }}'
name: "${{ inputs.exp-name }} ${{ inputs.runner }} ${{ inputs.framework }} ${{ inputs.precision }} ${{ inputs.run-eval && 'eval ' || '' }}tp=${{ inputs.tp }} ep=${{ inputs.ep }} dpa=${{ inputs.dp-attn }} conc=${{ inputs.conc }} spec=${{ inputs.spec-decoding }}"
steps:
- name: Resource cleanup
run: |
Expand Down Expand Up @@ -113,7 +122,11 @@ jobs:
- name: Launch job script
env:
RUNNER_NAME: ${{ runner.name }}
RUNNER_TYPE: ${{ inputs.runner }}
RESULT_FILENAME: ${{ env.EXP_NAME }}_${{ env.PRECISION }}_${{ env.FRAMEWORK }}_tp${{ env.TP }}_ep${{ env.EP_SIZE }}_dpa_${{ env.DP_ATTENTION }}_conc${{ env.CONC }}_specdecode_${{ env.SPEC_DECODING }}_${{ runner.name }}
# Suppress per-job eval markdown from being appended to the step summary.
# We'll publish a single combined eval table in the collection job instead.
GITHUB_STEP_SUMMARY: ''
run: |
bash ./runners/launch_${RUNNER_NAME%%_*}.sh
FOUND_RESULT_FILE=
Expand All @@ -137,8 +150,27 @@ jobs:
RUNNER_TYPE: ${{ inputs.runner }}
run: |
python3 utils/process_result.py

- name: Upload result
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: bmk_${{ env.RESULT_FILENAME }}
path: agg_${{ env.RESULT_FILENAME }}.json

- name: Upload eval results (if any)
if: ${{ env.RUN_EVAL == 'true' }}
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: eval_${{ env.EXP_NAME }}_${{ env.RESULT_FILENAME }}
path: |
meta_env.json
results*.json
sample*.jsonl
if-no-files-found: ignore

- name: Cleanup eval outputs (post-upload)
if: ${{ env.RUN_EVAL == 'true' }}
run: |
rm -f meta_env.json || true
# Remove any eval results JSONs that were moved into workspace
rm -f results*.json || true
46 changes: 46 additions & 0 deletions .github/workflows/collect-evals.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Template - Collect Evals
Comment thread
cquil11 marked this conversation as resolved.

on:
workflow_call:
inputs:
result-prefix:
required: false
type: string
default: ''

permissions:
contents: read

jobs:
collect-evals:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
token: ${{ secrets.REPO_PAT }}
fetch-depth: 0

- name: Download eval artifacts
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
path: eval_results/
pattern: ${{ inputs.result-prefix && format('eval_{0}_*', inputs.result-prefix) || 'eval_*' }}
Comment thread
Oseltamivir marked this conversation as resolved.

- name: Summarize evals
run: |
pip install tabulate
echo "## Eval Summary" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
python3 utils/collect_eval_results.py eval_results/ ${{ inputs.result-prefix || 'all' }} >> $GITHUB_STEP_SUMMARY

- name: Upload aggregated evals
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: eval_results_${{ inputs.result-prefix || 'all' }}
path: agg_eval_${{ inputs.result-prefix || 'all' }}.json

- name: Cleanup downloaded eval artifacts
if: ${{ always() }}
run: |
rm -rf eval_results/ || true
4 changes: 3 additions & 1 deletion .github/workflows/collect-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@ jobs:
python3 utils/summarize.py results/ >> $GITHUB_STEP_SUMMARY

- name: Aggregate results
run: python3 utils/collect_results.py results/ ${{ inputs.result-prefix || 'all' }}
run: |
pip install tabulate
python3 utils/collect_results.py results/ ${{ inputs.result-prefix || 'all' }}

- name: Upload aggregated results
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
Expand Down
11 changes: 10 additions & 1 deletion .github/workflows/e2e-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -122,16 +122,25 @@ jobs:
conc: ${{ matrix.config.conc }}
spec-decoding: ${{ matrix.config.spec-decoding }}
disagg: ${{ matrix.config.disagg }}
run-eval: ${{ matrix.config.run-eval }}
ref: ${{ inputs.ref }}

collect-results:
needs: [test-sweep-multi-node, test-sweep-single-node]
if: ${{ always() }}
uses: ./.github/workflows/collect-results.yml
secrets: inherit
with:
result-prefix: "bmk"

collect-evals:
needs: [test-sweep-multi-node, test-sweep-single-node]
if: ${{ always() }}
uses: ./.github/workflows/collect-evals.yml
secrets: inherit

calc-success-rate:
needs: collect-results
needs: [collect-results, collect-evals]
if: ${{ always() }}
runs-on: ubuntu-latest

Expand Down
16 changes: 16 additions & 0 deletions .github/workflows/run-sweep.yml
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ jobs:
conc: ${{ matrix.config.conc }}
spec-decoding: ${{ matrix.config.spec-decoding }}
disagg: ${{ matrix.config.disagg }}
run-eval: ${{ matrix.config.run-eval }}

sweep-single-node-1k8k:
needs: setup
Expand Down Expand Up @@ -184,6 +185,21 @@ jobs:
with:
result-prefix: "bmk"

collect-evals:
needs:
[
sweep-single-node-1k1k,
sweep-single-node-1k8k,
sweep-single-node-8k1k,
sweep-multi-node-1k1k,
sweep-multi-node-1k8k,
sweep-multi-node-8k1k,
setup,
]
if: ${{ always() && needs.setup.result != 'skipped' }}
uses: ./.github/workflows/collect-evals.yml
secrets: inherit

upload-changelog-metadata:
needs: [setup, collect-results]
if: ${{ always() && needs.setup.result != 'skipped' }}
Expand Down
Loading