forked from SemiAnalysisAI/InferenceX
-
Notifications
You must be signed in to change notification settings - Fork 0
aiperf: second independent mooncake corpus for ISB-1 (opt-in) #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
OCWC22
wants to merge
13
commits into
isb1/kv-cache-stress-benchmark
Choose a base branch
from
isb1/aiperf-mooncake-exporter
base: isb1/kv-cache-stress-benchmark
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
d53bd3b
data(isb1): ship 179 pre-converted kv-cache-tester trace JSONs
OCWC22 fd73c8a
feat(isb1): add drop-in sweep config for kv-cache-tester (PR #993)
OCWC22 119a037
feat(isb1): add kv-cache-tester trace schema validator
OCWC22 5208886
data(isb1): ship converted/manifest.json — 179-trace catalog
OCWC22 962634e
docs(isb1): HF publication recipe for kv-cache-tester hf_<org>--<repo…
OCWC22 4f12d3a
feat(isb1): add mooncake trace exporter
OCWC22 7e1127b
fix(isb1): align mooncake schema with aiperf MooncakeTrace model
OCWC22 8a50906
isb1: ship non-preview mooncake JSONL bundles (LFS)
OCWC22 0557679
isb1: add mooncake sweep config (8k / 32k / 131k)
OCWC22 3f8ffd1
docs: operator recipe for MOONCAKE_INPUT (aiperf mooncake_trace)
OCWC22 46d6d0f
aiperf: offline smoke for MooncakeTraceDatasetLoader
OCWC22 8fe2b80
docs(isb1): mark mooncake recipe and sweep yaml as speculative
OCWC22 b31f7c1
feat(isb1): preserve ISL in mooncake flattener + noprefix cells
OCWC22 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
54 changes: 54 additions & 0 deletions
54
.github/configs/multiturn-agentic-trace-isb1-mooncake.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| # ====================================================================== | ||
| # STATUS: SPECULATIVE / NON-BLOCKING | ||
| # ---------------------------------------------------------------------- | ||
| # This file is NOT the canonical PR framing for fork PR #2. It depends on | ||
| # an unmerged upstream patch that wires MOONCAKE_INPUT through | ||
| # .github/workflows/benchmark-multiturn-tmpl.yml. Do NOT reference from | ||
| # workflows until that path is approved upstream. See | ||
| # datasets/isb1/RECIPE_MOONCAKE.md for the speculative patch set, or | ||
| # https://github.com/OCWC22/InferenceX/pull/2 for the canonical | ||
| # opt-in framing (corpus at datasets/isb1/mooncake/ consumed via existing | ||
| # --custom-dataset-type mooncake_trace, zero harness patches required). | ||
| # ====================================================================== | ||
| # ISB1 sweep cells for Cam's aiperf / mooncake_trace replay flow. | ||
| # Schema mirrors .github/configs/multiturn-agentic-trace.yaml and | ||
| # .github/configs/multiturn-agentic-trace-isb1.yaml. | ||
| # 8k code cells expect MOONCAKE_INPUT=datasets/isb1/mooncake/core/code_8k1k/. | ||
| # 32k chat cells expect MOONCAKE_INPUT=datasets/isb1/mooncake/extension_32k/chat_32k1k*/. | ||
| # 131k code cells expect MOONCAKE_INPUT=datasets/isb1/mooncake/extension_131k/*_131k1k*/. | ||
| # Preview 500k / 1m lanes are intentionally omitted in v1. | ||
| # | ||
| # offload values: | ||
| # on — KV offload enabled (VLLM_USE_SIMPLE_KV_OFFLOAD=1) | ||
| # off — KV offload disabled (baseline) | ||
| # noprefix — offload off AND --no-enable-prefix-caching (clean-cache floor). | ||
| # Cam's h100 lane already wires the flag in | ||
| # multiturn_fp8_h100_lmcache_aiperf.sh:123-126; these cells just | ||
| # surface the third mode so the sweep generator emits it. | ||
|
|
||
| h100-fp8-qwen3-isb1-mooncake-code-8k-lmcache: | ||
| tp2: {users: [1, 2, 4, 8, 16, 32], offload: ["on", "off", "noprefix"]} | ||
| tp4: {users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off", "noprefix"]} | ||
|
|
||
| h200-fp8-qwen3-isb1-mooncake-code-8k-lmcache: | ||
| tp2: {users: [2, 4, 8, 16, 32, 64, 128], offload: ["on", "off", "noprefix"]} | ||
| tp4: {users: [2, 4, 8, 16, 32, 64, 128], offload: ["on", "off", "noprefix"]} | ||
|
|
||
| h200-fp8-qwen3-isb1-mooncake-chat-32k-lmcache: | ||
| tp2: {users: [1, 2, 4, 8, 16, 32], offload: ["on", "off", "noprefix"]} | ||
| tp4: {users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off", "noprefix"]} | ||
|
|
||
| h200-fp8-qwen3-isb1-mooncake-code-131k-lmcache: | ||
| tp4: {users: [1, 2, 4, 8], offload: ["on", "off", "noprefix"]} | ||
| tp8: {users: [1, 2, 4, 8, 16], offload: ["on", "off", "noprefix"]} | ||
|
|
||
| b200-fp4-dsr1-isb1-mooncake-code-8k-lmcache: | ||
| tp4: {ep: 4, users: [4, 8, 16, 32, 64, 128, 256], offload: ["on", "off"]} | ||
| tp8: {ep: 8, users: [8, 16, 32, 64, 128, 256, 512], offload: ["on", "off"]} | ||
|
|
||
| b200-fp4-dsr1-isb1-mooncake-chat-32k-lmcache: | ||
| tp4: {ep: 4, users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off"]} | ||
| tp8: {ep: 8, users: [1, 2, 4, 8, 16, 32, 64, 128], offload: ["on", "off"]} | ||
|
|
||
| b200-fp4-dsr1-isb1-mooncake-code-131k-lmcache: | ||
| tp8: {ep: 8, users: [1, 2, 4, 8, 16], offload: ["on", "off"]} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| # ISB1 sweep cells for Cam's kv-cache-tester replay flow. | ||
| # Schema mirrors .github/configs/multiturn-agentic-trace.yaml. | ||
| # Merge these top-level keys into that file (or extend the sweep workflow | ||
| # to glob .github/configs/multiturn-agentic-trace*.yaml) to include ISB1 sweeps. | ||
| # 8k code cells map to datasets/isb1/converted/core/code_8k1k/. | ||
| # 32k chat cells map to datasets/isb1/converted/extension_32k/chat_32k1k*/. | ||
| # 131k code/chat cells map to datasets/isb1/converted/extension_131k/*_131k1k*/. | ||
| # 500k preview cells map to datasets/isb1/converted/preview/long_context_500k/. | ||
| # 1m preview cells map to datasets/isb1/converted/preview/long_context_1m/. | ||
| # Expected TRACE_DIR is either datasets/isb1/converted/ or one of those subdirs. | ||
|
|
||
| h200-fp8-qwen3-isb1-code-8k: | ||
| tp2: {users: [2, 4, 8, 16, 32, 64, 128], offload: ["on", "off"]} | ||
| tp4: {users: [2, 4, 8, 16, 32, 64, 128], offload: ["on", "off"]} | ||
|
|
||
| h200-fp8-qwen3-isb1-chat-32k: | ||
| tp2: {users: [1, 2, 4, 8, 16, 32], offload: ["on", "off"]} | ||
| tp4: {users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off"]} | ||
|
|
||
| h200-fp8-qwen3-isb1-code-131k: | ||
| tp4: {users: [1, 2, 4, 8], offload: ["on", "off"]} | ||
| tp8: {users: [1, 2, 4, 8, 16], offload: ["on", "off"]} | ||
|
|
||
| b200-fp4-dsr1-isb1-code-8k: | ||
| tp4: {ep: 4, users: [4, 8, 16, 32, 64, 128, 256], offload: ["on", "off"]} | ||
| tp8: {ep: 8, users: [8, 16, 32, 64, 128, 256, 512], offload: ["on", "off"]} | ||
|
|
||
| b200-fp4-dsr1-isb1-chat-32k: | ||
| tp4: {ep: 4, users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off"]} | ||
| tp8: {ep: 8, users: [1, 2, 4, 8, 16, 32, 64, 128], offload: ["on", "off"]} | ||
|
|
||
| b200-fp4-dsr1-isb1-code-131k: | ||
| tp8: {ep: 8, users: [1, 2, 4, 8, 16], offload: ["on", "off"]} | ||
|
|
||
| b200-fp4-qwen3-isb1-chat-500k-preview: | ||
| tp4: {users: [1, 2, 4], offload: ["on", "off"]} | ||
| tp8: {users: [1, 2, 4, 8], offload: ["on", "off"]} | ||
|
|
||
| b200-fp4-qwen3-isb1-chat-1m-preview: | ||
| tp8: {users: [1, 2], offload: ["on", "off"]} | ||
|
|
||
| mi355x-fp8-qwen3-isb1-code-8k: | ||
| tp2: {users: [2, 4, 8, 16, 32, 64], offload: ["on", "off"]} | ||
| tp4: {users: [2, 4, 8, 16, 32, 64, 128], offload: ["on", "off"]} | ||
|
|
||
| mi355x-fp8-qwen3-isb1-chat-32k: | ||
| tp4: {users: [1, 2, 4, 8, 16, 32], offload: ["on", "off"]} | ||
|
|
||
| h100-fp8-qwen3-isb1-code-8k-lmcache: | ||
| tp2: {users: [1, 2, 4, 8, 16, 32], offload: ["on", "off"]} | ||
| tp4: {users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off"]} | ||
|
|
||
| h200-fp8-qwen3-isb1-debug: | ||
| tp2: {users: [2], offload: ["off"]} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,3 @@ | ||
| exports/**/*.json filter=lfs diff=lfs merge=lfs -text linguist-generated=true | ||
| converted/**/*.json filter=lfs diff=lfs merge=lfs -text linguist-generated=true | ||
| mooncake/**/*.jsonl filter=lfs diff=lfs merge=lfs -text linguist-generated=true | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| # HF publication recipe for ISB1 converted traces | ||
|
|
||
| Mirror `datasets/isb1/converted/` to Hugging Face so Cam's | ||
| `TRACE_DIR=hf_<org>--<repo>` path works immediately with kv-cache-tester. | ||
| Recommended target: `semianalysisai/isb1-cc-traces`. | ||
|
|
||
| ## 1. Target namespace | ||
|
|
||
| - Dataset repo: `semianalysisai/isb1-cc-traces` | ||
| - Source directory: `datasets/isb1/converted/` | ||
| - Consumer contract: Cam's replay scripts interpret `hf_<org>--<repo>` as a | ||
| Hugging Face dataset reference before calling `trace_replay_tester.py` | ||
|
|
||
| ## 2. Prereqs | ||
|
|
||
| - `huggingface-cli >= 0.20` | ||
| - `HF_TOKEN` with write scope to the destination org | ||
| - Local validation already green: | ||
| `python3 tools/validate_kvcache_tester_trace.py datasets/isb1/converted/` | ||
|
|
||
| Authenticate first: | ||
|
|
||
| ```bash | ||
| export HF_TOKEN=hf_xxx | ||
| huggingface-cli login --token "$HF_TOKEN" | ||
| ``` | ||
|
|
||
| ## 3. Dataset card template | ||
|
|
||
| Create the HF dataset `README.md` with this content: | ||
|
|
||
| ```markdown | ||
| --- | ||
| license: apache-2.0 | ||
| task_categories: [text-generation] | ||
| language: [en] | ||
| pretty_name: ISB1 Converted kv-cache-tester Traces | ||
| tags: [kv-cache, trace-replay, inference-benchmark, semianalysis, isb1] | ||
| --- | ||
|
|
||
| # ISB1 Converted kv-cache-tester Traces | ||
|
|
||
| This dataset mirrors `datasets/isb1/converted/` from SemiAnalysisAI/InferenceX | ||
| PR #1032 so Cam's kv-cache-tester replay flow from PR #993 can consume ISB1 | ||
| traces directly through the `hf_<org>--<repo>` `TRACE_DIR` convention. | ||
|
|
||
| ## Contents | ||
|
|
||
| - 179 pre-converted trace JSON files | ||
| - 8k / 32k / 64k / 131k / 500k preview / 1m preview coverage | ||
| - Kimi K2.5 / DSR1 / GPT-OSS / Qwen3.5 coverage | ||
| - `manifest.json` metadata catalog | ||
|
|
||
| ## Provenance | ||
|
|
||
| - Source repo: `SemiAnalysisAI/InferenceX` | ||
| - Source PR: `#1032` | ||
| - Consumer workflow: `callanjfox/kv-cache-tester` PR `#993` | ||
| - License: Apache-2.0 | ||
| ``` | ||
|
|
||
| ## 4. Upload command | ||
|
|
||
| ```bash | ||
| huggingface-cli upload \ | ||
| semianalysisai/isb1-cc-traces \ | ||
| datasets/isb1/converted/ \ | ||
| . \ | ||
| --repo-type dataset \ | ||
| --revision main | ||
| ``` | ||
|
|
||
| If the repo does not exist yet, create it in the HF UI first, then rerun the | ||
| upload. | ||
|
|
||
| ## 5. Cam's Slurm integration | ||
|
|
||
| After publication, switch Cam's script from a local directory to the HF path: | ||
|
|
||
| ```bash | ||
| TRACE_DIR=hf_semianalysisai--isb1-cc-traces # replaces datasets/isb1/converted | ||
| ``` | ||
|
|
||
| That triggers the `hf_<org>--<repo>` branch in Cam's PR #993 replay script | ||
| (`benchmarks/single_node/multiturn_fp4_b200_trace_replay.sh`, lines 54-58), | ||
| which rewrites the value into `--hf-dataset <org>/<repo>` before invoking | ||
| `trace_replay_tester.py`. | ||
|
|
||
| ## 6. Versioning | ||
|
|
||
| When new traces land: | ||
|
|
||
| 1. Regenerate `datasets/isb1/converted/manifest.json` | ||
| 2. Re-run local validation on the converted directory | ||
| 3. Upload the updated directory to HF `main` | ||
| 4. Create a matching HF tag such as `v0.2.0` or `pr1032-r2` | ||
| 5. Record the InferenceX commit SHA and HF revision together | ||
|
|
||
| Consumers who need immutability should pin an HF revision instead of floating | ||
| on `main`. | ||
|
|
||
| ## 7. Verification | ||
|
|
||
| ```bash | ||
| rm -rf /tmp/verify | ||
| huggingface-cli download semianalysisai/isb1-cc-traces \ | ||
| --repo-type dataset \ | ||
| --local-dir /tmp/verify | ||
| python3 tools/validate_kvcache_tester_trace.py /tmp/verify | ||
| ``` | ||
|
|
||
| Expected result: | ||
|
|
||
| - Download succeeds with all trace JSONs present | ||
| - Validator reports all converted traces passing | ||
| - Cam's replay wrapper accepts | ||
| `TRACE_DIR=hf_semianalysisai--isb1-cc-traces` with no shell-script changes | ||
|
|
||
| ## Notes | ||
|
|
||
| - Publish converted artifacts and metadata only | ||
| - Keep the layout compatible with `trace_replay_tester.py` | ||
| - If the org name changes, update both the upload command and `TRACE_DIR` | ||
| example together |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converted/**/*.jsonis marked as LFS-tracked, which also applies todatasets/isb1/converted/manifest.json. Since the manifest is small and useful to review/diff in normal Git history, consider overriding attributes for just that manifest (disable LFS) so it stays human-readable in PRs.