Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
bbc91bc
feat(isb1): add KV cache stress benchmark with multi-turn synthetic t…
OCWC22 Apr 15, 2026
cff850b
fix: validate KV stress configs against export metadata
OCWC22 Apr 15, 2026
fec4855
feat: expand export metadata + configs for all 87 model×engine×GPU co…
OCWC22 Apr 15, 2026
fa132a7
fix(isb1): close PR#1032 merge-sweep — flat paths, v0.2.0 manifests, …
OCWC22 Apr 17, 2026
c96d6a5
docs(isb1): tighten public-facing docs — flat paths, accurate counts,…
OCWC22 Apr 17, 2026
127f068
chore(isb1): trim PR #1032 to narrow data+contract scope
OCWC22 Apr 20, 2026
3c10c05
chore(isb1): drop remaining non-scope infra files from PR #1032
OCWC22 Apr 20, 2026
638e62a
chore(isb1): drop remaining non-scope files relative to PR base
OCWC22 Apr 20, 2026
3c2d003
chore(isb1): drop GPU-cell/CI edits from PR #1032 — data+contract only
OCWC22 Apr 20, 2026
e0d7506
feat(isb1): add kv-cache-tester shim + fix LFS precedence
OCWC22 Apr 21, 2026
d0e199e
refactor(isb1): drop homegrown replay harness — defer to callanjfox/k…
OCWC22 Apr 21, 2026
7c82349
Merge upstream/main into isb1/kv-cache-stress-benchmark
OCWC22 Apr 21, 2026
d53bd3b
data(isb1): ship 179 pre-converted kv-cache-tester trace JSONs
OCWC22 Apr 21, 2026
fd73c8a
feat(isb1): add drop-in sweep config for kv-cache-tester (PR #993)
OCWC22 Apr 21, 2026
119a037
feat(isb1): add kv-cache-tester trace schema validator
OCWC22 Apr 21, 2026
5208886
data(isb1): ship converted/manifest.json — 179-trace catalog
OCWC22 Apr 21, 2026
962634e
docs(isb1): HF publication recipe for kv-cache-tester hf_<org>--<repo…
OCWC22 Apr 21, 2026
40bad61
feat(isb1): HF publish package for ISB-1 kv-cache-tester corpus
OCWC22 Apr 21, 2026
38fd91a
feat(isb1): add noprefix sweep cells + DSR1 131k HF trace_replay cell
OCWC22 Apr 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
datasets/isb1/exports/preview/long_context_1m/*.json filter=lfs diff=lfs merge=lfs -text
datasets/isb1/exports/**/*.json filter=lfs diff=lfs merge=lfs -text
73 changes: 73 additions & 0 deletions .github/configs/multiturn-agentic-trace-isb1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# ISB1 sweep cells for Cam's kv-cache-tester replay flow.
# Schema mirrors .github/configs/multiturn-agentic-trace.yaml.
# Merge these top-level keys into that file (or extend the sweep workflow
# to glob .github/configs/multiturn-agentic-trace*.yaml) to include ISB1 sweeps.
# 8k code cells map to datasets/isb1/converted/core/code_8k1k/.
# 32k chat cells map to datasets/isb1/converted/extension_32k/chat_32k1k*/.
# 131k code/chat cells map to datasets/isb1/converted/extension_131k/*_131k1k*/.
# 500k preview cells map to datasets/isb1/converted/preview/long_context_500k/.
# 1m preview cells map to datasets/isb1/converted/preview/long_context_1m/.
# Expected TRACE_DIR is either datasets/isb1/converted/ or one of those subdirs,
# or an HF alias like hf_wchen22--isb1-cc-traces (resolved by the replay
# harness's TRACE_DIR=hf_<org>--<repo> hydration path).
#
# offload values:
# on — KV offload enabled (VLLM_USE_SIMPLE_KV_OFFLOAD=1)
# off — KV offload disabled (baseline)
# noprefix — offload off AND --no-enable-prefix-caching (clean-cache floor).
# Cam's h100 lane already wires the flag in
# multiturn_fp8_h100_lmcache_aiperf.sh:123-126; these cells just
# surface the third mode so the sweep generator emits it.

h200-fp8-qwen3-isb1-code-8k:
tp2: {users: [2, 4, 8, 16, 32, 64, 128], offload: ["on", "off", "noprefix"]}
tp4: {users: [2, 4, 8, 16, 32, 64, 128], offload: ["on", "off", "noprefix"]}

h200-fp8-qwen3-isb1-chat-32k:
tp2: {users: [1, 2, 4, 8, 16, 32], offload: ["on", "off", "noprefix"]}
tp4: {users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off", "noprefix"]}

h200-fp8-qwen3-isb1-code-131k:
tp4: {users: [1, 2, 4, 8], offload: ["on", "off", "noprefix"]}
tp8: {users: [1, 2, 4, 8, 16], offload: ["on", "off", "noprefix"]}

b200-fp4-dsr1-isb1-code-8k:
tp4: {ep: 4, users: [4, 8, 16, 32, 64, 128, 256], offload: ["on", "off"]}
tp8: {ep: 8, users: [8, 16, 32, 64, 128, 256, 512], offload: ["on", "off"]}

b200-fp4-dsr1-isb1-chat-32k:
tp4: {ep: 4, users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off"]}
tp8: {ep: 8, users: [1, 2, 4, 8, 16, 32, 64, 128], offload: ["on", "off"]}

b200-fp4-dsr1-isb1-code-131k:
tp8: {ep: 8, users: [1, 2, 4, 8, 16], offload: ["on", "off"]}

# DSR1 131k reasoning cell — trace_replay backed by the HF publish
# (wchen22/isb1-cc-traces). Exercises Cam's Apr 20 --no-max-tokens flag
# against a reasoning corpus without requiring local dataset checkout.
# TRACE_DIR alias: hf_wchen22--isb1-cc-traces
# (subset consumed by this cell: extension_131k/code_131k1k* and
# extension_131k/chat_131k1k*)
b200-fp4-dsr1-isb1-code-131k-hf:
tp8: {ep: 8, users: [1, 2, 4, 8, 16], offload: ["on", "off"]}

b200-fp4-qwen3-isb1-chat-500k-preview:
tp4: {users: [1, 2, 4], offload: ["on", "off"]}
tp8: {users: [1, 2, 4, 8], offload: ["on", "off"]}

b200-fp4-qwen3-isb1-chat-1m-preview:
tp8: {users: [1, 2], offload: ["on", "off"]}

mi355x-fp8-qwen3-isb1-code-8k:
tp2: {users: [2, 4, 8, 16, 32, 64], offload: ["on", "off"]}
tp4: {users: [2, 4, 8, 16, 32, 64, 128], offload: ["on", "off"]}

mi355x-fp8-qwen3-isb1-chat-32k:
tp4: {users: [1, 2, 4, 8, 16, 32], offload: ["on", "off"]}

h100-fp8-qwen3-isb1-code-8k-lmcache:
tp2: {users: [1, 2, 4, 8, 16, 32], offload: ["on", "off", "noprefix"]}
tp4: {users: [1, 2, 4, 8, 16, 32, 64], offload: ["on", "off", "noprefix"]}

h200-fp8-qwen3-isb1-debug:
tp2: {users: [2], offload: ["off"]}
2 changes: 2 additions & 0 deletions datasets/isb1/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
exports/**/*.json filter=lfs diff=lfs merge=lfs -text linguist-generated=true
converted/**/*.json filter=lfs diff=lfs merge=lfs -text linguist-generated=true
196 changes: 196 additions & 0 deletions datasets/isb1/HF_PUBLISH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# HF publication recipe for ISB1 converted traces

Mirror `datasets/isb1/converted/` to Hugging Face so Cam's
`TRACE_DIR=hf_<org>--<repo>` path works immediately with kv-cache-tester.
Preferred target: `semianalysisai/isb1-cc-traces`.
Fallback if org write access is unavailable: `ocwc22/isb1-cc-traces`.

## 1. What gets published

This publish package is the checked-in trio below:

- `datasets/isb1/converted/` — 179 validated kv-cache-tester trace JSON files
- `datasets/isb1/converted/manifest.json` — corpus metadata (`1226` total requests)
- `datasets/isb1/hf_dataset_card.md` — staged to HF as `README.md`

The consumer contract is unchanged: Cam's replay scripts interpret
`hf_<org>--<repo>` as a Hugging Face dataset source, hydrate it locally, and
then invoke the existing replay path.

## 2. Pre-flight validation

Run the stdlib validator before every publish attempt:

```bash
python3 tools/validate_kvcache_tester_trace.py datasets/isb1/converted/
```

Expected result:

- `✓ 179 files valid | 0 failed`
- Exit code `0`

If validation fails, stop and fix the source corpus before publishing. Do not
push a broken dataset mirror to HF.

## 3. Python version

`tools/publish_hf_dataset.py` imports `huggingface_hub >= 0.24`, which in turn
requires Python 3.10+. On macOS the system `/usr/bin/python3` is 3.9 and does
not ship `huggingface_hub`; do not use it.

Use Python 3.13 explicitly:

```bash
/opt/homebrew/opt/python@3.13/bin/python3.13 -m pip install --user huggingface_hub
/opt/homebrew/opt/python@3.13/bin/python3.13 tools/publish_hf_dataset.py --help
```

Or activate a virtualenv / pyenv shim that resolves to 3.10+ before running any
of the commands below. If you see `ModuleNotFoundError: huggingface_hub`, you
are on 3.9 — switch interpreters first.

## 4. Token setup

Authenticate with a token that has write access to the destination namespace:

```bash
huggingface-cli login
```

If you prefer explicit token injection:

```bash
export HF_TOKEN=hf_xxx
huggingface-cli login --token "$HF_TOKEN"
```

## 5. Dry-run the publish package locally

The uploader script stages the converted corpus plus the dataset card and
prints the exact file list it would upload without making any remote changes.

```bash
python3 tools/publish_hf_dataset.py \
--source datasets/isb1/converted/ \
--repo semianalysisai/isb1-cc-traces \
--private \
--dry-run
```

Use the fallback namespace instead if needed:

```bash
python3 tools/publish_hf_dataset.py \
--source datasets/isb1/converted/ \
--repo ocwc22/isb1-cc-traces \
--private \
--dry-run
```

## 6. Publish for real

Once the dry-run output looks correct and HF auth is configured, publish with
one of the exact commands below.

Private-first publish:

```bash
python3 tools/publish_hf_dataset.py \
--source datasets/isb1/converted/ \
--repo semianalysisai/isb1-cc-traces \
--private \
--commit-message "Publish ISB-1 kv-cache-tester traces"
```

Or make the dataset public at creation time:

```bash
python3 tools/publish_hf_dataset.py \
--source datasets/isb1/converted/ \
--repo semianalysisai/isb1-cc-traces \
--public \
--commit-message "Publish ISB-1 kv-cache-tester traces"
```

Fallback org:

```bash
python3 tools/publish_hf_dataset.py \
--source datasets/isb1/converted/ \
--repo ocwc22/isb1-cc-traces \
--public \
--commit-message "Publish ISB-1 kv-cache-tester traces"
```

The script will:

1. Stage `datasets/isb1/converted/` into a temporary upload tree
2. Copy `datasets/isb1/hf_dataset_card.md` into that tree as `README.md`
3. Create the dataset repo if it does not already exist
4. Upload the staged folder with `huggingface_hub`
5. Verify the published snapshot with `snapshot_download` into `/tmp`

## 7. Post-publish verification

### Repository-level verification

Re-download the published dataset and re-run the validator against the hydrated
copy:

```bash
huggingface-cli download semianalysisai/isb1-cc-traces \
--repo-type dataset \
--local-dir /tmp/isb1-cc-traces-verify
python3 tools/validate_kvcache_tester_trace.py /tmp/isb1-cc-traces-verify
```

### Harness-level verification

The exact consumer path for Cam is the existing `TRACE_DIR=hf_<org>--<repo>`
contract. In the replay harness checkout, the closest end-to-end verification
command is:

```bash
TRACE_DIR=hf_semianalysisai--isb1-cc-traces \
bash experimental/multiturn/benchmarks/single_node/multiturn_fp8_h200_trace_replay.sh
```

If the SemianalysisAI org is not available, swap in the fallback namespace:

```bash
TRACE_DIR=hf_ocwc22--isb1-cc-traces \
bash experimental/multiturn/benchmarks/single_node/multiturn_fp8_h200_trace_replay.sh
```

## 8. Consumer note for Cam

This is the zero-friction handoff:

```bash
TRACE_DIR=hf_semianalysisai--isb1-cc-traces \
bash experimental/multiturn/benchmarks/single_node/multiturn_fp8_h200_trace_replay.sh
```

No code change is required in Cam's harness. The only user action is publishing
this dataset repo once with valid HF credentials.

## 9. Versioning guidance

When new traces land:

1. Regenerate `datasets/isb1/converted/manifest.json`
2. Re-run `tools/validate_kvcache_tester_trace.py`
3. Re-run the uploader dry-run
4. Publish with a commit message that records the corpus revision
5. Record the InferenceX commit SHA and the HF dataset revision together

Consumers that need immutability should pin an HF revision instead of floating
on `main`.

## Notes

- Publish converted artifacts and metadata only
- Do not modify `datasets/isb1/converted/**` during publication prep
- Keep the uploaded layout compatible with kv-cache-tester's existing
`TRACE_DIR=hf_<org>--<repo>` convention
Loading