[isb1] add converted trace corpus + kv-cache-tester contract helpers by OCWC22 · Pull Request #1032 · SemiAnalysisAI/InferenceX

OCWC22 · 2026-04-15T04:42:58Z

Summary

Refresh PR #1032 into a checked-in data + contract drop for Cam's kv-cache-tester replay flow.

179 datasets/isb1/converted/** traces plus datasets/isb1/converted/manifest.json
1226 total requests cataloged in the manifest
Stdlib-only tools/validate_kvcache_tester_trace.py
Supporting docs/config for the existing TRACE_DIR=hf_<org>--<repo> consumption path
No changes under experimental/**

Scope

This PR keeps the consumer harness unchanged. It adds checked-in artifacts and small compatibility helpers only:

datasets/isb1/exports/** replay bundles
datasets/isb1/converted/** kv-cache-tester-ready per-conversation JSON traces
manifests/docs under datasets/isb1/
converter / validator helper tools and tests
.github/configs/multiturn-agentic-trace-isb1.yaml as a drop-in config mirror

Validation

python3 tools/validate_kvcache_tester_trace.py datasets/isb1/converted/ ✅ 179 files valid | 0 failed
Manifest audit ✅ 179 listed files exist and totals match manifest.json (179 traces / 1226 requests)
Hash-id audit ✅ positive-input turns carry non-empty hash_ids with lengths consistent with block_size=64
Diff audit ✅ no files under experimental/**

Why this matters

Cam's trace replay wrapper already accepts TRACE_DIR=hf_<org>--<repo>, so the converted corpus is the zero-friction path: publish or mirror the dataset and point the existing harness at it, with no consumer-side script edits.

Live HF dataset

The corpus is now published at https://huggingface.co/datasets/wchen22/isb1-cc-traces (public, 182 files / 9.27 MiB).

Consume with:

TRACE_DIR=hf_wchen22--isb1-cc-traces \
bash experimental/multiturn/benchmarks/single_node/multiturn_fp8_h200_trace_replay.sh

What landed since initial review

40bad610 — feat(isb1): HF publish package for ISB-1 kv-cache-tester corpus (dataset card + tools/publish_hf_dataset.py + runbook at datasets/isb1/HF_PUBLISH.md). Publish is a one-shot user action; no harness change required.
38fd91a7 — feat(isb1): add noprefix sweep cells + DSR1 131k HF trace_replay cell:
- noprefix added as a third offload value on every H200 fp8 Qwen3 cell and the H100 fp8 Qwen3 lmcache cell (Cam's multiturn_fp8_h100_lmcache_aiperf.sh:123-126 already wires --no-enable-prefix-caching; the sweep just needs to emit it).
- New b200-fp4-dsr1-isb1-code-131k-hf cell pointing at hf_wchen22--isb1-cc-traces so reasoning sweeps exercise the HF path plus Cam's Apr 20 --no-max-tokens flag.
- HF_PUBLISH.md gains a "Python version" section: publisher needs huggingface_hub (Python >= 3.10); macOS system python3 is 3.9 and silently fails. Use /opt/homebrew/opt/python@3.13/bin/python3.13.

Non-goals

No edits to experimental/multiturn/**
No consumer-harness rewrite
No benchmark runtime claims in this PR

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copilot

Pull request overview

Adds an ISB1 “KV cache stress / multi-turn replay” benchmarking surface (data + configs + runners + analysis utilities) to enable realistic long-context, high-prefix-overlap replay and offload-mode sweeps, while keeping it isolated from the existing experimental multiturn/kv-cache-tester lane.

Changes:

Add committed ISB1 export bundles (including preview 500K/1M lanes) and supporting ISB1 dataset documentation.
Add ISB1 KV-stress sweep workflow/config plus result summarization + gating utilities and tests.
Add/extend runner + single-node benchmark scripts (vLLM/SGLang + TriAttention variants) and GMI helper scripts for running/collecting sweeps.

Reviewed changes

Copilot reviewed 147 out of 150 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
utils/verify_producer_sync.py	New utility to compare producer vs consumer export trees for selected ISB1 subtrees.
utils/test_verify_producer_sync.py	Tests for verify_producer_sync utility (pass + content mismatch).
utils/test_summarize_isb1.py	Tests for ISB1 operator summary output formatting/sections.
utils/test_process_result.py	Adds guards/tests ensuring ISB1 replay-style results don’t go through throughput processor.
utils/test_gate_isb1.py	Tests for ISB1 gating logic and strict failure behavior.
utils/process_result.py	Adds “fail fast” guards for ISB1 replay env/payload in throughput result processor.
runners/lib_single_node_script.sh	New helper to resolve benchmark script paths (runtime-aware for ISB1 replay).
runners/launch_h200-nb.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h200-dgxc-slurm.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h200-cw.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h100-dgxc-slurm.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h100-cw.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h100-cr.sh	Uses new script resolver; expands env passthrough for ISB1 replay/kv-stress.
runners/launch_b200-nb.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_b200-dgxc.sh	Uses new script resolver; expands env passthrough for ISB1 replay/kv-stress.
runners/launch_b200-dgxc-slurm.sh	Uses new script resolver; executes resolved benchmark script; ensures cleanup.
experimental/multiturn/vllm_benchmark/scripts/trace_replay_qwen3.5_fp8_h200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_qwen3.5_fp8_h200_sglang.sh	Adds experimental trace-replay runner script (SGLang).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_qwen3.5_fp8_b200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_qwen3.5_fp8_b200_sglang.sh	Adds experimental trace-replay runner script (SGLang).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_gptoss_fp4_h200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_gptoss_fp4_h200_sglang.sh	Adds experimental trace-replay runner script (SGLang).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_gptoss_fp4_b200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_gptoss_fp4_b200_sglang.sh	Adds experimental trace-replay runner script (SGLang).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_dsr1_fp8_h200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_dsr1_fp8_b200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/launch/lmcache_vllm_h200.sh	Adds experimental LMCache-enabled vLLM launcher (H200).
experimental/multiturn/vllm_benchmark/launch/lmcache_vllm_b200.sh	Adds experimental LMCache-enabled vLLM launcher (B200).
experimental/multiturn/vllm_benchmark/launch/README.md	Docs for experimental LMCache launch helpers.
experimental/multiturn/vllm_benchmark/kv-cache-tester/traces/.gitkeep	Placeholder for external trace assets directory.
experimental/multiturn/vllm_benchmark/kv-cache-tester/README.md	Placeholder README describing expected kv-cache-tester population.
experimental/multiturn/vllm_benchmark/aiperf_traces/generate_aiperf_traces.py	Script to generate synthetic AIPerf-style sessions for replay.
experimental/multiturn/vllm_benchmark/README.md	Docs describing experimental parity surface and links to ISB1 scripts.
experimental/multiturn/vllm_benchmark/.gitignore	Ignores generated artifacts in experimental multiturn bench area.
experimental/multiturn/README.md	Replaces older notes with scoped “experimental notes” guidance and pointers to ISB1 ground truth.
experimental/README.md	Updates experimental directory warning + pointers to ISB1 ground truth docs.
datasets/isb1/scripts/plot_pareto.py	Adds Pareto frontier computation + optional plotting (TTFT p99 vs throughput).
datasets/isb1/scripts/gpu_profile_collector.sh	Adds nvidia-smi polling helper for GPU utilization/power logging.
datasets/isb1/scripts/gmi_test_matrix.sh	Adds a curated “matrix” driver for running portable benchmarks.
datasets/isb1/scripts/gmi_kv_sweep.sh	Adds concurrency × offload-mode sweep driver for portable benchmarks.
datasets/isb1/scripts/gmi_full_suite.sh	Adds full-suite portable runner across models/engines/bands (with skips).
datasets/isb1/scripts/generate_qwen35_low_band_exports.py	Generates Qwen3.5-specific low-band export bundles by rewriting filtered cells.
datasets/isb1/scripts/collect_sweep_results.py	Aggregates sweep results from DB or JSON dir; computes cliffs/benefits.
datasets/isb1/scripts/analyze_benchmark_distributions.py	Analyzes token/turn distributions for ISB1 exports or kv-cache traces.
datasets/isb1/scripts/adapt_trace_replay_result.py	Adapts kv-cache trace replay outputs into ISB1 replay JSON schema.
datasets/isb1/exports/preview/long_context_500k/manifest_qwen3.5.json	Adds preview 500k manifest (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/manifest.json	Adds preview 500k manifest (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__coding_qwen3.5_xlc2_500k_preview_v1__vllm.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__coding_qwen3.5_xlc2_500k_preview_v1__sglang.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__coding_gptoss_xlc2_500k_preview_v1__vllm.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__coding_gptoss_xlc2_500k_preview_v1__sglang.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__chat_qwen3.5_xlc2_500k_preview_v1__vllm.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__chat_qwen3.5_xlc2_500k_preview_v1__sglang.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__chat_gptoss_xlc2_500k_preview_v1__vllm.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__chat_gptoss_xlc2_500k_preview_v1__sglang.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/README.md	Documents bounded 500k-class preview lanes and claim boundary.
datasets/isb1/exports/preview/long_context_1m/manifest.json	Adds preview 1m manifest (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/inferencex_trace_replay__coding_qwen3.5_ulc2_1m_preview_v1__vllm.json	Adds preview 1m export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/inferencex_trace_replay__coding_qwen3.5_ulc2_1m_preview_v1__sglang.json	Adds preview 1m export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/inferencex_trace_replay__chat_qwen3.5_ulc2_1m_preview_v1__vllm.json	Adds preview 1m export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/inferencex_trace_replay__chat_qwen3.5_ulc2_1m_preview_v1__sglang.json	Adds preview 1m export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/README.md	Documents gated 1M preview lane and manual config boundary.
datasets/isb1/exports/extension_64k/vllm/code_64k1k_qwen3.5.json	Adds extension 64k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/vllm/code_64k1k.json	Adds extension 64k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/vllm/chat_64k1k_qwen3.5.json	Adds extension 64k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/vllm/chat_64k1k.json	Adds extension 64k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/sglang/code_64k1k_qwen3.5.json	Adds extension 64k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/sglang/code_64k1k.json	Adds extension 64k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/sglang/chat_64k1k_qwen3.5.json	Adds extension 64k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/sglang/chat_64k1k.json	Adds extension 64k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/vllm/code_32k1k_qwen3.5.json	Adds extension 32k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/vllm/code_32k1k.json	Adds extension 32k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/vllm/chat_32k1k_qwen3.5.json	Adds extension 32k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/vllm/chat_32k1k.json	Adds extension 32k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/sglang/code_32k1k_qwen3.5.json	Adds extension 32k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/sglang/code_32k1k.json	Adds extension 32k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/sglang/chat_32k1k_qwen3.5.json	Adds extension 32k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/sglang/chat_32k1k.json	Adds extension 32k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/code_131k1k_qwen3.5.json	Adds/updates extension 131k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/code_131k1k.json	Adds/updates extension 131k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/chat_131k1k_qwen3.5.json	Adds/updates extension 131k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/chat_131k1k_dsr1.json	Adds/updates extension 131k DSR1 bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/chat_131k1k.json	Adds/updates extension 131k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/code_131k1k_qwen3.5.json	Adds/updates extension 131k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/code_131k1k.json	Adds/updates extension 131k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/chat_131k1k_qwen3.5.json	Adds/updates extension 131k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/chat_131k1k_dsr1.json	Adds/updates extension 131k DSR1 bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/chat_131k1k.json	Adds/updates extension 131k generic bundle (Git LFS pointer).
datasets/isb1/exports/core/vllm/code_8k1k_qwen3.5.json	Adds core 8k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/core/vllm/code_8k1k.json	Adds core 8k generic bundle (Git LFS pointer).
datasets/isb1/exports/core/vllm/chat_8k1k_qwen3.5.json	Adds core 8k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/core/vllm/chat_8k1k.json	Adds core 8k generic bundle (Git LFS pointer).
datasets/isb1/exports/core/sglang/code_8k1k_qwen3.5.json	Adds core 8k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/core/sglang/code_8k1k.json	Adds core 8k generic bundle (Git LFS pointer).
datasets/isb1/exports/core/sglang/chat_8k1k_qwen3.5.json	Adds core 8k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/core/sglang/chat_8k1k.json	Adds core 8k generic bundle (Git LFS pointer).
datasets/isb1/README.md	Adds ISB1 consumer-package README with coverage inventory and claim boundary.
datasets/isb1/GMI_EXECUTION_PLAN.md	Adds execution plan/runbook for external GMI KV-stress benchmarking.
datasets/isb1/COEXISTENCE_WITH_KV_CACHE_TESTER.md	Adds coexistence plan doc for ISB1 vs kv-cache-tester surfaces.
datasets/isb1/.gitattributes	Adds attributes for exports (linguist + EOL handling).
benchmarks/single_node/qwen3.5triattn_fp8_h200_vllm.sh	Adds TriAttention vLLM benchmark script (H200).
benchmarks/single_node/qwen3.5triattn_fp8_h100_vllm.sh	Adds TriAttention vLLM benchmark script (H100).
benchmarks/single_node/qwen3.5_fp8_h200_vllm.sh	Adds/updates Qwen3.5 vLLM script (H200) with ISB1-aware prefix/offload behavior.
benchmarks/single_node/qwen3.5_fp8_h200_sglang.sh	Adds Qwen3.5 SGLang script (H200) with ISB1-aware radix/offload behavior.
benchmarks/single_node/qwen3.5_fp8_h100_vllm.sh	Adds Qwen3.5 vLLM script (H100).
benchmarks/single_node/qwen3.5_fp8_h100_sglang.sh	Adds Qwen3.5 SGLang script (H100).
benchmarks/single_node/qwen3.5_fp8_b200_vllm.sh	Adds Qwen3.5 vLLM script (B200).
benchmarks/single_node/qwen3.5_fp8_b200_sglang.sh	Adds Qwen3.5 SGLang script (B200).
benchmarks/single_node/gptosstriattn_fp4_h200_vllm.sh	Adds TriAttention vLLM benchmark script for GPT-OSS (H200).
benchmarks/single_node/gptosstriattn_fp4_h100_vllm.sh	Adds TriAttention vLLM benchmark script for GPT-OSS (H100).
benchmarks/single_node/gptoss_fp4_h200_sglang.sh	Adds GPT-OSS SGLang script (H200).
benchmarks/single_node/gptoss_fp4_h200.sh	Updates GPT-OSS H200 script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/gptoss_fp4_h100_sglang.sh	Adds GPT-OSS SGLang script (H100).
benchmarks/single_node/gptoss_fp4_h100.sh	Updates GPT-OSS H100 script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/gptoss_fp4_b200_sglang.sh	Adds GPT-OSS SGLang script (B200).
benchmarks/single_node/gptoss_fp4_b200.sh	Updates GPT-OSS B200 script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/dsr1triattn_fp8_h200_vllm.sh	Adds TriAttention vLLM benchmark script for DSR1 (H200).
benchmarks/single_node/dsr1triattn_fp8_h100_vllm.sh	Adds TriAttention vLLM benchmark script for DSR1 (H100).
benchmarks/single_node/dsr1_fp8_h200_vllm.sh	Adds DSR1 vLLM script (H200).
benchmarks/single_node/dsr1_fp8_h200.sh	Updates DSR1 H200 SGLang script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/dsr1_fp8_b200_vllm.sh	Adds DSR1 vLLM script (B200).
benchmarks/single_node/dsr1_fp8_b200.sh	Updates DSR1 B200 SGLang script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/dsr1_fp4_b200.sh	Updates DSR1 FP4 B200 SGLang script to be ISB1-aware and align to run_single_node_benchmark.
.gitignore	Adds ignores for macOS metadata + local prompt exports + .claude.
.github/workflows/run-isb1-kv-stress-sweep.yml	Adds workflow_dispatch sweep driver for ISB1 KV-stress matrix runs.
.github/workflows/collect-results.yml	Adds ISB1-specific summary + gating report generation and uploads.
.github/configs/isb1-qwen-1m-preview.yaml	Adds a manual-only gated config for 1M Qwen preview runs.
.github/configs/isb1-kv-stress.yaml	Adds dedicated KV-stress sweep config (separate from isb1-master).
.gitattributes	Tracks ISB1 export JSON under Git LFS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…races Add ISB-1 (Inference Stress Benchmark) — a multi-turn, long-context KV cache stress testing dataset for InferenceX V3. ## What this adds **35 synthetic multi-turn traces** across 7 context bands (8K → 1M+ tokens): - 6 workload families: long_chat, coding, agent, rag, cache_stress, multimodal - KV stress patterns: prefix reuse, offload cliff, compaction, reactivation, fanout - Real conversation content with 60-95% prefix overlap (enables prefix cache testing) - Context assets from 15KB to 6.6MB inlined into traces for honest token counts **Export bundles** for vLLM + SGLang replay: - extension_131k: DeepSeek-R1, GPT-OSS, Qwen 3.5 (H200/B200) - preview/long_context_500k: Qwen 3.5 500K context stress test - preview/long_context_1m: Qwen 3.5 1M context stress test **10 KV stress sweep configs** (isb1-kv-stress-pr993.yaml): - 3 models × 2 GPUs × 2 engines - Sweep: 2→256 concurrent users × on/off/noprefix offload modes × 1800s ## Coexistence with kv-cache-tester This dataset complements PR SemiAnalysisAI#993's kv-cache-tester (522 real Claude Code traces): - kv-cache-tester: real workload distribution, natural performance profile - ISB1: controlled KV stress patterns that force offload cliffs and cache pressure No files in experimental/multiturn/ are modified. Separate config files, separate data directory (datasets/isb1/), shared replay infrastructure. ## Benchmark infrastructure - benchmark_export_replay.py: replay harness with actual_context_len telemetry - process_result_isb1.py: result aggregation with KV metrics - Prometheus metrics: kv_cache_usage, prefix_cache_hits, kv_offload_bytes - Pareto frontier: throughput vs p99 TTFT at each concurrency level

- Keep only configs whose (runtime, hardware, model) triples exist in the export files — eliminates sweep generator failures - Fix canonical-model-id to match export metadata (e.g., gpt_oss_120b not gptoss) - Fix support-status to match export tiers (reviewed_preview vs unsupported) - Remove configs for engines/GPUs not yet in exports (SGLang, Dynamo, TRT, Atom, AMD) — these need export metadata updates before they can be added back - Add workload-type field required by sweep generator schema - Remove disagg/multinode fields not in KV stress schema Sweep generator now passes: exit code 0, produces valid matrix rows.

cquil11 · 2026-04-16T13:49:22Z

Some good stuff in here. Will collab async on this one and take some stuff from this PR into experimental/agentic-benchmark MVP.

…mbos Export metadata now includes all valid (runtime, hardware, model) triples from nvidia-master.yaml + amd-master.yaml: - 8 runtimes: vllm, sglang, trt, atom, sglang-disagg, dynamo-* - 9 GPU types: H100, H200, B200, B300, GB200, GB300, MI300X, MI325X, MI355X - 6 models: DSR1, GPT-OSS, Qwen 3.5, GLM-5, Kimi K2.5, MiniMax M2.5 87 KV stress configs with correct canonical-model-id and support-status matching export metadata. Sweep generator passes (exit code 0). MI355X configs sweep to 512 concurrent users (288GB HBM advantage).

…prefix-aware replay Final closure pass landing PR#1032 end-to-end for SLURM + InferenceX + kv-cache-tester across every (runtime, hardware, canonical-model) triple currently in the export metadata. Sweep configs: - Rename isb1-kv-stress-pr993.yaml -> isb1-kv-stress.yaml - Rewrite isb1-master / isb1-triattn-preview / isb1-qwen-1m-preview: drop/demote dead stanzas, flatten paths (strip /vllm//sglang/ subdirs and __vllm/__sglang suffixes), repoint qwen3.5 to _qwen3.5 basename - isb1-master shrinks 1723 -> 863 lines (50 -> 26 stanzas); 1M preview drops the vllm stanza (sglang-only in reality) - All produced rows resolve to real bundle cells at declared tier Manifests -> manifest_version 0.2.0 with single-bundle exports for preview/long_context_500k (gptoss + qwen3.5) and preview/long_context_1m. Consumer replay (utils/bench_serving/benchmark_export_replay.py): hydrate v0.2.0 prefix-aware bundles — thin per-cell deltas join a shared workload prefix via prefix_ref, LRU-cached (max 8) across cells in the same bundle. Pre-0.2.0 bundles replay unchanged. Producer-sync verifier (utils/verify_producer_sync.py): extend coverage to core + extension_32k + extension_64k; silently skip subtrees absent on both sides, report asymmetric ones. Docs: COEXISTENCE_WITH_KV_CACHE_TESTER + both preview READMEs updated with flat paths, new config name, and the sglang-only preview reality. Tests: 262/262 pass across utils/ (107 sweep-config + new test_benchmark_export_replay.py for the prefix-aware consumer + test_verify_producer_sync.py for broadened verifier coverage).

… clean support vocabulary README.md: - Remove dead links to docs removed in 5f6aba7 (COVERAGE_AUDIT, LONG_CONTEXT_TRUTH_MATRIX, SUPPORT_MATRIX, RUNBOOKs, INVESTIGATION) - Replace stale 50-export-files count with post-flatten per-subtree inventory (23 bundles + 3 manifests = 26 total, consolidating framework-specific variants into flat single files) - Add explicit five-class support-status vocabulary section - Keep safe/unsafe claim boundary COEXISTENCE_WITH_KV_CACHE_TESTER.md: - Strip planning/negotiation sections (Recommended PR Structure and maintainer-request list) — not coexistence-technical - Replace possessive references with PR-number references throughout (kv-cache-tester -> PR SemiAnalysisAI#993, ISB1 -> PR SemiAnalysisAI#1032) - Update data-directory layout to show flat paths - Update ISB1 workflow name to run-isb1-kv-stress-sweep.yml - Add support-status vocabulary section GMI_EXECUTION_PLAN.md: - Prepend support-status framing (reviewed_preview, dataset_replay_verified, not live-serving certification) - Fix stale nested paths to flat: extension_131k/vllm/ -> extension_131k/ - Fix preview bundle names: strip __vllm/__sglang suffixes - Update final result-pipeline sentence to cite actual analyzer scripts

OCWC22 · 2026-04-17T06:11:25Z

Merge-sweep closure summary

Head is now c96d6a56 with 5 commits. Branch is structurally ready for review or cherry-pick.

What landed in this sweep

fa132a75 — path flattening (strip /vllm/, /sglang/ subdirectories and __vllm.json/__sglang.json bundle-name suffixes), v0.2.0 prefix-aware manifests, prefix-aware consumer replay (benchmark_export_replay.py hydrates delta bundles via prefix_ref + LRU cache), producer-sync verifier broadened to cover all six export roots, test suite green (156/156 across utils/).
c96d6a56 — doc tightening (post-flatten paths, accurate post-flatten inventory counts, dead doc links removed, explicit five-class support-status vocabulary section added to README.md and COEXISTENCE_WITH_KV_CACHE_TESTER.md).

Copilot review

Only inline comment was on utils/verify_producer_sync.py:27 — requested expanding RELEVANT_SUBTREES beyond extension_131k + previews. Addressed in fa132a75 (now covers core/, extension_32k/, extension_64k/, extension_131k/, preview/long_context_500k/, preview/long_context_1m/). Reply posted on the thread.

Producer / consumer sync status

utils/verify_producer_sync.py run against the in-repo Inferscope producer staging tree reports expected drift: consumer is on the new flat layout, producer staging is still on the old nested vllm//sglang/ layout. The verifier classifies this correctly as missing_in_consumer / extra_in_consumer rows and will report clean once the upstream regen lands. This is Inferscope-side work, out of scope for #1032.

Scope boundary

No files under experimental/multiturn/vllm_benchmark/ are modified — kv-cache-tester lanes ([WIP][experimental] add agentic trace replay benchmark infrastructure #993) are untouched.
No live-serving certification claims anywhere; all ISB1 surfaces are bounded to dataset_replay_verified.
Preview lanes stay at reviewed_preview (500K) or gated (1M, consumed only via isb1-qwen-1m-preview.yaml).
Experimental LMCache launchers under experimental/multiturn/vllm_benchmark/launch/ stay experimental and are not promoted.

Understood on the async collab direction

Re the earlier comment about taking pieces into an experimental/agentic-benchmark MVP — happy for that to be the path. Commits here are scoped to be independently liftable:

Cherry-pick target	Commits
Sweep-config + schema additions	`bbc91bc5`, `cff850b1`, `fec48557`
Flat export layout + v0.2.0 prefix-aware replay	`fa132a75`
Support-vocabulary docs	`c96d6a56`

Flagging @cquil11 for visibility on the head state. Happy to rebase, split further, or re-scope against the MVP branch once you've picked which pieces fit.

…istries + hard gate Extends the ISB1 replay result schema with a backward-compatible set of optional fields so every row declares which optimization technique it exercises (baseline, kv_quantization, kv_compression, compressed_attention, speculative_decoding) and which quality benchmark backs any lossy-technique claim. A hard gate then prevents a row from being labeled support_status= supported for a lossy technique unless a registered quality benchmark has completed. Follow-up to PR SemiAnalysisAI#1032. All new fields default to NULL (mechanism defaults to "baseline") so pre-existing rows, configs, and SQLite databases are unaffected until they opt into the mechanism_eval vocabulary. The database migration is idempotent; legacy schemas upgrade in place on first connect_db(). New files: - utils/mechanism_eval.py Env-driven field catalog (14 fields), registry loaders, validation helpers, and the row_requires_completed_quality_eval predicate. - datasets/isb1/registry/mechanism_variant_registry.json 9 registered mechanism/variant pairs covering baseline, fp8_e4m3, turboquant_class, kvtc_class, triattention_class, mtp, eagle3, medusa, dflash. - datasets/isb1/registry/quality_eval_registry.json 4 registered quality benchmarks: ruler_v1, longbench_v2, humaneval, math_500. - .github/configs/isb1-mechanism-baseline.yaml DSR1 (H100) and Qwen3.5 (B200) baseline cells. - .github/configs/isb1-mechanism-fp8-kv.yaml Same two cells with FP8 E4M3 KV quantization, wired to ruler_v1 and held at reviewed_preview until the RULER run completes (the gate blocks promotion to supported without it). - .github/workflows/run-isb1-mechanism-eval.yml Dispatch workflow routing mechanism configs through benchmark-isb1-tmpl. - utils/test_mechanism_eval.py (13 tests). - utils/test_process_result_isb1_mechanism.py (3 subprocess tests). Extended files: - utils/process_result_isb1.py — emits 14 mechanism fields + a mechanism_eval_validation record attached to every processed row. - utils/gate_isb1.py — new mechanism_compression_quality gate enforcing: (1) any non-baseline mechanism_variant must resolve in the registry; (2) quality_eval_status in {pending, completed, failed, not_required}; (3) supported + compression mechanism ⇒ quality_eval_status == completed with a registered quality_eval_id; (4) speculative_decoding ⇒ draft_model_id + speculative_acceptance_rate. - datasets/isb1/scripts/isb1_results_db.py — 16 additive ALTER TABLE migrations plus matching SCHEMA_SQL, INSERT_COLUMNS, GROUPABLE_COLUMNS, and CLI ingest flags. - utils/test_gate_isb1.py — 7 new mechanism-gate tests. Full suite: 285 passed, 2 pre-existing warnings. References — public literature the registries are grounded in: KV cache quantization (mechanism: kv_quantization) - fp8_e4m3: Micikevicius et al., "FP8 Formats for Deep Learning" (NVIDIA/Intel/Arm, 2022), arXiv:2209.05433. Defines the E4M3/E5M2 formats used by engine-native FP8 KV paths in vLLM and SGLang. - turboquant_class: umbrella slot for Hadamard-rotated 4-bit KV schemes; Hooper et al., "KVQuant", 2024, arXiv:2401.18079, is a representative reference. Specific implementation citations travel with each submitted row via mechanism_notes. KV cache compression (mechanism: kv_compression) - kvtc_class: umbrella slot for tensor-codebook / product-quantization KV compressors. The class label reflects the architecture pattern; each submitted row cites its specific implementation. Compressed attention (mechanism: compressed_attention) - triattention_class: umbrella slot for sparse-/hybrid-attention variants that change the attention-computation surface rather than the stored KV format. Speculative decoding (mechanism: speculative_decoding) - mtp: Multi-Token Prediction head as used at scale in DeepSeek-V3 (DeepSeek-AI, 2024, arXiv:2412.19437). - eagle3: EAGLE-family speculative decoding (Li et al., original EAGLE, 2024, arXiv:2401.15077; EAGLE-2 and EAGLE-3 are subsequent iterations of the same draft-model recipe). - medusa: Cai et al., "Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads", 2024, arXiv:2401.10774. - dflash: umbrella slot for DeepFlash-style draft stacks. Quality benchmarks (quality_eval_registry.json) - ruler_v1: Hsieh et al., "RULER: What's the Real Context Size of Your Long-Context Language Models?" (NVIDIA, 2024), arXiv:2404.06654. Primary long-context retrieval signal for KV quantization and compression at 32K–1M. - longbench_v2: Bai et al., "LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks" (THUDM, 2024), arXiv:2412.15204. Complements RULER for reasoning-heavy long-context workloads. - humaneval: Chen et al., "Evaluating Large Language Models Trained on Code" (OpenAI Codex paper, 2021), arXiv:2107.03374. - math_500: 500-problem subset of the MATH dataset (Hendrycks et al., "Measuring Mathematical Problem Solving With the MATH Dataset", 2021, arXiv:2103.03874). Detects chain-of-thought degradation from aggressive KV quantization — the specific failure mode the hard gate is designed to catch.

…2.0 manifests, prefix-aware replay Final closure pass landing PR SemiAnalysisAI#1032 end-to-end across every (runtime, hardware, canonical-model) triple currently in the export metadata. Sweep configs: - Consolidate the sweep config under its canonical name isb1-kv-stress.yaml - Rewrite isb1-master / isb1-triattn-preview / isb1-qwen-1m-preview: drop/demote dead stanzas, flatten paths (strip /vllm//sglang/ subdirs and __vllm/__sglang suffixes), repoint qwen3.5 to _qwen3.5 basename - isb1-master shrinks 1723 -> 863 lines (50 -> 26 stanzas); 1M preview drops the vllm stanza (sglang-only in reality) - All produced rows resolve to real bundle cells at declared tier Manifests -> manifest_version 0.2.0 with single-bundle exports for preview/long_context_500k (gptoss + qwen3.5) and preview/long_context_1m. Consumer replay (utils/bench_serving/benchmark_export_replay.py): hydrate v0.2.0 prefix-aware bundles — thin per-cell deltas join a shared workload prefix via prefix_ref, LRU-cached (max 8) across cells in the same bundle. Pre-0.2.0 bundles replay unchanged. Producer-sync verifier (utils/verify_producer_sync.py): extend coverage to core + extension_32k + extension_64k; silently skip subtrees absent on both sides, report asymmetric ones. Docs: coexistence and both preview READMEs updated with flat paths, canonical config name, and the sglang-only preview reality. Tests: 262/262 pass across utils/ (107 sweep-config + new test_benchmark_export_replay.py for the prefix-aware consumer + test_verify_producer_sync.py for broadened verifier coverage).

…istries + hard gate Extends the ISB1 replay result schema with a backward-compatible set of optional fields so every row declares which optimization technique it exercises (baseline, kv_quantization, kv_compression, compressed_attention, speculative_decoding) and which quality benchmark backs any lossy-technique claim. A hard gate then prevents a row from being labeled support_status= supported for a lossy technique unless a registered quality benchmark has completed. Follow-up to PR SemiAnalysisAI#1032. All new fields default to NULL (mechanism defaults to "baseline") so pre-existing rows, configs, and SQLite databases are unaffected until they opt into the mechanism_eval vocabulary. The database migration is idempotent; legacy schemas upgrade in place on first connect_db(). New files: - utils/mechanism_eval.py Env-driven field catalog (14 fields), registry loaders, validation helpers, and the row_requires_completed_quality_eval predicate. - datasets/isb1/registry/mechanism_variant_registry.json 9 registered mechanism/variant pairs covering baseline, fp8_e4m3, turboquant_class, kvtc_class, triattention_class, mtp, eagle3, medusa, dflash. - datasets/isb1/registry/quality_eval_registry.json 4 registered quality benchmarks: ruler_v1, longbench_v2, humaneval, math_500. - .github/configs/isb1-mechanism-baseline.yaml DSR1 (H100) and Qwen3.5 (B200) baseline cells. - .github/configs/isb1-mechanism-fp8-kv.yaml Same two cells with FP8 E4M3 KV quantization, wired to ruler_v1 and held at reviewed_preview until the RULER run completes (the gate blocks promotion to supported without it). - .github/workflows/run-isb1-mechanism-eval.yml Dispatch workflow routing mechanism configs through benchmark-isb1-tmpl. - utils/test_mechanism_eval.py (13 tests). - utils/test_process_result_isb1_mechanism.py (3 subprocess tests). Extended files: - utils/process_result_isb1.py — emits 14 mechanism fields + a mechanism_eval_validation record attached to every processed row. - utils/gate_isb1.py — new mechanism_compression_quality gate enforcing: (1) any non-baseline mechanism_variant must resolve in the registry; (2) quality_eval_status in {pending, completed, failed, not_required}; (3) supported + compression mechanism ⇒ quality_eval_status == completed with a registered quality_eval_id; (4) speculative_decoding ⇒ draft_model_id + speculative_acceptance_rate. - datasets/isb1/scripts/isb1_results_db.py — 16 additive ALTER TABLE migrations plus matching SCHEMA_SQL, INSERT_COLUMNS, GROUPABLE_COLUMNS, and CLI ingest flags. - utils/test_gate_isb1.py — 7 new mechanism-gate tests. Full suite: 285 passed, 2 pre-existing warnings. References — public literature the registries are grounded in: KV cache quantization (mechanism: kv_quantization) - fp8_e4m3: Micikevicius et al., "FP8 Formats for Deep Learning" (NVIDIA/Intel/Arm, 2022), arXiv:2209.05433. Defines the E4M3/E5M2 formats used by engine-native FP8 KV paths in vLLM and SGLang. - turboquant_class: umbrella slot for Hadamard-rotated 4-bit KV schemes; Hooper et al., "KVQuant", 2024, arXiv:2401.18079, is a representative reference. Specific implementation citations travel with each submitted row via mechanism_notes. KV cache compression (mechanism: kv_compression) - kvtc_class: umbrella slot for tensor-codebook / product-quantization KV compressors. The class label reflects the architecture pattern; each submitted row cites its specific implementation. Compressed attention (mechanism: compressed_attention) - triattention_class: umbrella slot for sparse-/hybrid-attention variants that change the attention-computation surface rather than the stored KV format. Speculative decoding (mechanism: speculative_decoding) - mtp: Multi-Token Prediction head as used at scale in DeepSeek-V3 (DeepSeek-AI, 2024, arXiv:2412.19437). - eagle3: EAGLE-family speculative decoding (Li et al., original EAGLE, 2024, arXiv:2401.15077; EAGLE-2 and EAGLE-3 are subsequent iterations of the same draft-model recipe). - medusa: Cai et al., "Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads", 2024, arXiv:2401.10774. - dflash: umbrella slot for DeepFlash-style draft stacks. Quality benchmarks (quality_eval_registry.json) - ruler_v1: Hsieh et al., "RULER: What's the Real Context Size of Your Long-Context Language Models?" (NVIDIA, 2024), arXiv:2404.06654. Primary long-context retrieval signal for KV quantization and compression at 32K–1M. - longbench_v2: Bai et al., "LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks" (THUDM, 2024), arXiv:2412.15204. Complements RULER for reasoning-heavy long-context workloads. - humaneval: Chen et al., "Evaluating Large Language Models Trained on Code" (OpenAI Codex paper, 2021), arXiv:2107.03374. - math_500: 500-problem subset of the MATH dataset (Hendrycks et al., "Measuring Mathematical Problem Solving With the MATH Dataset", 2021, arXiv:2103.03874). Detects chain-of-thought degradation from aggressive KV quantization — the specific failure mode the hard gate is designed to catch.

Trim this branch to ISB1 data exports + processing/replay contract files only.\n\nRemoved non-scope changes from this PR branch (workflows/configs, benchmark runners/scripts, GMI harness docs/scripts, experimental multiturn assets, and auxiliary ISB1 tooling), preserving them on fork-only bookmark branches:\n- isb1/kv-stress-tooling\n- isb1/agentic-benchmark-runners\n- isb1/gmi-harness\n\nThis keeps upstream cherry-pick review focused on dataset exports and contract guards.

OCWC22 · 2026-04-20T21:36:13Z

@cquil11 Thanks for the cherry-pick guidance (not merge-as-a-whole) — I trimmed this PR accordingly.

New scope: ISB1 data + contract only (datasets/isb1 exports + README/LFS attrs, and replay/process_result ISB1 guard/tests).

Fork-only preservation branches (OCWC22/InferenceX):

isb1/kv-stress-tooling — gate/summarize/producer-sync + matrix/workflow/config tooling kept intact.
isb1/agentic-benchmark-runners — runner launch + single_node benchmark script set for experimental/agentic-benchmark follow-up.
isb1/gmi-harness — datasets/isb1/scripts GMI harness + experimental/multiturn + internal harness configs/docs.

If this reduced slice looks good, could you review/cherry-pick from this branch first and we can follow with focused PRs from the bookmarks?

…sAI#1032

…ntic-benchmark Moving off PR SemiAnalysisAI#1032 — per data-PR narrowing, per-GPU recipe cells do not belong in the ISB1 data-contribution diff. Parked here on the fork for possible follow-up contribution to experimental/agentic-benchmark when that branch exists upstream. No upstream PR opened from this branch.

…a+contract only Second trim pass. Reverts 12 consortium-owned files to merge-base state and removes 1 net-new per-GPU recipe: - benchmarks/single_node/qwen3.5_{bf16,fp8}_mi{300x,325x,355x}.sh (reverted) - benchmarks/single_node/qwen3.5_fp8_b300_mtp.sh (removed — preserved on fork branch isb1/agentic-benchmark-runners) - runners/launch_b300-nv.sh (reverted) - .github/configs/{amd,nvidia}-master.yaml (reverted) - .github/workflows/{benchmark-tmpl,pr-recipe-reminder}.yml (reverted) - perf-changelog.yaml (reverted) Rationale: per-GPU recipe cells and cross-cutting CI config are owned by AMD/NVIDIA contributors, not by a data-contribution PR. Matches Cam's cherry-pick-not-merge guidance and InferenceX consortium ownership model. Remaining PR scope: datasets/isb1/** + utils/** (replay contract + process_result ISB1 guard + tests) + top-level .gitattributes.

OCWC22 · 2026-04-20T22:43:42Z

Second trim pass landed (3c2d0039). PR is now 37 files / +4,457 / 0.

Reverted or removed all cross-cutting / per-GPU-cell edits:

benchmarks/single_node/qwen3.5_{bf16,fp8}_mi{300x,325x,355x}.sh — reverted to merge-base
benchmarks/single_node/qwen3.5_fp8_b300_mtp.sh — removed (preserved on fork branch isb1/agentic-benchmark-runners)
runners/launch_b300-nv.sh — reverted to merge-base
.github/configs/{amd,nvidia}-master.yaml — reverted to merge-base
.github/workflows/{benchmark-tmpl,pr-recipe-reminder}.yml — reverted to merge-base
perf-changelog.yaml — reverted to merge-base

Remaining scope is strictly data + contract:

datasets/isb1/** — 20 trace JSONs + 2 preview READMEs + 3 manifests + top-level README + LFS attrs
utils/bench_serving/benchmark_export_replay.py + tests — replay contract
utils/process_result_isb1.py + process_result.py guard + tests
top-level .gitattributes — LFS routing

No consortium-owned files touched, no CI edits, no per-GPU recipe edits. Zero net deletions against main. Should be trivially cherry-pickable into experimental/agentic-benchmark per your earlier guidance.

Provide a zero-dependency bridge that converts ISB1 multiturn and trace_replay bundles into Cam's kv-cache-tester trace JSON format (prefix-extending hash_ids for KV cache hit computation). Covers both bundle shapes, hydrates schema 0.2.0 prefix_ref sidecars, and ships 15 contract tests (0.61s). Also fixes a Git LFS attribute precedence bug where the inner .gitattributes silently overrode the root rule, making `git lfs pull --include` a no-op for datasets/isb1/exports/**/*.json. - tools/isb1_to_kvcache_tester.py (+771) - tools/test_isb1_to_kvcache_tester.py (+412) - datasets/isb1/.gitattributes: enable LFS filter on exports/**/*.json - datasets/isb1/README.md: how-to-consume, smoke-test, HF publication recipe Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…v-cache-tester (PR SemiAnalysisAI#993) PR SemiAnalysisAI#1032 keeps only data + the conversion shim; Cam's kv-cache-tester submodule on PR SemiAnalysisAI#993 owns replay. Delete: - utils/bench_serving/benchmark_export_replay.py - utils/process_result_isb1.py - utils/test_benchmark_export_replay.py - utils/test_process_result_isb1.py Revert to upstream/main: - utils/process_result.py - utils/test_process_result.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Pulls 55 upstream commits published on SemiAnalysisAI/InferenceX:main since PR SemiAnalysisAI#1032 was opened. Zero conflicts; none touch tools/ or datasets/isb1/. Purpose: modernize PR base before Cam review and absorb upstream fork-drift reductions. Notable upstream work picked up: - MiniMax M2.5 MXFP4 MI355X + B300 configs - GLM5.1 FP4 MI355X support - GPT-OSS FP4 TP=8 conc=1 extension (SemiAnalysisAI#1096) - H200 multinode evals (SemiAnalysisAI#1000) - B300 configs for Kimi K2.5, DSR1, Qwen3.5 - Parallel random data generation (SemiAnalysisAI#1094) - KNOWN_LIMITATION.md updates Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Fold Track A into PR 1032. Consumers now point Cam's trace_replay_tester.py directly at datasets/isb1/converted/ with no conversion step: python $KV_CACHE_TESTER_DIR/trace_replay_tester.py --trace-directory datasets/isb1/converted/ --tokenizer Qwen/Qwen2.5-Coder-32B-Instruct --block-size 64 179 traces across 23 bundles span 6 context scales (8k/32k/64k/131k/500k/1M) and multi-model coverage (Kimi K2.5, DSR1, GPT-OSS, Qwen3.5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ysisAI#993) Schema-parity sibling of .github/configs/multiturn-agentic-trace.yaml with 16 ISB1 sweep cells across H200/B200/MI355X/H100 × multi-scale workloads (8k/32k/131k/500k-preview/1M-preview) × multi-model (Qwen3.5, DSR1). Follows Cam's exact tp<N> / users / offload / ep schema. Consumers either merge these top-level keys into multiturn-agentic-trace.yaml or extend the sweep loader to glob multiturn-agentic-trace*.yaml. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

tools/validate_kvcache_tester_trace.py — stdlib-only CLI that validates any trace JSON against Cam's kv-cache-tester schema: required keys, block_size consistency, prefix-extending hash_ids, per-request fields. Runs against single files or directories; exit code 1 on any failure. Catches schema drift before submissions reach the sweep. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Auto-generated index with per-trace metadata: scale band, workload family, model family, token totals, and approximate cache hit rate (computed via Cam's normalize_trace walker). Enables sweep configs to filter or select trace subsets by metadata without loading every file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…> path datasets/isb1/HF_PUBLISH.md walks through publishing datasets/isb1/converted/ to Hugging Face at semianalysisai/isb1-cc-traces so Cam's trace_replay scripts can load ISB1 via TRACE_DIR=hf_semianalysisai--isb1-cc-traces with zero changes to his shell scripts (hf_<org>--<repo> handling at benchmarks/single_node/multiturn_fp4_b200_trace_replay.sh lines 54-58). Includes dataset card template, upload command, versioning recipe, and post-upload verification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add the HF dataset card, dry-run-safe publisher script, and publish runbook for the converted ISB-1 trace corpus. This packages the zero-friction consumer path via TRACE_DIR=hf_<org>--<repo> and implements PR A from the investigation report without changing Cam's harness.

PR B (kv-cache-tester lane): - Extend offload: ["on","off"] to ["on","off","noprefix"] on every H200 fp8 Qwen3 cell and the H100 fp8 Qwen3 lmcache cell. - Document the three offload values in the header comment so the sweep generator emits noprefix cells alongside on/off. - Cam's multiturn_fp8_h100_lmcache_aiperf.sh:123-126 already wires --no-enable-prefix-caching; noprefix just lets the sweep invoke it. PR D (DSR1 131k HF trace_replay): - New cell b200-fp4-dsr1-isb1-code-131k-hf pointing at the freshly published HF dataset (wchen22/isb1-cc-traces, alias hf_wchen22--isb1-cc-traces). Pairs with --no-max-tokens (a01b775). - Header comment extended to document HF alias as a valid TRACE_DIR. HF publish gotcha: - HF_PUBLISH.md gains a "Python version" section (new §3): the publisher needs huggingface_hub which requires Python >= 3.10. macOS system python3 (3.9) will fail with ModuleNotFoundError. Prefer /opt/homebrew/opt/python@3.13/bin/python3.13. - Remaining section numbers shifted +1 accordingly.

Add pasteable manual GMI Cloud quickstarts for GB200 (Blackwell, FP4 DSR1 template) and H100 (Hopper, FP8 Qwen3 template) mirroring the existing H200 quickstart shape. Extend the operator-only YAML stub with matching gb200/h100 reference rows. - runners/GMI_QUICKSTART_GB200.md (149 lines): 1-2 node GB200 path with Blackwell FP4 DSR1 model default, --cpu-offload-gb 60, users [1,2,4,8,16] x offload [on,off,noprefix] sweep. Cells: code-8k, chat-32k, code-131k. - runners/GMI_QUICKSTART_H100.md (148 lines): 1-2 node H100 path with FP8 default (operator picks MODEL env var), --cpu-offload-gb 20, same sweep shape. Operator must verify model fits TP on 80GB HBM3. - .github/configs/multiturn-agentic-trace-isb1.yaml: add gb200-fp4-dsr1-isb1-gmi-reference and h100-fp8-qwen3-isb1-gmi-reference rows. Still NOT CI-dispatched. Model choice is env-driven (matches Cam's upstream multiturn_fp8_{h100_lmcache,h200_trace}_aiperf.sh script contract); pick MODEL to fit TP on the actual VRAM ceiling. 80GB H100 at TP4 with large FP8 models may not fit; switch to TP8 or smaller variant. Refs: shipped ISB1 sweep/data commit 38fd91a (PR SemiAnalysisAI#1032) Refs: mooncake exporter commit b31f7c1 (fork PR #2) Refs: H200 runbook commit d62899e

OCWC22 requested review from a team and Copilot April 15, 2026 04:42

github-project-automation Bot added this to InferenceMAX Board Apr 15, 2026

claude Bot reviewed Apr 15, 2026

View reviewed changes

Copilot started reviewing on behalf of OCWC22 April 15, 2026 04:43 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread utils/verify_producer_sync.py Outdated

OCWC22 force-pushed the isb1/kv-cache-stress-benchmark branch 5 times, most recently from af64122 to 1b9b79c Compare April 15, 2026 08:35

cquil11 changed the title ~~feat: add multi-turn KV cache stress benchmark traces~~ [experimentak] add multi-turn KV cache stress benchmark traces Apr 15, 2026

OCWC22 force-pushed the isb1/kv-cache-stress-benchmark branch from 1b9b79c to ef90b64 Compare April 15, 2026 21:52

OCWC22 changed the title ~~[experimentak] add multi-turn KV cache stress benchmark traces~~ [experimental] add multi-turn KV cache stress benchmark traces Apr 15, 2026

OCWC22 force-pushed the isb1/kv-cache-stress-benchmark branch from ef90b64 to fbe9f79 Compare April 15, 2026 22:36

OCWC22 added 2 commits April 16, 2026 12:37

OCWC22 force-pushed the isb1/kv-cache-stress-benchmark branch 2 times, most recently from 57cbf1b to fa132a7 Compare April 17, 2026 05:55

OCWC22 mentioned this pull request Apr 17, 2026

[experimental] feat(isb1): mechanism_eval schema — registries + hard gate for compression quality #1052

Closed

4 tasks

OCWC22 mentioned this pull request Apr 17, 2026

[experimental] feat(isb1): mechanism_eval schema — registries + hard gate for compression quality OCWC22/InferenceX#1

Open

7 tasks

OCWC22 requested review from 1am9trash, billishyahao, chunfangamd, jgangani, kedarpotdar-nv, seungrokj and yctseng0211 as code owners April 20, 2026 21:35

OCWC22 added 2 commits April 20, 2026 14:37

chore(isb1): drop remaining non-scope infra files from PR SemiAnalysi…

3c10c05

…sAI#1032

chore(isb1): drop remaining non-scope files relative to PR base

638e62a

OCWC22 and others added 8 commits April 20, 2026 21:36

OCWC22 mentioned this pull request Apr 21, 2026

aiperf: second independent mooncake corpus for ISB-1 (opt-in) OCWC22/InferenceX#2

Open

OCWC22 changed the title ~~[experimental] add multi-turn KV cache stress benchmark traces~~ [isb1] add converted trace corpus + kv-cache-tester contract helpers Apr 21, 2026

OCWC22 added 2 commits April 21, 2026 11:33

OCWC22 mentioned this pull request Apr 22, 2026

Add opt-in KV offload sweep, probe, and operator playbook OCWC22/InferenceX#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[isb1] add converted trace corpus + kv-cache-tester contract helpers#1032

[isb1] add converted trace corpus + kv-cache-tester contract helpers#1032
OCWC22 wants to merge 19 commits intoSemiAnalysisAI:mainfrom
OCWC22:isb1/kv-cache-stress-benchmark

OCWC22 commented Apr 15, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

cquil11 commented Apr 16, 2026

Uh oh!

OCWC22 commented Apr 17, 2026

Uh oh!

OCWC22 commented Apr 20, 2026

Uh oh!

OCWC22 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

OCWC22 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Validation

Why this matters

Live HF dataset

What landed since initial review

Non-goals

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

cquil11 commented Apr 16, 2026

Uh oh!

OCWC22 commented Apr 17, 2026

Merge-sweep closure summary

What landed in this sweep

Copilot review

Producer / consumer sync status

Scope boundary

Understood on the async collab direction

Uh oh!

OCWC22 commented Apr 20, 2026

Uh oh!

OCWC22 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OCWC22 commented Apr 15, 2026 •

edited

Loading