aiperf: second independent mooncake corpus for ISB-1 (opt-in)#2
aiperf: second independent mooncake corpus for ISB-1 (opt-in)#2OCWC22 wants to merge 13 commits intoisb1/kv-cache-stress-benchmarkfrom
Conversation
Fold Track A into PR 1032. Consumers now point Cam's trace_replay_tester.py directly at datasets/isb1/converted/ with no conversion step: python $KV_CACHE_TESTER_DIR/trace_replay_tester.py --trace-directory datasets/isb1/converted/ --tokenizer Qwen/Qwen2.5-Coder-32B-Instruct --block-size 64 179 traces across 23 bundles span 6 context scales (8k/32k/64k/131k/500k/1M) and multi-model coverage (Kimi K2.5, DSR1, GPT-OSS, Qwen3.5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ysisAI#993) Schema-parity sibling of .github/configs/multiturn-agentic-trace.yaml with 16 ISB1 sweep cells across H200/B200/MI355X/H100 × multi-scale workloads (8k/32k/131k/500k-preview/1M-preview) × multi-model (Qwen3.5, DSR1). Follows Cam's exact tp<N> / users / offload / ep schema. Consumers either merge these top-level keys into multiturn-agentic-trace.yaml or extend the sweep loader to glob multiturn-agentic-trace*.yaml. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tools/validate_kvcache_tester_trace.py — stdlib-only CLI that validates any trace JSON against Cam's kv-cache-tester schema: required keys, block_size consistency, prefix-extending hash_ids, per-request fields. Runs against single files or directories; exit code 1 on any failure. Catches schema drift before submissions reach the sweep. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Auto-generated index with per-trace metadata: scale band, workload family, model family, token totals, and approximate cache hit rate (computed via Cam's normalize_trace walker). Enables sweep configs to filter or select trace subsets by metadata without loading every file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…> path datasets/isb1/HF_PUBLISH.md walks through publishing datasets/isb1/converted/ to Hugging Face at semianalysisai/isb1-cc-traces so Cam's trace_replay scripts can load ISB1 via TRACE_DIR=hf_semianalysisai--isb1-cc-traces with zero changes to his shell scripts (hf_<org>--<repo> handling at benchmarks/single_node/multiturn_fp4_b200_trace_replay.sh lines 54-58). Includes dataset card template, upload command, versioning recipe, and post-upload verification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
aiperf's pydantic MooncakeTrace requires the conversation list at `messages` (not `input`) and uses `delay` in milliseconds (not `pre_gap` in seconds). The original exporter would have failed aiperf's validation with `input_mode_count == 0` on every row and silently dropped pacing. - rename emitted field `input` -> `messages` - rename emitted field `pre_gap` (seconds) -> `delay` (milliseconds) - rename CLI flag --include-pre-gap -> --include-delay - rename helper _event_pre_gap -> _event_delay_ms (returns ms as float) - update validator REQUIRED_FIELDS and superset set - update tests (17/17 pass; multi-turn expected value 2500.0 ms) Verified on full 23-bundle re-export: 1226 rows, validator clean, row-2 of a multi-turn session emits delay=15000.0 ms as expected. Field reference: aiperf/src/aiperf/dataset/loader/models.py MooncakeTrace
17 bundles / 1142 rows / 22 sessions across core/extension_32k/\nextension_64k/extension_131k. Preview lanes (500k, 1m) deferred per\nv1 plan. Manifest tracks per-bundle size, session count, scale band,\nand workload family.\n\nRan the exporter once in directory mode against datasets/isb1/exports/\nand pruned preview output before staging so v1 ships only the\nnon-preview bundles. This keeps the directory-mode manifest filtering\nbehavior and avoids the glob-mode manifest-selection footgun noted in\nthe plan.
Schema mirrors multiturn-agentic-trace-isb1.yaml. TP x users x offload surfaces match the kv-cache-tester sibling. Preview lanes (500k, 1m) intentionally omitted in v1.
Docs-only recipe for pulling isb1/mooncake/ JSONLs through the upstream aiperf harness. Preserves legacy sammshen/lmcache-agentic-traces fallback when MOONCAKE_INPUT is unset.
Validates exporter output against the pinned aiperf loader without a GPU or inference server. Asserts row/session counts, delay values, raw_messages propagation, and cross-bundle session prefix survival.
There was a problem hiding this comment.
Pull request overview
Adds an ISB1 → aiperf mooncake_trace “sibling lane” (export/validate/test + dataset bundles + sweep config + operator docs) alongside the existing kv-cache-tester trace lane, so operators can run long-context KV stress via aiperf/mooncake without changing the core harness in this PR.
Changes:
- Introduces a stdlib-only ISB1→mooncake JSONL exporter, a JSONL validator, aiperf loader smoke script, and contract tests.
- Adds non-preview mooncake JSONL bundles + a manifest (with Git LFS wiring for large assets).
- Adds sweep config YAMLs and operator/HF publication docs; adds a stdlib validator for kv-cache-tester trace JSONs.
Reviewed changes
Copilot reviewed 209 out of 209 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/validate_mooncake_trace.py | New stdlib validator for mooncake_trace JSONL rows (file/dir/glob). |
| tools/validate_kvcache_tester_trace.py | New stdlib validator for kv-cache-tester trace JSON schema. |
| tools/test_isb1_to_mooncake_trace.py | Contract tests for the mooncake exporter CLI and row schema. |
| tools/smoke_aiperf_mooncake.py | Offline smoke script to validate aiperf MooncakeTraceDatasetLoader ingestion. |
| tools/isb1_to_mooncake_trace.py | New exporter converting ISB1 replay bundles into mooncake_trace JSONL. |
| datasets/isb1/mooncake/manifest.json | Manifest describing the shipped mooncake JSONL bundles and counts. |
| datasets/isb1/mooncake/core/chat_8k1k/isb1_core_chat_8k1k.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/core/chat_8k1k_qwen3.5/isb1_core_chat_8k1k_qwen3_5.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/core/code_8k1k/isb1_core_code_8k1k.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/core/code_8k1k_qwen3.5/isb1_core_code_8k1k_qwen3_5.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_32k/chat_32k1k/isb1_extension_32k_chat_32k1k.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_32k/chat_32k1k_qwen3.5/isb1_extension_32k_chat_32k1k_qwen3_5.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_32k/code_32k1k/isb1_extension_32k_code_32k1k.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_32k/code_32k1k_qwen3.5/isb1_extension_32k_code_32k1k_qwen3_5.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_64k/chat_64k1k/isb1_extension_64k_chat_64k1k.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_64k/chat_64k1k_qwen3.5/isb1_extension_64k_chat_64k1k_qwen3_5.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_64k/code_64k1k/isb1_extension_64k_code_64k1k.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_64k/code_64k1k_qwen3.5/isb1_extension_64k_code_64k1k_qwen3_5.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_131k/chat_131k1k/isb1_extension_131k_chat_131k1k.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_131k/chat_131k1k_dsr1/isb1_extension_131k_chat_131k1k_dsr1.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_131k/chat_131k1k_qwen3.5/isb1_extension_131k_chat_131k1k_qwen3_5.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_131k/code_131k1k/isb1_extension_131k_vllm_code_131k1k.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/mooncake/extension_131k/code_131k1k_qwen3.5/isb1_extension_131k_vllm_code_131k1k_qwen3_5.jsonl | Git LFS pointer for mooncake JSONL bundle. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__coding_qwen3.5_xlc2_500k_preview_v1/isb1_hb_depth_cache_xlc2_hot_cold_session_mix_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__coding_qwen3.5_xlc2_500k_preview_v1/isb1_hb_depth_cache_xlc2_hot_cold_session_mix_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__coding_qwen3.5_xlc2_500k_preview_v1/isb1_hb_depth_cache_xlc2_hot_cold_session_mix_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__coding_gptoss_xlc2_500k_preview_v1/isb1_sess_cache_xlc2_hot_cold_session_mix_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__coding_gptoss_xlc2_500k_preview_v1/isb1_sess_cache_xlc2_hot_cold_session_mix_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__coding_gptoss_xlc2_500k_preview_v1/isb1_sess_cache_xlc2_hot_cold_session_mix_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__chat_qwen3.5_xlc2_500k_preview_v1/isb1_hb_depth_cache_xlc2_hot_cold_session_mix_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__chat_qwen3.5_xlc2_500k_preview_v1/isb1_hb_depth_cache_xlc2_hot_cold_session_mix_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__chat_qwen3.5_xlc2_500k_preview_v1/isb1_hb_depth_cache_xlc2_hot_cold_session_mix_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__chat_gptoss_xlc2_500k_preview_v1/isb1_sess_cache_xlc2_hot_cold_session_mix_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__chat_gptoss_xlc2_500k_preview_v1/isb1_sess_cache_xlc2_hot_cold_session_mix_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_500k/inferencex_trace_replay__chat_gptoss_xlc2_500k_preview_v1/isb1_sess_cache_xlc2_hot_cold_session_mix_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_1m/inferencex_trace_replay__coding_qwen3.5_ulc2_1m_preview_v1/isb1_hb_depth_cache_ulc2_offload_cliff_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_1m/inferencex_trace_replay__coding_qwen3.5_ulc2_1m_preview_v1/isb1_hb_depth_cache_ulc2_offload_cliff_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_1m/inferencex_trace_replay__coding_qwen3.5_ulc2_1m_preview_v1/isb1_hb_depth_cache_ulc2_offload_cliff_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_1m/inferencex_trace_replay__chat_qwen3.5_ulc2_1m_preview_v1/isb1_hb_depth_cache_ulc2_offload_cliff_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_1m/inferencex_trace_replay__chat_qwen3.5_ulc2_1m_preview_v1/isb1_hb_depth_cache_ulc2_offload_cliff_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/preview/long_context_1m/inferencex_trace_replay__chat_qwen3.5_ulc2_1m_preview_v1/isb1_hb_depth_cache_ulc2_offload_cliff_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/manifest.json | Git LFS pointer for converted-traces manifest. |
| datasets/isb1/converted/extension_64k/code_64k1k_qwen3.5/isb1_sess_optimizer_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k_qwen3.5/isb1_sess_optimizer_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k_qwen3.5/isb1_sess_optimizer_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0003.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0004.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0005.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0006.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0007.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0008.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0009.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0010.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/code_64k1k/isb1_sess_optimizer_01_0011.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k_qwen3.5/isb1_sess_chat_lc3_multi_day_strategy_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k_qwen3.5/isb1_sess_chat_lc3_multi_day_strategy_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k_qwen3.5/isb1_sess_chat_lc3_multi_day_strategy_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0003.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0004.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0005.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0006.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0007.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0008.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0009.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0010.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_64k/chat_64k1k/isb1_sess_chat_lc3_multi_day_strategy_01_0011.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k_qwen3.5/isb1_sess_doc_comp_fanout_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k_qwen3.5/isb1_sess_doc_comp_fanout_01_0004.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k_qwen3.5/isb1_sess_doc_comp_fanout_01_0005.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k_qwen3.5/isb1_sess_2c2a96a7.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k_qwen3.5/isb1_sess_2c2a96a7_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k_qwen3.5/isb1_sess_2c2a96a7_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0013.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0014.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0015.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0016.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0017.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0018.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0019.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0020.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0021.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0022.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_doc_comp_fanout_01_0023.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0003.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0004.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0005.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0006.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0007.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0008.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0009.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0010.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/code_32k1k/isb1_sess_2c2a96a7_0011.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k_qwen3.5/isb1_sess_chat_lc2_resume_reasoning_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k_qwen3.5/isb1_sess_chat_lc2_resume_reasoning_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k_qwen3.5/isb1_sess_chat_lc2_resume_reasoning_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0003.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0004.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0005.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0006.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0007.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0008.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0009.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0010.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_32k/chat_32k1k/isb1_sess_chat_lc2_resume_reasoning_01_0011.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/code_131k1k_qwen3.5/isb1_hb_depth_cache_xlc1_text_shared_prefix_swarm_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/code_131k1k/isb1_sess_cache_xlc1_text_shared_prefix_swarm_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k_qwen3.5/isb1_sess_xlc1_text_resume_bridge_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k_qwen3.5/isb1_sess_xlc1_text_resume_bridge_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k_qwen3.5/isb1_sess_xlc1_text_resume_bridge_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k_dsr1/isb1_sess_xlc1_text_resume_bridge_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k_dsr1/isb1_sess_xlc1_text_resume_bridge_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k_dsr1/isb1_sess_xlc1_text_resume_bridge_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0003.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0004.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0005.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0006.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0007.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0008.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0009.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0010.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/extension_131k/chat_131k1k/isb1_sess_xlc1_text_resume_bridge_01_0011.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k_qwen3.5/isb1_sess_code_ca1_agent_benchmark_plan_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k_qwen3.5/isb1_sess_code_ca1_agent_benchmark_plan_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k_qwen3.5/isb1_sess_code_ca1_agent_benchmark_plan_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0013.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0014.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0015.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0016.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0017.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0018.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0019.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0020.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0021.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0022.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_offload_cliff_9982_0023.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0003.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0004.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0005.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0006.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0007.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0008.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0009.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0010.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_debug_repair_repo_001_0011.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0025.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0026.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0027.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0028.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0029.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0030.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0031.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0032.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0033.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0034.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/code_8k1k/isb1_sess_code_ca1_agent_benchmark_plan_01_0035.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k_qwen3.5/isb1_sess_chat_lc3_contract_review_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k_qwen3.5/isb1_sess_chat_lc3_contract_review_01_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k_qwen3.5/isb1_sess_chat_lc3_contract_review_01_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0001.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0002.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0003.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0004.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0005.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0006.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0007.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0008.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0009.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0010.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_tool_free_memory_resume_001_0011.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0013.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0014.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0015.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0016.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0017.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0018.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0019.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0020.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0021.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0022.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/converted/core/chat_8k1k/isb1_sess_chat_lc3_contract_review_01_0023.json | Git LFS pointer for converted trace JSON. |
| datasets/isb1/RECIPE_MOONCAKE.md | Operator recipe documenting how to plumb MOONCAKE_INPUT into upstream harness/workflows. |
| datasets/isb1/README.md | Documents the datasets/isb1/converted/ sidecar and how to validate it. |
| datasets/isb1/HF_PUBLISH.md | Steps for publishing converted traces to Hugging Face for hf_<org>--<repo> consumption. |
| datasets/isb1/.gitattributes | Adds LFS tracking rules for converted/**/*.json and mooncake/**/*.jsonl. |
| .github/configs/multiturn-agentic-trace-isb1.yaml | Adds sweep cells for kv-cache-tester replay flow using ISB1 converted traces. |
| .github/configs/multiturn-agentic-trace-isb1-mooncake.yaml | Adds sweep cells for aiperf/mooncake replay flow using ISB1 mooncake bundles. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| DEFAULT_AIPERF_SRC = os.environ.get("AIPERF_SRC") or ( | ||
| "/tmp/cam-pr993-full/experimental/multiturn/vllm_benchmark/aiperf/src" | ||
| ) |
There was a problem hiding this comment.
DEFAULT_AIPERF_SRC is hard-coded to a very specific local path under /tmp/..., which is unlikely to exist for other users and can lead to confusing failures if --aiperf-src isn’t provided. Consider defaulting to empty/None (and requiring --aiperf-src or AIPERF_SRC) or using a more generic placeholder path in the help text.
| @@ -1 +1,3 @@ | |||
| exports/**/*.json filter=lfs diff=lfs merge=lfs -text linguist-generated=true | |||
| converted/**/*.json filter=lfs diff=lfs merge=lfs -text linguist-generated=true | |||
There was a problem hiding this comment.
converted/**/*.json is marked as LFS-tracked, which also applies to datasets/isb1/converted/manifest.json. Since the manifest is small and useful to review/diff in normal Git history, consider overriding attributes for just that manifest (disable LFS) so it stays human-readable in PRs.
| converted/**/*.json filter=lfs diff=lfs merge=lfs -text linguist-generated=true | |
| converted/**/*.json filter=lfs diff=lfs merge=lfs -text linguist-generated=true | |
| converted/manifest.json -filter -diff -merge text linguist-generated=false |
| prefix = f"line {line_no} input[{message_idx}]" | ||
| if not isinstance(message, dict): | ||
| _add_issue(errors, f"{prefix} must be object", max_issues) | ||
| categories["message_not_object"] += 1 | ||
| return |
There was a problem hiding this comment.
In _validate_message(), the error prefix uses input[...] but this function is validating row["messages"]. This makes diagnostics confusing (it should reference messages[...] to match the schema and other error messages).
| try: | ||
| lines = file_path.read_text(encoding="utf-8").splitlines() | ||
| except Exception as exc: |
There was a problem hiding this comment.
main() reads each JSONL via file_path.read_text(...).splitlines(), which loads the whole file into memory. These trace files can be large; streaming line-by-line (iterating the file handle) would reduce peak memory and improve reliability when validating directories of bundles.
Reframes fork PR #2 centerpiece — the canonical ask is the opt-in corpus at datasets/isb1/mooncake/ consumed via existing --custom-dataset-type mooncake_trace flows. The recipe and the ISB-1-mooncake sweep YAML remain for operators who elect to patch their harness, but are explicitly non-blocking and depend on unmerged upstream patches. No code changes. No removal. Label-only to clarify PR framing per deep-investigation-report.md §Answer 1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- preserve non-text ISB1 token_count blocks in the mooncake flattener and cover TABLE / TOOL_OUTPUT / BLOCK fallback behavior in the focused exporter test - regenerate the 17 manifest-listed mooncake bundles and refresh manifest provenance/stats (jsonl bytes 13995749 -> 13988708; requests 1142 -> 1142; sessions 22 -> 22); smoke not run locally — no aiperf install (missing cyclopts in local aiperf env) - add noprefix offload cells plus header docs to the speculative mooncake sweep YAML; preview bundles remain deferred
Add pasteable manual GMI Cloud quickstarts for GB200 (Blackwell, FP4
DSR1 template) and H100 (Hopper, FP8 Qwen3 template) mirroring the
existing H200 quickstart shape. Extend the operator-only YAML stub
with matching gb200/h100 reference rows.
- runners/GMI_QUICKSTART_GB200.md (149 lines): 1-2 node GB200 path
with Blackwell FP4 DSR1 model default, --cpu-offload-gb 60,
users [1,2,4,8,16] x offload [on,off,noprefix] sweep. Cells:
code-8k, chat-32k, code-131k.
- runners/GMI_QUICKSTART_H100.md (148 lines): 1-2 node H100 path
with FP8 default (operator picks MODEL env var), --cpu-offload-gb 20,
same sweep shape. Operator must verify model fits TP on 80GB HBM3.
- .github/configs/multiturn-agentic-trace-isb1.yaml: add
gb200-fp4-dsr1-isb1-gmi-reference and h100-fp8-qwen3-isb1-gmi-reference
rows. Still NOT CI-dispatched.
Model choice is env-driven (matches Cam's upstream
multiturn_fp8_{h100_lmcache,h200_trace}_aiperf.sh script contract);
pick MODEL to fit TP on the actual VRAM ceiling. 80GB H100 at TP4
with large FP8 models may not fit; switch to TP8 or smaller variant.
Refs: shipped ISB1 sweep/data commit 38fd91a (PR SemiAnalysisAI#1032)
Refs: mooncake exporter commit b31f7c1 (fork PR #2)
Refs: H200 runbook commit d62899e
Ships a validated mooncake JSONL corpus (1142 rows / 22 sessions / ~13.35 MiB) plus the exporter that produced it. No harness changes required. Operators who already run
--custom-dataset-type mooncake_tracecan pointMOONCAKE_INPUTatdatasets/isb1/mooncake/instead of (or alongside)sammshen/lmcache-agentic-traces.This PR is opt-in and non-blocking, and does not request any changes to
experimental/multiturn/vllm_benchmark/scripts.Stacks on: upstream PR SemiAnalysisAI#1032 (fork branch
isb1/kv-cache-stress-benchmark)Why this is separate from SemiAnalysisAI#1032: PR SemiAnalysisAI#1032 was intentionally trimmed to a data+contract shape, while this PR adds the aiperf/mooncake pathway as a sibling lane (exporter/validator/tests + mooncake bundles + sweep + recipe + smoke) without re-opening SemiAnalysisAI#1032 scope.
Commits
7e1127bb8a509064055767983f8ffd1e46d6d0fe8fe2b807b31f7c19What shipped since
8fe2b807b31f7c19— ISL preservation + noprefix cells_flatten_blocks()intools/isb1_to_mooncake_trace.pynow emits a[<TYPE> token_count=N]placeholder for ISB-1 content blocks that declare atoken_countbut notext(e.g.table,tool_output). Previously these were silently dropped, losing ISL that Cam's harness expects to see. Unchanged fortext/codepaths; blocks with neithertextnortoken_countstill drop (no behavior regression).tools/test_isb1_to_mooncake_trace.py) extended withTABLE,TOOL_OUTPUT,BLOCKfallback, and the legacy drop case.schema_version: 1.1.0and a newflattener_version: 1.1.0field. Byte delta:13,995,749→13,988,708(-7,041);1142requests and22sessions unchanged. Placeholder path is covered by tests but does not affect the current non-preview corpus (none of the shipped 17 bundles have declared-but-textless blocks)..github/configs/multiturn-agentic-trace-isb1-mooncake.yamlgainsnoprefixas a thirdoffloadvalue on every H100 / H200 fp8 Qwen3 lmcache cell, mirroring the kv-cache-tester lane addition on PR [isb1] add converted trace corpus + kv-cache-tester contract helpers SemiAnalysisAI/InferenceX#1032 branch (38fd91a7). Header comment extended to document thenoprefixsemantics. Speculative-status banner preserved.Validator (
tools/validate_mooncake_trace.py --allow-superset): clean. Focused pytest: clean. Local smoke (tools/smoke_aiperf_mooncake.py) not run — local aiperf env missingcyclopts; gated separately.What upstream gets
Verification
python3 tools/isb1_to_mooncake_trace.py --input datasets/isb1/exports/ --output-dir /tmp/verify --dry-run→ 23 bundles / 1226 rows / 28 sessions / 163 warnings / 0 errorspython3 tools/validate_mooncake_trace.py --input datasets/isb1/mooncake/ --allow-superset→ 0git check-attr -a datasets/isb1/mooncake/core/code_8k1k/isb1_core_code_8k1k.jsonl→filter: lfs1.1.0, flattener1.1.0)Deferred (out of scope for this PR)
MOONCAKE_INPUThunks in*_lmcache_aiperf.shand workflow YAMLs)_flatten_blocksfix is already in place so adding them later is straightforwardReview hint
Each commit is individually revertable and cherry-pickable.
🤖 Generated with Claude Code