Skip to content

Add opt-in KV offload sweep, probe, and operator playbook#3

Open
OCWC22 wants to merge 1 commit intoisb1/kv-cache-stress-benchmarkfrom
isb1/kv-cache-offload-extension
Open

Add opt-in KV offload sweep, probe, and operator playbook#3
OCWC22 wants to merge 1 commit intoisb1/kv-cache-stress-benchmarkfrom
isb1/kv-cache-offload-extension

Conversation

@OCWC22
Copy link
Copy Markdown
Owner

@OCWC22 OCWC22 commented Apr 22, 2026

Summary

Opt-in, additive KV-cache offload surface for SemiAnalysisAI#993. Granular --cpu-offload-gb sweep, live offload probe, LMCache NVMe recipe, curated pressure subset, operator playbook.

Scope

Stacks on: SemiAnalysisAI#1032. No experimental/** touches. No edits to Cam's *_lmcache_aiperf.sh.

Opt-in, additive extension of the KV-cache-offloading surface introduced in SemiAnalysisAI#993. No harness edits required — operators who run Cam's existing multiturn_fp8_h200_trace_replay.sh or multiturn_fp8_h100_lmcache_aiperf.sh get these knobs by passing a different sweep config and, optionally, a parallel probe script. Zero changes under experimental/**.

Cherry-pickable onto upstream because the change is config/docs/tooling only.

What upstream gets (if cherry-picked)

  • Granular --cpu-offload-gb sweep config
  • kv_offload_probe.py side-car for vLLM /metrics
  • LMCache NVMe cold-tier recipe (config file; no script edits required)
  • Curated KV-pressure subset + validator extension
  • Operator playbook

Verification

  • python tools/validate_kvcache_tester_trace.py datasets/isb1/converted/ --pressure-manifest datasets/isb1/kv_pressure/manifest.json → clean
  • /opt/homebrew/opt/python@3.13/bin/python3.13 -m unittest tools.test_kv_offload_probe -v → clean
  • /opt/homebrew/opt/python@3.13/bin/python3.13 -c "import yaml; yaml.safe_load(open('.github/configs/multiturn-agentic-trace-isb1-offload-sweep.yaml'))" → OK

Stacks on 38fd91a (PR SemiAnalysisAI#1032), mirrors the opt-in fork framing from b31f7c1 (fork PR #2), and stays sibling to 992ff21 (GMI runbook).
Copilot AI review requested due to automatic review settings April 22, 2026 09:17
@github-actions
Copy link
Copy Markdown

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@OCWC22 OCWC22 changed the base branch from main to isb1/kv-cache-stress-benchmark April 22, 2026 09:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the benchmarking/eval surface to support separate multi-node eval jobs and adds operator-facing documentation/configuration and additional benchmark recipes (plus new/updated ISB1 dataset artifacts via Git LFS).

Changes:

  • Split eval results into single-node vs multi-node (evals vs multinode_evals) and extend validation to accept eval-conc for multi-node entries.
  • Add multi-node eval-only execution path for AMD multi-node runner scripts and wire a dedicated sweep-multi-node-evals job into run-sweep.yml.
  • Add multiple new single-node benchmark scripts, new runner script(s), docs, and new/updated ISB1 dataset manifests/pointers + .gitattributes for LFS.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
utils/process_changelog.py Splits eval results into evals and multinode_evals based on presence of prefill.
utils/matrix_logic/validation.py Adds eval-conc field and multinode_evals to validated changelog matrix output.
utils/evals/EVALS.md Updates documentation to describe separate single-node vs multi-node eval job behavior and env var wiring.
utils/bench_serving/KNOWN_LIMITATION.md Adds a known limitation note for bench_serving client behavior at ultra-high QPS.
benchmarks/multi_node/amd_utils/submit.sh Threads eval-related env vars into the AMD multi-node submission environment.
benchmarks/multi_node/amd_utils/job.slurm Threads eval-related env vars into the Docker container environment.
benchmarks/multi_node/amd_utils/server.sh Adds eval-only skip for throughput + runs lm-eval in multi-node flow when requested.
.github/workflows/run-sweep.yml Adds sweep-multi-node-evals job and updates collect-evals dependencies/condition.
.github/workflows/profile.yml Simplifies Slurm cleanup to always scancel by runner name when Slurm is present.
.github/workflows/benchmark-tmpl.yml Simplifies Slurm cleanup to always scancel by runner name; adds pre-run cleanup of stale eval outputs.
.github/workflows/pr-recipe-reminder.yml Adjusts reminder comment content to include additional guidance about rerunning actions / support.
.github/workflows/claude-pr-review.yml Adds guidance about perf-changelog.yaml append-only chronological ordering.
.github/PULL_REQUEST_TEMPLATE/pull_request_template.md Adds a checkbox to remind authors to append perf-changelog entries to the end.
runners/launch_mi325x-amds.sh Adds a Slurm+Enroot-based runner launcher script for MI325X.
runners/launch_b200-dgxc.sh Removes a legacy runner launcher script.
benchmarks/single_node/* Adds multiple new single-node benchmark recipes (Qwen3.5, GLM5/5.1, MiniMax, Kimi, DSR1, etc.) and tweaks a couple existing ones.
docs/lmcache_nvme_recipe.md Adds an operator recipe for LMCache NVMe cold-tier configuration via LMCACHE_EXTRA_CONFIG_FILE.
docs/kv_offload_readme.md Adds a KV offload operator readme describing sweep/probe/recipe surfaces.
datasets/isb1/** Adds/updates ISB1 dataset manifests and many Git LFS pointer files plus datasets/isb1/.gitattributes for LFS rules.
.gitattributes Adds Git LFS rules for ISB1 export JSONs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants