fix(vllm): backport get_kv_cache_group_metadata to vllm 0.20.1#9645
fix(vllm): backport get_kv_cache_group_metadata to vllm 0.20.1#9645siddhant-0707 wants to merge 2 commits into
Conversation
|
👋 Hi siddhant-0707! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis PR introduces a patch that backports the ChangesKV Cache Metadata Backport
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch (1)
58-63: 💤 Low valueConsider defensive access for spec.block_size.
While
sliding_windowusesgetattrwith a default (line 62),block_sizeis accessed directly (line 61). If a spec lacks this attribute, the method will raise AttributeError.🛡️ Suggested defensive access pattern
metadata.append({ "group_idx": group_idx, "kind": kind, - "block_size": spec.block_size, + "block_size": getattr(spec, "block_size", None), "sliding_window": getattr(spec, "sliding_window", None), })However, if
block_sizeis guaranteed to exist on all spec types, the current approach is acceptable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch` around lines 58 - 63, The code appends a dict using spec.block_size directly which can raise AttributeError for specs without that attribute; change to defensive access similar to sliding_window (e.g., use getattr(spec, "block_size", None) or another sensible default) in the metadata.append call so metadata uses getattr(spec, "block_size", <default>) instead of spec.block_size; update the block that builds the dict around metadata.append({ "group_idx": group_idx, "kind": kind, "block_size": ..., "sliding_window": getattr(spec, "sliding_window", None) }) to use the safe accessor for block_size.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch`:
- Around line 58-63: The code appends a dict using spec.block_size directly
which can raise AttributeError for specs without that attribute; change to
defensive access similar to sliding_window (e.g., use getattr(spec,
"block_size", None) or another sensible default) in the metadata.append call so
metadata uses getattr(spec, "block_size", <default>) instead of spec.block_size;
update the block that builds the dict around metadata.append({ "group_idx":
group_idx, "kind": kind, "block_size": ..., "sliding_window": getattr(spec,
"sliding_window", None) }) to use the safe accessor for block_size.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 61d0d05f-1d0e-468b-96d8-98c8bff81c48
📒 Files selected for processing (1)
container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch
…ith enhanced spec kind handling
|
We can't use unreleased parts of vllm, even via a patch. We should also not be patching vllm at this stage, but contributing upstream. |
Summary
EngineCore.get_kv_cache_group_metadatadoes not exist — it was added to vllm main in PR #40984 after the 0.20.1 release.configure_kv_event_block_sizecalls this viacall_utility_async, which dispatches toEngineCoreProc. Since the method is absent in 0.20.1, vllm logs an ERROR and dynamo falls back to the minimum block size across all KV cache groups — which can be wrong for models with multiple groups (e.g. hybrid MLA+Mamba).install_vllm.sh(same directory as the existing0001-pr40932patch). The backport inlinesisinstance-based kind detection becauseget_kv_cache_spec_kind()was also introduced post-0.20.1.Fixes #9560
Test plan
--forwardskips already-applied hunks, so re-runs are safe)call_utility_async("get_kv_cache_group_metadata")succeeds and the correct KV-event block size is selected for standard and MLA modelsSummary by CodeRabbit