Skip to content

fix(vllm): backport get_kv_cache_group_metadata to vllm 0.20.1#9645

Closed
siddhant-0707 wants to merge 2 commits into
ai-dynamo:mainfrom
siddhant-0707:issue-9560
Closed

fix(vllm): backport get_kv_cache_group_metadata to vllm 0.20.1#9645
siddhant-0707 wants to merge 2 commits into
ai-dynamo:mainfrom
siddhant-0707:issue-9560

Conversation

@siddhant-0707
Copy link
Copy Markdown

@siddhant-0707 siddhant-0707 commented May 15, 2026

Summary

  • In vllm 0.20.1 (our pinned version) EngineCore.get_kv_cache_group_metadata does not exist — it was added to vllm main in PR #40984 after the 0.20.1 release.
  • configure_kv_event_block_size calls this via call_utility_async, which dispatches to EngineCoreProc. Since the method is absent in 0.20.1, vllm logs an ERROR and dynamo falls back to the minimum block size across all KV cache groups — which can be wrong for models with multiple groups (e.g. hybrid MLA+Mamba).
  • Adds a vendored patch using the project's existing mechanism in install_vllm.sh (same directory as the existing 0001-pr40932 patch). The backport inlines isinstance-based kind detection because get_kv_cache_spec_kind() was also introduced post-0.20.1.

Fixes #9560

Test plan

  • Container build applies both patches cleanly (--forward skips already-applied hunks, so re-runs are safe)
  • call_utility_async("get_kv_cache_group_metadata") succeeds and the correct KV-event block size is selected for standard and MLA models
  • No regression on existing vllm worker tests

Open in Devin Review

Summary by CodeRabbit

  • New Features
    • New method enables retrieval of KV cache group metadata for improved visibility into cache configuration and monitoring.

Review Change Stack

@siddhant-0707 siddhant-0707 requested review from a team as code owners May 15, 2026 23:20
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi siddhant-0707! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor fix backend::vllm Relates to the vllm backend container labels May 15, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Walkthrough

This PR introduces a patch that backports the get_kv_cache_group_metadata method to vLLM 0.20.1's EngineCore class. The method inspects the scheduler's KV cache configuration, derives metadata for each group including type classification and block size, and returns msgspec-serializable dictionaries.

Changes

KV Cache Metadata Backport

Layer / File(s) Summary
Backport get_kv_cache_group_metadata to EngineCore
container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch
Patch file adds get_kv_cache_group_metadata(self) -> list[dict[str, int | str | None]] method to EngineCore. Method reads scheduler.kv_cache_config.kv_cache_groups, derives kind classification from spec type attributes, and returns metadata dicts with group_idx, kind, block_size, and optional sliding_window fields. Returns empty list if scheduler lacks KV cache config.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: backporting a missing method to vllm 0.20.1.
Description check ✅ Passed The description provides clear context, explains the issue, and covers the solution with test plan checklist.
Linked Issues check ✅ Passed The PR successfully addresses issue #9560 by backporting get_kv_cache_group_metadata, enabling correct KV-event block size selection for multi-group models.
Out of Scope Changes check ✅ Passed The changes are narrowly scoped: a single patch file backporting one method to address the linked issue with no extraneous modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch (1)

58-63: 💤 Low value

Consider defensive access for spec.block_size.

While sliding_window uses getattr with a default (line 62), block_size is accessed directly (line 61). If a spec lacks this attribute, the method will raise AttributeError.

🛡️ Suggested defensive access pattern
             metadata.append({
                 "group_idx": group_idx,
                 "kind": kind,
-                "block_size": spec.block_size,
+                "block_size": getattr(spec, "block_size", None),
                 "sliding_window": getattr(spec, "sliding_window", None),
             })

However, if block_size is guaranteed to exist on all spec types, the current approach is acceptable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch`
around lines 58 - 63, The code appends a dict using spec.block_size directly
which can raise AttributeError for specs without that attribute; change to
defensive access similar to sliding_window (e.g., use getattr(spec,
"block_size", None) or another sensible default) in the metadata.append call so
metadata uses getattr(spec, "block_size", <default>) instead of spec.block_size;
update the block that builds the dict around metadata.append({ "group_idx":
group_idx, "kind": kind, "block_size": ..., "sliding_window": getattr(spec,
"sliding_window", None) }) to use the safe accessor for block_size.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch`:
- Around line 58-63: The code appends a dict using spec.block_size directly
which can raise AttributeError for specs without that attribute; change to
defensive access similar to sliding_window (e.g., use getattr(spec,
"block_size", None) or another sensible default) in the metadata.append call so
metadata uses getattr(spec, "block_size", <default>) instead of spec.block_size;
update the block that builds the dict around metadata.append({ "group_idx":
group_idx, "kind": kind, "block_size": ..., "sliding_window": getattr(spec,
"sliding_window", None) }) to use the safe accessor for block_size.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 61d0d05f-1d0e-468b-96d8-98c8bff81c48

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb03b9 and 08af65a.

📒 Files selected for processing (1)
  • container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch

@grahamking
Copy link
Copy Markdown
Contributor

We can't use unreleased parts of vllm, even via a patch. We should also not be patching vllm at this stage, but contributing upstream.

@grahamking grahamking closed this May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::vllm Relates to the vllm backend container external-contribution Pull request is from an external contributor fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: vllm worker get_kv_cache_group_metadata has moved in vllm

3 participants