fix(vllm): backport get_kv_cache_group_metadata to vllm 0.20.1 by siddhant-0707 · Pull Request #9645 · ai-dynamo/dynamo

siddhant-0707 · 2026-05-15T23:20:01Z

Summary

In vllm 0.20.1 (our pinned version) EngineCore.get_kv_cache_group_metadata does not exist — it was added to vllm main in PR #40984 after the 0.20.1 release.
configure_kv_event_block_size calls this via call_utility_async, which dispatches to EngineCoreProc. Since the method is absent in 0.20.1, vllm logs an ERROR and dynamo falls back to the minimum block size across all KV cache groups — which can be wrong for models with multiple groups (e.g. hybrid MLA+Mamba).
Adds a vendored patch using the project's existing mechanism in install_vllm.sh (same directory as the existing 0001-pr40932 patch). The backport inlines isinstance-based kind detection because get_kv_cache_spec_kind() was also introduced post-0.20.1.

Fixes #9560

Test plan

Container build applies both patches cleanly (--forward skips already-applied hunks, so re-runs are safe)
call_utility_async("get_kv_cache_group_metadata") succeeds and the correct KV-event block size is selected for standard and MLA models
No regression on existing vllm worker tests

Summary by CodeRabbit

New Features
- New method enables retrieval of KV cache group metadata for improved visibility into cache configuration and monitoring.

copy-pr-bot · 2026-05-15T23:20:05Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-05-15T23:20:08Z

👋 Hi siddhant-0707! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2026-05-15T23:23:16Z

Walkthrough

This PR introduces a patch that backports the get_kv_cache_group_metadata method to vLLM 0.20.1's EngineCore class. The method inspects the scheduler's KV cache configuration, derives metadata for each group including type classification and block size, and returns msgspec-serializable dictionaries.

Changes

KV Cache Metadata Backport

Layer / File(s)	Summary
Backport get_kv_cache_group_metadata to EngineCore `container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch`	Patch file adds `get_kv_cache_group_metadata(self) -> list[dict[str, int \| str \| None]]` method to EngineCore. Method reads `scheduler.kv_cache_config.kv_cache_groups`, derives `kind` classification from spec type attributes, and returns metadata dicts with `group_idx`, `kind`, `block_size`, and optional `sliding_window` fields. Returns empty list if scheduler lacks KV cache config.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: backporting a missing method to vllm 0.20.1.
Description check	✅ Passed	The description provides clear context, explains the issue, and covers the solution with test plan checklist.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#9560` by backporting get_kv_cache_group_metadata, enabling correct KV-event block size selection for multi-group models.
Out of Scope Changes check	✅ Passed	The changes are narrowly scoped: a single patch file backporting one method to address the linked issue with no extraneous modifications.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch (1)

58-63: 💤 Low value

Consider defensive access for spec.block_size.

While sliding_window uses getattr with a default (line 62), block_size is accessed directly (line 61). If a spec lacks this attribute, the method will raise AttributeError.

🛡️ Suggested defensive access pattern

             metadata.append({
                 "group_idx": group_idx,
                 "kind": kind,
-                "block_size": spec.block_size,
+                "block_size": getattr(spec, "block_size", None),
                 "sliding_window": getattr(spec, "sliding_window", None),
             })

However, if block_size is guaranteed to exist on all spec types, the current approach is acceptable.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch`
around lines 58 - 63, The code appends a dict using spec.block_size directly
which can raise AttributeError for specs without that attribute; change to
defensive access similar to sliding_window (e.g., use getattr(spec,
"block_size", None) or another sensible default) in the metadata.append call so
metadata uses getattr(spec, "block_size", <default>) instead of spec.block_size;
update the block that builds the dict around metadata.append({ "group_idx":
group_idx, "kind": kind, "block_size": ..., "sliding_window": getattr(spec,
"sliding_window", None) }) to use the safe accessor for block_size.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch`:
- Around line 58-63: The code appends a dict using spec.block_size directly
which can raise AttributeError for specs without that attribute; change to
defensive access similar to sliding_window (e.g., use getattr(spec,
"block_size", None) or another sensible default) in the metadata.append call so
metadata uses getattr(spec, "block_size", <default>) instead of spec.block_size;
update the block that builds the dict around metadata.append({ "group_idx":
group_idx, "kind": kind, "block_size": ..., "sliding_window": getattr(spec,
"sliding_window", None) }) to use the safe accessor for block_size.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 61d0d05f-1d0e-468b-96d8-98c8bff81c48

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb03b9 and 08af65a.

📒 Files selected for processing (1)

container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch

…ith enhanced spec kind handling

grahamking · 2026-05-18T14:24:30Z

We can't use unreleased parts of vllm, even via a patch. We should also not be patching vllm at this stage, but contributing upstream.

feat(kv-events): backport get_kv_cache_group_metadata to EngineCore

08af65a

siddhant-0707 requested review from a team as code owners May 15, 2026 23:20

pull-request-size Bot added the size/M label May 15, 2026

github-actions Bot added external-contribution Pull request is from an external contributor fix backend::vllm Relates to the vllm backend container labels May 15, 2026

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

dynamo-ops reviewed May 16, 2026

View reviewed changes

Comment thread container/deps/vllm/patches/v0.20.1/0002-pr40984-backport-get-kv-cache-group-metadata.patch Outdated

feat(kv-events): backport get_kv_cache_group_metadata to EngineCore w…

7f2d742

…ith enhanced spec kind handling

dynamo-ops approved these changes May 16, 2026

View reviewed changes

grahamking closed this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vllm): backport get_kv_cache_group_metadata to vllm 0.20.1#9645

fix(vllm): backport get_kv_cache_group_metadata to vllm 0.20.1#9645
siddhant-0707 wants to merge 2 commits into
ai-dynamo:mainfrom
siddhant-0707:issue-9560

siddhant-0707 commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

grahamking commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

siddhant-0707 commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grahamking commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

siddhant-0707 commented May 15, 2026 •

edited by coderabbitai Bot

Loading