Skip to content

fix: do not override the the event_buffer_max_size if user provided one#9284

Merged
richardhuo-nv merged 3 commits into
mainfrom
rihuo/fix_publisher_2
May 7, 2026
Merged

fix: do not override the the event_buffer_max_size if user provided one#9284
richardhuo-nv merged 3 commits into
mainfrom
rihuo/fix_publisher_2

Conversation

@richardhuo-nv
Copy link
Copy Markdown
Contributor

@richardhuo-nv richardhuo-nv commented May 7, 2026

Overview:

do not override the the event_buffer_max_size if user provided one

Details:

dynamo.trtllm was overriding the event_buffer_max_size to 1024 no matter what the number is in the engine config, this might affect the rate of the worker publishing the events

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Open in Devin Review

Summary by CodeRabbit

  • Bug Fixes
    • Improved event buffer configuration handling to better preserve user-provided settings while applying defaults more intelligently when events and metrics publishing is enabled.

@richardhuo-nv richardhuo-nv requested review from a team as code owners May 7, 2026 21:35
@github-actions github-actions Bot added backend::trtllm Relates to the trtllm backend fix labels May 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

The PR refines KV cache configuration handling in LLM worker initialization. When metrics publishing is enabled, the function now preserves existing event_buffer_max_size values instead of unconditionally overwriting them, converting KvCacheConfig objects to dicts and conditionally applying defaults only when the value is unset or zero, with appropriate logging.

Changes

KV Cache Event Buffering Configuration

Layer / File(s) Summary
Configuration Context
components/src/dynamo/trtllm/workers/llm_worker.py
Engine argument map setup for KV cache configuration initialization.
Event Buffering Logic
components/src/dynamo/trtllm/workers/llm_worker.py
KvCacheConfig is converted to dict via model_dump(exclude_none=True). The event_buffer_max_size is set only if unset or zero; existing user-provided values are preserved with a log message indicating reuse instead of overwriting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: preventing override of user-provided event_buffer_max_size values.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description covers the main sections of the template with a clear overview and technical details, though 'Where should the reviewer start?' is left empty.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/src/dynamo/trtllm/workers/llm_worker.py`:
- Around line 324-327: The log call using logging.info with format "%d" for the
variable existing is unsafe because existing comes from kv_cache_config and may
not be an int; update the logging call in llm_worker.py (where logging.info(...)
logs "Using existing event_buffer_max_size=%d from kv_cache_config") to use a
type-safe formatter such as "%r" or "%s" (or explicitly cast existing to int
after validation) so non-integer config values do not raise formatting errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d000ac6f-3312-45b0-a33f-e122e7e8a606

📥 Commits

Reviewing files that changed from the base of the PR and between 91516b0 and d7ba3ec.

📒 Files selected for processing (1)
  • components/src/dynamo/trtllm/workers/llm_worker.py

Comment thread components/src/dynamo/trtllm/workers/llm_worker.py
Copy link
Copy Markdown
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the buffer-related changes.
Non blocking Bigger question: is there a way to enforce these config-mismatch regressions programmatically we ahve seen quite a few of them?

Codex suggested a few ideas
_▎ 1. Snapshot-and-diff the user inputs vs final arg_map before constructing the engine. Log (or fail) when any user-supplied key was modified by Dynamo. The existing _warn_override_collisions (llm_worker.py:98) is the same pattern — apply it to the whole pipeline, not just override_engine_args.
▎ 2. Track key provenance in a parallel dict ({"event_buffer_max_size": "user_yaml", "backend": "dynamo_default"}) so we have a single audit trail of who set what.
▎ 3. Pin regression tests that feed a YAML with non-default values for the historically-fragile fields (kv_cache_config.*, return_perf_metrics, backend, enable_iter_perf_stats) and assert they survive end-to-end into the final arg_map._

Comment thread components/src/dynamo/trtllm/workers/llm_worker.py
@richardhuo-nv
Copy link
Copy Markdown
Contributor Author

richardhuo-nv commented May 7, 2026

LGTM on the buffer-related changes. Non blocking Bigger question: is there a way to enforce these config-mismatch regressions programmatically we ahve seen quite a few of them?

Codex suggested a few ideas _▎ 1. Snapshot-and-diff the user inputs vs final arg_map before constructing the engine. Log (or fail) when any user-supplied key was modified by Dynamo. The existing _warn_override_collisions (llm_worker.py:98) is the same pattern — apply it to the whole pipeline, not just override_engine_args. ▎ 2. Track key provenance in a parallel dict ({"event_buffer_max_size": "user_yaml", "backend": "dynamo_default"}) so we have a single audit trail of who set what. ▎ 3. Pin regression tests that feed a YAML with non-default values for the historically-fragile fields (kv_cache_config.*, return_perf_metrics, backend, enable_iter_perf_stats) and assert they survive end-to-end into the final arg_map._

yeah, good points. I think (1 or 2) and 3 combined would be ideal. We have audit log for debugging but also have test coverage for gate keeping.

Copy link
Copy Markdown
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Created a "good first issue" on GH for hardening configs based on the dicsussion in this PR
#9288

@indrajit96 indrajit96 self-requested a review May 7, 2026 23:00
@nv-yna
Copy link
Copy Markdown
Contributor

nv-yna commented May 7, 2026

overall lgtm, just how to deal with "return_perf_metrics" is an open question

@richardhuo-nv richardhuo-nv merged commit c9b7845 into main May 7, 2026
67 of 68 checks passed
@richardhuo-nv richardhuo-nv deleted the rihuo/fix_publisher_2 branch May 7, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::trtllm Relates to the trtllm backend fix size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants