fix: do not override the the event_buffer_max_size if user provided one by richardhuo-nv · Pull Request #9284 · ai-dynamo/dynamo

richardhuo-nv · 2026-05-07T21:35:08Z

Overview:

do not override the the event_buffer_max_size if user provided one

Details:

dynamo.trtllm was overriding the event_buffer_max_size to 1024 no matter what the number is in the engine config, this might affect the rate of the worker publishing the events

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Bug Fixes
- Improved event buffer configuration handling to better preserve user-provided settings while applying defaults more intelligently when events and metrics publishing is enabled.

coderabbitai · 2026-05-07T21:37:59Z

Walkthrough

The PR refines KV cache configuration handling in LLM worker initialization. When metrics publishing is enabled, the function now preserves existing event_buffer_max_size values instead of unconditionally overwriting them, converting KvCacheConfig objects to dicts and conditionally applying defaults only when the value is unset or zero, with appropriate logging.

Changes

KV Cache Event Buffering Configuration

Layer / File(s)	Summary
Configuration Context `components/src/dynamo/trtllm/workers/llm_worker.py`	Engine argument map setup for KV cache configuration initialization.
Event Buffering Logic `components/src/dynamo/trtllm/workers/llm_worker.py`	`KvCacheConfig` is converted to dict via `model_dump(exclude_none=True)`. The `event_buffer_max_size` is set only if unset or zero; existing user-provided values are preserved with a log message indicating reuse instead of overwriting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: preventing override of user-provided event_buffer_max_size values.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description covers the main sections of the template with a clear overview and technical details, though 'Where should the reviewer start?' is left empty.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/src/dynamo/trtllm/workers/llm_worker.py`:
- Around line 324-327: The log call using logging.info with format "%d" for the
variable existing is unsafe because existing comes from kv_cache_config and may
not be an int; update the logging call in llm_worker.py (where logging.info(...)
logs "Using existing event_buffer_max_size=%d from kv_cache_config") to use a
type-safe formatter such as "%r" or "%s" (or explicitly cast existing to int
after validation) so non-integer config values do not raise formatting errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d000ac6f-3312-45b0-a33f-e122e7e8a606

📥 Commits

Reviewing files that changed from the base of the PR and between 91516b0 and d7ba3ec.

📒 Files selected for processing (1)

components/src/dynamo/trtllm/workers/llm_worker.py

indrajit96

LGTM on the buffer-related changes.
Non blocking Bigger question: is there a way to enforce these config-mismatch regressions programmatically we ahve seen quite a few of them?

Codex suggested a few ideas
_▎ 1. Snapshot-and-diff the user inputs vs final arg_map before constructing the engine. Log (or fail) when any user-supplied key was modified by Dynamo. The existing _warn_override_collisions (llm_worker.py:98) is the same pattern — apply it to the whole pipeline, not just override_engine_args.
▎ 2. Track key provenance in a parallel dict ({"event_buffer_max_size": "user_yaml", "backend": "dynamo_default"}) so we have a single audit trail of who set what.
▎ 3. Pin regression tests that feed a YAML with non-default values for the historically-fragile fields (kv_cache_config.*, return_perf_metrics, backend, enable_iter_perf_stats) and assert they survive end-to-end into the final arg_map._

richardhuo-nv · 2026-05-07T22:47:35Z

LGTM on the buffer-related changes. Non blocking Bigger question: is there a way to enforce these config-mismatch regressions programmatically we ahve seen quite a few of them?

Codex suggested a few ideas _▎ 1. Snapshot-and-diff the user inputs vs final arg_map before constructing the engine. Log (or fail) when any user-supplied key was modified by Dynamo. The existing _warn_override_collisions (llm_worker.py:98) is the same pattern — apply it to the whole pipeline, not just override_engine_args. ▎ 2. Track key provenance in a parallel dict ({"event_buffer_max_size": "user_yaml", "backend": "dynamo_default"}) so we have a single audit trail of who set what. ▎ 3. Pin regression tests that feed a YAML with non-default values for the historically-fragile fields (kv_cache_config.*, return_perf_metrics, backend, enable_iter_perf_stats) and assert they survive end-to-end into the final arg_map._

yeah, good points. I think (1 or 2) and 3 combined would be ideal. We have audit log for debugging but also have test coverage for gate keeping.

indrajit96

LGTM!
Created a "good first issue" on GH for hardening configs based on the dicsussion in this PR
#9288

nv-yna · 2026-05-07T23:08:47Z

overall lgtm, just how to deal with "return_perf_metrics" is an open question

…ne (#9284)

fix publisher override

d7ba3ec

richardhuo-nv requested review from a team as code owners May 7, 2026 21:35

pull-request-size Bot added the size/S label May 7, 2026

github-actions Bot added backend::trtllm Relates to the trtllm backend fix labels May 7, 2026

dynamo-ops approved these changes May 7, 2026

View reviewed changes

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Comment thread components/src/dynamo/trtllm/workers/llm_worker.py

use fstring

396056d

copy-pr-bot Bot temporarily deployed to GITLAB May 7, 2026 21:59 Inactive

indrajit96 reviewed May 7, 2026

View reviewed changes

Comment thread components/src/dynamo/trtllm/workers/llm_worker.py

indrajit96 self-requested a review May 7, 2026 22:56

indrajit96 mentioned this pull request May 7, 2026

[CONTRIBUTION]: [FEATURE]: Enforce user-config preservation in TRT-LLM worker arg_map #9288

Open

indrajit96 reviewed May 7, 2026

View reviewed changes

indrajit96 self-requested a review May 7, 2026 23:00

indrajit96 approved these changes May 7, 2026

View reviewed changes

add return_perf_metrics

33fa540

copy-pr-bot Bot temporarily deployed to GITLAB May 7, 2026 23:18 Inactive

nv-yna approved these changes May 7, 2026

View reviewed changes

copy-pr-bot Bot had a problem deploying to GITLAB May 7, 2026 23:43 Failure

richardhuo-nv merged commit c9b7845 into main May 7, 2026
67 of 68 checks passed

richardhuo-nv deleted the rihuo/fix_publisher_2 branch May 7, 2026 23:58

GuanLuo pushed a commit that referenced this pull request May 8, 2026

fix: do not override the the event_buffer_max_size if user provided o…

9eac6f9

…ne (#9284)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: do not override the the event_buffer_max_size if user provided one#9284

fix: do not override the the event_buffer_max_size if user provided one#9284
richardhuo-nv merged 3 commits into
mainfrom
rihuo/fix_publisher_2

richardhuo-nv commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

indrajit96 left a comment

Uh oh!

Uh oh!

richardhuo-nv commented May 7, 2026 •

edited

Loading

Uh oh!

indrajit96 left a comment

Uh oh!

nv-yna commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

richardhuo-nv commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

richardhuo-nv commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

nv-yna commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

richardhuo-nv commented May 7, 2026 •

edited

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading

richardhuo-nv commented May 7, 2026 •

edited

Loading