Skip to content

feat: Add n support for TRT-LLM (Stacked PR on top of #8744)#8746

Merged
indrajit96 merged 24 commits into
mainfrom
ibhosale/n-options-trtllm
Apr 30, 2026
Merged

feat: Add n support for TRT-LLM (Stacked PR on top of #8744)#8746
indrajit96 merged 24 commits into
mainfrom
ibhosale/n-options-trtllm

Conversation

@indrajit96
Copy link
Copy Markdown
Contributor

@indrajit96 indrajit96 commented Apr 27, 2026

Overview:

Add TRT-LLM backend support for the OpenAI-compatible n field on top of PR #8744.
This PR is intentionally scoped to TRT-LLM only. The shared OpenAI/Rust response contract and vLLM plumbing are handled in the parent PR.

WHY:

PR #8744 adds the shared Dynamo contract for carrying multiple OpenAI choices by preserving each backend output index.

TensorRT-LLM supports n in SamplingParams as the number of sequences to generate, and trtllm-serve exposes an OpenAI-compatible /v1/chat/completions endpoint. Dynamo needed the TRT-LLM-specific plumbing to pass n through and keep streamed choices separated.

References:

Details:

  • components/src/dynamo/trtllm/llm_engine.py

    • Preserves TRT-LLM output choice indexes.
    • Tracks cumulative token offsets per choice so each Dynamo chunk emits only the new token delta for that choice.
    • Reports completion usage using all returned choices.
  • components/src/dynamo/trtllm/request_handlers/handler_base.py

    • Passes n through to TRT-LLM sampling params.
    • Preserves output index in streamed chunks.
    • Tracks token and logprob offsets per choice index for interleaved n > 1 output streams.
    • Keeps TRT-LLM’s internal best_of field aligned with n when needed, because TRT-LLM validates best_of >= n.
  • components/src/dynamo/trtllm/tests/test_trtllm_handler_base.py

    • Adds unit coverage that n is applied to sampling params.
    • Verifies the internal best_of compatibility adjustment for TRT-LLM validation.
  • docs/backends/trtllm/trtllm-reference-guide.md

    • Documents the TRT-LLM n > 1 behavior relevant to this backend path.
  • tests/serve/test_trtllm.py

    • Adds pre-merge serve coverage validating that a chat request with n=2 returns two choices.

Where should the reviewer start?

Start with:

  • components/src/dynamo/trtllm/request_handlers/handler_base.py

That file contains the main TRT-LLM streaming path and the per-choice cursor logic for n > 1.

Then review:

  • components/src/dynamo/trtllm/llm_engine.py
  • components/src/dynamo/trtllm/tests/test_trtllm_handler_base.py
  • tests/serve/test_trtllm.py
  • docs/backends/trtllm/trtllm-reference-guide.md

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

  • New Features

    • Added full support for requesting multiple response choices using the n parameter with TensorRT-LLM backend
    • Implemented independent token tracking per output choice for accurate streaming and token accounting
  • Bug Fixes

    • Fixed token delta computation to correctly process all output choices instead of only the first
    • Corrected completion token counting to aggregate properly across all generated choices
  • Documentation

    • Added configuration guide for enabling multiple response choices with TensorRT-LLM, including required environment variable setup

@indrajit96 indrajit96 requested review from a team as code owners April 27, 2026 04:32
@github-actions github-actions Bot added documentation Improvements or additions to documentation backend::trtllm Relates to the trtllm backend labels Apr 27, 2026
@indrajit96 indrajit96 changed the title Add n support for TRT-LLM feat: Add n support for TRT-LLM Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

@indrajit96 indrajit96 changed the title feat: Add n support for TRT-LLM feat: Add n support for TRT-LLM (Stacked PR on top of https://github.com/ai-dynamo/dynamo/pull/8744 Apr 27, 2026
@indrajit96 indrajit96 changed the title feat: Add n support for TRT-LLM (Stacked PR on top of https://github.com/ai-dynamo/dynamo/pull/8744 feat: Add n support for TRT-LLM #8744 Apr 27, 2026
@indrajit96 indrajit96 changed the title feat: Add n support for TRT-LLM #8744 feat: Add n support for TRT-LLM (Stacked PR on top of https://github.com/ai-dynamo/dynamo/pull/8744) Apr 27, 2026
@indrajit96 indrajit96 changed the title feat: Add n support for TRT-LLM (Stacked PR on top of https://github.com/ai-dynamo/dynamo/pull/8744) feat: Add n support for TRT-LLM (Stacked PR on top of #8744) Apr 27, 2026
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Comment thread components/src/dynamo/trtllm/llm_engine.py
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
…tllm

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

# Conflicts:
#	components/src/dynamo/trtllm/tests/test_trtllm_handler_base.py
@indrajit96 indrajit96 enabled auto-merge (squash) April 29, 2026 17:26
@indrajit96 indrajit96 merged commit e314a9f into main Apr 30, 2026
154 of 156 checks passed
@indrajit96 indrajit96 deleted the ibhosale/n-options-trtllm branch April 30, 2026 05:33
furionw pushed a commit that referenced this pull request May 2, 2026
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::trtllm Relates to the trtllm backend documentation Improvements or additions to documentation feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants