Skip to content

feat: Support n optional OAI field in dynamo frontend and vLLM backend#8744

Merged
indrajit96 merged 10 commits into
mainfrom
ibhosale/n-options-output
Apr 28, 2026
Merged

feat: Support n optional OAI field in dynamo frontend and vLLM backend#8744
indrajit96 merged 10 commits into
mainfrom
ibhosale/n-options-output

Conversation

@indrajit96
Copy link
Copy Markdown
Contributor

@indrajit96 indrajit96 commented Apr 27, 2026

Overview:

Add OpenAI-compatible n choice support across Dynamo chat completions so a request with n > 1 can return multiple choices instead of only choice 0.

WHY:

OAI Chat Completions supports n as the number of choices to generate for one input message. In the OpenAI API reference, this is documented here
vLLM support here
All Backends (engines natively) support it. Dynamo was missing the request and response plumbing for the same.
Will follow-up with PRs for Sglang and TRTLLM

Details:

  • Rust OpenAI request/response path

    • lib/llm/src/backend.rs and lib/llm/src/migration.rs carry n through Dynamo’s preprocessed request path.
    • lib/llm/src/protocols/openai/chat_completions/delta.rs adds index to streamed backend deltas so each generated choice can be routed back to the correct OpenAI choice.
    • The non-n path remains unchanged logically because missing backend indexes default to choice 0.
  • vLLM frontend processor + backend

    • components/src/dynamo/frontend/vllm_processor.py copies n into SamplingParams and forwards it in sampling_options.
    • For n > 1, it recreates vLLM’s parent/child request bookkeeping with ParentRequest, because Dynamo bypasses vLLM’s normal HTTP engine path and feeds backend token chunks directly into OutputProcessor.
    • It keeps one StreamingPostProcessor per choice index so text deltas, tool parsing, reasoning parsing, and finish reasons do not bleed across choices.
    • components/src/dynamo/vllm/llm_engine.py and components/src/dynamo/vllm/handlers.py preserve each vLLM output’s index.
    • They track cumulative token offsets per choice so each streamed Dynamo chunk contains only the new tokens for that specific choice.
  • Tests

    • tests/utils/payloads.py and tests/utils/payload_builder.py extend the existing chat payload builder to optionally set n.
    • tests/serve/test_vllm.py add pre-merge E2E coverage validating n=2 returns two choices.

Where should the reviewer start?

Start with the request/response contract:

  • lib/llm/src/protocols/openai/chat_completions/delta.rs
  • lib/llm/src/backend.rs
  • lib/llm/src/migration.rs

Then backend-specific behavior:

  • components/src/dynamo/frontend/vllm_processor.py

Then validation:

  • tests/serve/test_vllm.py

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Open in Devin Review

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for requesting multiple response choices (n > 1) in streaming completions and chat requests, enabling generation of multiple alternative responses in parallel.
    • Streaming responses now independently track and emit tokens and deltas for each choice, preventing state interference between parallel generations.
  • Bug Fixes

    • Fixed completion token counting to accurately aggregate tokens across all generated choices.

@indrajit96 indrajit96 requested review from a team as code owners April 27, 2026 04:15
@github-actions github-actions Bot added feat backend::vllm Relates to the vllm backend frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 27, 2026
@rmccorm4
Copy link
Copy Markdown
Contributor

All Backends support it.

Seems like this isn't true given the SGLang follow up PR: #8745?

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

Walkthrough

This change enables handling multiple parallel choices (OpenAI n > 1 parameter) throughout the vLLM processing pipeline. Previously rejected, the system now maintains separate streaming state per choice, maps the OpenAI sampling parameter to vLLM's parent/child sampling structure, tracks token progress per choice index, and routes decoding streams to the correct output.

Changes

Cohort / File(s) Summary
Frontend Streaming Handler
components/src/dynamo/frontend/vllm_processor.py, components/src/dynamo/vllm/handlers.py
Removed rejection of SamplingParams.n != 1. Added per-choice StreamingPostProcessor instances, separate tracking of generated tokens and emitted text per output index, and recreation of vLLM's parent/child sampling structure for n>1. Token/text slicing and logprob computation now operate per-output.
Generation Loop
components/src/dynamo/vllm/llm_engine.py
Modified to support multiple outputs per vLLM response. Replaced single token counter with per-output dictionary, ensured GenerateChunk includes index field, and updated completion token aggregation to sum across all outputs.
Backend Decoding
lib/llm/src/backend.rs
Introduced HashMap<u32, Decoder> to manage multiple decoding streams per choice. Routes token processing via data.index, tracks completion per choice, and stops generation only when all choices finish. Handles unexpected choice indices by setting FinishReason::Error.
Migration Guard
lib/llm/src/migration.rs
Added check to disable migration when sampling_options.n > 1, logging that per-choice generation state cannot be transferred.
OpenAI Protocol
lib/llm/src/protocols/openai/chat_completions/delta.rs
Updated streaming choice construction to use backend-provided delta.index (default 0) instead of always using index 0. Added unit test for multi-choice output.
Test Additions
tests/serve/test_vllm.py, tests/utils/payload_builder.py, tests/utils/payloads.py
Extended vLLM test scenario to include a multi-choice request. Added expected_num_choices parameter to payload builder and validation logic to ChatPayload for asserting response choice count.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the main change: adding support for the OpenAI n field in the Dynamo frontend and vLLM backend.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description follows the template structure with all required sections completed: Overview clearly states the feature, Details explain implementation across Rust and vLLM, and Where should the reviewer start guides reviewers through the codebase logically.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/llm/src/backend.rs (1)

195-222: ⚠️ Potential issue | 🟡 Minor

Two small observations on per-choice finish tracking in the early-return branches.

  1. Lines 195–205 (text already populated): you correctly insert into finished_choices when data.finish_reason.is_some(), but never call state.stream.context().stop_generating() or set state.finished = true even when all choices have finished. In this branch the upstream engine owns termination, so this is probably fine, but it's inconsistent with the symmetric block at lines 274–280 — worth a one-line comment explaining why upstream is trusted here.

  2. Lines 207–222 (out-of-range choice index): emitting FinishReason::Error and continuing is reasonable (don't poison the whole stream over a bogus index), but if a misbehaving engine keeps emitting the same bad index, this will spam error chunks until upstream ends. Consider also inserting choice_idx (or a sentinel) into finished_choices and short-circuiting subsequent occurrences.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/backend.rs` around lines 195 - 222, In the early-return branch
that handles already-decoded text (the block that checks output.data.text and
!state.validate_engine_decode), add a one-line comment explaining we trust the
upstream engine to signal termination so we don't call
state.stream.context().stop_generating() or set state.finished there;
additionally, when you insert into state.finished_choices on
data.finish_reason.is_some(), also check whether all choices are finished and if
so set state.finished = true and call state.stream.context().stop_generating().
In the out-of-range choice index error path (the branch that uses
state.decoders.get_mut(&choice_idx) and emits FinishReason::Error), also insert
the offending choice_idx (or a sentinel like usize::MAX) into
state.finished_choices and return early so repeated bad-index events are
short-circuited; ensure the same all-choices-finished check (set state.finished
and call stop_generating()) runs after inserting the sentinel.
components/src/dynamo/vllm/handlers.py (1)

1958-1965: ⚠️ Potential issue | 🟡 Minor

Hardcoded "index": 0 in the empty-outputs error path leaves other choices stranded for n > 1.

When the engine returns a RequestOutput with no outputs, this yields a single chunk with "index": 0 and breaks. For n > 1, choices 1..n-1 will never receive a final chunk; the Rust backend's per-choice finished_choices set will only mark choice 0 as finished and rely on the upstream stream ending to terminate. This works today (stream ends → unfold returns None) but means downstream clients won't see explicit error finish reasons for the other choices. Consider yielding one error chunk per requested choice index.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/vllm/handlers.py` around lines 1958 - 1965, The
empty-outputs branch yields a single error chunk with "index": 0 which leaves
other choices unmarked when n>1; instead iterate over the number of requested
choices (the same n used to create the RequestOutput / completions request —
e.g., request.num_completions or request_output.num_choices) and yield one error
chunk per choice index with finish_reason "error: No outputs from vLLM engine"
and the corresponding "index" (0..n-1), then return/stop the generator; update
the code around the RequestOutput empty-outputs handling where the single yield
currently occurs so every choice gets an explicit final error chunk.
♻️ Duplicate comments (1)
components/src/dynamo/vllm/handlers.py (1)

2298-2313: ⚠️ Potential issue | 🟡 Minor

Same hardcoded index: 0 issue in the text-mode empty-outputs error path.

Mirrors the token-mode path on line 1962 — for n > 1 only one error chunk is yielded with index: 0. Recommend emitting an error chunk per requested choice index here as well so the OpenAI-shaped response contains a finish for every choice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/vllm/handlers.py` around lines 2298 - 2313, The current
empty-output error path inside the async for loop over gen yields a single error
chunk with a hardcoded "index: 0"; change it to emit an error choice entry for
every requested choice index (the same behavior as the token-mode path that
checks n > 1). Build the "choices" array by iterating over the requested choice
count (n) and creating entries with "index" set to each choice number and the
same delta/finish_reason ("role": "assistant", "content": "", "finish_reason":
"error"), then yield that single error chunk using openai_request_id so each
requested choice receives a finish signal.
🧹 Nitpick comments (5)
tests/serve/test_vllm.py (1)

92-99: Optional: strengthen the n=2 assertion beyond just count.

The new payload only verifies that two choices entries are returned (via expected_num_choices=2), not that each contains valid content nor that their index values are {0, 1}. With greedy sampling (default temperature for Qwen) both choices may be identical, but they should each still carry independent message.content and a unique index.

Consider extending ChatPayload.validate (or asserting inline) to walk every choice and verify each has non-empty content and a unique index, so a regression that, say, returns two empty choices or duplicated index=0 would be caught here rather than only by the Rust unit test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/serve/test_vllm.py` around lines 92 - 99, The test only asserts
expected_num_choices=2 but doesn't verify each returned choice contains
non-empty content or unique indices; update ChatPayload.validate (or add inline
assertions in chat_payload usage) to iterate over response.choices and assert
each choice.message.content is non-empty (truthy) and that the set of
choice.index values has length equal to expected_num_choices (i.e., indices are
unique and cover {0,1} for n=2), failing the test if any choice is empty or
indices duplicate.
components/src/dynamo/vllm/llm_engine.py (1)

144-156: Minor: completion_usage is recomputed/emitted for every finishing output and reflects cumulative-but-not-final totals.

When n > 1 and multiple outputs in the same res finish (or finish across nearby iterations), this block emits completion_usage once per finishing output. Each emission uses sum(len(choice.token_ids) for choice in res.outputs) evaluated at that moment, so:

  • If output 0 finishes while output 1 is still mid-generation, the completion_tokens reported here is partial.
  • A later finish for output 1 will emit again with a different total.

The Rust frontend (delta.rs) currently only reads prompt_tokens from completion_usage and accumulates completion_tokens itself per delta, so this is not a correctness bug today. But the field is misleading to anyone reading the chunk in isolation, and it's redundant work per chunk.

Suggestion (optional): only attach completion_usage on the last finishing output of the request (e.g. when all res.outputs have a finish_reason), or just emit prompt_tokens and let the frontend own the completion total.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/vllm/llm_engine.py` around lines 144 - 156, The code
currently emits out["completion_usage"] whenever an individual output has
output.finish_reason, causing partial or changing completion_tokens because it
sums len(choice.token_ids) over res.outputs at that moment; update the logic in
the block that checks output.finish_reason so that completion_usage is only
attached when all outputs are finished (e.g., check that every choice in
res.outputs has a finish_reason) or alternatively only include prompt_tokens
(using res.prompt_token_ids) and omit completion_tokens entirely so the frontend
can accumulate per-delta totals; rename or gate the out["completion_usage"]
assignment accordingly to avoid recomputing per finishing output.
tests/utils/payloads.py (1)

169-182: LGTM — expected_num_choices validation is correct and additive.

super().validate(...) keeps the existing substring check on the first-choice content, and the new block precisely asserts the choices list length against expected_num_choices. Backwards compatible (no-op when None).

Nit (optional): response.json() is parsed both here and inside extract_content (requests.Response.json() re-parses on each call). You could either cache the parsed result on the response object via getattr(response, "_cached_json", None) or pass it down. Trivial perf, not worth blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/payloads.py` around lines 169 - 182, The validate method
currently calls response.json() again which re-parses the response; to avoid
redundant parsing, modify validate (the validate method in
tests/utils/payloads.py) to accept or retrieve a cached parsed JSON and use that
instead of calling response.json() twice—either add an optional parameter (e.g.,
parsed_result) or check for a cached attribute on response (getattr(response,
"_cached_json", None)) and fall back to response.json() only if missing, then
use that parsed_result for the choices length check and ensure any callers
(e.g., extract_content) pass or set the cached JSON accordingly.
lib/llm/src/migration.rs (1)

155-163: LGTM — clean mirror of the guided-decoding guard.

The condition correctly disables migration when n > 1 because per-choice generation state cannot be replayed by re-feeding accumulated tokens (each choice diverges independently).

Optional: consider adding a test_retry_manager_no_migration_for_n_gt_1 test analogous to test_retry_manager_no_migration_for_guided_decoding to lock in this behavior, since the n>1 path is now PR-significant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/migration.rs` around lines 155 - 163, Add a unit test that
asserts migration is disabled when sampling_options.n > 1, mirroring the
existing test for guided decoding; specifically, create a test (e.g.,
test_retry_manager_no_migration_for_n_gt_1) that constructs a
PreprocessedRequest with sampling_options.n = Some(2) and verifies the same
behavior as the guided-decoding test (retries_left becomes 0 and a warning is
emitted) to prevent regressions around the branch that checks
preprocessed_request.sampling_options.n and sets retries_left = 0 in the
migration logic.
components/src/dynamo/frontend/vllm_processor.py (1)

619-637: Inconsistent error handling: postprocessor-miss break only exits the inner loop.

Line 599 handles an unknown engine index by break-ing the outer async for dynamo_response loop, terminating streaming. The symmetric case at line 634 (postprocessor not found for output.index) only breaks the inner for output in ... loop — the outer loop continues processing further backend responses, and any subsequent outputs with a valid index will still be appended to choices and emitted, despite the request being in a half-broken state. Recommend return (after running cleanup via the finally block) to halt the request entirely on this error, matching the line 599 behavior.

♻️ Proposed change
                 for output in vllm_out.request_outputs[0].outputs:
                     post = post_processors.get(output.index)
                     if post is None:
                         yield {
                             "error": {
                                 "message": (
                                     f"Invalid postprocessor choice index {output.index} "
                                     f"for request {request_id}"
                                 ),
                                 "type": "internal_error",
                             }
                         }
-                        break
+                        return
                     choice = post.process_output(output)
                     if choice:
                         choices.append(choice)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/frontend/vllm_processor.py` around lines 619 - 637, The
error handling for a missing postprocessor is inconsistent: inside the loop over
vllm_out.request_outputs[0].outputs you call post_processors.get(output.index)
and on miss you yield an error then break the inner loop, but you should halt
the entire request like the earlier unknown engine case. Replace the inner-loop
break with a return (so the generator stops and the surrounding finally cleanup
runs) in the block where post is None (the yield error branch referencing
post_processors, output.index, request_id, and choices) to match the behavior at
the other error site.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/vllm/llm_engine.py`:
- Around line 125-142: Replace the assert checks in the initialization path with
explicit runtime checks: in the method where self.engine_client and
self._default_sampling_params are validated (in llm_engine.py, locate the code
using self.engine_client and self._default_sampling_params before using the vLLM
client), change the assert statements to explicit if checks that raise
RuntimeError("Engine not initialized") when either self.engine_client is None or
self._default_sampling_params is None so failures surface predictably.

---

Outside diff comments:
In `@components/src/dynamo/vllm/handlers.py`:
- Around line 1958-1965: The empty-outputs branch yields a single error chunk
with "index": 0 which leaves other choices unmarked when n>1; instead iterate
over the number of requested choices (the same n used to create the
RequestOutput / completions request — e.g., request.num_completions or
request_output.num_choices) and yield one error chunk per choice index with
finish_reason "error: No outputs from vLLM engine" and the corresponding "index"
(0..n-1), then return/stop the generator; update the code around the
RequestOutput empty-outputs handling where the single yield currently occurs so
every choice gets an explicit final error chunk.

In `@lib/llm/src/backend.rs`:
- Around line 195-222: In the early-return branch that handles already-decoded
text (the block that checks output.data.text and !state.validate_engine_decode),
add a one-line comment explaining we trust the upstream engine to signal
termination so we don't call state.stream.context().stop_generating() or set
state.finished there; additionally, when you insert into state.finished_choices
on data.finish_reason.is_some(), also check whether all choices are finished and
if so set state.finished = true and call
state.stream.context().stop_generating(). In the out-of-range choice index error
path (the branch that uses state.decoders.get_mut(&choice_idx) and emits
FinishReason::Error), also insert the offending choice_idx (or a sentinel like
usize::MAX) into state.finished_choices and return early so repeated bad-index
events are short-circuited; ensure the same all-choices-finished check (set
state.finished and call stop_generating()) runs after inserting the sentinel.

---

Duplicate comments:
In `@components/src/dynamo/vllm/handlers.py`:
- Around line 2298-2313: The current empty-output error path inside the async
for loop over gen yields a single error chunk with a hardcoded "index: 0";
change it to emit an error choice entry for every requested choice index (the
same behavior as the token-mode path that checks n > 1). Build the "choices"
array by iterating over the requested choice count (n) and creating entries with
"index" set to each choice number and the same delta/finish_reason ("role":
"assistant", "content": "", "finish_reason": "error"), then yield that single
error chunk using openai_request_id so each requested choice receives a finish
signal.

---

Nitpick comments:
In `@components/src/dynamo/frontend/vllm_processor.py`:
- Around line 619-637: The error handling for a missing postprocessor is
inconsistent: inside the loop over vllm_out.request_outputs[0].outputs you call
post_processors.get(output.index) and on miss you yield an error then break the
inner loop, but you should halt the entire request like the earlier unknown
engine case. Replace the inner-loop break with a return (so the generator stops
and the surrounding finally cleanup runs) in the block where post is None (the
yield error branch referencing post_processors, output.index, request_id, and
choices) to match the behavior at the other error site.

In `@components/src/dynamo/vllm/llm_engine.py`:
- Around line 144-156: The code currently emits out["completion_usage"] whenever
an individual output has output.finish_reason, causing partial or changing
completion_tokens because it sums len(choice.token_ids) over res.outputs at that
moment; update the logic in the block that checks output.finish_reason so that
completion_usage is only attached when all outputs are finished (e.g., check
that every choice in res.outputs has a finish_reason) or alternatively only
include prompt_tokens (using res.prompt_token_ids) and omit completion_tokens
entirely so the frontend can accumulate per-delta totals; rename or gate the
out["completion_usage"] assignment accordingly to avoid recomputing per
finishing output.

In `@lib/llm/src/migration.rs`:
- Around line 155-163: Add a unit test that asserts migration is disabled when
sampling_options.n > 1, mirroring the existing test for guided decoding;
specifically, create a test (e.g., test_retry_manager_no_migration_for_n_gt_1)
that constructs a PreprocessedRequest with sampling_options.n = Some(2) and
verifies the same behavior as the guided-decoding test (retries_left becomes 0
and a warning is emitted) to prevent regressions around the branch that checks
preprocessed_request.sampling_options.n and sets retries_left = 0 in the
migration logic.

In `@tests/serve/test_vllm.py`:
- Around line 92-99: The test only asserts expected_num_choices=2 but doesn't
verify each returned choice contains non-empty content or unique indices; update
ChatPayload.validate (or add inline assertions in chat_payload usage) to iterate
over response.choices and assert each choice.message.content is non-empty
(truthy) and that the set of choice.index values has length equal to
expected_num_choices (i.e., indices are unique and cover {0,1} for n=2), failing
the test if any choice is empty or indices duplicate.

In `@tests/utils/payloads.py`:
- Around line 169-182: The validate method currently calls response.json() again
which re-parses the response; to avoid redundant parsing, modify validate (the
validate method in tests/utils/payloads.py) to accept or retrieve a cached
parsed JSON and use that instead of calling response.json() twice—either add an
optional parameter (e.g., parsed_result) or check for a cached attribute on
response (getattr(response, "_cached_json", None)) and fall back to
response.json() only if missing, then use that parsed_result for the choices
length check and ensure any callers (e.g., extract_content) pass or set the
cached JSON accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a915cb4b-a6a7-43e4-8106-6f9313826b9d

📥 Commits

Reviewing files that changed from the base of the PR and between 8b55dc3 and 4aaa241.

📒 Files selected for processing (9)
  • components/src/dynamo/frontend/vllm_processor.py
  • components/src/dynamo/vllm/handlers.py
  • components/src/dynamo/vllm/llm_engine.py
  • lib/llm/src/backend.rs
  • lib/llm/src/migration.rs
  • lib/llm/src/protocols/openai/chat_completions/delta.rs
  • tests/serve/test_vllm.py
  • tests/utils/payload_builder.py
  • tests/utils/payloads.py

Comment thread components/src/dynamo/vllm/llm_engine.py
@indrajit96
Copy link
Copy Markdown
Contributor Author

All Backends support it.

Seems like this isn't true given the SGLang follow up PR: #8745?

I meant engines support it natively.
Gap is in Dynamo Frontend + DYnamo Backend Handlers

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 27, 2026
Comment thread components/src/dynamo/common/backend/engine.py Outdated
Comment thread components/src/dynamo/common/backend/README.md Outdated
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
@github-actions github-actions Bot added backend::sglang Relates to the sglang backend backend::trtllm Relates to the trtllm backend labels Apr 27, 2026
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
@indrajit96 indrajit96 merged commit 65f5982 into main Apr 28, 2026
98 checks passed
@indrajit96 indrajit96 deleted the ibhosale/n-options-output branch April 28, 2026 02:05
@indrajit96 indrajit96 restored the ibhosale/n-options-output branch April 28, 2026 02:05
indrajit96 added a commit that referenced this pull request Apr 30, 2026
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
furionw pushed a commit that referenced this pull request May 2, 2026
…end (#8744)

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
furionw pushed a commit that referenced this pull request May 2, 2026
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

actions backend::sglang Relates to the sglang backend backend::trtllm Relates to the trtllm backend backend::vllm Relates to the vllm backend documentation Improvements or additions to documentation feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants