feat(frontend): gate nvext response metadata behind extra_fields by biswapanda · Pull Request #8252 · ai-dynamo/dynamo

biswapanda · 2026-04-15T22:57:03Z

Overview:

Fix regression where plain OpenAI-compatible requests leaked nvext response metadata (worker_id, timing) by default. Introduce NvExtResponseFieldSelection to gate each response field independently behind extra_fields opt-in, while preserving the query_instance_id GAIE exception.

Before commit b2f7f22 (regression state) — plain request leaks nvext.timing:
// curl /v1/chat/completions (no nvext.extra_fields)
"nvext": { "timing": { "request_received_ms": 1776373206718, "total_time_ms": 10.862539 } }
After commit 51c4117 (HEAD, fixed) — plain request omits nvext, opt-in still works:

Details:

Root Cause: The February 4, 2026 per-worker metrics change made request tracking unconditional in the chat/completions delta generators. Response shaping still emitted nvext whenever tracker-backed metadata was present, so plain requests could leak worker_id and timing data by default.

What Changed:

Added NvExtResponseFieldSelection in nvext.rs with from_nvext() that walks extra_fields once and maps "worker_id" / "timing" / "routed_experts" to independent flags.
Gated worker_id, timing, routed_experts, and token_ids in choice_from_postprocessor for both chat and text completions so each field requires explicit opt-in (or the query_instance_id exception).
Preserved the query_instance_id exception: auto-enables worker_id + token_ids. timing is not auto-enabled because the query-only fast path has no finish_reason chunk and timing is only emitted on the final chunk.
Tightened annotation matching to the exact "query_instance_id:" key prefix, consistent with PreprocessedRequest::get_annotation_value and KvPushRouter. Prevents stray annotations like "query_instance_id_extra:..." from enabling the exception.
Left record_finish() unconditional so timing/ITL accounting and Prometheus metrics do not regress.
Documented routed_experts as a third extra_fields value in docs/components/frontend/nvext.md.

Tests:

Selection unit tests in nvext.rs cover defaults, each individual extra_fields value, combined fields, the query_instance_id: exception, and the stray-annotation negative case.
End-to-end gating tests through choice_from_postprocessor in both chat_completions/delta.rs and completions/delta.rs: plain request omits nvext; extra_fields: ["timing"] emits only timing; query_instance_id:... emits worker_id + token_ids (not timing); extra_fields: ["routed_experts"] emits only routed_experts.

Impact:

Plain /v1/chat/completions and /v1/completions requests no longer return nvext by default.
nvext.worker_id, nvext.timing, and nvext.routed_experts remain opt-in via nvext.extra_fields.
query_instance_id behavior is preserved for worker_id + token_ids; timing is now opt-in even in that flow.

Where should the reviewer start?

lib/llm/src/protocols/openai/nvext.rs -- NvExtResponseFieldSelection::from_nvext and has_query_instance_id_annotation.
lib/llm/src/protocols/openai/chat_completions/delta.rs -- response gating in choice_from_postprocessor and the new positive/negative delta tests.
lib/llm/src/protocols/openai/completions/delta.rs -- parallel gating and tests for the text-completions endpoint.

Related Issues:

Closes Regression: OpenAI-compatible chat/completions responses include nvext metadata without opt-in #8249

Validation:

cargo fmt -p dynamo-llm
cargo test -p dynamo-llm --lib -- protocols::openai::nvext:: protocols::openai::chat_completions::delta::tests:: protocols::openai::completions::delta::tests:: (23 tests pass)

copy-pr-bot · 2026-04-15T22:57:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-15T23:02:58Z

Walkthrough

The changes introduce a granular response field selection system for NvExt responses. A new NvExtResponseFieldSelection struct replaces boolean tracking flags in delta generators, enabling conditional field inclusion based on explicit extra_fields requests or query_instance_id annotations. Documentation is updated to reflect support for the new routed_experts field.

Changes

Cohort / File(s)	Summary
Documentation Update `docs/components/frontend/nvext.md`	Extended supported `nvext.extra_fields` to include `"routed_experts"` field alongside existing `"worker_id"` and `"timing"` options.
Core NvExt Protocol `lib/llm/src/protocols/openai/nvext.rs`	Added public `NvExtResponseFieldSelection` struct with constructor `from_nvext()` that computes which response fields to return based on `extra_fields` or `query_instance_id` annotations. Introduced `NvExt::has_query_instance_id_annotation()` helper and corresponding unit tests.
Delta Generator Updates `lib/llm/src/protocols/openai/chat_completions/delta.rs`, `lib/llm/src/protocols/openai/completions/delta.rs`	Replaced `enable_tracking: bool` with `response_fields: NvExtResponseFieldSelection` in `DeltaGeneratorOptions`. Updated `choice_from_postprocessor` to conditionally extract and inject `worker_id`, `token_ids`, `routed_experts`, and `timing_info` based on selection flags. Added unit tests verifying omission of `nvext` when no fields are requested.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(frontend): gate nvext response metadata behind extra_fields' is highly specific and directly reflects the main change: implementing field gating for nvext response metadata based on extra_fields requests.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The pull request description is mostly complete with detailed overview, root cause analysis, implementation details, test coverage, and impact assessment, though the Overview and Details sections are empty in the required template format.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/components/frontend/nvext.md (1)

160-167: ⚠️ Potential issue | 🟡 Minor

Document the query_instance_id exception too.

This section now implies worker_id and timing only come back when explicitly requested via extra_fields, but the implementation also auto-enables worker_id, timing, and token_ids for the query_instance_id flow. That will mislead GAIE callers about the expected response shape.

Suggested doc tweak

-When the client requests response metadata via `extra_fields`, the response includes an `nvext` object with the requested fields:
+When the client requests response metadata via `extra_fields` — or uses the `query_instance_id` flow — the response includes an `nvext` object with the requested or automatically enabled fields:

-| `worker_id` | `extra_fields: ["worker_id"]` | Prefill/decode worker IDs and data parallel ranks that processed the request. |
-| `timing` | `extra_fields: ["timing"]` | Per-request timing information (TTFT, ITL, queue time, etc.). |
+| `worker_id` | `extra_fields: ["worker_id"]` or automatic with `query_instance_id` | Prefill/decode worker IDs and data parallel ranks that processed the request. |
+| `timing` | `extra_fields: ["timing"]` or automatic with `query_instance_id` | Per-request timing information (TTFT, ITL, queue time, etc.). |
 | `routed_experts` | `extra_fields: ["routed_experts"]` | Routed expert capture payload returned by SGLang-backed requests. |
-| `token_ids` | Automatic (GAIE Stage 1) | Tokenized prompt for reuse in Stage 2 query-only mode. |
+| `token_ids` | Automatic with `query_instance_id` (GAIE Stage 1) | Tokenized prompt for reuse in Stage 2 query-only mode. |

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/components/frontend/nvext.md` around lines 160 - 167, Update the nvext
docs to document the query_instance_id exception: clarify that when a request is
part of the query_instance_id flow the server automatically returns
nvext.worker_id, nvext.timing and nvext.token_ids even if they are not listed in
extra_fields; mention that token_ids auto-enable applies to GAIE Stage 1 (and is
used for Stage 2 query-only mode); reference the nvext response object and the
extra_fields parameter so readers know this behavior differs from normal
extra_fields-driven inclusion.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/protocols/openai/nvext.rs`:
- Around line 323-329: The has_query_instance_id_annotation function currently
uses annotation.starts_with("query_instance_id") which is too permissive; change
the check in has_query_instance_id_annotation (and the annotations.iter().any
closure) to only accept an exact key match or the key followed immediately by
the expected delimiter (e.g., "query_instance_id" == annotation ||
annotation.starts_with("query_instance_id=") or other delimiter used by your
annotations format) so strings like "query_instance_identifier" do not match.

---

Outside diff comments:
In `@docs/components/frontend/nvext.md`:
- Around line 160-167: Update the nvext docs to document the query_instance_id
exception: clarify that when a request is part of the query_instance_id flow the
server automatically returns nvext.worker_id, nvext.timing and nvext.token_ids
even if they are not listed in extra_fields; mention that token_ids auto-enable
applies to GAIE Stage 1 (and is used for Stage 2 query-only mode); reference the
nvext response object and the extra_fields parameter so readers know this
behavior differs from normal extra_fields-driven inclusion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: adaf0fc3-9d30-4416-9ff4-683f6081f6a8

📥 Commits

Reviewing files that changed from the base of the PR and between b2f7f22 and 4ad7ead.

📒 Files selected for processing (4)

docs/components/frontend/nvext.md
lib/llm/src/protocols/openai/chat_completions/delta.rs
lib/llm/src/protocols/openai/completions/delta.rs
lib/llm/src/protocols/openai/nvext.rs

biswapanda · 2026-04-16T03:25:31Z

/ok to test 4ad7ead

github-actions · 2026-04-16T03:27:22Z

🌿 Fern Docs Preview: https://nvidia-preview-033e74a7-ff99-4ae2-8842-e4a0af47983e.docs.buildwithfern.com/dynamo/dev

biswapanda · 2026-04-16T20:40:17Z

/ok to test 51c4117

biswapanda · 2026-04-19T05:29:02Z

Rebasing - trying to fix the DCO issue.

biswapanda · 2026-04-19T05:30:34Z

/ok to test 926bbbc

biswapanda · 2026-04-19T20:30:42Z

/ok to test be79fc3

Signed-off-by: AmeenP <ameenp360@gmail.com>

…tive-path tests Address review feedback on PR #8250: - Restore the pre-regression behavior where query_instance_id automatically enables timing in the response (was lost during the refactor to NvExtResponseFieldSelection). - Remove dead any() method that had no callers. - Add positive-path unit tests for each gated field: worker_id, timing, routed_experts, query_instance_id, and combined fields. - Update stale doc comment on NvExtResponse.timing to mention query_instance_id auto-enablement. - Use assert_eq! with struct literals in tests for consistency and more informative failure output.

Address two issues found by codex review: HIGH: has_query_instance_id_annotation() used starts_with("query_instance_id") which would match stray annotations like "query_instance_id_extra:foo" that the router does not recognize. Tightened to starts_with("query_instance_id:") to match the exact key:value format used by PreprocessedRequest::get_annotation_value and the KvPushRouter query-only detection. MEDIUM: timing was auto-enabled for query_instance_id, but the query-only fast path returns LLMEngineOutput::default() with no finish_reason, and timing is only emitted when finish_reason.is_some(). Reverted timing to extra_fields-only opt-in since it can never be emitted on the query-only path. Updated doc comments to accurately reflect this behavior. Added test_nvext_response_field_selection_rejects_stray_annotation to verify the tightened matching.

biswapanda requested a review from a team as a code owner April 15, 2026 22:57

pull-request-size Bot added the size/L label Apr 15, 2026

github-actions Bot added feat documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 15, 2026

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread lib/llm/src/protocols/openai/nvext.rs

biswapanda self-assigned this Apr 16, 2026

biswapanda enabled auto-merge (squash) April 16, 2026 20:32

pull-request-size Bot added size/XL and removed size/L labels Apr 16, 2026

copy-pr-bot Bot temporarily deployed to GITLAB April 16, 2026 20:40 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 16, 2026 20:59 Inactive

biswapanda force-pushed the bis/fix/nvext-response-gating branch from 51c4117 to dce56e0 Compare April 16, 2026 21:11

copy-pr-bot Bot temporarily deployed to GITLAB April 16, 2026 21:11 Inactive

biswapanda assigned biswapanda and unassigned biswapanda Apr 16, 2026

copy-pr-bot Bot temporarily deployed to GITLAB April 16, 2026 21:28 Inactive

GuanLuo reviewed Apr 16, 2026

View reviewed changes

Comment thread lib/llm/src/protocols/openai/completions/delta.rs Outdated

GuanLuo approved these changes Apr 16, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to GITLAB April 19, 2026 05:26 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 19, 2026 05:27 Inactive

biswapanda force-pushed the bis/fix/nvext-response-gating branch from b92a3f8 to 926bbbc Compare April 19, 2026 05:29

copy-pr-bot Bot temporarily deployed to GITLAB April 19, 2026 05:29 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 19, 2026 05:32 Inactive

biswapanda force-pushed the bis/fix/nvext-response-gating branch from 926bbbc to be79fc3 Compare April 19, 2026 05:48

copy-pr-bot Bot temporarily deployed to GITLAB April 19, 2026 05:49 Inactive

biswapanda disabled auto-merge April 19, 2026 20:32

biswapanda enabled auto-merge (squash) April 19, 2026 20:33

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 06:59 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 07:00 Inactive

AmeenP and others added 5 commits April 20, 2026 02:34

fix: gate nvext response metadata behind extra_fields

3fc5488

Signed-off-by: AmeenP <ameenp360@gmail.com>

refactor: simplify nvext response field selection and add gating tests

9a04b65

refactor - address comments

17b5d23

biswapanda force-pushed the bis/fix/nvext-response-gating branch from 5463f96 to 17b5d23 Compare April 20, 2026 09:38

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 09:38 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 09:39 Inactive

biswapanda merged commit f437c8c into main Apr 20, 2026
99 of 101 checks passed

biswapanda deleted the bis/fix/nvext-response-gating branch April 20, 2026 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(frontend): gate nvext response metadata behind extra_fields#8252

feat(frontend): gate nvext response metadata behind extra_fields#8252
biswapanda merged 5 commits into
mainfrom
bis/fix/nvext-response-gating

biswapanda commented Apr 15, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

biswapanda commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

biswapanda commented Apr 16, 2026

Uh oh!

Uh oh!

biswapanda commented Apr 19, 2026

Uh oh!

biswapanda commented Apr 19, 2026

Uh oh!

biswapanda commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

biswapanda commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Validation:

Uh oh!

copy-pr-bot Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

biswapanda commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

biswapanda commented Apr 16, 2026

Uh oh!

Uh oh!

biswapanda commented Apr 19, 2026

Uh oh!

biswapanda commented Apr 19, 2026

Uh oh!

biswapanda commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

biswapanda commented Apr 15, 2026 •

edited

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited

Loading