Skip to content

feat(frontend): gate nvext response metadata behind extra_fields#8252

Merged
biswapanda merged 5 commits into
mainfrom
bis/fix/nvext-response-gating
Apr 20, 2026
Merged

feat(frontend): gate nvext response metadata behind extra_fields#8252
biswapanda merged 5 commits into
mainfrom
bis/fix/nvext-response-gating

Conversation

@biswapanda
Copy link
Copy Markdown
Contributor

@biswapanda biswapanda commented Apr 15, 2026

Overview:

Fix regression where plain OpenAI-compatible requests leaked nvext response metadata (worker_id, timing) by default. Introduce NvExtResponseFieldSelection to gate each response field independently behind extra_fields opt-in, while preserving the query_instance_id GAIE exception.

  • Before commit b2f7f22 (regression state) — plain request leaks nvext.timing:
    // curl /v1/chat/completions (no nvext.extra_fields)
    "nvext": { "timing": { "request_received_ms": 1776373206718, "total_time_ms": 10.862539 } }

  • After commit 51c4117 (HEAD, fixed) — plain request omits nvext, opt-in still works:

Details:

Root Cause: The February 4, 2026 per-worker metrics change made request tracking unconditional in the chat/completions delta generators. Response shaping still emitted nvext whenever tracker-backed metadata was present, so plain requests could leak worker_id and timing data by default.

What Changed:

  • Added NvExtResponseFieldSelection in nvext.rs with from_nvext() that walks extra_fields once and maps "worker_id" / "timing" / "routed_experts" to independent flags.
  • Gated worker_id, timing, routed_experts, and token_ids in choice_from_postprocessor for both chat and text completions so each field requires explicit opt-in (or the query_instance_id exception).
  • Preserved the query_instance_id exception: auto-enables worker_id + token_ids. timing is not auto-enabled because the query-only fast path has no finish_reason chunk and timing is only emitted on the final chunk.
  • Tightened annotation matching to the exact "query_instance_id:" key prefix, consistent with PreprocessedRequest::get_annotation_value and KvPushRouter. Prevents stray annotations like "query_instance_id_extra:..." from enabling the exception.
  • Left record_finish() unconditional so timing/ITL accounting and Prometheus metrics do not regress.
  • Documented routed_experts as a third extra_fields value in docs/components/frontend/nvext.md.

Tests:

  • Selection unit tests in nvext.rs cover defaults, each individual extra_fields value, combined fields, the query_instance_id: exception, and the stray-annotation negative case.
  • End-to-end gating tests through choice_from_postprocessor in both chat_completions/delta.rs and completions/delta.rs: plain request omits nvext; extra_fields: ["timing"] emits only timing; query_instance_id:... emits worker_id + token_ids (not timing); extra_fields: ["routed_experts"] emits only routed_experts.

Impact:

  • Plain /v1/chat/completions and /v1/completions requests no longer return nvext by default.
  • nvext.worker_id, nvext.timing, and nvext.routed_experts remain opt-in via nvext.extra_fields.
  • query_instance_id behavior is preserved for worker_id + token_ids; timing is now opt-in even in that flow.

Where should the reviewer start?

  1. lib/llm/src/protocols/openai/nvext.rs -- NvExtResponseFieldSelection::from_nvext and has_query_instance_id_annotation.
  2. lib/llm/src/protocols/openai/chat_completions/delta.rs -- response gating in choice_from_postprocessor and the new positive/negative delta tests.
  3. lib/llm/src/protocols/openai/completions/delta.rs -- parallel gating and tests for the text-completions endpoint.

Related Issues:

Validation:

  • cargo fmt -p dynamo-llm
  • cargo test -p dynamo-llm --lib -- protocols::openai::nvext:: protocols::openai::chat_completions::delta::tests:: protocols::openai::completions::delta::tests:: (23 tests pass)

@biswapanda biswapanda requested a review from a team as a code owner April 15, 2026 22:57
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added feat documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 15, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

Walkthrough

The changes introduce a granular response field selection system for NvExt responses. A new NvExtResponseFieldSelection struct replaces boolean tracking flags in delta generators, enabling conditional field inclusion based on explicit extra_fields requests or query_instance_id annotations. Documentation is updated to reflect support for the new routed_experts field.

Changes

Cohort / File(s) Summary
Documentation Update
docs/components/frontend/nvext.md
Extended supported nvext.extra_fields to include "routed_experts" field alongside existing "worker_id" and "timing" options.
Core NvExt Protocol
lib/llm/src/protocols/openai/nvext.rs
Added public NvExtResponseFieldSelection struct with constructor from_nvext() that computes which response fields to return based on extra_fields or query_instance_id annotations. Introduced NvExt::has_query_instance_id_annotation() helper and corresponding unit tests.
Delta Generator Updates
lib/llm/src/protocols/openai/chat_completions/delta.rs, lib/llm/src/protocols/openai/completions/delta.rs
Replaced enable_tracking: bool with response_fields: NvExtResponseFieldSelection in DeltaGeneratorOptions. Updated choice_from_postprocessor to conditionally extract and inject worker_id, token_ids, routed_experts, and timing_info based on selection flags. Added unit tests verifying omission of nvext when no fields are requested.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(frontend): gate nvext response metadata behind extra_fields' is highly specific and directly reflects the main change: implementing field gating for nvext response metadata based on extra_fields requests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description is mostly complete with detailed overview, root cause analysis, implementation details, test coverage, and impact assessment, though the Overview and Details sections are empty in the required template format.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/components/frontend/nvext.md (1)

160-167: ⚠️ Potential issue | 🟡 Minor

Document the query_instance_id exception too.

This section now implies worker_id and timing only come back when explicitly requested via extra_fields, but the implementation also auto-enables worker_id, timing, and token_ids for the query_instance_id flow. That will mislead GAIE callers about the expected response shape.

Suggested doc tweak
-When the client requests response metadata via `extra_fields`, the response includes an `nvext` object with the requested fields:
+When the client requests response metadata via `extra_fields` — or uses the `query_instance_id` flow — the response includes an `nvext` object with the requested or automatically enabled fields:

-| `worker_id` | `extra_fields: ["worker_id"]` | Prefill/decode worker IDs and data parallel ranks that processed the request. |
-| `timing` | `extra_fields: ["timing"]` | Per-request timing information (TTFT, ITL, queue time, etc.). |
+| `worker_id` | `extra_fields: ["worker_id"]` or automatic with `query_instance_id` | Prefill/decode worker IDs and data parallel ranks that processed the request. |
+| `timing` | `extra_fields: ["timing"]` or automatic with `query_instance_id` | Per-request timing information (TTFT, ITL, queue time, etc.). |
 | `routed_experts` | `extra_fields: ["routed_experts"]` | Routed expert capture payload returned by SGLang-backed requests. |
-| `token_ids` | Automatic (GAIE Stage 1) | Tokenized prompt for reuse in Stage 2 query-only mode. |
+| `token_ids` | Automatic with `query_instance_id` (GAIE Stage 1) | Tokenized prompt for reuse in Stage 2 query-only mode. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/components/frontend/nvext.md` around lines 160 - 167, Update the nvext
docs to document the query_instance_id exception: clarify that when a request is
part of the query_instance_id flow the server automatically returns
nvext.worker_id, nvext.timing and nvext.token_ids even if they are not listed in
extra_fields; mention that token_ids auto-enable applies to GAIE Stage 1 (and is
used for Stage 2 query-only mode); reference the nvext response object and the
extra_fields parameter so readers know this behavior differs from normal
extra_fields-driven inclusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/protocols/openai/nvext.rs`:
- Around line 323-329: The has_query_instance_id_annotation function currently
uses annotation.starts_with("query_instance_id") which is too permissive; change
the check in has_query_instance_id_annotation (and the annotations.iter().any
closure) to only accept an exact key match or the key followed immediately by
the expected delimiter (e.g., "query_instance_id" == annotation ||
annotation.starts_with("query_instance_id=") or other delimiter used by your
annotations format) so strings like "query_instance_identifier" do not match.

---

Outside diff comments:
In `@docs/components/frontend/nvext.md`:
- Around line 160-167: Update the nvext docs to document the query_instance_id
exception: clarify that when a request is part of the query_instance_id flow the
server automatically returns nvext.worker_id, nvext.timing and nvext.token_ids
even if they are not listed in extra_fields; mention that token_ids auto-enable
applies to GAIE Stage 1 (and is used for Stage 2 query-only mode); reference the
nvext response object and the extra_fields parameter so readers know this
behavior differs from normal extra_fields-driven inclusion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: adaf0fc3-9d30-4416-9ff4-683f6081f6a8

📥 Commits

Reviewing files that changed from the base of the PR and between b2f7f22 and 4ad7ead.

📒 Files selected for processing (4)
  • docs/components/frontend/nvext.md
  • lib/llm/src/protocols/openai/chat_completions/delta.rs
  • lib/llm/src/protocols/openai/completions/delta.rs
  • lib/llm/src/protocols/openai/nvext.rs

Comment thread lib/llm/src/protocols/openai/nvext.rs
@biswapanda biswapanda self-assigned this Apr 16, 2026
@biswapanda
Copy link
Copy Markdown
Contributor Author

/ok to test 4ad7ead

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

@biswapanda biswapanda enabled auto-merge (squash) April 16, 2026 20:32
@biswapanda
Copy link
Copy Markdown
Contributor Author

/ok to test 51c4117

Comment thread lib/llm/src/protocols/openai/completions/delta.rs Outdated
@biswapanda
Copy link
Copy Markdown
Contributor Author

Rebasing - trying to fix the DCO issue.

@biswapanda biswapanda force-pushed the bis/fix/nvext-response-gating branch from b92a3f8 to 926bbbc Compare April 19, 2026 05:29
@biswapanda
Copy link
Copy Markdown
Contributor Author

/ok to test 926bbbc

@biswapanda
Copy link
Copy Markdown
Contributor Author

/ok to test be79fc3

AmeenP and others added 5 commits April 20, 2026 02:34
Signed-off-by: AmeenP <ameenp360@gmail.com>
…tive-path tests

Address review feedback on PR #8250:

- Restore the pre-regression behavior where query_instance_id
  automatically enables timing in the response (was lost during
  the refactor to NvExtResponseFieldSelection).
- Remove dead any() method that had no callers.
- Add positive-path unit tests for each gated field: worker_id,
  timing, routed_experts, query_instance_id, and combined fields.
- Update stale doc comment on NvExtResponse.timing to mention
  query_instance_id auto-enablement.
- Use assert_eq! with struct literals in tests for consistency
  and more informative failure output.
Address two issues found by codex review:

HIGH: has_query_instance_id_annotation() used starts_with("query_instance_id")
which would match stray annotations like "query_instance_id_extra:foo" that
the router does not recognize. Tightened to starts_with("query_instance_id:")
to match the exact key:value format used by PreprocessedRequest::get_annotation_value
and the KvPushRouter query-only detection.

MEDIUM: timing was auto-enabled for query_instance_id, but the query-only
fast path returns LLMEngineOutput::default() with no finish_reason, and
timing is only emitted when finish_reason.is_some(). Reverted timing to
extra_fields-only opt-in since it can never be emitted on the query-only
path. Updated doc comments to accurately reflect this behavior.

Added test_nvext_response_field_selection_rejects_stray_annotation to
verify the tightened matching.
@biswapanda biswapanda force-pushed the bis/fix/nvext-response-gating branch from 5463f96 to 17b5d23 Compare April 20, 2026 09:38
@biswapanda biswapanda merged commit f437c8c into main Apr 20, 2026
99 of 101 checks passed
@biswapanda biswapanda deleted the bis/fix/nvext-response-gating branch April 20, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: OpenAI-compatible chat/completions responses include nvext metadata without opt-in

3 participants