Skip to content

[codex] gate nvext response metadata behind extra_fields#8250

Draft
AmeenP wants to merge 1 commit into
ai-dynamo:mainfrom
AmeenP:fix/nvext-response-gating
Draft

[codex] gate nvext response metadata behind extra_fields#8250
AmeenP wants to merge 1 commit into
ai-dynamo:mainfrom
AmeenP:fix/nvext-response-gating

Conversation

@AmeenP
Copy link
Copy Markdown
Contributor

@AmeenP AmeenP commented Apr 15, 2026

Summary

  • restore opt-in gating for nvext response fields in OpenAI-compatible chat/completions responses
  • keep RequestTracker always enabled so internal per-worker metrics still work
  • add a small doc clarification for routed_experts
  • add one smoke test in each delta generator to cover the default no-extra_fields path

Root Cause

The February 4, 2026 per-worker metrics change made request tracking unconditional in the chat/completions delta generators. Response shaping still emitted nvext whenever tracker-backed metadata was present, so plain OpenAI-compatible requests could leak worker_id and timing data by default.

What Changed

  • introduced a shared NvExtResponseFieldSelection helper to compute which response fields are allowed for a request
  • gated worker_id, timing, routed_experts, and token_ids independently from tracker existence
  • preserved the query_instance_id exception for worker_id and token_ids
  • left record_finish() unconditional so timing/ITL accounting and Prometheus metrics do not regress

Impact

  • plain /v1/chat/completions and /v1/completions requests no longer return nvext by default
  • nvext.worker_id, nvext.timing, and nvext.routed_experts remain opt-in via nvext.extra_fields
  • query_instance_id behavior is preserved

Closes #8249.

Validation

  • cargo check -p dynamo-llm --no-default-features --lib --tests

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi AmeenP! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 15, 2026

Self {
worker_id: has_extra_field("worker_id") || query_instance_id,
timing: has_extra_field("timing"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: query_instance_id no longer auto-enables timing in the response

The original (pre-regression) behavior of enable_tracking was:

let enable_tracking = timing_in_extra_fields || has_query_instance_id;

But NvExtResponseFieldSelection::from_nvext sets:

timing: has_extra_field("timing"),  // no query_instance_id exception

This means GAIE Stage 1 (query_instance_id) requests will no longer receive timing info in the response. The original code intentionally auto-enabled timing for query_instance_id.

If this is an intentional behavior change, please call it out explicitly in the PR description so reviewers can verify the GAIE Stage 1 contract doesn't depend on auto-returned timing.

If unintentional, the fix is:

Self {
    worker_id: has_extra_field("worker_id") || query_instance_id,
    timing: has_extra_field("timing") || query_instance_id,
    token_ids: query_instance_id,
    routed_experts: has_extra_field("routed_experts"),
}

The unit test test_nvext_response_field_selection_query_instance_id_exception asserts !selection.timing, so it would need updating too.


pub fn any(&self) -> bool {
self.worker_id || self.timing || self.token_ids || self.routed_experts
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: any() is dead code

This method is defined but never called anywhere in the diff (or the existing codebase). Consider removing it to avoid dead-code warnings, or add a #[allow(dead_code)] with a comment explaining the intended use if it's planned for future use.

enable_logprobs: self.inner.logprobs.unwrap_or(false)
|| self.inner.top_logprobs.unwrap_or(0) > 0,
enable_tracking,
response_fields,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add positive-path tests for each gated field

The smoke test test_plain_request_without_extra_fields_omits_nvext is a great regression test, but there are no tests verifying the opt-in paths actually work. I'd recommend at minimum:

  • extra_fields: ["worker_id"] returns worker_id (and NOT timing)
  • extra_fields: ["timing"] returns timing (and NOT worker_id)
  • extra_fields: ["routed_experts"] returns routed_experts
  • query_instance_id annotation returns worker_id + token_ids without extra_fields
  • Multiple extra_fields combined work together

Without positive-path tests, a future refactor could accidentally break the opt-in behavior and the test suite wouldn't catch it.

biswapanda added a commit that referenced this pull request Apr 15, 2026
…tive-path tests

Address review feedback on PR #8250:

- Restore the pre-regression behavior where query_instance_id
  automatically enables timing in the response (was lost during
  the refactor to NvExtResponseFieldSelection).
- Remove dead any() method that had no callers.
- Add positive-path unit tests for each gated field: worker_id,
  timing, routed_experts, query_instance_id, and combined fields.
- Update stale doc comment on NvExtResponse.timing to mention
  query_instance_id auto-enablement.
- Use assert_eq! with struct literals in tests for consistency
  and more informative failure output.
biswapanda added a commit that referenced this pull request Apr 16, 2026
…tive-path tests

Address review feedback on PR #8250:

- Restore the pre-regression behavior where query_instance_id
  automatically enables timing in the response (was lost during
  the refactor to NvExtResponseFieldSelection).
- Remove dead any() method that had no callers.
- Add positive-path unit tests for each gated field: worker_id,
  timing, routed_experts, query_instance_id, and combined fields.
- Update stale doc comment on NvExtResponse.timing to mention
  query_instance_id auto-enablement.
- Use assert_eq! with struct literals in tests for consistency
  and more informative failure output.
biswapanda added a commit that referenced this pull request Apr 19, 2026
…tive-path tests

Address review feedback on PR #8250:

- Restore the pre-regression behavior where query_instance_id
  automatically enables timing in the response (was lost during
  the refactor to NvExtResponseFieldSelection).
- Remove dead any() method that had no callers.
- Add positive-path unit tests for each gated field: worker_id,
  timing, routed_experts, query_instance_id, and combined fields.
- Update stale doc comment on NvExtResponse.timing to mention
  query_instance_id auto-enablement.
- Use assert_eq! with struct literals in tests for consistency
  and more informative failure output.
biswapanda added a commit that referenced this pull request Apr 19, 2026
…tive-path tests

Address review feedback on PR #8250:

- Restore the pre-regression behavior where query_instance_id
  automatically enables timing in the response (was lost during
  the refactor to NvExtResponseFieldSelection).
- Remove dead any() method that had no callers.
- Add positive-path unit tests for each gated field: worker_id,
  timing, routed_experts, query_instance_id, and combined fields.
- Update stale doc comment on NvExtResponse.timing to mention
  query_instance_id auto-enablement.
- Use assert_eq! with struct literals in tests for consistency
  and more informative failure output.
biswapanda added a commit that referenced this pull request Apr 20, 2026
…tive-path tests

Address review feedback on PR #8250:

- Restore the pre-regression behavior where query_instance_id
  automatically enables timing in the response (was lost during
  the refactor to NvExtResponseFieldSelection).
- Remove dead any() method that had no callers.
- Add positive-path unit tests for each gated field: worker_id,
  timing, routed_experts, query_instance_id, and combined fields.
- Update stale doc comment on NvExtResponse.timing to mention
  query_instance_id auto-enablement.
- Use assert_eq! with struct literals in tests for consistency
  and more informative failure output.
@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the Stale label May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation external-contribution Pull request is from an external contributor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: OpenAI-compatible chat/completions responses include nvext metadata without opt-in

2 participants