[RFC][core][V1] generalize structured output manager and backends #1

william-baker-inflection · 2025-04-30T01:08:45Z

Proposing to move all Structured Outputs related logic into v1/structured_output and standardise the interfaces with structured outputs in the rest of the code. Currently there is structure outputs logic scattered and fragmented through the code base with logic in Scheduler, StructuredOutputManager and gpu_model_runner.

Benefits:

Simply the GPU runner to have well defined entry-points to structured outputs
add flexibility for future structured outputs implementations
expose logits directly to structured output backend (the code is already moving in this direction in tpu_model_runner)
- This in turn will allow the backend to make use of the logits if needed e.g. a verbose mode
remove dependency on xgrammar within GPU runner
generalise where possible for expandability
- The init_batch callback could be used to trigger the re-shuffling of the grammar mask asynchronously rather than synchronously in the gpu_model_runner as is currently implemented
no performance impact

This involves few changes to the existing backend logic for xgrammar and guidance in vllm/v1/structured_output/backend_guidance.py and vllm/v1/structured_output/backend_xgrammar.py with the largest change being to move bitmasking logic from the gpu_model_runner to vllm/v1/structured_output/bitmasking_grammar.py.

I have tested this with xgrammar and guidance.

WARNING: I have yet to refactor tpu_model_runner with this new logic but I think this will help clean up the code duplication between gpu_model_runner and tpu_model_runner and add any tpu logic into bitmasking_grammar.py. I wanted to gather feedback before taking the time to make changes to the tpu_model_runner.

Signed-off-by: william-baker-inflection <william.baker@inflection.ai>

github-actions · 2025-09-05T02:06:26Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

william-baker-inflection force-pushed the generalized-structured-decoding branch from d2897aa to b350b21 Compare April 30, 2025 21:31

william-baker-inflection force-pushed the generalized-structured-decoding branch 4 times, most recently from 9e001b2 to fe6a707 Compare May 19, 2025 09:33

william-baker-inflection force-pushed the generalized-structured-decoding branch 2 times, most recently from 5d24e8e to 3d6863c Compare May 23, 2025 19:01

generalize structured output manager and backends

714292e

Signed-off-by: william-baker-inflection <william.baker@inflection.ai>

william-baker-inflection force-pushed the generalized-structured-decoding branch from 3d6863c to 714292e Compare June 6, 2025 14:48

github-actions bot added the stale label Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][core][V1] generalize structured output manager and backends #1

[RFC][core][V1] generalize structured output manager and backends #1

Uh oh!

william-baker-inflection commented Apr 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[RFC][core][V1] generalize structured output manager and backends #1

Are you sure you want to change the base?

[RFC][core][V1] generalize structured output manager and backends #1

Uh oh!

Conversation

william-baker-inflection commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

william-baker-inflection commented Apr 30, 2025 •

edited

Loading