Skip to content

Conversation

@william-baker-inflection
Copy link

@william-baker-inflection william-baker-inflection commented Apr 30, 2025

Proposing to move all Structured Outputs related logic into v1/structured_output and standardise the interfaces with structured outputs in the rest of the code. Currently there is structure outputs logic scattered and fragmented through the code base with logic in Scheduler, StructuredOutputManager and gpu_model_runner.

Benefits:

  • Simply the GPU runner to have well defined entry-points to structured outputs
  • add flexibility for future structured outputs implementations
  • expose logits directly to structured output backend (the code is already moving in this direction in tpu_model_runner)
    • This in turn will allow the backend to make use of the logits if needed e.g. a verbose mode
  • remove dependency on xgrammar within GPU runner
  • generalise where possible for expandability
    • The init_batch callback could be used to trigger the re-shuffling of the grammar mask asynchronously rather than synchronously in the gpu_model_runner as is currently implemented
  • no performance impact

This involves few changes to the existing backend logic for xgrammar and guidance in vllm/v1/structured_output/backend_guidance.py and vllm/v1/structured_output/backend_xgrammar.py with the largest change being to move bitmasking logic from the gpu_model_runner to vllm/v1/structured_output/bitmasking_grammar.py.

I have tested this with xgrammar and guidance.

WARNING: I have yet to refactor tpu_model_runner with this new logic but I think this will help clean up the code duplication between gpu_model_runner and tpu_model_runner and add any tpu logic into bitmasking_grammar.py. I wanted to gather feedback before taking the time to make changes to the tpu_model_runner.

@william-baker-inflection william-baker-inflection force-pushed the generalized-structured-decoding branch from d2897aa to b350b21 Compare April 30, 2025 21:31
@william-baker-inflection william-baker-inflection force-pushed the generalized-structured-decoding branch 4 times, most recently from 9e001b2 to fe6a707 Compare May 19, 2025 09:33
@william-baker-inflection william-baker-inflection force-pushed the generalized-structured-decoding branch 2 times, most recently from 5d24e8e to 3d6863c Compare May 23, 2025 19:01
Signed-off-by: william-baker-inflection <william.baker@inflection.ai>
@william-baker-inflection william-baker-inflection force-pushed the generalized-structured-decoding branch from 3d6863c to 714292e Compare June 6, 2025 14:48
@github-actions
Copy link

github-actions bot commented Sep 5, 2025

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale label Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants