Skip to content

Conversation

@realliujiaxu
Copy link
Contributor

@realliujiaxu realliujiaxu commented Dec 6, 2025

What this PR does / why we need it?

Currently, we are using AscendRejctionSampler that extends from RejctionSampler in spec decoding. AscendRejctionSampler override forward of RejctionSampler, only aming to replace rejection_sample func. This
causes a lot of code of RejctionSampler cannot be reused, for example:

Proposed Change:

Does this PR introduce any user-facing change?

How was this patch tested?

  • test logits processor for spec decoding

  • test logprobs for spec decoding

  • test logprobs for spec decoding + async shcheduling

  • vLLM version: v0.12.0

  • vLLM main: vllm-project/vllm@ad32e3e

@github-actions
Copy link

github-actions bot commented Dec 6, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link

github-actions bot commented Dec 6, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@realliujiaxu realliujiaxu marked this pull request as draft December 6, 2025 06:27
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the rejection sampler implementation to better align with the upstream vLLM project. The custom AscendRejectionSampler class is removed in favor of using the standard vllm.v1.sample.rejection_sampler.RejectionSampler and monkey-patching its dependencies with Ascend-optimized implementations. This is a solid architectural improvement that will enhance maintainability. The sample_tokens method in NPUModelRunner has also been cleanly refactored into smaller, more focused helper methods. The changes appear correct and logically sound. I have not found any issues of high or critical severity.

@realliujiaxu realliujiaxu force-pushed the refactor_rejection_sampler branch from 5f90fbc to bb549e1 Compare December 6, 2025 06:33
@github-actions
Copy link

github-actions bot commented Dec 6, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

github-actions bot commented Dec 6, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@realliujiaxu realliujiaxu force-pushed the refactor_rejection_sampler branch from 651f652 to 147c6f6 Compare December 6, 2025 06:46
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@github-actions
Copy link

github-actions bot commented Dec 6, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant