-
Notifications
You must be signed in to change notification settings - Fork 629
refactor rejection sampler #4758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor rejection sampler #4758
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the rejection sampler implementation to better align with the upstream vLLM project. The custom AscendRejectionSampler class is removed in favor of using the standard vllm.v1.sample.rejection_sampler.RejectionSampler and monkey-patching its dependencies with Ascend-optimized implementations. This is a solid architectural improvement that will enhance maintainability. The sample_tokens method in NPUModelRunner has also been cleanly refactored into smaller, more focused helper methods. The changes appear correct and logically sound. I have not found any issues of high or critical severity.
5f90fbc to
bb549e1
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
bb549e1 to
418f847
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
418f847 to
95ed77f
Compare
651f652 to
147c6f6
Compare
1362dab to
388964c
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
Currently, we are using
AscendRejctionSamplerthat extends fromRejctionSamplerin spec decoding.AscendRejctionSampleroverrideforwardofRejctionSampler, only aming to replacerejection_samplefunc. Thiscauses a lot of code of
RejctionSamplercannot be reused, for example:Proposed Change:
AscendRejctionSamplerand useRejctionSamplerdirectly in model runner.RejctionSampler.expand_batch_to_tokensandRejctionSampler.rejection_sample, maybe a better way is to make them as custom ops.NPUModelRunnerfollowing [V1][spec decode] return logprobs for spec decoding vllm#26060Does this PR introduce any user-facing change?
How was this patch tested?
test logits processor for spec decoding
test logprobs for spec decoding
test logprobs for spec decoding + async shcheduling
vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e