Conversation
17446a6 to
9015568
Compare
|
Could you please help attach the accuracy test results on gms8k? Do we support MTP=1 or MTP=1/2/3? How about the acceptance ratio? |
|
I will turn this PR to draft and go through CI after the code review is done. |
Sure I will attach the acc results later. Now we support MTP=1/2/3, but the acceptance rate is low (about 20% for first draft token and 0 for other tokens) and I'm working on it. |
| self.model_arch = model_arch | ||
| # if self.forced_model_arch is not None: | ||
| # model_arch = self.forced_model_arch | ||
| # logger.info(f"Using forced model arch: {model_arch} for vLLM plugin mode") |
There was a problem hiding this comment.
should be removed?
| # can coexist in one process. Resolve per-forward config first to avoid | ||
| # reading a stale global singleton. | ||
| if not is_vllm(): | ||
| return None |
There was a problem hiding this comment.
here should have an assertion to avoid non-vllm backend calling this method
it should only be called by atom-vllm backend
|
|
||
|
|
||
| def get_current_atom_config() -> Config: | ||
| forward_atom_config = _get_current_atom_config_from_vllm_forward_context() |
There was a problem hiding this comment.
Here maybe a little bit risky. If the forward_atom_config is None, and there is no assertion, it will silent fallback to the global singleton _current_atom_config. Can we add some log here? Or make it more safe
In ideal situation, the lifecycle and ownership forward_atom_config is belong to the model itself and the main model will get its atom config, draft model will get its config. While if draft model cannot get its config, it will fallback to the _current_atom_config, which not be correct
There was a problem hiding this comment.
Do you mean that we should always obtain forward_atom_config from vllm_forward_context? Will there be scenarios that we need to return the default global _current_atom_config?
There was a problem hiding this comment.
Let us add some comments here to warn a case: there is no atom config in forward context, so the default global config will be provided. With this warning, we can mitigate the coherent issue for local value and its twins global value.
| position_offset = getattr(self.model, "vllm_draft_position_offset", 0) | ||
| if position_offset == 0: | ||
| return positions | ||
| return positions + position_offset |
There was a problem hiding this comment.
can we leave some comments here for the position offset
There was a problem hiding this comment.
not needed anymore. removed.
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
2c1db99 to
90aa06b
Compare
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Motivation
This PR enables MTP feature for running DeepSeekV3 and GLM5 with vLLM+atom.
Technical Details
atom_configrelated bugfix.full_cls_nameof different MLA sparse attention backends.index_bufferforDeepseekMTP.Test Plan
Comming soon.
Test Result
Accuracy test commands:
Accuracy test result with mtp=3:
Accuracy test commands:
Accuracy test result with mtp=3:
Submission Checklist