Is your feature request related to a problem? Please describe.
Our FP8 implementation requires patching over vLLM's fp8 module for linear layers. For MoEs we have to extend our patching to vLLM's fused moe module:
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/fp8.py#L476
Describe the solution you'd like
A clear and concise description of what you want to happen.
Provide a code snippet on how new APIs/changes would be used by others.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here
Is your feature request related to a problem? Please describe.
Our FP8 implementation requires patching over vLLM's fp8 module for linear layers. For MoEs we have to extend our patching to vLLM's fused moe module:
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/fp8.py#L476
Describe the solution you'd like
A clear and concise description of what you want to happen.
Provide a code snippet on how new APIs/changes would be used by others.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here