FP8 generation in vLLM for MoEs

**Is your feature request related to a problem? Please describe.**

Our FP8 implementation requires patching over vLLM's fp8 module for linear layers. For MoEs we have to extend our patching to vLLM's fused moe module:
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/fp8.py#L476

**Describe the solution you'd like**

A clear and concise description of what you want to happen.
Provide a code snippet on how new APIs/changes would be used by others.

**Describe alternatives you've considered**

A clear and concise description of any alternative solutions or features you've considered.

**Additional context**

Add any other context or screenshots about the feature request here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 generation in vLLM for MoEs #978

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FP8 generation in vLLM for MoEs #978

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions