-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
Quantization is not available for transformer backend
Describe the solution you'd like
Add bitsandbytes 4bit quantization to be triggered by the user with the low_vram flag in the model definition
Additionally I propose to use the f16 flag to change the compute_dtype to bfloat16 for better optimization on Nvidia cards
Describe alternatives you've considered
Additional context
I've implemented this while fixing #1774
Issue opened for tracking
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request