Hi, I've implemented more quant types IQ1_S, IQ1_M, IQ2_XXS, IQ2_S, IQ3_XXS, IQ3_S in torch at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/qwen3_moe_fused/quantize_gguf/dequant.py
Also I've found that torch.compile cannot handle some complicated view operations, and it silently gives wrong numbers or nan. I've added view_float16 in dequant functions like Q6_K to work around it, and rewritten the IQ4_XS dequant function. Now all dequant functions work with torch.compile correctly. Later we should find some minimal reproducers and report to PyTorch.
There is a unit test at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/test_gguf_dequant.py
I'm not sure if we're interested in quantizing diffusion models into IQ3 or smaller quants, but this at least allows us to load some existing LLMs as text encoders, especially for Unsloth UD quants. I can make a PR to this repo if needed.
Hi, I've implemented more quant types
IQ1_S, IQ1_M, IQ2_XXS, IQ2_S, IQ3_XXS, IQ3_Sin torch at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/qwen3_moe_fused/quantize_gguf/dequant.pyAlso I've found that
torch.compilecannot handle some complicatedviewoperations, and it silently gives wrong numbers or nan. I've addedview_float16in dequant functions like Q6_K to work around it, and rewritten the IQ4_XS dequant function. Now all dequant functions work withtorch.compilecorrectly. Later we should find some minimal reproducers and report to PyTorch.There is a unit test at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/test_gguf_dequant.py
I'm not sure if we're interested in quantizing diffusion models into IQ3 or smaller quants, but this at least allows us to load some existing LLMs as text encoders, especially for Unsloth UD quants. I can make a PR to this repo if needed.