Conversation
0cc4m
commented
Jul 21, 2024
- I have read the contributing guidelines
- Self-reported review complexity:
- Low
- Medium
- High
- Add IQ4_NL support to Vulkan to resolve the issue with iq4_nl fallbacks in k-quants (llama : change fallback type IQ4_NL -> Q4_0 #8489 and Bug: QWEN2 quantization GGML_ASSERT #7805 (comment)).
- Increase the mat_mul_id matrix multiplication row_ids buffer size to allow larger MoEs (like DeepSeek-Coder-V2-Lite which had the iq4_nl issue) to work with Vulkan.
- Fix Vulkan test code that was broken after the last rework
|
How much effort is needed to support Iq4xs in additional to iq4nl? |
Can you elaborate what specific cases that would enable? |
IQ4XS is common used among community due to its small size and better PPL than Q4KM. It s a sweet spot in GGUF quant series. |
It's quite a bit of effort, but at least it's easier than the other i-quants. I can't do it now, but should be able to at some point in the not-too-distant future. |
Where there's a will, there's a way =;-) |
|
While testing this I got tests failures with fp16/fp32 mul mat, but it also happens on master. |
* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
| const uint64_t nei0 = ids->ne[0]; | ||
| const uint64_t nei1 = ids->ne[1]; | ||
| GGML_ASSERT(nei0 * nei1 <= 2048); | ||
| GGML_ASSERT(nei0 * nei1 <= 3072); |
There was a problem hiding this comment.
Hi @0cc4m can I check, what exactly is this assert testing for?
ref: LostRuins#1337
There was a problem hiding this comment.
For the maximum number of row_ids the mat_mul_id shader can handle.
There was a problem hiding this comment.
Deepseek 16B MoE (6/64 experts)
nei0 = 6
nei1 = 1024
nei0 x nei1 = 6144
* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support