Vulkan IQ4_NL Support by 0cc4m · Pull Request #8613 · ggml-org/llama.cpp

0cc4m · 2024-07-21T09:02:38Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Add IQ4_NL support to Vulkan to resolve the issue with iq4_nl fallbacks in k-quants (llama : change fallback type IQ4_NL -> Q4_0 #8489 and Bug: QWEN2 quantization GGML_ASSERT #7805 (comment)).
Increase the mat_mul_id matrix multiplication row_ids buffer size to allow larger MoEs (like DeepSeek-Coder-V2-Lite which had the iq4_nl issue) to work with Vulkan.
Fix Vulkan test code that was broken after the last rework

sorasoras · 2024-07-21T10:28:44Z

How much effort is needed to support Iq4xs in additional to iq4nl？
vulkan backend would be a lot more useful if iq4xs is supported.

0cc4m · 2024-07-21T10:57:42Z

How much effort is needed to support Iq4xs in additional to iq4nl？ vulkan backend would be a lot more useful if iq4xs is supported.

Can you elaborate what specific cases that would enable?

sorasoras · 2024-07-21T11:12:58Z

How much effort is needed to support Iq4xs in additional to iq4nl？ vulkan backend would be a lot more useful if iq4xs is supported.

Can you elaborate what specific cases that would enable?

IQ4XS is common used among community due to its small size and better PPL than Q4KM. It s a sweet spot in GGUF quant series.

0cc4m · 2024-07-21T11:52:06Z

How much effort is needed to support Iq4xs in additional to iq4nl？ vulkan backend would be a lot more useful if iq4xs is supported.

Can you elaborate what specific cases that would enable?

IQ4XS is common used among community due to its small size and better PPL than Q4KM. It s a sweet spot in GGUF quant series.

It's quite a bit of effort, but at least it's easier than the other i-quants. I can't do it now, but should be able to at some point in the not-too-distant future.

oldgithubman · 2024-07-21T19:25:33Z

I have read the contributing guidelines

Self-reported review complexity:

Low

Medium

High

Add IQ4_NL support to Vulkan to resolve the issue with iq4_nl fallbacks in k-quants (llama : change fallback type IQ4_NL -> Q4_0 #8489 and Bug: QWEN2 quantization GGML_ASSERT #7805 (comment)).

Increase the mat_mul_id matrix multiplication row_ids buffer size to allow larger MoEs (like DeepSeek-Coder-V2-Lite which had the iq4_nl issue) to work with Vulkan.

Fix Vulkan test code that was broken after the last rework

Where there's a will, there's a way =;-)

slaren · 2024-07-21T20:36:21Z

While testing this I got tests failures with fp16/fp32 mul mat, but it also happens on master.

Vulkan0: NVIDIA GeForce RTX 3090 Ti (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
  MUL_MAT(type_a=f32,type_b=f32,m=71,n=82,k=367,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.791590151 > 0.000500000 FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=73,n=31,k=10,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.618188214 > 0.000500000 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=42,n=85,k=77,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.844454779 > 0.000500000 FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=106,n=17,k=50,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.509752347 > 0.000500000 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=80,n=110,k=345,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.739823578 > 0.000500000 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=73,n=46,k=361,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.642865213 > 0.000500000 FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=18,n=27,k=153,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.802821297 > 0.000500000 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=12,n=80,k=182,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.590408512 > 0.000500000 FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=110,n=42,k=6,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.487417955 > 0.000500000 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=98,n=56,k=484,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 0.988843480 > 0.000500000 FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=8,n=22,k=223,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.419249893 > 0.000500000 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=63,n=51,k=452,bs=[1,1],nr=[1,1]): [MUL_MAT] NMSE = 1.033080833 > 0.000500000 FAIL

* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

LostRuins · 2025-01-28T07:05:29Z

    const uint64_t nei0 = ids->ne[0];
    const uint64_t nei1 = ids->ne[1];
-    GGML_ASSERT(nei0 * nei1 <= 2048);
+    GGML_ASSERT(nei0 * nei1 <= 3072);


Hi @0cc4m can I check, what exactly is this assert testing for?

ref: LostRuins#1337

https://github.com/ggerganov/llama.cpp/blob/3252afb32363230a6e0a3356c75c526049913a89/ggml/src/vulkan-shaders/mul_mm.comp#L74

For the maximum number of row_ids the mat_mul_id shader can handle.

Deepseek 16B MoE (6/64 experts)

nei0 = 6
nei1 = 1024

nei0 x nei1 = 6144

* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

0cc4m added 3 commits July 20, 2024 08:01

Fix Vulkan matmul tests compile errors

c8ee1bc

Add Vulkan IQ4_NL support

6274b3f

Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

3252afb

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 21, 2024

slaren approved these changes Jul 21, 2024

View reviewed changes

0cc4m merged commit 751fcfc into master Jul 23, 2024

0cc4m deleted the 0cc4m/vulkan-iq4_nl branch July 23, 2024 08:56

ggerganov mentioned this pull request Jul 26, 2024

llama : change fallback type IQ4_NL -> Q4_0 #8489

Closed

4 tasks

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024

Vulkan IQ4_NL Support (ggml-org#8613)

4b93675

* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

ggerganov mentioned this pull request Jul 30, 2024

Vulkan Stable Diffusion Operators ggml-org/ggml#904

Merged

LostRuins reviewed Jan 28, 2025

View reviewed changes

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

Vulkan IQ4_NL Support (ggml-org#8613)

b805703

* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan IQ4_NL Support#8613

Vulkan IQ4_NL Support#8613
0cc4m merged 3 commits intomasterfrom
0cc4m/vulkan-iq4_nl

0cc4m commented Jul 21, 2024

Uh oh!

sorasoras commented Jul 21, 2024

Uh oh!

0cc4m commented Jul 21, 2024

Uh oh!

sorasoras commented Jul 21, 2024

Uh oh!

0cc4m commented Jul 21, 2024

Uh oh!

oldgithubman commented Jul 21, 2024

Uh oh!

slaren commented Jul 21, 2024

Uh oh!

LostRuins Jan 28, 2025

Uh oh!

0cc4m Jan 28, 2025

Uh oh!

LostRuins Jan 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

0cc4m commented Jul 21, 2024

Uh oh!

sorasoras commented Jul 21, 2024

Uh oh!

0cc4m commented Jul 21, 2024

Uh oh!

sorasoras commented Jul 21, 2024

Uh oh!

0cc4m commented Jul 21, 2024

Uh oh!

oldgithubman commented Jul 21, 2024

Uh oh!

slaren commented Jul 21, 2024

Uh oh!

LostRuins Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

LostRuins Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants