[SYCL] Align GEMM dispatch by airMeng · Pull Request #7566 · ggml-org/llama.cpp

airMeng · 2024-05-27T13:13:31Z

The issue is exposed during #6408

I will split #6408 into several smaller PRs for eazy reviewing, the tasks will be updated according to issues exposed

GEMM aligning with CUDA backends
remove all global variables and pass the context instead
leave the async copy to common backend
separate GEMM files outside of ggml-sycl.c for maintainability.

joeatodd

This is a welcome addition since it simplifies the MatMul dispatch 😄

We spotted a couple of issues with the code, I've added comments. I hope they're helpful.

joeatodd · 2024-05-27T13:51:31Z

+    use_mul_mat_vec_q = use_mul_mat_vec_q; // Check dp4a
+    use_mul_mat_q     = use_mul_mat_q    ; // check dp4a


These lines don't do anything.

there will be refactoring about SYCL computation capablities in #6408 keep here for reminder

joeatodd · 2024-05-27T14:02:46Z

+    use_mul_mat_vec_q = use_mul_mat_vec_q; // Check dp4a
+    use_mul_mat_q     = use_mul_mat_q    ; // check dp4a
+#ifdef SYCL_USE_XMX
+    use_mul_mat_q     = use_mul_mat_q     && (!fp16_performance_good || src1->ne[1] <= MMQ_MAX_BATCH_SIZE);


The logic in this block is a little confusing:
(!fp16_performance_good || src1->ne[1] <= MMQ_MAX_BATCH_SIZE).

It says: use MMQ if either 1) FP16 perf is bad, or 2) the number of columns in src1 is less than or equal to the maximum.

Is this the intention?

just align with CUDA logic

AidanBeltonS

I am seeing a new test failures on Arc A770, is this to be expected?

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1]): GGML_ASSERT: /builds/perseus-performance-libraries/llama_ci/llama.cpp/ggml-sycl.cpp:13858: false
ggml_sycl_op_dequantize_mul_mat_vec unsupported GGML_TYPE 16

AidanBeltonS · 2024-05-27T14:16:25Z

+bool ggml_sycl_supports_mmq(enum ggml_type type) {
+    // TODO: accuracy issues in MMQ
+    return false;


Could you elaborate on what accuracy issues you are having with MMQ?

the master using ggml_sycl_op_mul_mat_sycl for these 5 cases, you can try to force using MMQ

MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1]): ggml_sycl_op_mul_mat_sycl OK MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1]): ggml_sycl_op_mul_mat_sycl OK MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1]): ggml_sycl_op_mul_mat_sycl OK MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1]): ggml_sycl_op_mul_mat_sycl OK MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1]): ggml_sycl_op_mul_mat_sycl

airMeng · 2024-05-28T06:07:48Z

I am seeing a new test failures on Arc A770, is this to be expected?

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1]): GGML_ASSERT: /builds/perseus-performance-libraries/llama_ci/llama.cpp/ggml-sycl.cpp:13858: false
ggml_sycl_op_dequantize_mul_mat_vec unsupported GGML_TYPE 16

fixed

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

NeoZhangJianyu

The performance is increased from 34 token/s to 37 on Arc770 with llama2-7b-Q4.
It's good!

* align GEMM dispatch

airMeng requested review from abhilash1910, arthw and ggerganov May 27, 2024 13:14

joeatodd reviewed May 27, 2024

View reviewed changes

github-actions Bot added build Compilation issues SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 27, 2024

AidanBeltonS reviewed May 27, 2024

View reviewed changes

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 28, 2024

airMeng added 5 commits May 28, 2024 13:29

align GEMM dispatch

583c81c

fix typo

abe594a

remove useless use_xmx

19dc47c

update readme

bfed283

typo

4bf6133

airMeng force-pushed the sycl-gemm-dispatch branch from bcadd61 to bfed283 Compare May 28, 2024 06:07

airMeng requested a review from NeoZhangJianyu May 28, 2024 07:30

airMeng added 3 commits May 28, 2024 16:03

revert FORCE_DMMV both in cuda and sycl

c7ed1d8

revert typo

8eb0549

remove fp16 replacing fp32

d0e9e0e

NeoZhangJianyu reviewed May 28, 2024

View reviewed changes

Comment thread ggml-sycl.cpp Outdated

Comment thread ggml-sycl.cpp Outdated

Comment thread ggml-sycl.cpp Outdated

Comment thread ggml-sycl.cpp Outdated

Comment thread ggml-sycl.cpp Outdated

airMeng and others added 5 commits May 28, 2024 20:59

Update ggml-sycl.cpp

6a8432b

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

Update ggml-sycl.cpp

1723c14

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

Update ggml-sycl.cpp

fc08e1a

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

Update ggml-sycl.cpp

d63f6b6

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

rm unused

732c3c9

NeoZhangJianyu approved these changes May 28, 2024

View reviewed changes

airMeng merged commit b864b50 into master May 28, 2024

airMeng deleted the sycl-gemm-dispatch branch May 30, 2024 01:56

airMeng mentioned this pull request Jun 3, 2024

[SYCL] remove global variables #7710

Merged

2 tasks

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

[SYCL] Align GEMM dispatch (ggml-org#7566)

902a057

* align GEMM dispatch

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

[SYCL] Align GEMM dispatch (ggml-org#7566)

9af0e83

* align GEMM dispatch

		use_mul_mat_vec_q = use_mul_mat_vec_q; // Check dp4a
		use_mul_mat_q = use_mul_mat_q ; // check dp4a

Conversation

airMeng commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joeatodd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joeatodd May 27, 2024

Choose a reason for hiding this comment

Uh oh!

airMeng May 27, 2024

Choose a reason for hiding this comment

Uh oh!

joeatodd May 27, 2024

Choose a reason for hiding this comment

Uh oh!

airMeng May 27, 2024

Choose a reason for hiding this comment

Uh oh!

AidanBeltonS left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AidanBeltonS May 27, 2024

Choose a reason for hiding this comment

Uh oh!

airMeng May 27, 2024

Choose a reason for hiding this comment

Uh oh!

airMeng commented May 28, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

airMeng commented May 27, 2024 •

edited

Loading

AidanBeltonS left a comment •

edited

Loading