Conversation
|
Thanks for your refactoring! I'm aware this is a draft PR test-backend-ops on ARC A770 GPU llama-bench |
|
@AidanBeltonS Thank you for your reminder! I am aware current interaction between SYCL and common is not perfect, you can review the rough design. |
b49f1c0 to
8dfc5a7
Compare
cc8c48b to
f32f17a
Compare
|
@AidanBeltonS @NeoZhangJianyu now |
Yes. |
AidanBeltonS
left a comment
There was a problem hiding this comment.
Thanks for the changes! This is a big but necessary update. I have some minor comments
remove global variables and pack into context
f32f17a to
d342abc
Compare
|
Still crash in multiple GPUs: |
|
@AidanBeltonS @NeoZhangJianyu This PR fixed SYCL broken since #7640 (comment) and I believe it solves #7777 and related, please have a try. Known issues: multi-card support still broken |
I have tested on the A100 GPU and can confirm this fixes #7777 |
* separate DPCT helpers outside * replace global variables with context * remove useless extra * update mul_mat condition * remove duplicate buft initialization * remove duplicate extra and global work group size * remove useless backend check * remove duplicated extras * use macro for group_size and remove cuda-related
* separate DPCT helpers outside * replace global variables with context * remove useless extra * update mul_mat condition * remove duplicate buft initialization * remove duplicate extra and global work group size * remove useless backend check * remove duplicated extras * use macro for group_size and remove cuda-related





Following #7566
Remaining