Fix OpenCL kernels for the new formats#1422
Conversation
skidd-level-100
left a comment
There was a problem hiding this comment.
Looks legit please merge!
edit: downloaded and compiled make with 'make LLAMA_CLBLAST=1'
it works!
|
That works fine for me! |
|
My device failed to create out of order queue, so I fallback to order queue with this patch: Now it fails with program source errors (10 long error blocks): Seem these two statements matters: EDIT: platform and device info:
|
0cc4m
left a comment
There was a problem hiding this comment.
Thanks, you found the q5_1 problem and fixed the kernels, nice. I tested it and didn't find any issues on AMD and Nvidia.
RTX 3060, Llama 7B:
q5_0: 8.18 ms per token on CLBlast, 4.63 ms per token on CuBLAS
q5_1: 8.25 ms per token on CLBlast, 4.54 ms per token on CuBLAS
|
@SlyEcho @0cc4m this works for me, but I have noticed a few people mentioning that they get the error regarding variable length arrays. #1429 (comment) I also noticed that previously the array lengths are indeed hard coded with a constant. Perhaps this is a platform limitation? |
|
@LostRuins I will take care of it. |
|
@SlyEcho Another thing to add - seems like the some people are reporting that the q8_0 dequant kernel is not working correctly - this seems to be the case for me too. Have you observed similar issues? It works correctly on OpenBLAS though, only Clblast is returning gibberish, and only for q8_0. |
* Fix OpenCL kernels for the new formats * Fix Q5_0 alignment issues.
* Fix OpenCL kernels for the new formats * Fix Q5_0 alignment issues.
* Fix OpenCL kernels for the new formats * Fix Q5_0 alignment issues.

This should fix the CLBlast related errors with the new formats.
I also rewrote them to be almost identical to the CUDA versions, so future updates could be easier.
Should fix #1417 #1415
I also figured out the solution to the Q5_0 that required preconversion to a different format with f32 (and
malloc!), the issue was, of course, an alignment issue which an__attribute__((packed))as per the OpenCL 1.1 spec solved.Test results
Test models:
Test data:
head -n 102 wiki.test.raw > wiki.test.miniTest command:
Test outputs:
7B Q4_0
7B Q4_1
7B Q5_0
7B Q5_1
7B Q8_0
7B F16