add loongarch lsx and lasx optimize code#6454
Conversation
|
@junchao-loongson Thanks for this PR. Just a heads up I will only be able to get to reviewing this after #6412 and #6414, so it can take me some time - sorry about that. In the meantime feel free to continue review with other devs |
|
Let's resolve the conflicts from the recent |
|
okay, I rebased the code. |
|
test ok |
ggerganov
left a comment
There was a problem hiding this comment.
I don't suppose Github actions support this architecture, but if it does, it would be nice to add CI workflow
Have you done some inference/perplexity runs to make sure the generation looks find?
| typedef union | ||
| { | ||
| int32_t i; | ||
| float f; | ||
| } FloatInt; | ||
| /* float type data load instructions */ | ||
| static __m128 __lsx_vreplfr2vr_s(float val) | ||
| { | ||
| FloatInt fi_tmpval = {.f = val}; | ||
| return (__m128)__lsx_vreplgr2vr_w(fi_tmpval.i); | ||
| } | ||
|
|
||
| static __m256 __lasx_xvreplfr2vr_s(float val) | ||
| { | ||
| FloatInt fi_tmpval = {.f = val}; | ||
| return (__m256)__lasx_xvreplgr2vr_w(fi_tmpval.i); | ||
| } |
There was a problem hiding this comment.
Deduplicate this code by moving it in ggml-impl.h and reusing it in ggml.c and ggml-quants.c
There was a problem hiding this comment.
I was thinking to just deduplicate the __lsx_vreplfr2vr_s and __lasx_xvreplfr2vr_s code. The rest of the lsx/lasx code that is used only inside ggml-quants.c should remain in ggml-quants.c
|
Btw, for long-term support it would be very useful to add CI for this arch. If there is someone who can donate a machine we can deploy |
|
We have loongarch architecture machines available for remote connection, can we use them as ci? |
Great! If you could spare a machine we can add it as a node to the ggml-ci fleet. Easiest way would be if you could give me SSH access so I can log and configure it. If that is possible, send me an email and we can set it up |
|
I apologize for the late reply. We are in the process of checking in with our colleagues who are responsible for this matter and should have it ready within the next week. |
* add loongarch lsx and lasx optimize code * Add loongarch compilation support to makefile * revert stb_image.h * opt bytes_from_nibbles_32 and sum_i16_pairs_float * fix undeclared * format code * update * update 2 --------- Co-authored-by: Jinyang He <hejinyang@loongson.cn>
* add loongarch lsx and lasx optimize code * Add loongarch compilation support to makefile * revert stb_image.h * opt bytes_from_nibbles_32 and sum_i16_pairs_float * fix undeclared * format code * update * update 2 --------- Co-authored-by: Jinyang He <hejinyang@loongson.cn>




Description
Hello, we (@lixing-star @MQ-mengqing) are the developers of the Loongson team.
We have added 128 (LSX) and 256 (LASX) vector optimization codes for the Loongarch architecture.
test-quantize-fns
benchmark
LonngArch Documents