add loongarch lsx and lasx optimize code by junchao-loongson · Pull Request #6454 · ggml-org/llama.cpp

junchao-loongson · 2024-04-03T07:54:35Z

Description

Hello, we (@lixing-star @MQ-mengqing) are the developers of the Loongson team.

We have added 128 (LSX) and 256 (LASX) vector optimization codes for the Loongarch architecture.

test-quantize-fns

./bin/test-quantize-fns
Testing f32
Testing f16
Testing q4_0
Testing q4_1
Testing q5_0
Testing q5_1
Testing q8_0
Testing q8_1
Testing q2_K
Testing q3_K
Testing q4_K
Testing q5_K
Testing q6_K
Testing q8_K
Testing iq2_xxs
Testing iq2_xs
Testing iq3_xxs
Testing iq1_s
Testing iq4_nl
Testing iq3_s
Testing iq2_s
Testing iq4_xs
Testing i8
Testing i16
Testing i32
Testing i64
Testing f64
Testing iq1_m

benchmark

3A5000

CPU: 
    Loongson-3A5000-HV
uname -a:  
    Linux 5a2k 4.19.0-19-loongson-3 #1 SMP 4.19.190.8.14 Thu Aug 24 08:54:20 UTC 2023 loongarch64 loongarch64 loongarch64 GNU/Linux

./build/bin/benchmark 
main: build = 2606 (e70d50e8)
main: built with cc (Loongnix 8.3.0-6.lnd.vec.37) 8.3.0 for loongarch64-linux-gnu
Starting Test
Allocating Memory of size 800194560 bytes, 763 MB
Creating new tensors

------ Test 1 - Matrix Mult via F32 code
n_threads=1
            m11: type = 0 (  f32) ne = 11008 x  4096 x     1, nb = (    4, 44032, 180355072) - Sum of tensor m11 is 45088768.00
             m2: type = 0 (  f32) ne = 11008 x   128 x     1, nb = (    4, 44032, 5636096) - Sum of tensor m2 is 2818048.00
   gf->nodes[0]: type = 0 (  f32) ne =  4096 x   128 x     1, nb = (    4, 16384, 2097152) - Sum of tensor gf->nodes[0] is 11542724608.00

------ Test 2 - Matrix Mult via q4_1 code
n_threads=1
Matrix Multiplication of (11008,4096,1) x (11008,128,1) - about  11.54 gFLOPS

Iteration;NThreads; SizeX; SizeY; SizeZ; Required_FLOPS; Elapsed_u_Seconds; gigaFLOPS
=====================================================================================
        0;       1; 11008;  4096;   128;    11542724608;            760593;     15.18
        1;       1; 11008;  4096;   128;    11542724608;            758773;     15.21
        2;       1; 11008;  4096;   128;    11542724608;            758563;     15.22
        3;       1; 11008;  4096;   128;    11542724608;            759198;     15.20
        4;       1; 11008;  4096;   128;    11542724608;            758189;     15.22
        5;       1; 11008;  4096;   128;    11542724608;            759360;     15.20
        6;       1; 11008;  4096;   128;    11542724608;            760177;     15.18
        7;       1; 11008;  4096;   128;    11542724608;            757374;     15.24
        8;       1; 11008;  4096;   128;    11542724608;            757833;     15.23
        9;       1; 11008;  4096;   128;    11542724608;            757848;     15.23

Average                                                                         15.21
=====================================================================================

3A6000

CPU: 
    Loongson-3A6000
uname -a:  
    Linux arch6k 6.7.0-rc2-2 #1 SMP PREEMPT Mon, 27 Nov 2023 08:42:49 +0000 loongarch64 GNU/Linux

./bin/benchmark
main: build = 2590 (849cb13)
main: built with cc (GCC) 13.2.1 20230906 for loongarch64-unknown-linux-gnu
Starting Test
Allocating Memory of size 800194560 bytes, 763 MB
Creating new tensors

------ Test 1 - Matrix Mult via F32 code
n_threads=1
            m11: type = 0 (  f32) ne = 11008 x  4096 x     1, nb = (    4, 44032, 180355072) - Sum of tensor m11 is 45088768.00
             m2: type = 0 (  f32) ne = 11008 x   128 x     1, nb = (    4, 44032, 5636096) - Sum of tensor m2 is 2818048.00
   gf->nodes[0]: type = 0 (  f32) ne =  4096 x   128 x     1, nb = (    4, 16384, 2097152) - Sum of tensor gf->nodes[0] is 11542724608.00

------ Test 2 - Matrix Mult via q4_1 code
n_threads=1
Matrix Multiplication of (11008,4096,1) x (11008,128,1) - about  11.54 gFLOPS

Iteration;NThreads; SizeX; SizeY; SizeZ; Required_FLOPS; Elapsed_u_Seconds; gigaFLOPS
=====================================================================================
        0;       1; 11008;  4096;   128;    11542724608;            502525;     22.97
        1;       1; 11008;  4096;   128;    11542724608;            502258;     22.98
        2;       1; 11008;  4096;   128;    11542724608;            502188;     22.98
        3;       1; 11008;  4096;   128;    11542724608;            502212;     22.98
        4;       1; 11008;  4096;   128;    11542724608;            502231;     22.98
        5;       1; 11008;  4096;   128;    11542724608;            502297;     22.98
        6;       1; 11008;  4096;   128;    11542724608;            502201;     22.98
        7;       1; 11008;  4096;   128;    11542724608;            502202;     22.98
        8;       1; 11008;  4096;   128;    11542724608;            502271;     22.98
        9;       1; 11008;  4096;   128;    11542724608;            502237;     22.98

Average                                                                         22.98
=====================================================================================

LonngArch Documents

ggerganov · 2024-04-08T11:54:25Z

@junchao-loongson Thanks for this PR. Just a heads up I will only be able to get to reviewing this after #6412 and #6414, so it can take me some time - sorry about that. In the meantime feel free to continue review with other devs

ggerganov · 2024-05-17T13:09:03Z

Let's resolve the conflicts from the recent __POWER9_VECTOR__ changes and look to merge

junchao-loongson · 2024-05-18T02:33:29Z

okay， I rebased the code.

junchao-loongson · 2024-05-18T03:14:48Z

test ok

github-actions · 2024-05-18T03:58:23Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 531 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8790.8ms p(95)=22532.69ms fails=, finish reason: stop=477 truncated=54
Prompt processing (pp): avg=113.24tk/s p(95)=523.32tk/s
Token generation (tg): avg=32.7tk/s p(95)=50.33tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=ee26b8ff10565458599dabdfaf41f65c2c313060

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716171353 --> 1716171979
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 315.85, 315.85, 315.85, 315.85, 315.85, 662.77, 662.77, 662.77, 662.77, 662.77, 659.21, 659.21, 659.21, 659.21, 659.21, 684.45, 684.45, 684.45, 684.45, 684.45, 708.38, 708.38, 708.38, 708.38, 708.38, 758.25, 758.25, 758.25, 758.25, 758.25, 759.69, 759.69, 759.69, 759.69, 759.69, 776.02, 776.02, 776.02, 776.02, 776.02, 792.59, 792.59, 792.59, 792.59, 792.59, 804.53, 804.53, 804.53, 804.53, 804.53, 805.13, 805.13, 805.13, 805.13, 805.13, 813.38, 813.38, 813.38, 813.38, 813.38, 818.17, 818.17, 818.17, 818.17, 818.17, 812.36, 812.36, 812.36, 812.36, 812.36, 837.44, 837.44, 837.44, 837.44, 837.44, 842.23, 842.23, 842.23, 842.23, 842.23, 844.61, 844.61, 844.61, 844.61, 844.61, 846.66, 846.66, 846.66, 846.66, 846.66, 839.47, 839.47, 839.47, 839.47, 839.47, 821.92, 821.92, 821.92, 821.92, 821.92, 822.25, 822.25, 822.25, 822.25, 822.25, 827.74, 827.74, 827.74, 827.74, 827.74, 827.27, 827.27, 827.27, 827.27, 827.27, 832.48, 832.48, 832.48, 832.48, 832.48, 822.12, 822.12, 822.12, 822.12, 822.12, 825.56, 825.56, 825.56, 825.56, 825.56, 836.9, 836.9, 836.9, 836.9, 836.9, 839.95, 839.95, 839.95, 839.95, 839.95, 841.09, 841.09, 841.09, 841.09, 841.09, 840.72, 840.72, 840.72, 840.72, 840.72, 846.25, 846.25, 846.25, 846.25, 846.25, 846.07, 846.07, 846.07, 846.07, 846.07, 844.07, 844.07, 844.07, 844.07, 844.07, 842.92, 842.92, 842.92, 842.92, 842.92, 839.85, 839.85, 839.85, 839.85, 839.85, 845.58, 845.58, 845.58, 845.58, 845.58, 846.12, 846.12, 846.12, 846.12, 846.12, 827.06, 827.06, 827.06, 827.06, 827.06, 825.79, 825.79, 825.79, 825.79, 825.79, 824.51, 824.51, 824.51, 824.51, 824.51, 830.03, 830.03, 830.03, 830.03, 830.03, 831.97, 831.97, 831.97, 831.97, 831.97, 842.29, 842.29, 842.29, 842.29, 842.29, 845.15, 845.15, 845.15, 845.15, 845.15, 845.43, 845.43, 845.43, 845.43, 845.43, 845.04, 845.04, 845.04, 845.04, 845.04, 842.46, 842.46, 842.46, 842.46, 842.46, 841.04, 841.04, 841.04, 841.04, 841.04, 843.6, 843.6, 843.6, 843.6, 843.6, 840.16, 840.16, 840.16, 840.16, 840.16, 839.55, 839.55, 839.55, 839.55, 839.55, 841.85, 841.85, 841.85, 841.85, 841.85, 844.03, 844.03, 844.03, 844.03, 844.03, 840.08, 840.08, 840.08, 840.08, 840.08, 845.53, 845.53, 845.53, 845.53, 845.53, 845.09, 845.09, 845.09, 845.09, 845.09, 851.21, 851.21, 851.21, 851.21, 851.21, 851.46, 851.46, 851.46, 851.46, 851.46, 850.89, 850.89, 850.89, 850.89, 850.89, 852.2, 852.2, 852.2, 852.2, 852.2, 853.19, 853.19, 853.19, 853.19]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716171353 --> 1716171979
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 44.44, 44.44, 44.44, 44.44, 44.44, 31.52, 31.52, 31.52, 31.52, 31.52, 28.24, 28.24, 28.24, 28.24, 28.24, 30.2, 30.2, 30.2, 30.2, 30.2, 31.38, 31.38, 31.38, 31.38, 31.38, 32.69, 32.69, 32.69, 32.69, 32.69, 34.22, 34.22, 34.22, 34.22, 34.22, 34.57, 34.57, 34.57, 34.57, 34.57, 34.61, 34.61, 34.61, 34.61, 34.61, 34.18, 34.18, 34.18, 34.18, 34.18, 34.26, 34.26, 34.26, 34.26, 34.26, 33.46, 33.46, 33.46, 33.46, 33.46, 33.48, 33.48, 33.48, 33.48, 33.48, 31.94, 31.94, 31.94, 31.94, 31.94, 31.21, 31.21, 31.21, 31.21, 31.21, 30.17, 30.17, 30.17, 30.17, 30.17, 30.02, 30.02, 30.02, 30.02, 30.02, 30.32, 30.32, 30.32, 30.32, 30.32, 30.43, 30.43, 30.43, 30.43, 30.43, 30.21, 30.21, 30.21, 30.21, 30.21, 30.05, 30.05, 30.05, 30.05, 30.05, 29.95, 29.95, 29.95, 29.95, 29.95, 30.21, 30.21, 30.21, 30.21, 30.21, 30.3, 30.3, 30.3, 30.3, 30.3, 30.41, 30.41, 30.41, 30.41, 30.41, 30.74, 30.74, 30.74, 30.74, 30.74, 30.53, 30.53, 30.53, 30.53, 30.53, 30.48, 30.48, 30.48, 30.48, 30.48, 30.66, 30.66, 30.66, 30.66, 30.66, 30.88, 30.88, 30.88, 30.88, 30.88, 30.97, 30.97, 30.97, 30.97, 30.97, 31.06, 31.06, 31.06, 31.06, 31.06, 31.22, 31.22, 31.22, 31.22, 31.22, 31.27, 31.27, 31.27, 31.27, 31.27, 31.06, 31.06, 31.06, 31.06, 31.06, 30.93, 30.93, 30.93, 30.93, 30.93, 30.79, 30.79, 30.79, 30.79, 30.79, 30.54, 30.54, 30.54, 30.54, 30.54, 30.65, 30.65, 30.65, 30.65, 30.65, 30.81, 30.81, 30.81, 30.81, 30.81, 30.9, 30.9, 30.9, 30.9, 30.9, 30.91, 30.91, 30.91, 30.91, 30.91, 30.79, 30.79, 30.79, 30.79, 30.79, 30.48, 30.48, 30.48, 30.48, 30.48, 30.43, 30.43, 30.43, 30.43, 30.43, 29.04, 29.04, 29.04, 29.04, 29.04, 28.75, 28.75, 28.75, 28.75, 28.75, 28.67, 28.67, 28.67, 28.67, 28.67, 28.66, 28.66, 28.66, 28.66, 28.66, 28.65, 28.65, 28.65, 28.65, 28.65, 28.75, 28.75, 28.75, 28.75, 28.75, 28.77, 28.77, 28.77, 28.77, 28.77, 28.81, 28.81, 28.81, 28.81, 28.81, 28.72, 28.72, 28.72, 28.72, 28.72, 28.83, 28.83, 28.83, 28.83, 28.83, 28.79, 28.79, 28.79, 28.79, 28.79, 28.87, 28.87, 28.87, 28.87, 28.87, 29.03, 29.03, 29.03, 29.03, 29.03, 29.1, 29.1, 29.1, 29.1, 29.1, 29.18, 29.18, 29.18, 29.18]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716171353 --> 1716171979
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.07, 0.07, 0.07, 0.07, 0.07, 0.35, 0.35, 0.35, 0.35, 0.35, 0.3, 0.3, 0.3, 0.3, 0.3, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.32, 0.32, 0.32, 0.32, 0.32, 0.25, 0.25, 0.25, 0.25, 0.25, 0.42, 0.42, 0.42, 0.42, 0.42, 0.33, 0.33, 0.33, 0.33, 0.33, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.33, 0.33, 0.33, 0.33, 0.33, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.25, 0.25, 0.25, 0.25, 0.25, 0.28, 0.28, 0.28, 0.28, 0.28, 0.25, 0.25, 0.25, 0.25, 0.25, 0.31, 0.31, 0.31, 0.31, 0.31, 0.2, 0.2, 0.2, 0.2, 0.2, 0.07, 0.07, 0.07, 0.07, 0.07, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.36, 0.36, 0.36, 0.36, 0.36, 0.54, 0.54, 0.54, 0.54, 0.54, 0.57, 0.57, 0.57, 0.57, 0.57, 0.69, 0.69, 0.69, 0.69, 0.69, 0.42, 0.42, 0.42, 0.42, 0.42, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.21, 0.21, 0.21, 0.21, 0.21, 0.27, 0.27, 0.27, 0.27, 0.27, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.25, 0.25, 0.25, 0.25]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716171353 --> 1716171979
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0]

ggerganov

I don't suppose Github actions support this architecture, but if it does, it would be nice to add CI workflow

Have you done some inference/perplexity runs to make sure the generation looks find?

ggerganov · 2024-05-19T06:59:48Z

+typedef union
+{
+    int32_t i;
+    float f;
+} FloatInt;
+/* float type data load instructions */
+static __m128 __lsx_vreplfr2vr_s(float val)
+{
+    FloatInt fi_tmpval = {.f = val};
+    return (__m128)__lsx_vreplgr2vr_w(fi_tmpval.i);
+}
+
+static __m256 __lasx_xvreplfr2vr_s(float val)
+{
+    FloatInt fi_tmpval = {.f = val};
+    return (__m256)__lasx_xvreplgr2vr_w(fi_tmpval.i);
+}


Deduplicate this code by moving it in ggml-impl.h and reusing it in ggml.c and ggml-quants.c

I was thinking to just deduplicate the __lsx_vreplfr2vr_s and __lasx_xvreplfr2vr_s code. The rest of the lsx/lasx code that is used only inside ggml-quants.c should remain in ggml-quants.c

ggerganov · 2024-05-19T09:21:43Z

Btw, for long-term support it would be very useful to add CI for this arch. If there is someone who can donate a machine we can deploy ggml-ci on it and have it run tests on each commit. Without CI, the code can quickly get outdated and break

junchao-loongson · 2024-05-20T02:16:13Z

We have loongarch architecture machines available for remote connection, can we use them as ci?

ggerganov · 2024-05-20T07:21:43Z

We have loongarch architecture machines available for remote connection, can we use them as ci?

Great! If you could spare a machine we can add it as a node to the ggml-ci fleet. Easiest way would be if you could give me SSH access so I can log and configure it. If that is possible, send me an email and we can set it up

junchao-loongson · 2024-05-24T01:38:28Z

I apologize for the late reply. We are in the process of checking in with our colleagues who are responsible for this matter and should have it ready within the next week.

* add loongarch lsx and lasx optimize code * Add loongarch compilation support to makefile * revert stb_image.h * opt bytes_from_nibbles_32 and sum_i16_pairs_float * fix undeclared * format code * update * update 2 --------- Co-authored-by: Jinyang He <hejinyang@loongson.cn>

cebtenzzre reviewed Apr 4, 2024

View reviewed changes

Comment thread CMakeLists.txt Outdated

Comment thread common/stb_image.h Outdated

junchao-loongson requested a review from cebtenzzre April 8, 2024 07:45

ggerganov mentioned this pull request Apr 10, 2024

Improve cpu prompt eval speed #6414

Merged

mofosyne added performance Speed related topics Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 10, 2024

cebtenzzre removed their request for review May 10, 2024 15:44

junchao-loongson and others added 4 commits May 18, 2024 10:28

add loongarch lsx and lasx optimize code

ee42f24

Add loongarch compilation support to makefile

a719e98

revert stb_image.h

4cfd8b9

opt bytes_from_nibbles_32 and sum_i16_pairs_float

e8ed670

junchao-loongson force-pushed the master branch from 2cb9174 to e8ed670 Compare May 18, 2024 02:30

fix undeclared

fdef762

ggerganov reviewed May 18, 2024

View reviewed changes

Comment thread ggml-quants.c Outdated

Comment thread ggml.c Outdated

format code

3b6199b

github-actions Bot added build Compilation issues ggml changes relating to the ggml tensor library for machine learning labels May 18, 2024

junchao-loongson requested a review from ggerganov May 19, 2024 06:51

ggerganov reviewed May 19, 2024

View reviewed changes

Comment thread CMakeLists.txt Outdated

ggerganov reviewed May 19, 2024

View reviewed changes

update

8a0d9a3

update 2

ee26b8f

ggerganov approved these changes May 20, 2024

View reviewed changes

ggerganov merged commit 65c5820 into ggml-org:master May 20, 2024

HougeLangley mentioned this pull request May 21, 2024

Please support LoongArch ISA ollama/ollama#4552

Open

xen0n mentioned this pull request May 26, 2024

Content suggestion for This Week in LoongArch newsletter / 《每周一龙》新闻线索信箱 loongson-community/areweloongyet#16

Open

HougeLangley mentioned this pull request Jun 15, 2024

Add LoongArch64 ISA Support ollama/ollama#5067

Closed

wojiushixiaobai mentioned this pull request May 31, 2025

ci: add LoongArch cross-compile build #13944

Merged

Conversation

junchao-loongson commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

test-quantize-fns

benchmark

LonngArch Documents

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Apr 8, 2024

Uh oh!

ggerganov commented May 17, 2024

Uh oh!

junchao-loongson commented May 18, 2024

Uh oh!

junchao-loongson commented May 18, 2024

Uh oh!

github-actions Bot commented May 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov May 19, 2024

Choose a reason for hiding this comment

Uh oh!

ggerganov May 19, 2024

Choose a reason for hiding this comment

Uh oh!

ggerganov commented May 19, 2024

Uh oh!

junchao-loongson commented May 20, 2024

Uh oh!

ggerganov commented May 20, 2024

Uh oh!

junchao-loongson commented May 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

junchao-loongson commented Apr 3, 2024 •

edited

Loading

github-actions Bot commented May 18, 2024 •

edited

Loading