Skip to content

Add Q4_3 support to cuBLAS#1086

Merged
slaren merged 1 commit intoggml-org:masterfrom
slaren:cuda-q4_3
Apr 20, 2023
Merged

Add Q4_3 support to cuBLAS#1086
slaren merged 1 commit intoggml-org:masterfrom
slaren:cuda-q4_3

Conversation

@slaren
Copy link
Copy Markdown
Member

@slaren slaren commented Apr 20, 2023

Also changed the Makefile to link to the cuda dynamic libraries, linking is much faster that way and there is no reason to link statically for local use.

@slaren slaren merged commit 2005469 into ggml-org:master Apr 20, 2023
@slaren slaren deleted the cuda-q4_3 branch April 20, 2023 18:59
@slaren
Copy link
Copy Markdown
Member Author

slaren commented Apr 20, 2023

7B q4_3 perplexity with cuBLAS: 6.0617

Details main: seed = 1682015944 llama.cpp: loading model from models/7B/ggml-model-q4_3.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 6 (mostly Q4_3) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 4936267.11 KB llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state) .................................................................................................... llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.65 seconds per pass - ETA 1.03 hours
[1]4.3508,[2]4.7736,[3]5.6662,[4]6.2864,[5]6.4227,[6]6.3703,[7]6.5471,[8]6.6450,[9]6.9846,[10]7.2508,[11]7.4526,[12]7.4782,[13]7.3964,[14]7.4641,[15]7.7127,[16]7.3279,[17]7.2089,[18]7.1596,[19]6.8043,[20]6.7911,[21]6.6988,[22]6.5289,[23]6.5003,[24]6.4051,[25]6.4152,[26]6.2542,[27]6.0810,[28]5.9814,[29]5.8920,[30]5.7339,[31]5.7033,[32]5.7208,[33]5.6628,[34]5.6958,[35]5.7176,[36]5.7589,[37]5.7625,[38]5.7697,[39]5.8023,[40]5.8532,[41]5.8655,[42]5.9053,[43]5.8678,[44]5.9249,[45]5.9253,[46]5.8997,[47]5.9190,[48]5.8934,[49]5.8921,[50]5.8504,[51]5.8450,[52]5.8340,[53]5.8786,[54]5.8589,[55]5.8354,[56]5.8628,[57]5.8831,[58]5.9029,[59]5.9213,[60]5.9619,[61]5.9535,[62]6.0125,[63]6.0403,[64]6.0535,[65]6.0964,[66]6.1040,[67]6.1232,[68]6.1375,[69]6.1625,[70]6.1932,[71]6.2158,[72]6.2465,[73]6.3047,[74]6.3088,[75]6.3243,[76]6.3367,[77]6.3494,[78]6.3360,[79]6.3622,[80]6.3551,[81]6.3681,[82]6.3726,[83]6.3218,[84]6.3046,[85]6.2919,[86]6.2698,[87]6.2102,[88]6.1858,[89]6.1659,[90]6.1492,[91]6.1729,[92]6.1674,[93]6.1685,[94]6.1660,[95]6.1934,[96]6.1921,[97]6.1884,[98]6.1818,[99]6.1686,[100]6.1689,[101]6.1924,[102]6.1878,[103]6.2070,[104]6.2139,[105]6.2127,[106]6.2300,[107]6.2305,[108]6.2437,[109]6.2370,[110]6.2308,[111]6.2521,[112]6.2721,[113]6.2745,[114]6.2706,[115]6.2764,[116]6.2675,[117]6.2732,[118]6.3004,[119]6.3223,[120]6.3569,[121]6.3717,[122]6.3952,[123]6.4325,[124]6.4500,[125]6.4400,[126]6.4794,[127]6.5153,[128]6.5454,[129]6.5295,[130]6.5373,[131]6.5317,[132]6.5242,[133]6.5118,[134]6.5214,[135]6.5170,[136]6.5053,[137]6.4969,[138]6.4791,[139]6.4690,[140]6.4653,[141]6.4379,[142]6.4332,[143]6.4042,[144]6.3833,[145]6.3751,[146]6.3640,[147]6.3670,[148]6.3670,[149]6.3619,[150]6.3581,[151]6.3602,[152]6.3507,[153]6.3348,[154]6.3266,[155]6.3332,[156]6.3287,[157]6.3453,[158]6.3493,[159]6.3545,[160]6.3568,[161]6.3683,[162]6.3404,[163]6.3282,[164]6.3047,[165]6.2742,[166]6.2474,[167]6.2103,[168]6.1802,[169]6.1658,[170]6.1544,[171]6.1274,[172]6.1095,[173]6.0937,[174]6.0644,[175]6.0429,[176]6.0309,[177]6.0117,[178]5.9891,[179]5.9722,[180]5.9629,[181]5.9418,[182]5.9239,[183]5.9101,[184]5.9087,[185]5.9011,[186]5.9014,[187]5.9079,[188]5.9040,[189]5.9216,[190]5.9226,[191]5.9437,[192]5.9593,[193]5.9758,[194]5.9870,[195]6.0082,[196]6.0240,[197]6.0442,[198]6.0589,[199]6.0623,[200]6.0667,[201]6.0619,[202]6.0804,[203]6.0877,[204]6.0862,[205]6.0971,[206]6.1038,[207]6.0998,[208]6.1083,[209]6.1123,[210]6.1173,[211]6.1272,[212]6.1341,[213]6.1443,[214]6.1468,[215]6.1489,[216]6.1636,[217]6.1807,[218]6.1940,[219]6.1937,[220]6.1898,[221]6.1845,[222]6.1826,[223]6.1738,[224]6.1672,[225]6.1636,[226]6.1836,[227]6.1922,[228]6.1972,[229]6.2034,[230]6.2003,[231]6.2163,[232]6.2049,[233]6.1881,[234]6.1737,[235]6.1548,[236]6.1483,[237]6.1385,[238]6.1405,[239]6.1261,[240]6.1158,[241]6.1175,[242]6.1211,[243]6.1193,[244]6.1085,[245]6.1052,[246]6.0943,[247]6.0828,[248]6.0762,[249]6.0737,[250]6.0782,[251]6.0713,[252]6.0677,[253]6.0581,[254]6.0524,[255]6.0405,[256]6.0225,[257]6.0106,[258]6.0025,[259]6.0002,[260]5.9921,[261]5.9880,[262]5.9823,[263]5.9770,[264]5.9585,[265]5.9581,[266]5.9564,[267]5.9498,[268]5.9584,[269]5.9570,[270]5.9575,[271]5.9655,[272]5.9693,[273]5.9692,[274]5.9715,[275]5.9798,[276]5.9858,[277]6.0012,[278]6.0115,[279]6.0208,[280]6.0234,[281]6.0338,[282]6.0396,[283]6.0546,[284]6.0629,[285]6.0712,[286]6.0842,[287]6.0843,[288]6.0899,[289]6.0817,[290]6.0667,[291]6.0517,[292]6.0369,[293]6.0240,[294]6.0261,[295]6.0251,[296]6.0296,[297]6.0280,[298]6.0312,[299]6.0286,[300]6.0177,[301]6.0176,[302]6.0096,[303]6.0007,[304]5.9919,[305]5.9884,[306]5.9764,[307]5.9785,[308]5.9813,[309]5.9656,[310]5.9602,[311]5.9537,[312]5.9559,[313]5.9501,[314]5.9487,[315]5.9329,[316]5.9278,[317]5.9121,[318]5.8926,[319]5.9047,[320]5.9169,[321]5.9211,[322]5.9170,[323]5.9105,[324]5.9076,[325]5.9178,[326]5.9179,[327]5.9201,[328]5.9238,[329]5.9298,[330]5.9332,[331]5.9455,[332]5.9429,[333]5.9499,[334]5.9447,[335]5.9389,[336]5.9426,[337]5.9404,[338]5.9398,[339]5.9350,[340]5.9308,[341]5.9389,[342]5.9418,[343]5.9461,[344]5.9465,[345]5.9471,[346]5.9448,[347]5.9487,[348]5.9521,[349]5.9545,[350]5.9511,[351]5.9518,[352]5.9523,[353]5.9463,[354]5.9476,[355]5.9528,[356]5.9561,[357]5.9526,[358]5.9620,[359]5.9645,[360]5.9613,[361]5.9611,[362]5.9679,[363]5.9788,[364]5.9850,[365]5.9900,[366]5.9913,[367]5.9996,[368]5.9970,[369]5.9978,[370]5.9996,[371]5.9943,[372]5.9991,[373]6.0038,[374]6.0023,[375]6.0024,[376]6.0089,[377]6.0040,[378]6.0068,[379]6.0129,[380]6.0055,[381]6.0024,[382]5.9973,[383]5.9965,[384]5.9961,[385]5.9949,[386]5.9946,[387]5.9943,[388]5.9910,[389]5.9860,[390]5.9792,[391]5.9716,[392]5.9678,[393]5.9660,[394]5.9689,[395]5.9676,[396]5.9600,[397]5.9668,[398]5.9712,[399]5.9791,[400]5.9790,[401]5.9803,[402]5.9812,[403]5.9831,[404]5.9893,[405]5.9802,[406]5.9771,[407]5.9765,[408]5.9782,[409]5.9896,[410]6.0009,[411]6.0122,[412]6.0279,[413]6.0388,[414]6.0465,[415]6.0520,[416]6.0600,[417]6.0719,[418]6.0754,[419]6.0824,[420]6.0911,[421]6.1027,[422]6.1063,[423]6.1133,[424]6.1237,[425]6.1329,[426]6.1392,[427]6.1437,[428]6.1517,[429]6.1569,[430]6.1651,[431]6.1789,[432]6.1825,[433]6.1816,[434]6.1774,[435]6.1783,[436]6.1807,[437]6.1904,[438]6.1979,[439]6.1947,[440]6.1936,[441]6.1887,[442]6.1875,[443]6.1887,[444]6.1895,[445]6.1874,[446]6.1898,[447]6.1928,[448]6.1966,[449]6.1941,[450]6.1948,[451]6.1908,[452]6.1781,[453]6.1700,[454]6.1645,[455]6.1652,[456]6.1699,[457]6.1716,[458]6.1697,[459]6.1704,[460]6.1789,[461]6.1763,[462]6.1750,[463]6.1785,[464]6.1774,[465]6.1748,[466]6.1672,[467]6.1679,[468]6.1676,[469]6.1697,[470]6.1701,[471]6.1654,[472]6.1699,[473]6.1645,[474]6.1658,[475]6.1597,[476]6.1613,[477]6.1543,[478]6.1535,[479]6.1596,[480]6.1640,[481]6.1656,[482]6.1613,[483]6.1571,[484]6.1590,[485]6.1572,[486]6.1515,[487]6.1514,[488]6.1491,[489]6.1444,[490]6.1421,[491]6.1394,[492]6.1339,[493]6.1310,[494]6.1291,[495]6.1287,[496]6.1251,[497]6.1196,[498]6.1181,[499]6.1137,[500]6.1044,[501]6.0980,[502]6.0981,[503]6.0974,[504]6.0887,[505]6.0904,[506]6.0914,[507]6.0861,[508]6.0822,[509]6.0816,[510]6.0849,[511]6.0896,[512]6.0930,[513]6.0952,[514]6.1015,[515]6.0960,[516]6.0952,[517]6.0961,[518]6.0956,[519]6.0985,[520]6.1008,[521]6.1021,[522]6.1049,[523]6.1056,[524]6.1113,[525]6.1145,[526]6.1155,[527]6.1172,[528]6.1121,[529]6.1130,[530]6.1078,[531]6.1064,[532]6.1111,[533]6.1135,[534]6.1117,[535]6.1137,[536]6.1085,[537]6.1062,[538]6.1114,[539]6.1124,[540]6.1159,[541]6.1162,[542]6.1173,[543]6.1188,[544]6.1197,[545]6.1178,[546]6.1188,[547]6.1148,[548]6.1097,[549]6.1099,[550]6.1069,[551]6.1035,[552]6.1013,[553]6.0975,[554]6.0953,[555]6.0921,[556]6.0914,[557]6.0939,[558]6.0902,[559]6.0900,[560]6.0898,[561]6.0902,[562]6.0880,[563]6.0878,[564]6.0921,[565]6.0942,[566]6.0942,[567]6.0920,[568]6.0928,[569]6.0913,[570]6.0941,[571]6.0942,[572]6.0950,[573]6.0947,[574]6.0912,[575]6.0907,[576]6.0906,[577]6.0890,[578]6.0871,[579]6.0876,[580]6.0811,[581]6.0773,[582]6.0764,[583]6.0772,[584]6.0774,[585]6.0699,[586]6.0630,[587]6.0637,[588]6.0683,[589]6.0738,[590]6.0765,[591]6.0788,[592]6.0776,[593]6.0746,[594]6.0755,[595]6.0731,[596]6.0765,[597]6.0743,[598]6.0713,[599]6.0735,[600]6.0727,[601]6.0713,[602]6.0727,[603]6.0754,[604]6.0762,[605]6.0798,[606]6.0821,[607]6.0805,[608]6.0773,[609]6.0782,[610]6.0817,[611]6.0800,[612]6.0825,[613]6.0788,[614]6.0740,[615]6.0667,[616]6.0693,[617]6.0633,[618]6.0586,[619]6.0530,[620]6.0394,[621]6.0326,[622]6.0310,[623]6.0324,[624]6.0328,[625]6.0327,[626]6.0315,[627]6.0339,[628]6.0341,[629]6.0340,[630]6.0372,[631]6.0428,[632]6.0487,[633]6.0472,[634]6.0506,[635]6.0513,[636]6.0478,[637]6.0443,[638]6.0469,[639]6.0438,[640]6.0447,[641]6.0449,[642]6.0515,[643]6.0537,[644]6.0548,[645]6.0529,[646]6.0570,[647]6.0529,[648]6.0539,[649]6.0543,[650]6.0581,[651]6.0635,[652]6.0646,[653]6.0685,[654]6.0622,[655]6.0617,

llama_print_timings: load time = 9033.50 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 3328253.06 ms / 335360 tokens ( 9.92 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3366921.57 ms

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* Adding fused_norm - same idea as fused_rms_norm

* Avoid computing the attention reduce op for cohere2

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants