Skip to content

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x#19399

Merged
taronaeo merged 2 commits intoggml-org:masterfrom
taronaeo:feat/s390x-bf16
Feb 15, 2026
Merged

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x#19399
taronaeo merged 2 commits intoggml-org:masterfrom
taronaeo:feat/s390x-bf16

Conversation

@taronaeo
Copy link
Copy Markdown
Member

@taronaeo taronaeo commented Feb 6, 2026

Similar to #18837, this pull request integrates the SIMD instruction set for BF16 on the s390x platform. We notice a 154.86% performance improvement for Prompt Processing. No performance difference was noticed for Token Generation.

Before SIMD Benchmark

model size params backend threads mmap test t/s
granite 3B BF16 4.72 GiB 2.53 B CPU 1 1 pp512 0.29 ± 0.00
granite 3B BF16 4.72 GiB 2.53 B CPU 1 1 tg128 0.03 ± 0.00

build: 423cf0b (8029)

After SIMD Benchmark

model size params backend threads mmap test t/s
granite 3B BF16 4.72 GiB 2.53 B CPU 1 1 pp512 2.28 ± 0.20
granite 3B BF16 4.72 GiB 2.53 B CPU 1 1 tg128 0.03 ± 0.00

build: b4bc24e (8004)

Verification

This PR was tested against the IBM Granite 3.3 2B Instruct BF16 model.

test-quantize-fns

$ build/bin/test-quantize-fns

Testing f32
Testing f16
Testing q4_0
Testing q4_1
Testing q5_0
Testing q5_1
Testing q8_0
Testing q8_1
Testing q2_K
Testing q3_K
Testing q4_K
Testing q5_K
Testing q6_K
Testing q8_K
Testing iq2_xxs
Testing iq2_xs
Testing iq3_xxs
Testing iq1_s
Testing iq4_nl
Testing iq3_s
Testing iq2_s
Testing iq4_xs
Testing i8
Testing i16
Testing i32
Testing i64
Testing f64
Testing iq1_m
Testing bf16
Testing tq1_0
Testing tq2_0
Testing mxfp4

@taronaeo taronaeo requested a review from ggerganov as a code owner February 6, 2026 17:59
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 6, 2026
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
@taronaeo
Copy link
Copy Markdown
Member Author

@CISC I'm not sure who to ping to review this... Any chance you could review it? :)

Copy link
Copy Markdown
Member

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't verify this, but it looks good in general and CI passes.

@taronaeo taronaeo merged commit 184c694 into ggml-org:master Feb 15, 2026
77 of 78 checks passed
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants