ggml-cpu: optimize ggml_vec_dot_bf16 for s390x by taronaeo · Pull Request #19399 · ggml-org/llama.cpp

taronaeo · 2026-02-06T17:59:13Z

Similar to #18837, this pull request integrates the SIMD instruction set for BF16 on the s390x platform. We notice a 154.86% performance improvement for Prompt Processing. No performance difference was noticed for Token Generation.

Before SIMD Benchmark

model	size	params	backend	threads	mmap	test	t/s
granite 3B BF16	4.72 GiB	2.53 B	CPU	1	1	pp512	0.29 ± 0.00
granite 3B BF16	4.72 GiB	2.53 B	CPU	1	1	tg128	0.03 ± 0.00

build: 423cf0b (8029)

After SIMD Benchmark

model	size	params	backend	threads	mmap	test	t/s
granite 3B BF16	4.72 GiB	2.53 B	CPU	1	1	pp512	2.28 ± 0.20
granite 3B BF16	4.72 GiB	2.53 B	CPU	1	1	tg128	0.03 ± 0.00

build: b4bc24e (8004)

Verification

This PR was tested against the IBM Granite 3.3 2B Instruct BF16 model.

test-quantize-fns

$ build/bin/test-quantize-fns

Testing f32
Testing f16
Testing q4_0
Testing q4_1
Testing q5_0
Testing q5_1
Testing q8_0
Testing q8_1
Testing q2_K
Testing q3_K
Testing q4_K
Testing q5_K
Testing q6_K
Testing q8_K
Testing iq2_xxs
Testing iq2_xs
Testing iq3_xxs
Testing iq1_s
Testing iq4_nl
Testing iq3_s
Testing iq2_s
Testing iq4_xs
Testing i8
Testing i16
Testing i32
Testing i64
Testing f64
Testing iq1_m
Testing bf16
Testing tq1_0
Testing tq2_0
Testing mxfp4

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

taronaeo · 2026-02-15T03:56:53Z

@CISC I'm not sure who to ping to review this... Any chance you could review it? :)

CISC

I can't verify this, but it looks good in general and CI passes.

taronaeo requested a review from ggerganov as a code owner February 6, 2026 17:59

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 6, 2026

taronaeo force-pushed the feat/s390x-bf16 branch from 00aef23 to c72407f Compare February 11, 2026 13:47

taronaeo added 2 commits February 11, 2026 21:47

ggml-cpu: impl bf16 vec for s390x

ec2b036

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-cpu: fix incorrect macro

b4bc24e

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

taronaeo force-pushed the feat/s390x-bf16 branch from c72407f to b4bc24e Compare February 11, 2026 13:47

CISC approved these changes Feb 15, 2026

View reviewed changes

taronaeo merged commit 184c694 into ggml-org:master Feb 15, 2026
77 of 78 checks passed

liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (ggml-org#19399)

c3bda54

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (ggml-org#19399)

57aa5ff

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (ggml-org#19399)

5ca832f

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (ggml-org#19399)

2668c67

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (ggml-org#19399)

fde0027

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x#19399

ggml-cpu: optimize ggml_vec_dot_bf16 for s390x#19399
taronaeo merged 2 commits intoggml-org:masterfrom
taronaeo:feat/s390x-bf16

taronaeo commented Feb 6, 2026 •

edited

Loading

Uh oh!

taronaeo commented Feb 15, 2026

Uh oh!

CISC left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

taronaeo commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before SIMD Benchmark

After SIMD Benchmark

Verification

Uh oh!

taronaeo commented Feb 15, 2026

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

taronaeo commented Feb 6, 2026 •

edited

Loading