ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free by taronaeo · Pull Request #15839 · ggml-org/llama.cpp

taronaeo · 2025-09-06T14:15:52Z

Not sure if my .supports_buft is implemented inaccurately but the weights tensor are not going through the .set_tensor function, and thus, will have to re-initialise the weight zTensors on-the-fly during matmul. Not ideal though.

Activates the following data types:

FP16
BF16

Fixes:

LLAMA_SET_ROWS=1 causing the inference to be incorrect (see: Eval bug: zDNN backend not inferencing correctly after LLAMA_SET_ROWS enablement #15414)
zTensor not freeing correctly and would exhaust all available memory when llama-bench was used with more than 1 model
Moved bias zTensor to .init_tensor for performance improvements

Performance

model	size	params	threads	test	t/s master	t/s PR	speedup
granite 3B all F32	9.44 GiB	2.53 B	1	pp512	52.14	51.92	1.00
granite 3B all F32	9.44 GiB	2.53 B	1	tg128	3.92	3.86	0.98
granite 3B all F32	9.44 GiB	2.53 B	2	pp512	92.60	81.92	0.88
granite 3B all F32	9.44 GiB	2.53 B	2	tg128	4.44	4.48	1.01
granite 3B all F32	9.44 GiB	2.53 B	4	pp512	141.14	144.85	1.03
granite 3B all F32	9.44 GiB	2.53 B	4	tg128	4.83	4.86	1.01
granite 3B all F32	9.44 GiB	2.53 B	8	pp512	216.55	215.82	1.00
granite 3B all F32	9.44 GiB	2.53 B	8	tg128	4.97	4.95	1.00
granite 3B F16	4.72 GiB	2.53 B	1	pp512	10.42	51.68	4.96
granite 3B F16	4.72 GiB	2.53 B	1	tg128	0.45	3.43	7.62
granite 3B F16	4.72 GiB	2.53 B	2	pp512	19.61	81.78	4.17
granite 3B F16	4.72 GiB	2.53 B	2	tg128	0.89	4.17	4.69
granite 3B F16	4.72 GiB	2.53 B	4	pp512	38.99	138.58	3.55
granite 3B F16	4.72 GiB	2.53 B	4	tg128	1.73	4.67	2.70
granite 3B F16	4.72 GiB	2.53 B	8	pp512	74.60	213.83	2.87
granite 3B F16	4.72 GiB	2.53 B	8	tg128	3.17	4.9	1.55
granite 3B BF16	4.72 GiB	2.53 B	1	pp512	11.30	51.6	4.57
granite 3B BF16	4.72 GiB	2.53 B	1	tg128	0.31	3.08	9.94
granite 3B BF16	4.72 GiB	2.53 B	2	pp512	21.40	82.45	3.85
granite 3B BF16	4.72 GiB	2.53 B	2	tg128	0.61	3.88	6.36
granite 3B BF16	4.72 GiB	2.53 B	4	pp512	42.28	142.97	3.38
granite 3B BF16	4.72 GiB	2.53 B	4	tg128	1.22	4.41	3.61
granite 3B BF16	4.72 GiB	2.53 B	8	pp512	80.90	213.85	2.64
granite 3B BF16	4.72 GiB	2.53 B	8	tg128	2.40	4.79	2.00

Note

Tests were conducted on an IBM z17 Mainframe with 40 IFLs (cores) and 128 GB Memory on a shared R&D LPAR.

`test-backend-ops`

build/bin/test-backend-ops -b zDNN | grep -v "not supported"
ggml_zdnn_init: allocating
ggml_zdnn_init: found 1 device
ggml_zdnn_init: picking default device: zDNN
ggml_zdnn_init: NNPA name: zDNN
ggml_zdnn_init: NNPA_PARMBLKFORMAT_0 = true
ggml_zdnn_init: NNPA_PARMBLKFORMAT_1 = true
Testing 3 devices

Backend 1/3: zDNN
  Device description: IBM Z Neural Network Processing Assist (NNPA)
  Device memory: 0 MB (0 MB free)

  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=1,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
ggml_zdnn_free: deallocating
  12353/12353 tests passed
  Backend zDNN: OK
Backend 2/3: BLAS
  Skipping
Backend 3/3: CPU
  Skipping
3/3 backends passed
OK

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

…al if guard" This reverts commit 6e780a4. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

This reverts commit 0da4b6a. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

slaren · 2025-09-08T11:50:34Z

Not sure if my .supports_buft is implemented inaccurately but the weights tensor are not going through the .set_tensor function, and thus, will have to re-initialise the weight zTensors on-the-fly during matmul. Not ideal though.

Are you sure that you are looking at a weight? It might be part of the attention computation.

taronaeo · 2025-09-09T18:04:46Z

Sorry I missed this. Yep I can confirm that I am looking at a weight tensor, unless my debugging code is wrong.

Debug Patch

diff --git a/ggml/src/ggml-zdnn/ggml-zdnn.cpp b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
index 7947aab87..bd04beb2d 100644
--- a/ggml/src/ggml-zdnn/ggml-zdnn.cpp
+++ b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
@@ -130,7 +130,11 @@ static void ggml_zdnn_mul_mat_op(ggml_backend_zdnn_context * ctx, const ggml_ten
     // TODO: Weights are somehow not going through `ggml_backend_zdnn_buffer_set_tensor` during model loading.
     //       So we need to load the weights here. Remove this when the issue is fixed.
     //       Problem might be residing in `ggml_backend_zdnn_device_supports_buft`.
-    if (weights_extra->ztensor.is_transformed == false) ggml_zdnn_load_tensor(weights_extra->ztensor, weights->data);
+    if (weights_extra->ztensor.is_transformed == false) {
+       GGML_LOG_INFO("%s: tensor->name = %s | tensor->buffer->usage = %d\n", __func__, weights->name, weights->buffer->usage);
+       ggml_zdnn_load_tensor(weights_extra->ztensor, weights->data);
+       std::raise(SIGINT);
+    }
 
     // GGML_LOG_INFO("%s: tensor '%s' tensor dimensions: [%ld, %ld, %ld, %ld] pre_tfm_desc dimensions: [%ld, %ld, %ld, %ld]\n",
     //               __func__, weights_extra->name,

And as logged, the buffer usage is 1, which equates to GGML_BACKEND_BUFFER_USAGE_WEIGHTS.

$ gdb --args build/bin/llama-cli -m hf_models/granite-3.3-2b-instruct-be.F32.gguf -t 8 -n 25 -p "Write me a dog walking business idea 1. " -no-cnv -ngl -1 --seed 1568795874

ggml_zdnn_mul_mat_op: tensor->name = blk.0.attn_q.weight | tensor->buffer->usage = 1

Thread 1 "llama-cli" received signal SIGINT, Interrupt.
0x000003fff6b98c26 in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-168.el9_6.23.s390x

taronaeo · 2025-09-09T18:22:40Z

I did some digging as well and found out that setting .buffer_from_host_ptr = false allows the weight tensors to go through .set_tensor whereas before, only the compute tensors were going through.

`.buffer_from_host_ptr = false`

diff --git a/ggml/src/ggml-zdnn/ggml-zdnn.cpp b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
index 7947aab87..d6d1d06c8 100644
--- a/ggml/src/ggml-zdnn/ggml-zdnn.cpp
+++ b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
@@ -432,9 +432,14 @@ static void ggml_backend_zdnn_buffer_set_tensor(ggml_backend_buffer_t buffer, gg
     memcpy((char *)tensor->data + offset, data, size);
 
     ggml_backend_zdnn_buffer * extra = (ggml_backend_zdnn_buffer *)tensor->extra;
+    GGML_LOG_INFO("%s: tensor->name = %s | tensor->buffer->usage = %d | tensor->extra->ztensor.is_transformed = %d\n", __func__, tensor->name, tensor->buffer->usage, extra->ztensor.is_transformed);
+
     if (extra->ztensor.is_transformed) zdnn_reset_ztensor(&extra->ztensor);
     ggml_zdnn_load_tensor(extra->ztensor, tensor->data);
 
+    GGML_LOG_INFO("%s: tensor->name = %s | tensor->buffer->usage = %d | tensor->extra->ztensor.is_transformed = %d\n", __func__, tensor->name, tensor->buffer->usage, extra->ztensor.is_transformed);
+    std::raise(SIGINT);
+
     GGML_UNUSED(buffer);
 }
 
@@ -647,7 +652,7 @@ static void ggml_backend_zdnn_device_get_props(ggml_backend_dev_t dev, ggml_back
     props->caps = (ggml_backend_dev_caps) {
         /* .async                = */ false,
         /* .host_buffer          = */ false,
-        /* .buffer_from_host_ptr = */ true,
+        /* .buffer_from_host_ptr = */ false,
         /* .events               = */ false
     };
 }

First tensor to call .set_tensor

ggml_backend_zdnn_buffer_set_tensor: tensor->name = blk.0.attn_q.weight | tensor->buffer->usage = 1 | tensor->extra->ztensor.is_transformed = 0
ggml_backend_zdnn_buffer_set_tensor: tensor->name = blk.0.attn_q.weight | tensor->buffer->usage = 1 | tensor->extra->ztensor.is_transformed = 1

`.buffer_from_host_ptr = true` (Current PR)

First tensor to call .set_tensor

ggml_backend_zdnn_buffer_set_tensor: tensor->name = zDNN#attn_norm-0#0 | tensor->buffer->usage = 2 | tensor->extra->ztensor.is_transformed = 0
ggml_backend_zdnn_buffer_set_tensor: tensor->name = zDNN#attn_norm-0#0 | tensor->buffer->usage = 2 | tensor->extra->ztensor.is_transformed = 1

Do let me know if this is weird. I intend on fixing the weight tensor problem in another PR, while this PR is mainly to fix the issues that have been preventing zDNN from inferencing correctly using the latest upstream code.

slaren · 2025-09-09T18:28:08Z

That's expected, of course you cannot enable user mapped buffers if you need to modify the tensor data.

taronaeo · 2025-09-09T18:37:08Z

Got it. Will create a separate PR by tomorrow to fix it. Do let me know if I need to make any changes to this PR

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

…ect zTensor free (#15839)

…d incorrect zTensor free (ggml-org#15839)

taronaeo added 13 commits September 6, 2025 17:56

ggml-zdnn: first rev to fix ggml-org#15414

47509d4

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: trying to fix set_tensors without needing additional if guard

6e780a4

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Revert "ggml-zdnn: trying to fix set_tensors without needing addition…

bf285e0

…al if guard" This reverts commit 6e780a4. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: clean up set_tensor

f4ec752

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: remove old code

e0bae5d

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: attempt at init bias in init_tensor

7de719a

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: add assert

81e2025

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: attempt at fixing sigsegv

5c31f9b

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: attempt at fixing double free

1a6d62b

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: fix incorrect ztensor free

8279a1c

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: add comments to look back

0a08b1d

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: activate fp16 and bf16

9ed5947

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: clean up matmul codepath

99311bc

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning IBM zDNN issues specific to IBM zDNN Accelerator labels Sep 6, 2025

taronaeo added 3 commits September 7, 2025 02:00

ggml-zdnn: fix compiler warnings

d5b32ff

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: fix more compiler warnings

53b2ad9

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: rm origtensor from .get_tensor

4f6be46

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

taronaeo requested a review from slaren September 6, 2025 18:25

taronaeo added 3 commits September 7, 2025 17:28

ggml-cpu: clean up s390x simd

0da4b6a

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Revert "ggml-cpu: clean up s390x simd"

5542933

This reverts commit 0da4b6a. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

docs: update ops and build for zDNN

f231a9d

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

github-actions Bot added the documentation Improvements or additions to documentation label Sep 7, 2025

slaren reviewed Sep 9, 2025

View reviewed changes

Comment thread ggml/src/ggml-zdnn/ggml-zdnn.cpp

ggml-zdnn: rm ggml_backend_zdnn_init from header files

326ce81

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

taronaeo requested a review from slaren September 11, 2025 15:24

slaren approved these changes Sep 12, 2025

View reviewed changes

taronaeo merged commit 40be511 into ggml-org:master Sep 12, 2025
49 checks passed

taronaeo mentioned this pull request Sep 13, 2025

ggml-zdnn: rm user mapped buffers #15965

Merged

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorr…

09e00e0

…ect zTensor free (#15839)

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

ggml-zdnn: fix ggml-org#15414, activate FP16 and BF16 acceleration an…

4482c36

…d incorrect zTensor free (ggml-org#15839)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free#15839

ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free#15839
taronaeo merged 20 commits intoggml-org:masterfrom
taronaeo:fix/15414-llama-set-rows

taronaeo commented Sep 6, 2025 •

edited

Loading

Uh oh!

slaren commented Sep 8, 2025

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

slaren commented Sep 9, 2025 •

edited

Loading

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

taronaeo commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

test-backend-ops

Uh oh!

slaren commented Sep 8, 2025

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

taronaeo commented Sep 9, 2025

.buffer_from_host_ptr = false

.buffer_from_host_ptr = true (Current PR)

Uh oh!

slaren commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

taronaeo commented Sep 6, 2025 •

edited

Loading

`test-backend-ops`

`.buffer_from_host_ptr = false`

`.buffer_from_host_ptr = true` (Current PR)

slaren commented Sep 9, 2025 •

edited

Loading