Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Correct uploading of contiguous 3D tensor data to GPU.
Current Behavior
ggml_cl_h2d_tensor_2d uses offset argument as byte offset in a call to clEnqueueWriteBuffer. ggml_cl_transform_tensor passes element count as offset to ggml_cl_h2d_tensor_2d. This corresponds to byte offset only if element size is exactly 1.
Also, I don't understand why ggml_cl_mul_f32 passes non-zero offset to ggml_cl_h2d_tensor_2d.
Environment and Context
AMD GPU
Linux
Steps to Reproduce
- Pass 3D tensor with contiguous
GGML_TYPE_F16 or GGML_TYPE_F32 data to ggml_cl_transform_tensor.
- Read data back from GPU memory or perform
ggml_cl_mul_mat on that tensor.
- Observe incorrect data or result.
Ping
@0cc4m
@JohannesGaessler
@SlyEcho
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Correct uploading of contiguous 3D tensor data to GPU.
Current Behavior
ggml_cl_h2d_tensor_2dusesoffsetargument as byte offset in a call toclEnqueueWriteBuffer.ggml_cl_transform_tensorpasses element count asoffsettoggml_cl_h2d_tensor_2d. This corresponds to byte offset only if element size is exactly 1.Also, I don't understand why
ggml_cl_mul_f32passes non-zero offset toggml_cl_h2d_tensor_2d.Environment and Context
AMD GPU
Linux
Steps to Reproduce
GGML_TYPE_F16orGGML_TYPE_F32data toggml_cl_transform_tensor.ggml_cl_mul_maton that tensor.Ping
@0cc4m
@JohannesGaessler
@SlyEcho