Skip to content

vulkan: add get/set tensor 2d functions#22514

Merged
0cc4m merged 3 commits intomasterfrom
0cc4m/vulkan-tensor-get-set-2d
Apr 30, 2026
Merged

vulkan: add get/set tensor 2d functions#22514
0cc4m merged 3 commits intomasterfrom
0cc4m/vulkan-tensor-get-set-2d

Conversation

@0cc4m
Copy link
Copy Markdown
Contributor

@0cc4m 0cc4m commented Apr 29, 2026

Overview

Implement the 2d tensor copy functions that were added for TP support to the Vulkan backend. This shouldn't make a performance difference, but it was not much work since the 2d functions basically already existed.

I also noticed that the interface comments for the functions were universally wrong, so I corrected them, too. Sorry about the pings that causes.

Requirements

@0cc4m 0cc4m requested review from a team, JohannesGaessler and ggerganov as code owners April 29, 2026 11:07
@0cc4m 0cc4m removed request for a team and ggerganov April 29, 2026 11:07
Comment thread ggml/src/ggml-metal/ggml-metal.cpp Outdated
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Copy link
Copy Markdown
Member

@taronaeo taronaeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack for IBM zDNN :)

} else {
slices.resize(height);
for (size_t i = 0; i < height; i++) {
slices[i].srcOffset = i * width;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure I understand - here the src data is tightly packed, and when we do the memcpy that uses the spitch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the deferred_memcpy packs the source data into a contiguous shape in the staging buffer, then the vk::BufferCopy slices copy from the packed shape in the staging buffer into the final dpitch shape. If dpitch == width this is just one copy, otherwise it copies each slice. Does that help?

ggml_vk_buffer_write(buf, vk_tensor_offset(tensor) + tensor->view_offs + offset, data, size);
}

static void ggml_backend_vk_buffer_set_tensor_2d(ggml_backend_buffer_t buffer, ggml_tensor * tensor, const void * data, size_t offset,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are tensors passed into these commands always contiguous?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but it is up to the caller to make sure the data ends up where the tensor expects it. At least that's how I understood it. Is that correct @JohannesGaessler?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ggml backend API for getting or setting tensor data in one dimension does not consider the tensor data layout, the user simply specifies an offset and a size in bytes and the data is copied into that memory region. The 2D functions I added follow this same pattern: they simply do a 2D copy using the tensor's memory and it is the responsibility of the user calling the function to correctly consider the tensor's memory layout. So no, these functions cannot expect to always receive contiguous tensors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm reading this as: the tensors may not be contiguous, but that's ok and we're supposed to ignore the layout for this op. Then the change looks fine to me.

static void ggml_backend_vk_set_tensor_async(ggml_backend_t backend, ggml_tensor * tensor, const void * data, size_t offset, size_t size) {
VK_LOG_DEBUG("ggml_backend_vk_set_tensor_async(" << size << ")");
static void ggml_backend_vk_set_tensor_2d_async(ggml_backend_t backend, ggml_tensor * tensor, const void * data, size_t offset,
size_t size, size_t n_copies, size_t stride_tensor, size_t stride_data) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 'size' here be renamed width to be consistent with the others?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just used the same nomenclature as the CUDA backend. Internally we use width and height, while the interface uses size and n_copies.

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend labels Apr 29, 2026
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend IBM zDNN issues specific to IBM zDNN Accelerator Hexagon AMD ZenDNN Issues related to the AMD ZenDNN backend WebGPU labels Apr 29, 2026
@0cc4m 0cc4m merged commit 660b1b4 into master Apr 30, 2026
67 of 68 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-tensor-get-set-2d branch April 30, 2026 15:37
tekintian added a commit to tekintian/llama.cpp that referenced this pull request May 1, 2026
* 'master' of github.com:tekintian/llama.cpp: (659 commits)
  ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_ID (ggml-org#22464)
  Update llama-mmap to use ftello/fseeko (ggml-org#22497)
  common : check for null getpwuid in hf-cache (ggml-org#22550)
  vulkan: add get/set tensor 2d functions (ggml-org#22514)
  spec: fix argument typo (ggml-org#22552)
  ci : bump ty to 0.0.33 (ggml-org#22535)
  vendor : update cpp-httplib to 0.43.2 (ggml-org#22548)
  CUDA: fix tile FA kernel on Pascal (ggml-org#22541)
  scripts : add wc2wt.sh - create worktree from current HEAD (ggml-org#22513)
  add fast matmul iquants (ggml-org#22504)
  spec : fix draft model checkpoints (ggml-org#22521)
  spec : fix vocab compat checks in spec example (ggml-org#22426)
  common : do not pass prompt tokens to reasoning budget sampler (ggml-org#22488)
  hexagon: make vmem and buffer-size configurable (ggml-org#22487)
  CUDA: fuse SSM_CONV + ADD(bias) + SILU (ggml-org#22478)
  spec : disacard last drafted token with low prob (ggml-org#22506)
  sync : ggml
  ggml : bump version to 0.10.1 (ggml/1469)
  webui: fix slow mic stop and WAV encode (ggml-org#22480)
  ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (ggml-org#22293)
  ...

# Conflicts:
#	.gitignore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AMD ZenDNN Issues related to the AMD ZenDNN backend Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning Hexagon IBM zDNN issues specific to IBM zDNN Accelerator Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Vulkan Issues specific to the Vulkan backend WebGPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants