vulkan: add get/set tensor 2d functions#22514
Conversation
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
| } else { | ||
| slices.resize(height); | ||
| for (size_t i = 0; i < height; i++) { | ||
| slices[i].srcOffset = i * width; |
There was a problem hiding this comment.
I want to make sure I understand - here the src data is tightly packed, and when we do the memcpy that uses the spitch?
There was a problem hiding this comment.
Yes, the deferred_memcpy packs the source data into a contiguous shape in the staging buffer, then the vk::BufferCopy slices copy from the packed shape in the staging buffer into the final dpitch shape. If dpitch == width this is just one copy, otherwise it copies each slice. Does that help?
| ggml_vk_buffer_write(buf, vk_tensor_offset(tensor) + tensor->view_offs + offset, data, size); | ||
| } | ||
|
|
||
| static void ggml_backend_vk_buffer_set_tensor_2d(ggml_backend_buffer_t buffer, ggml_tensor * tensor, const void * data, size_t offset, |
There was a problem hiding this comment.
Are tensors passed into these commands always contiguous?
There was a problem hiding this comment.
I don't think so, but it is up to the caller to make sure the data ends up where the tensor expects it. At least that's how I understood it. Is that correct @JohannesGaessler?
There was a problem hiding this comment.
The ggml backend API for getting or setting tensor data in one dimension does not consider the tensor data layout, the user simply specifies an offset and a size in bytes and the data is copied into that memory region. The 2D functions I added follow this same pattern: they simply do a 2D copy using the tensor's memory and it is the responsibility of the user calling the function to correctly consider the tensor's memory layout. So no, these functions cannot expect to always receive contiguous tensors.
There was a problem hiding this comment.
OK, I'm reading this as: the tensors may not be contiguous, but that's ok and we're supposed to ignore the layout for this op. Then the change looks fine to me.
| static void ggml_backend_vk_set_tensor_async(ggml_backend_t backend, ggml_tensor * tensor, const void * data, size_t offset, size_t size) { | ||
| VK_LOG_DEBUG("ggml_backend_vk_set_tensor_async(" << size << ")"); | ||
| static void ggml_backend_vk_set_tensor_2d_async(ggml_backend_t backend, ggml_tensor * tensor, const void * data, size_t offset, | ||
| size_t size, size_t n_copies, size_t stride_tensor, size_t stride_data) { |
There was a problem hiding this comment.
Should 'size' here be renamed width to be consistent with the others?
There was a problem hiding this comment.
I just used the same nomenclature as the CUDA backend. Internally we use width and height, while the interface uses size and n_copies.
* 'master' of github.com:tekintian/llama.cpp: (659 commits) ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_ID (ggml-org#22464) Update llama-mmap to use ftello/fseeko (ggml-org#22497) common : check for null getpwuid in hf-cache (ggml-org#22550) vulkan: add get/set tensor 2d functions (ggml-org#22514) spec: fix argument typo (ggml-org#22552) ci : bump ty to 0.0.33 (ggml-org#22535) vendor : update cpp-httplib to 0.43.2 (ggml-org#22548) CUDA: fix tile FA kernel on Pascal (ggml-org#22541) scripts : add wc2wt.sh - create worktree from current HEAD (ggml-org#22513) add fast matmul iquants (ggml-org#22504) spec : fix draft model checkpoints (ggml-org#22521) spec : fix vocab compat checks in spec example (ggml-org#22426) common : do not pass prompt tokens to reasoning budget sampler (ggml-org#22488) hexagon: make vmem and buffer-size configurable (ggml-org#22487) CUDA: fuse SSM_CONV + ADD(bias) + SILU (ggml-org#22478) spec : disacard last drafted token with low prob (ggml-org#22506) sync : ggml ggml : bump version to 0.10.1 (ggml/1469) webui: fix slow mic stop and WAV encode (ggml-org#22480) ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (ggml-org#22293) ... # Conflicts: # .gitignore
Overview
Implement the 2d tensor copy functions that were added for TP support to the Vulkan backend. This shouldn't make a performance difference, but it was not much work since the 2d functions basically already existed.
I also noticed that the interface comments for the functions were universally wrong, so I corrected them, too. Sorry about the pings that causes.
Requirements