Skip to content

Add Pad Reflect 1D CUDA support#14659

Merged
CISC merged 2 commits intoggml-org:masterfrom
YavorGIvanov:feature/pad-reflect-cuda-support
Aug 22, 2025
Merged

Add Pad Reflect 1D CUDA support#14659
CISC merged 2 commits intoggml-org:masterfrom
YavorGIvanov:feature/pad-reflect-cuda-support

Conversation

@YavorGIvanov
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 13, 2025
@am17an am17an requested a review from JohannesGaessler July 13, 2025 14:49
Copy link
Copy Markdown
Contributor

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me whether you want to address the comment regarding the loop in this PR.

Comment thread ggml/src/ggml-cuda/pad_reflect_1d.cu Outdated
const char * src0_ptr = (const char *)src0 + i3*nb03 + i2*nb02 + i1*nb01;
char * dst_ptr = (char *)dst + i3*nb3 + i2*nb2 + i1*nb1;

for (int64_t i0 = threadIdx.x; i0 < ne0; i0 += blockDim.x) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to produce correct results but generally speaking you will get much better performance if each thread just works on a single value instead of looping over ne0. However, it would also be fine to just merge it as-is and maybe change this later if it ever becomes relevant for end-to-end performance.

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
@CISC
Copy link
Copy Markdown
Member

CISC commented Jul 31, 2025

@YavorGIvanov Shall we merge as-is or are you looking into the review comment by @JohannesGaessler?

@CISC
Copy link
Copy Markdown
Member

CISC commented Aug 22, 2025

@YavorGIvanov ping

@YavorGIvanov
Copy link
Copy Markdown
Contributor Author

Let's merge as is. @CISC

@CISC CISC merged commit b1ab918 into ggml-org:master Aug 22, 2025
47 checks passed
@YavorGIvanov YavorGIvanov deleted the feature/pad-reflect-cuda-support branch August 22, 2025 11:47
qnixsynapse pushed a commit to janhq/llama.cpp that referenced this pull request Aug 25, 2025
* Add Pad Reflect 1D CUDA support

* Update ggml/src/ggml-cuda/pad_reflect_1d.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* Add Pad Reflect 1D CUDA support

* Update ggml/src/ggml-cuda/pad_reflect_1d.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* Add Pad Reflect 1D CUDA support

* Update ggml/src/ggml-cuda/pad_reflect_1d.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants