CUDA: add set#14980
Conversation
|
Part of #14909 |
|
Hi, @JohannesGaessler Could you please review the changes when you have a chance? Thank you in advance! |
JohannesGaessler
left a comment
There was a problem hiding this comment.
One of my current goals is to consolidate and deduplicate the code around copying data in the CUDA backend. As such I think rather than adding new kernels here it would be better to re-use the existing code. If the operation is not inlace you can use cudaMemsetAsync to set dst with the contents of src0. Afterwards you can use ggml_cpy_flt_cuda in cpy.cu to do the copy. That kernel does not have an argument for the offset but it's not needed as you can simply apply the offset in host code.
|
I have already used
If I’m wrong, please correct me. Thanks for your help! |
|
Hi @am17an, thanks again for your previous review. Since @JohannesGaessler hasn’t had a chance to respond for a few weeks, would it be possible to ask another maintainer or contributor to review this as well? I’d really appreciate any additional feedback to help move this forward. |
|
Sorry, I forgot about this PR. The code in
Set both types to float.
Use the same shape twice. |
Make sure to read the contributing guidelines before submitting a PR