Skip to content

Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers#45

Merged
LostRuins merged 2 commits intoLostRuins:concedofrom
0cc4m:clblast-1
Apr 13, 2023
Merged

Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers#45
LostRuins merged 2 commits intoLostRuins:concedofrom
0cc4m:clblast-1

Conversation

@0cc4m
Copy link
Copy Markdown

@0cc4m 0cc4m commented Apr 11, 2023

Didn't manage to set up caching yet. I improved some steps, but the difference wasn't big.

@LostRuins
Copy link
Copy Markdown
Owner

LGTM
keep on keepin on

@0cc4m 0cc4m marked this pull request as ready for review April 12, 2023 21:13
@0cc4m
Copy link
Copy Markdown
Author

0cc4m commented Apr 12, 2023

Keeping the buffers open didn't improve performance at all, so I scrapped that. Buffering matrices in advance is difficult because they are stored in a quantized format and would either have to be stored unquantized on the GPU, which needs a lot of VRAM, or dequantized on the GPU, which would need a new kernel and quite a bit of work.

So this PR just does some code improvements and platform + device printing, but performance stays the same.

@LostRuins LostRuins merged commit 2ff91b5 into LostRuins:concedo Apr 13, 2023
@LostRuins
Copy link
Copy Markdown
Owner

LGTM, merged

@0cc4m 0cc4m deleted the clblast-1 branch April 15, 2023 17:55
Foxy6670 pushed a commit to Foxy6670/koboldcpp that referenced this pull request Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants