Skip to content

Fix hordeconfig max context setting, and add Makefile flags for cuda F16/KQuants per iter.#252

Merged
LostRuins merged 2 commits intoLostRuins:concedofrom
ycros:ycros
Jun 21, 2023
Merged

Fix hordeconfig max context setting, and add Makefile flags for cuda F16/KQuants per iter.#252
LostRuins merged 2 commits intoLostRuins:concedofrom
ycros:ycros

Conversation

@ycros
Copy link

@ycros ycros commented Jun 21, 2023

Makefile flags are copied right from llama.cpp's Makefile.

@ycros ycros changed the title Fix hordeconfig max context setting, and add Makefile for cuda F16/KQuants per iter. Fix hordeconfig max context setting, and add Makefile flags for cuda F16/KQuants per iter. Jun 21, 2023
Copy link
Owner

@LostRuins LostRuins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@LostRuins LostRuins merged commit b1f00fa into LostRuins:concedo Jun 21, 2023
@dragonfyre13
Copy link

Sometimes the speed of this makes my head spin. Saw this issue last night, figured I'd start to work on when I got a free minute today, turns out ycros already did the work by the time I got there. Dropped a comment in the upstream issue (ggml-org#1862 (comment)_) as well. Might be worth documenting on the koboldcpp page regarding compile flags, given LLAMA_CUDA_KQUANTS_ITER=1 has a pretty drastic impact on those of us running the pascal architecture (almost exactly 2x slower inference for 2k and 6k quant types).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants