Q4 cleanup#1061
Conversation
|
I think that everything inside this
I just realized that CI wouldn't fail in any case because |
|
You are right. Parts of this clever code may be useful for other quantization types, but that's what the git history is for. We might want to add |
dfyz
left a comment
There was a problem hiding this comment.
I don't know if I can/should approve this, but this PR looks pretty uncontroversial to me.
* Q4 cleanup * Remove unused AVX512 Q4_0 code
* Q4 cleanup * Remove unused AVX512 Q4_0 code
* This works and TG is descent, but PP is low * Better * Apply f_logit_scale before mul mat with output tensor * This is better for PP: 600 t/s -> 700 t/s * To not lose this again * WIP * Equal split * WIP --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Cleanup following #951 and #1046:
ggml_vec_dot_q4_0ggml_is_quantizedfor the work buffer calculation