Conversation
|
For AVX2/AVX/scalar, we might want to keep I'm actually surprised that they're worth using on ARM NEON, as the alternative is simply subtracting 8 from the Q4 quants. |
|
@sw there is no noticeable difference difference between the two. Still, changed to use |
|
I guess it's not finished? You're using |
|
Wow - this is difficult 😄 I keep messing up something |
|
Looks good now; I think it's very slightly slower for Q4_0 and Q4_2 because we're now missing the SIMD optimizations for |
|
Ok, will merge now and we can finish the AVX stuff from |
… NEON) (ggml-org#1179) * ggml : add Q8_0 quantization format (rename the old one to Q8_1) * tests : fix test-quantize-fns * ggml : finalize Q8_0 implementation * ggml : use q4_0_q8_0 and q4_2_q8_0 * ggml : fix Q8_0 dot product bug (ARM) * ggml : Q8_0 unroll x2 * ggml : fix bug - using wrong block type * ggml : extend quantize_fns_t with "vec_dot_type" * ggml : fix Q8_0 to use 255 values out of 256 * ggml : fix assert using wrong QK4_2 instead of QK4_3
… NEON) (ggml-org#1179) * ggml : add Q8_0 quantization format (rename the old one to Q8_1) * tests : fix test-quantize-fns * ggml : finalize Q8_0 implementation * ggml : use q4_0_q8_0 and q4_2_q8_0 * ggml : fix Q8_0 dot product bug (ARM) * ggml : Q8_0 unroll x2 * ggml : fix bug - using wrong block type * ggml : extend quantize_fns_t with "vec_dot_type" * ggml : fix Q8_0 to use 255 values out of 256 * ggml : fix assert using wrong QK4_2 instead of QK4_3
8-bit integer quantization support
Perplexity:
5.9563Details