Hi authors, thanks for sharing the excellent MASQuant work!
I have a quick question about the W4A4 setting in Table 7. I noticed that no W4A4 accuracy metrics (MMMU/OCR/VQA/WER) are reported in Table 1 and Table 2, while Table 7 only shows the inference speed/memory under W4A4.
Could you clarify whether the W4A4 configuration in Table 7 is only for inference efficiency evaluation and not fully quantized with reported accuracy? If W4A4 quantization is fully implemented, could you please provide the corresponding accuracy results?
Thanks a lot!
Hi authors, thanks for sharing the excellent MASQuant work!
I have a quick question about the W4A4 setting in Table 7. I noticed that no W4A4 accuracy metrics (MMMU/OCR/VQA/WER) are reported in Table 1 and Table 2, while Table 7 only shows the inference speed/memory under W4A4.
Could you clarify whether the W4A4 configuration in Table 7 is only for inference efficiency evaluation and not fully quantized with reported accuracy? If W4A4 quantization is fully implemented, could you please provide the corresponding accuracy results?
Thanks a lot!