# Feature Description Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do as an enhancement. # Motivation It sounds like it's a fast/useful quantisation method: - https://towardsdatascience.com/exllamav2-the-fastest-library-to-run-llms-32aeda294d26 - https://github.com/mlabonne/llm-course/blob/main/Quantize_models_with_ExLlamaV2.ipynb - https://towardsdatascience.com/4-bit-quantization-with-gptq-36b0f4f02c34 - https://huggingface.co/blog/gptq-integration - https://oobabooga.github.io/blog/posts/gptq-awq-exl2-llamacpp/ - > A detailed comparison between GPTQ, AWQ, EXL2, q4_K_M, q4_K_S, and load_in_4bit: perplexity, VRAM, speed, model size, and loading time. # Possible Implementation N/A
Feature Description
Please provide a detailed written description of what you were trying to do, and what you expected
llama.cppto do as an enhancement.Motivation
It sounds like it's a fast/useful quantisation method:
Possible Implementation
N/A