Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
-
Updated
Apr 13, 2026 - Python
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation
Open quantization tooling for TurboQuant-style low-bit LLM releases, stock GGUF deployment, and Apple Silicon runtime experiments.
Enable expert-level, multi-step diagnostic reasoning in Claude Code with an easy-to-use skill for clear and explainable AI diagnosis.
A more deep research about TurboQuant algorithms
Shift-based post-training quantization analysis for LLMs (ShiftQuant paper)
Let me make GGUF files quickly
LLM quantization project built around `llama.cpp` + `Ollama` + `GGUF`
PentaNet extends BitNet's ternary quantization to pentanary {-2,-1,0,+1,+2}, improving perplexity by 6.4% at 124M params while preserving zero-multiplier arithmetic.
Develop an on-device AI system that processes and analyzes complaints using lightweight, fine-tuned LLMs optimized for industrial field use.
Implementation of advanced Natural Language Processing architectures and optimization techniques, built from scratch. The projects focus on understanding the internal mechanics of Transformers, LLM efficiency through quantization, and scaling via Mixture-of-Experts (MoE).
Add a description, image, and links to the llm-quantization topic page so that developers can more easily learn about it.
To associate your repository with the llm-quantization topic, visit your repo's landing page and select "manage topics."