Keywords: PTQ: post-training quantization | QAT: quantization-aware training | Extreme: binary or ternary quantization | Non-uniform: non-uniform quantization | MP: mixed-precision quantization |
- "A survey of low-bit large language models: Basics, systems, and algorithms", Neural Networks, 2025. [paper]
- "Binary Neural Networks: A Survey", Pattern Recognition, 2020. [paper] [
Extreme] - "A Survey of Quantization Methods for Efficient Neural Network Inference", Book Chapter: Low-Power Computer Vision, 2021. [paper]
- "Full Stack Optimization of Transformer Inference: a Survey", arXiv, 2023. [paper]
- "A White Paper on Neural Network Quantization", arXiv, 2021. [paper]
- "SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training", NeurIPS, 2025. [paper][code][
PTQ] [Non-uniform]
- "CBQ: Cross-Block Quantization for Large Language Models", ICLR, 2025. [paper][
PTQ] - "SpinQuant: LLM Quantization with Learned Rotations", ICLR, 2025. [paper] [code][
PTQ] - "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid", ICLR, 2025. [paper] [code]
- "SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning", ICLR, 2025. [paper][code][
QAT] - "OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting", ICLR, 2025. [paper][code][
PTQ] - "Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training", ICLR, 2025. [paper][
QAT]
- "Matryoshka Quantization", ICML, 2025. [paper][
QAT] [Non-uniform] - "AEQA-NAT : Adaptive End-to-end Quantization Alignment Training Framework for Non-autoregressive Machine Translation", ICML, 2025. [paper][
QAT] - "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance", ICML, 2025. [paper][code][
PTQ] - "Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models", ICML, 2025. [paper][
PTQ] - "Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation", ICML, 2025. [paper][code][
PTQ] - "SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression", ICML, 2025. [paper][code][
PTQ] - "SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization", ICML, 2025. [paper][
PTQ] - "BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference", ICML, 2025. [paper][code][
PTQ] - "GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models", ICML, 2025. [paper][code][
PTQ] [Non-uniform] - "SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models", ICML, 2025. [paper][code][
PTQ] [Non-uniform] - "SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization", ICML, 2025. [paper][code][
PTQ] [Non-uniform] - "KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference", ICML, 2025. [paper][code][
PTQ] - "RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models", ICML, 2025. [paper][code][
QAT] - "CommVQ: Commutative Vector Quantization for KV Cache Compression", ICML, 2025. [paper][code][
QAT] - "PARQ: Piecewise-Affine Regularized Quantization", ICML, 2025. [paper][
QAT] - "ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals", ICML, 2025. [paper][code][
PTQ] [Non-uniform] - "NestQuant: nested lattice quantization for matrix products and LLMs", ICML, 2025. [paper][
PTQ] - "MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance", ICML, 2025. [paper]
- "FlatQuant: Flatness Matters for LLM Quantization", ICML, 2025. [paper][code][
PTQ] - "Optimizing Large Language Model Training Using FP4 Quantization", ICML, 2025. [paper][code][
QAT] - "MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design", ICML, 2025. [paper][code][
PTQ] - "BoA: Attention-aware Post-training Quantization without Backpropagation", ICML, 2025. [paper][
PTQ]
- "Q-VLM: Post-training Quantization for Large Vision-Language Models", NeurIPS, 2024. [paper][code][
PTQ] - "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", NeurIPS, 2024. [paper][code][
PTQ] - "QBB: Quantization with Binary Bases for LLMs", NeurIPS, 2024. [paper][
Extreme] - "DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs", NeurIPS, 2024. [paper][code][
PTQ] - "ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification", NeurIPS, 2024. [paper][code][
PTQ] - "KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization", NeurIPS, 2024. [paper][
Extreme] - "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression", NeurIPS, 2024. [paper][
Extreme] [QAT] - "Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models", NeurIPS, 2024. [paper][
PTQ] - "QTIP: Quantization with Trellises and Incoherence Processing", NeurIPS, 2024. [paper][code][
PTQ] - "FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision", NeurIPS, 2024. [paper][code][
QAT]
- "Evaluating Quantized Large Language Models", ICML, 2024. [paper]
- "SqueezeLLM: Dense-and-Sparse Quantization", ICML, 2024. [paper] [
PTQ] [Non-uniform] - "KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache", ICML, 2024. [paper]
- "LQER: Low-Rank Quantization Error Reconstruction for LLMs", ICML, 2024. [paper]
- "Extreme Compression of Large Language Models via Additive Quantization", ICML, 2024. [paper]
- "BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization", ICML, 2024. [paper]
- "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs", ICML, 2024. [paper]
- "Compressing Large Language Models by Joint Sparsification and Quantization", ICML, 2024. [paper]
- "FrameQuant: Flexible Low-Bit Quantization for Transformers", ICML, 2024. [paper] [
PTQ]
- "OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models", ICLR, 2024. [paper]"
- "LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models", ICLR, 2024. [paper]
- "SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression", ICLR, 2024. [paper] [
PTQ] - "QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models", ICLR, 2024. [paper]
- "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models", ICLR, 2024. [paper] [
PTQ] - "PB-LLM: Partially Binarized Large Language Models", ICLR, 2024. [paper] [
Extreme] - "AffineQuant: Affine Transformation Quantization for Large Language Models", ICLR, 2024. [paper]
- "Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models", ICLR, 2024. [paper]
- "LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models", ICLR, 2024. [paper]
- "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models", AAAI, 2024. [paper]
- "Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models", AAAI, 2024. [paper]
- "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", AAAI, 2024. [paper]
- "Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation", AAAI, 2024. [paper] [
PTQ] - "What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation", AAAI, 2024. [paper]
- "EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs", arXiv, 2024. [paper]
- "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact", arXiv, 2024. [paper]
- "FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization", arXiv, 2024. [paper]
- "A Comprehensive Evaluation of Quantization Strategies for Large Language Models", arXiv, 2024. [paper]
- "GPTVQ: The Blessing of Dimensionality for LLM Quantization", arXiv, 2024. [paper]
- "APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models", arXiv, 2024. [paper]
- "EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge", arXiv, 2024. [paper]
- "RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization", arXiv, 2024. [paper]
- "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention", arXiv, 2024. [paper]
- "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", arXiv, 2023. [paper]
- "Extreme Compression of Large Language Models via Additive Quantization", arXiv, 2023. [paper]
- "ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks", arXiv, 2023. [paper] [
PTQ] - "FP8-BERT: Post-Training Quantization for Transformer", arXiv, 2023. [paper] [
PTQ] - "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", arXiv, 2023. [paper]
- "SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM", arXiv, 2023. [paper] [
PTQ] - "A Speed Odyssey for Deployable Quantization of LLMs", arXiv, 2023. [paper]
- "AFPQ: Asymmetric Floating Point Quantization for LLMs", arXiv, 2023. [paper]
- "Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization", arXiv, 2023. [paper]
- "QLoRA: Efficient Finetuning of Quantized LLMs", NeurIPS, 2023. [paper] [code]
- "QuIP: 2-Bit Quantization of Large Language Models With Guarantees", NeurIPS, 2023. [paper] [code] [
PTQ] - "Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization", NeurIPS, 2023. [paper]
- "QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources", arXiv, 2023. [paper]
- "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models", arXiv, 2023. [paper]
- "Atom: Low-bit Quantization for Efficient and Accurate LLM Serving", arXiv, 2023. [paper]
- "ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers", arXiv, 2023. [paper]
- "LLM-FP4: 4-Bit Floating-Point Quantized Transformers", arXiv, 2023. [paper]
- "TEQ: Trainable Equivalent Transformation for Quantization of LLMs", arXiv, 2023. [paper]
- "Efficient Post-training Quantization with FP8 Formats", arXiv, 2023. [paper]
- "Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization", arXiv, 2023. [paper]
- "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs", arXiv, 2023. [paper]
- "Norm Tweaking: High-performance Low-bit Quantization of Large Language Models", arXiv, 2023. [paper]
- "Understanding the Impact of Post-Training Quantization on Large Language Models", arXiv, 2023. [paper]
- "QuantEase: Optimization-based Quantization for Language Models -- An Efficient and Intuitive Algorithm", arXiv, 2023. [paper]
- "FPTQ: Fine-grained Post-Training Quantization for Large Language Models", arXiv, 2023. [paper]
- "FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs", arXiv, 2023. [paper] [
PTQ] - "Gradient-Based Post-Training Quantization: Challenging the Status Quo", arXiv, 2023. [paper] [
PTQ] - "NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search", arXiv, 2023. [paper] [
Non-uniform] - "ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats", arXiv, 2023. [paper]
- "Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models", arXiv, 2023. [paper]
- "Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study", arXiv, 2023. [paper]
- "INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation", arXiv, 2023. [paper]
- "QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models", arXiv, 2023. [paper] [code]
- "OWQ: Lessons learned from activation outliers for weight quantization in large language models", arXiv, 2023. [paper] [
PTQ] - "PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models", arXiv, 2023. [paper]
- "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration", arXiv, 2023. [paper] [
PTQ] - "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models", arXiv, 2023. [paper]
- "Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling", arXiv, 2023. [paper] [
PTQ] - "RPTQ: Reorder-based Post-training Quantization for Large Language Models", arXiv, 2023. [paper] [code] [
PTQ]
- "The case for 4-bit precision: k-bit Inference Scaling Laws", ICML, 2023. [paper]
- "Quantized Distributed Training of Large Models with Convergence Guarantees", ICML, 2023. [paper]
- "Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases", ICML, 2023. [paper]
- "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models", ICML, 2023. [paper] [code] [
PTQ] - "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers", ICLR, 2023. [papar] [code] [
PTQ] - "BiBERT: Accurate Fully Binarized BERT", ICLR, 2022. [paper] [code] [
Extreme] - "BiT: Robustly Binarized Multi-distilled Transformer", NeurIPS, 2022. [paper] [code] [
Extreme] - "Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models", NeurIPS, 2022. [paper] [code] [
PTQ] - "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale", NeurIPS, 2022. [paper] [code]
- "Towards Efficient Post-training Quantization of Pre-trained Language Models", NeurIPS, 2022. [paper] [
PTQ] - "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers", NeurIPS, 2022. [paper] [code] [
PTQ] - "Compression of Generative Pre-trained Language Models via Quantization", ACL, 2022. [paper]
- "I-BERT: Integer-only BERT Quantization", ICML, 2021. [paper] [code]
- "BinaryBERT: Pushing the Limit of BERT Quantization", ACL, 2021. [paper] [code] [
Extreme] - "On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers", ACL, 2021. [paper]
- "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP, 2021. [paper] [code]
- "KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization", arXiv, 2021. [paper]
- "TernaryBERT: Distillation-aware Ultra-low Bit BERT", EMNLP, 2020. [paper] [code] [
Extreme] - "Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation", EMNLP, 2020. [paper]
- "GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference", MICRO, 2020. [paper]
- "Towards Fully 8-bit Integer Inference for the Transformer Model", IJCAI, 2020. [paper]
- "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT", AAAI, 2020. [paper]
- "Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model", ICML, 2019. [paper]
- "Q8BERT: Quantized 8Bit BERT", EMC2 Workshop, 2019. [paper]
- "MBQ: Modality-Balanced Quantization for Large Vision-Language Models", CVPR, 2025. [paper][code][
PTQ] - "MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization", CVPR, 2025. [paper][code][
QAT] - "FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation", CVPR, 2025. [paper][code][
PTQ] - "APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers", CVPR, 2025. [paper][code][
PTQ]
- "LRA-QViT: Integrating Low-Rank Approximation and Quantization for Robust and Efficient Vision Transformers", ICML 2025. [paper][
QAT]
- "CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs", ECCV, 2024. [paper][
PTQ] - "AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer", ECCV, 2024. [paper][
PTQ] - "PQ-SAM: Post-training Quantization for Segment Anything Model", ECCV, 2024. [paper][
PTQ]
- "ERQ: Error Reduction for Post-Training Quantization of Vision Transformers", ICML, 2024. [paper][
PTQ] - "Outlier-aware Slicing for Post-Training Quantization in Vision Transformer", ICML, 2024. [paper][
PTQ]
- "PTQ4SAM: Post-Training Quantization for Segment Anything", CVPR, 2024. [paper] [
PTQ] - "Instance-Aware Group Quantization for Vision Transformers", CVPR, 2024. [paper] [
PTQ]
- "Bi-ViT: Pushing the Limit of Vision Transformer Quantization", AAAI, 2024. [paper] [
Extreme] - "AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries", AAAI, 2024. [paper]
- "LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation", arXiv, 2023. [paper][
PTQ] [MP] - "MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer", arXiv, 2023. [paper][
PTQ] [MP]
- "I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference", ICCV, 2023. [paper] [code][
Extreme] - "RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers", ICCV, 2023. [paper] [code][
PTQ] - "QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection", ICCV, 2023. [paper]
- "BiViT: Extremely Compressed Binary Vision Transformers", ICCV, 2023. [paper][
Extreme] - "Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers", ICCV, 2023. [paper]
- "PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile", NeurIPS, 2023. [paper]
- "Oscillation-free Quantization for Low-bit Vision Transformers", ICML, 2023. [paper] [code]
- "PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers", TNNLS, 2023. [paper]
- "Variation-aware Vision Transformer Quantization", arXiv, 2023. [paper]
- "NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers", CVPR, 2023. [paper][
PTQ] - "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization", CVPR, 2023. [paper]
- "Q-DETR: An Efficient Low-Bit Quantized Detection Transformer", CVPR, 2023. [paper]
- "Output Sensitivity-Aware DETR Quantization", 2023. [paper]
- "Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction", arXiv, 2023. [paper][
PTQ] - "Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer", NeurIPS, 2022. [paper] [code]
- "Patch Similarity Aware Data-Free Quantization for Vision Transformers", ECCV, 2022. [paper] [code][
PTQ] - "PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization", ECCV, 2022. [paper] [code] [
PTQ] - "FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer", IJCAI, 2022. [paper] [code][
PTQ] - "Q-ViT: Fully Differentiable Quantization for Vision Transformer", arXiv, 2022. [paper]
- "Post-Training Quantization for Vision Transformer", NeurIPS, 2021. [paper][
PTQ]
- "SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models", ICLR, 2025. [paper]
- "ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation", ICLR, 2025. [paper]
- "DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models", ICLR, 2025. [paper]
- "Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning", CVPR 2025. [paper][
PTQ] - "Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers", CVPR 2025. [paper][code][
PTQ] - "PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution", CVPR 2025. [paper][code][
PTQ]
- "Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers", ICML, 2025. [paper]['QAT']
- "Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models", ICML, 2025. [paper][
PTQ]
- "PTQ4DiT: Post-training Quantization for Diffusion Transformers", NeurIPS, 2024. [paper][
PTQ] - "BiDM: Pushing the Limit of Quantization for Diffusion Models", NeurIPS, 2024. [paper]
- "BitsFusion: 1.99 bits Weight Quantization of Diffusion Model", NeurIPS, 2024. [paper]
- "Timestep-Aware Correction for Quantized Diffusion Models", ECCV, 2024. [paper]
- "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", ECCV, 2024. [paper][
PTQ] - "Memory-Efficient Fine-Tuning for Quantized Diffusion Model", ECCV, 2024. [paper]
- "MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization", ECCV, 2024. [paper]
- "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models", CVPR, 2024. [paper][
PTQ] - "Towards Accurate Post-training Quantization for Diffusion Models", CVPR, 2024. [paper] [
PTQ]
- "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models", ICLR, 2024. [paper]
- "QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning", arXiv, 2024. [paper]
- "Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models", arXiv, 2023. [paper]
- "Efficient Quantization Strategies for Latent Diffusion Models", arXiv, 2023. [paper][
PTQ] - "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", arXiv, 2023. [paper]
- "Effective Quantization for Diffusion Models on CPUs", arXiv, 2023. [paper]
- "PTQD: Accurate Post-Training Quantization for Diffusion Models", NeurIPS, 2023. [paper][
PTQ] - "Q-DM: An Efficient Low-bit Quantized Diffusion Model", NeurIPS, 2023. [paper]
- "Temporal Dynamic Quantization for Diffusion Models", NeurIPS, 2023. [paper]
- "Q-diffusion: Quantizing Diffusion Models", ICCV, 2023. [paper] [code] [
PTQ] - "Towards Accurate Data-free Quantization for Diffusion Models", arXiv, 2023. [paper][
PTQ] - "Post-training Quantization on Diffusion Models", CVPR, 2023. [paper] [code][
PTQ]
- "MetaAug: Meta-Data Augmentation for Post-Training Quantization", ECCV, 2024. [paper]
- "Sharpness-Aware Data Generation for Zero-shot Quantization", ICML, 2024. [paper]
- "A2Q+: Improving Accumulator-Aware Weight Quantization", ICML, 2024. [paper]
- "HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks", IJCAI, 2024. [paper][
PTQ] - "Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning", CVPR, 2024. [paper][
MP] - "Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices", CVPR, 2024. [paper][
MP] - "Enhancing Post-training Quantization Calibration through Contrastive Learning", CVPR, 2024. [paper] [
PTQ] - "Data-Free Quantization via Pseudo-label Filtering", CVPR, 2024. [paper]
- "Make RepVGG Greater Again: A Quantization-Aware Approach", AAAI, 2024. [paper]
- "MetaMix: Meta-State Precision Searcher for Mixed-Precision Activation Quantization", AAAI, 2024. [paper][
MP] - "Robustness-Guided Image Synthesis for Data-Free Quantization", AAAI, 2024. [paper]
- "PTMQ: Post-training Multi-Bit Quantization of Neural Networks", AAAI, 2024. [paper][
PTQ] - "Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs", arXiv, 2023. [paper]
- "StableQ: Enhancing Data-Scarce Quantization with Text-to-Image Data", arXiv, 2023. [paper]
- "Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers", NeurIPS, 2023. [paper][
Extreme] - "TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration", NeurIPS, 2023. [paper]
- "Overcoming Forgetting Catastrophe in Quantization-Aware Training", ICCV, 2023. [paper]
- "Causal-DFQ: Causality Guided Data-Free Network Quantization", ICCV, 2023. [paper] [code]
- "DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization", ICCV, 2023. [paper]
- "EQ-Net: Elastic Quantization Neural Networks", ICCV, 2023. [paper] [code]
- "A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance", ICCV, 2023. [paper]
- "EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization", ICCV, 2023. [paper][
MP] - "Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning", ICCV, 2023. [paper][
PTQ] - "Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction", ICML, 2023. [paper] [code]
- "FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization", ICML, 2023. [paper][
PTQ] - "Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning", PR, 2023. [paper]
- "OMPQ: Orthogonal Mixed Precision Quantization", AAAI, 2023. [paper][
MP] - "Rethinking Data-Free Quantization as a Zero-Sum Game", AAAI, 2023. [paper]
- "Quantized Feature Distillation for Network Quantization", AAAI, 2023. [paper]
- "Resilient Binary Neural Network", AAAI, 2023. [paper][
Extreme] - "Fast and Accurate Binary Neural Networks Based on Depth-Width Reshaping", AAAI, 2023. [paper][
Extreme] - "Efficient Quantization-aware Training with Adaptive Coreset Selection", arXiv, 2023. [paper]
- "One-Shot Model for Mixed-Precision Quantization", CVPR, 2023. [paper][
MP] - "Adaptive Data-Free Quantization", CVPR, 2023. [paper]
- "Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization", CVPR, 2023. [paper][
PTQ] - "Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective", CVPR, 2023. [paper] [code][
PTQ] - "GENIE: Show Me the Data for Quantization", CVPR, 2023. [paper] [code][
PTQ] - "Bayesian asymmetric quantized neural networks", PR, 2023. [paper]
- "Distribution-sensitive Information Retention for Accurate Binary Neural Network", IJCV, 2023. [paper][
Extreme] - "SDQ: Stochastic Differentiable Quantization with Mixed Precision", ICML, 2022. [paper][
MP] - "Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks", ICML, 2022. [paper] [code]
- "GACT: Activation Compressed Training for Generic Network Architectures", ICML, 2022. [paper] [code]
- "Overcoming Oscillations in Quantization-Aware Training", ICML, 2022. [paper] [code]
- "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation", CVPR, 2022. [paper] [code][
Non-uniform] - "Learnable Lookup Table for Neural Network Quantization", CVPR, 2022. [paper] [code][
Non-uniform] - "Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error", CVPR, 2022. [paper][
PTQ] [Non-uniform] - "Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization", CVPR, 2022. [paper][
Non-uniform] [MP] - "IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization", CVPR, 2022. [paper] [code]
- "Instance-Aware Dynamic Neural Network Quantization", CVPR, 2022. [paper]
- "Leveraging Inter-Layer Dependency for Post-Training Quantization", NeurIPS, 2022. [paper][
PTQ] - "Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques", NeurIPS, 2022. [paper]
- "Entropy-Driven Mixed-Precision Quantization for Deep Network Design", NeurIPS, 2022. [paper][
MP] - "Redistribution of Weights and Activations for AdderNet Quantization", NeurIPS, 2022. [paper]
- "FP8 Quantization: The Power of the Exponent", NeurIPS, 2022. [paper] [code]
- "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning", NeurIPS, 2022. [paper] [code][
PTQ] - "ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences", NeurIPS, 2022. [paper]
- "Non-Uniform Step Size Quantization for Accurate Post-Training Quantization", ECCV, 2022. [paper][
PTQ] [Non-uniform] - "Towards Accurate Network Quantization with Equivalent Smooth Regularizer", ECCV, 2022. [paper]
- "BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks", ECCV, 2022. [paper] [code]
- "RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization", ECCV, 2022. [paper]
- "Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance", ECCV, 2022. [paper] [Code] [code][
MP] - "Symmetry Regularization and Saturating Nonlinearity for Robust Quantization", ECCV, 2022. [paper]
- "RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization", IJCAI, 2022. [paper] [code][
PTQ] - "MultiQuant: Training Once for Multi-bit Quantization of Neural Networks", IJCAI, 2022. [paper]
- "F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization", ICLR, 2022. [paper]
- "8-bit Optimizers via Block-wise Quantization", ICLR, 2022. [paper] [code]
- "Information Bottleneck: Exact Analysis of (Quantized) Neural Networks", ICLR, 2022. [paper] [code]
- "QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization", ICLR, 2022. [paper] [code][
PTQ] - "SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation", ICLR, 2022. [paper] [code][
PTQ] - "FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization", FPGA, 2022. [paper][
MP] - "Accurate Post Training Quantization with Small Calibration Sets", ICML, 2021. [paper] [code][
PTQ] - "How Do Adam and Training Strategies Help BNNs Optimization?", ICML, 2021. [paper] [code]
- "ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training", ICML, 2021. [paper] [code]
- "HAWQ-V3: Dyadic Neural Network Quantization", ICML, 2021. [paper] [code][
MP] - "Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution", ICML, 2021. [paper][
MP] - "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators", ICML, 2021. [paper] [code]
- "Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples", NeurIPS, 2021. [paper] [code]
- "Post-Training Sparsity-Aware Quantization", NeurIPS, 2021. [paper] [code][
PTQ] - "Diversifying Sample Generation for Accurate Data-Free Quantization", CVPR, 2021. [paper][
PTQ] - "Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks.", CVPR, 2021. [paper] [code]
- "Learnable Companding Quantization for Accurate Low-bit Neural Networks", CVPR, 2021. [paper]
- "Zero-shot Adversarial Quantization", CVPR, 2021. [paper] [code]
- "Network Quantization with Element-wise Gradient Scaling", CVPR, 2021. [paper] [code]
- "High-Capacity Expert Binary Networks", ICLR, 2021. [paper] [code][
Extreme] - "Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network", ICLR, 2021. [paper] [code] [
Extreme] - "BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction", ICLR, 2021. [paper] [code][
PTQ] - "Neural gradients are near-lognormal: improved quantized and sparse training", ICLR, 2021. [paper]
- "Training with Quantization Noise for Extreme Model Compression", ICLR, 2021. [paper]
- "BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization", ICLR, 2021. [paper] [code][
MP] - "Simple Augmentation Goes a Long Way: ADRL for DNN Quantization", ICLR, 2021. [paper]
- "Distribution Adaptive INT8 Quantization for Training CNNs", AAAI, 2021. [paper]
- "Stochastic Precision Ensemble: Self‐Knowledge Distillation for Quantized Deep Neural Networks", AAAI, 2021. [paper]
- "Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization", AAAI, 2021. [paper][
MP] - "OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization", AAAI, 2021. [paper]
- "Scalable Verification of Quantized Neural Networks", AAAI, 2021. [paper] [code]
- "Uncertainty Quantification in CNN through the Bootstrap of Convex Neural Networks", AAAI, 2021. [paper]
- "FracBits: Mixed Precision Quantization via Fractional Bit-Widths", AAAI, 2021. [paper][
MP] - "Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision", AAAI, 2021. [paper][
PTQ] [MP] - "ZeroQ: A Novel Zero Shot Quantization Framework", CVPR, 2020. [paper] [code][
PTQ] - "LSQ+: Improving Low-bit Quantization Through Learnable Offsets and Better Initialization", CVPR, 2020. [paper]
- "HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks", NeurIPS, 2020. [paper][
MP] - "Learned step size quantization", ICLR, 2020. [paper]
- "HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision", ICCV, 2019. [paper][
MP] - "Data-Free Quantization Through Weight Equalization and Bias Correction", ICCV, 2019. [paper][
PTQ] - "HAQ: Hardware-Aware Automated Quantization with Mixed Precision", CVPR, 2019. [paper] [code][
MP] - "PACT: Parameterized Clipping Activation for Quantized Neural Networks", arXiv, 2018. [paper]
- "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", CVPR, 2018. [paper]
- "Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector", CVPR, 2024. [paper][
PTQ] - "Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric", arXiv, 2023. [paper][
PTQ] - "AQD: Towards Accurate Quantized Object Detection", CVPR, 2021. [paper]
- "BiDet: An Efficient Binarized Object Detector", CVPR, 2020. [paper] [code][
Extreme] - "Fully Quantized Network for Object Detection", CVPR, 2019. [paper]
- "Towards Robust Full Low-bit Quantization of Super Resolution Networks", ECCV, 2024. [paper][
PTQ] - "Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks", ECCV, 2024. [paper]
- "QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution", NeurIPS, 2023. [paper]
- "Toward Accurate Post-Training Quantization for Image Super Resolution", CVPR, 2023. [paper] [code][
PTQ] - "EBSR: Enhanced Binary Neural Network for Image Super-Resolution", arXiv, 2023. [paper][
Extreme] - "CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution ", ECCV, 2022. [paper] [code]
- "Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks", ECCV, 2022. [paper] [code]
- "DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks", WACV, 2022. [paper] [code]
- "Fully Quantized Image Super-Resolution Networks", ACM MM, 2021. [paper] [code]
- "PAMS: Quantized Super-Resolution via Parameterized Max Scale", ECCV, 2020. [paper] [code]
- "Training Binary Neural Network without Batch Normalization for Image Super-Resolution", AAAI, 2021. [paper][
Extreme]
- "LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection", ICLR, 2024. [paper][
PTQ] - "Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis", arXiv, 2023. [paper][
Extreme] - "BiPointNet: Binary Neural Network for Point Clouds", ICLR, 2021. [paper][code][
Extreme]
- "QP-SNN: Quantized and Pruned Spiking Neural Networks", ICLR, 2025. [paper] "Q-SNNs: Quantized Spiking Neural Networks", arXiv, 2025. [paper]