Skip to content

LakeofRain/Quantization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Quantization-Papers-with-Codes

一个神经网络量化相关的仓库,包含论文和代码。按照量化模型的类型进行分类,并总结每类量化模型的特点,以及未来研究方向。

Overview

Keywords: PTQ: post-training quantization | QAT: quantization-aware training | Extreme: binary or ternary quantization | Non-uniform: non-uniform quantization | MP: mixed-precision quantization |


Survey

  • "A survey of low-bit large language models: Basics, systems, and algorithms", Neural Networks, 2025. [paper]
  • "Binary Neural Networks: A Survey", Pattern Recognition, 2020. [paper] [Extreme]
  • "A Survey of Quantization Methods for Efficient Neural Network Inference", Book Chapter: Low-Power Computer Vision, 2021. [paper]
  • "Full Stack Optimization of Transformer Inference: a Survey", arXiv, 2023. [paper]
  • "A White Paper on Neural Network Quantization", arXiv, 2021. [paper]

Transformer-based Models

Language Transformers

  • "SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training", NeurIPS, 2025. [paper][code][PTQ] [Non-uniform]



  • "CBQ: Cross-Block Quantization for Large Language Models", ICLR, 2025. [paper][PTQ]
  • "SpinQuant: LLM Quantization with Learned Rotations", ICLR, 2025. [paper] [code][PTQ]
  • "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid", ICLR, 2025. [paper] [code]
  • "SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning", ICLR, 2025. [paper][code][QAT]
  • "OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting", ICLR, 2025. [paper][code][PTQ]
  • "Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training", ICLR, 2025. [paper][QAT]



  • "Matryoshka Quantization", ICML, 2025. [paper][QAT] [Non-uniform]
  • "AEQA-NAT : Adaptive End-to-end Quantization Alignment Training Framework for Non-autoregressive Machine Translation", ICML, 2025. [paper][QAT]
  • "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance", ICML, 2025. [paper][code][PTQ]
  • "Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models", ICML, 2025. [paper][PTQ]
  • "Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation", ICML, 2025. [paper][code][PTQ]
  • "SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression", ICML, 2025. [paper][code][PTQ]
  • "SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization", ICML, 2025. [paper][PTQ]
  • "BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference", ICML, 2025. [paper][code][PTQ]
  • "GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models", ICML, 2025. [paper][code][PTQ] [Non-uniform]
  • "SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models", ICML, 2025. [paper][code][PTQ] [Non-uniform]
  • "SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization", ICML, 2025. [paper][code][PTQ] [Non-uniform]
  • "KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference", ICML, 2025. [paper][code][PTQ]
  • "RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models", ICML, 2025. [paper][code][QAT]
  • "CommVQ: Commutative Vector Quantization for KV Cache Compression", ICML, 2025. [paper][code][QAT]
  • "PARQ: Piecewise-Affine Regularized Quantization", ICML, 2025. [paper][QAT]
  • "ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals", ICML, 2025. [paper][code][PTQ] [Non-uniform]
  • "NestQuant: nested lattice quantization for matrix products and LLMs", ICML, 2025. [paper][PTQ]
  • "MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance", ICML, 2025. [paper]
  • "FlatQuant: Flatness Matters for LLM Quantization", ICML, 2025. [paper][code][PTQ]
  • "Optimizing Large Language Model Training Using FP4 Quantization", ICML, 2025. [paper][code][QAT]
  • "MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design", ICML, 2025. [paper][code][PTQ]
  • "BoA: Attention-aware Post-training Quantization without Backpropagation", ICML, 2025. [paper][PTQ]



  • "Q-VLM: Post-training Quantization for Large Vision-Language Models", NeurIPS, 2024. [paper][code][PTQ]
  • "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", NeurIPS, 2024. [paper][code][PTQ]
  • "QBB: Quantization with Binary Bases for LLMs", NeurIPS, 2024. [paper][Extreme]
  • "DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs", NeurIPS, 2024. [paper][code][PTQ]
  • "ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification", NeurIPS, 2024. [paper][code][PTQ]
  • "KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization", NeurIPS, 2024. [paper][Extreme]
  • "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression", NeurIPS, 2024. [paper][Extreme] [QAT]
  • "Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models", NeurIPS, 2024. [paper][PTQ]
  • "QTIP: Quantization with Trellises and Incoherence Processing", NeurIPS, 2024. [paper][code][PTQ]
  • "FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision", NeurIPS, 2024. [paper][code][QAT]



  • "Evaluating Quantized Large Language Models", ICML, 2024. [paper]
  • "SqueezeLLM: Dense-and-Sparse Quantization", ICML, 2024. [paper] [PTQ] [Non-uniform]
  • "KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache", ICML, 2024. [paper]
  • "LQER: Low-Rank Quantization Error Reconstruction for LLMs", ICML, 2024. [paper]
  • "Extreme Compression of Large Language Models via Additive Quantization", ICML, 2024. [paper]
  • "BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization", ICML, 2024. [paper]
  • "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs", ICML, 2024. [paper]
  • "Compressing Large Language Models by Joint Sparsification and Quantization", ICML, 2024. [paper]
  • "FrameQuant: Flexible Low-Bit Quantization for Transformers", ICML, 2024. [paper] [PTQ]



  • "OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models", ICLR, 2024. [paper]"
  • "LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models", ICLR, 2024. [paper]
  • "SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression", ICLR, 2024. [paper] [PTQ]
  • "QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models", ICLR, 2024. [paper]
  • "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models", ICLR, 2024. [paper] [PTQ]
  • "PB-LLM: Partially Binarized Large Language Models", ICLR, 2024. [paper] [Extreme]
  • "AffineQuant: Affine Transformation Quantization for Large Language Models", ICLR, 2024. [paper]
  • "Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models", ICLR, 2024. [paper]
  • "LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models", ICLR, 2024. [paper]



  • "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models", AAAI, 2024. [paper]
  • "Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models", AAAI, 2024. [paper]
  • "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", AAAI, 2024. [paper]
  • "Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation", AAAI, 2024. [paper] [PTQ]
  • "What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation", AAAI, 2024. [paper]



  • "EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs", arXiv, 2024. [paper]
  • "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact", arXiv, 2024. [paper]
  • "FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization", arXiv, 2024. [paper]
  • "A Comprehensive Evaluation of Quantization Strategies for Large Language Models", arXiv, 2024. [paper]
  • "GPTVQ: The Blessing of Dimensionality for LLM Quantization", arXiv, 2024. [paper]
  • "APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models", arXiv, 2024. [paper]
  • "EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge", arXiv, 2024. [paper]
  • "RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization", arXiv, 2024. [paper]
  • "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention", arXiv, 2024. [paper]
  • "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", arXiv, 2023. [paper]
  • "Extreme Compression of Large Language Models via Additive Quantization", arXiv, 2023. [paper]
  • "ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks", arXiv, 2023. [paper] [PTQ]
  • "FP8-BERT: Post-Training Quantization for Transformer", arXiv, 2023. [paper] [PTQ]
  • "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", arXiv, 2023. [paper]
  • "SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM", arXiv, 2023. [paper] [PTQ]
  • "A Speed Odyssey for Deployable Quantization of LLMs", arXiv, 2023. [paper]
  • "AFPQ: Asymmetric Floating Point Quantization for LLMs", arXiv, 2023. [paper]
  • "Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization", arXiv, 2023. [paper]
  • "QLoRA: Efficient Finetuning of Quantized LLMs", NeurIPS, 2023. [paper] [code]
  • "QuIP: 2-Bit Quantization of Large Language Models With Guarantees", NeurIPS, 2023. [paper] [code] [PTQ]
  • "Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization", NeurIPS, 2023. [paper]
  • "QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources", arXiv, 2023. [paper]
  • "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models", arXiv, 2023. [paper]
  • "Atom: Low-bit Quantization for Efficient and Accurate LLM Serving", arXiv, 2023. [paper]
  • "ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers", arXiv, 2023. [paper]
  • "LLM-FP4: 4-Bit Floating-Point Quantized Transformers", arXiv, 2023. [paper]
  • "TEQ: Trainable Equivalent Transformation for Quantization of LLMs", arXiv, 2023. [paper]
  • "Efficient Post-training Quantization with FP8 Formats", arXiv, 2023. [paper]
  • "Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization", arXiv, 2023. [paper]
  • "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs", arXiv, 2023. [paper]
  • "Norm Tweaking: High-performance Low-bit Quantization of Large Language Models", arXiv, 2023. [paper]
  • "Understanding the Impact of Post-Training Quantization on Large Language Models", arXiv, 2023. [paper]
  • "QuantEase: Optimization-based Quantization for Language Models -- An Efficient and Intuitive Algorithm", arXiv, 2023. [paper]
  • "FPTQ: Fine-grained Post-Training Quantization for Large Language Models", arXiv, 2023. [paper]
  • "FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs", arXiv, 2023. [paper] [PTQ]
  • "Gradient-Based Post-Training Quantization: Challenging the Status Quo", arXiv, 2023. [paper] [PTQ]
  • "NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search", arXiv, 2023. [paper] [Non-uniform]
  • "ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats", arXiv, 2023. [paper]
  • "Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models", arXiv, 2023. [paper]
  • "Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study", arXiv, 2023. [paper]
  • "INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation", arXiv, 2023. [paper]
  • "QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models", arXiv, 2023. [paper] [code]
  • "OWQ: Lessons learned from activation outliers for weight quantization in large language models", arXiv, 2023. [paper] [PTQ]
  • "PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models", arXiv, 2023. [paper]
  • "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration", arXiv, 2023. [paper] [PTQ]
  • "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models", arXiv, 2023. [paper]
  • "Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling", arXiv, 2023. [paper] [PTQ]
  • "RPTQ: Reorder-based Post-training Quantization for Large Language Models", arXiv, 2023. [paper] [code] [PTQ]



  • "The case for 4-bit precision: k-bit Inference Scaling Laws", ICML, 2023. [paper]
  • "Quantized Distributed Training of Large Models with Convergence Guarantees", ICML, 2023. [paper]
  • "Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases", ICML, 2023. [paper]
  • "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models", ICML, 2023. [paper] [code] [PTQ]
  • "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers", ICLR, 2023. [papar] [code] [PTQ]
  • "BiBERT: Accurate Fully Binarized BERT", ICLR, 2022. [paper] [code] [Extreme]
  • "BiT: Robustly Binarized Multi-distilled Transformer", NeurIPS, 2022. [paper] [code] [Extreme]
  • "Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models", NeurIPS, 2022. [paper] [code] [PTQ]
  • "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale", NeurIPS, 2022. [paper] [code]
  • "Towards Efficient Post-training Quantization of Pre-trained Language Models", NeurIPS, 2022. [paper] [PTQ]
  • "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers", NeurIPS, 2022. [paper] [code] [PTQ]
  • "Compression of Generative Pre-trained Language Models via Quantization", ACL, 2022. [paper]
  • "I-BERT: Integer-only BERT Quantization", ICML, 2021. [paper] [code]
  • "BinaryBERT: Pushing the Limit of BERT Quantization", ACL, 2021. [paper] [code] [Extreme]
  • "On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers", ACL, 2021. [paper]
  • "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP, 2021. [paper] [code]
  • "KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization", arXiv, 2021. [paper]
  • "TernaryBERT: Distillation-aware Ultra-low Bit BERT", EMNLP, 2020. [paper] [code] [Extreme]
  • "Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation", EMNLP, 2020. [paper]
  • "GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference", MICRO, 2020. [paper]
  • "Towards Fully 8-bit Integer Inference for the Transformer Model", IJCAI, 2020. [paper]
  • "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT", AAAI, 2020. [paper]
  • "Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model", ICML, 2019. [paper]
  • "Q8BERT: Quantized 8Bit BERT", EMC2 Workshop, 2019. [paper]

[Back to Overview]

Vision Transformers

  • "MBQ: Modality-Balanced Quantization for Large Vision-Language Models", CVPR, 2025. [paper][code][PTQ]
  • "MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization", CVPR, 2025. [paper][code][QAT]
  • "FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation", CVPR, 2025. [paper][code][PTQ]
  • "APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers", CVPR, 2025. [paper][code][PTQ]



  • "LRA-QViT: Integrating Low-Rank Approximation and Quantization for Robust and Efficient Vision Transformers", ICML 2025. [paper][QAT]



  • "CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs", ECCV, 2024. [paper][PTQ]
  • "AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer", ECCV, 2024. [paper][PTQ]
  • "PQ-SAM: Post-training Quantization for Segment Anything Model", ECCV, 2024. [paper][PTQ]



  • "ERQ: Error Reduction for Post-Training Quantization of Vision Transformers", ICML, 2024. [paper][PTQ]
  • "Outlier-aware Slicing for Post-Training Quantization in Vision Transformer", ICML, 2024. [paper][PTQ]



  • "PTQ4SAM: Post-Training Quantization for Segment Anything", CVPR, 2024. [paper] [PTQ]
  • "Instance-Aware Group Quantization for Vision Transformers", CVPR, 2024. [paper] [PTQ]



  • "Bi-ViT: Pushing the Limit of Vision Transformer Quantization", AAAI, 2024. [paper] [Extreme]
  • "AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries", AAAI, 2024. [paper]



  • "LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation", arXiv, 2023. [paper][PTQ] [MP]
  • "MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer", arXiv, 2023. [paper][PTQ] [MP]



  • "I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference", ICCV, 2023. [paper] [code][Extreme]
  • "RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers", ICCV, 2023. [paper] [code][PTQ]
  • "QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection", ICCV, 2023. [paper]
  • "BiViT: Extremely Compressed Binary Vision Transformers", ICCV, 2023. [paper][Extreme]
  • "Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers", ICCV, 2023. [paper]



  • "PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile", NeurIPS, 2023. [paper]



  • "Oscillation-free Quantization for Low-bit Vision Transformers", ICML, 2023. [paper] [code]
  • "PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers", TNNLS, 2023. [paper]
  • "Variation-aware Vision Transformer Quantization", arXiv, 2023. [paper]



  • "NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers", CVPR, 2023. [paper][PTQ]
  • "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization", CVPR, 2023. [paper]
  • "Q-DETR: An Efficient Low-Bit Quantized Detection Transformer", CVPR, 2023. [paper]
  • "Output Sensitivity-Aware DETR Quantization", 2023. [paper]
  • "Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction", arXiv, 2023. [paper][PTQ]
  • "Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer", NeurIPS, 2022. [paper] [code]
  • "Patch Similarity Aware Data-Free Quantization for Vision Transformers", ECCV, 2022. [paper] [code][PTQ]
  • "PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization", ECCV, 2022. [paper] [code] [PTQ]
  • "FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer", IJCAI, 2022. [paper] [code][PTQ]
  • "Q-ViT: Fully Differentiable Quantization for Vision Transformer", arXiv, 2022. [paper]
  • "Post-Training Quantization for Vision Transformer", NeurIPS, 2021. [paper][PTQ]

[Back to Overview]

Diffusion-based Models

Visual Generation

  • "SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models", ICLR, 2025. [paper]
  • "ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation", ICLR, 2025. [paper]
  • "DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models", ICLR, 2025. [paper]



  • "Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning", CVPR 2025. [paper][PTQ]
  • "Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers", CVPR 2025. [paper][code][PTQ]
  • "PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution", CVPR 2025. [paper][code][PTQ]



  • "Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers", ICML, 2025. [paper]['QAT']
  • "Outlier-Aware Post-Training Quantization for Discrete Graph Diffusion Models", ICML, 2025. [paper][PTQ]



  • "PTQ4DiT: Post-training Quantization for Diffusion Transformers", NeurIPS, 2024. [paper][PTQ]
  • "BiDM: Pushing the Limit of Quantization for Diffusion Models", NeurIPS, 2024. [paper]
  • "BitsFusion: 1.99 bits Weight Quantization of Diffusion Model", NeurIPS, 2024. [paper]



  • "Timestep-Aware Correction for Quantized Diffusion Models", ECCV, 2024. [paper]
  • "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", ECCV, 2024. [paper][PTQ]
  • "Memory-Efficient Fine-Tuning for Quantized Diffusion Model", ECCV, 2024. [paper]
  • "MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization", ECCV, 2024. [paper]



  • "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models", CVPR, 2024. [paper][PTQ]
  • "Towards Accurate Post-training Quantization for Diffusion Models", CVPR, 2024. [paper] [PTQ]



  • "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models", ICLR, 2024. [paper]
  • "QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning", arXiv, 2024. [paper]
  • "Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models", arXiv, 2023. [paper]
  • "Efficient Quantization Strategies for Latent Diffusion Models", arXiv, 2023. [paper][PTQ]
  • "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", arXiv, 2023. [paper]
  • "Effective Quantization for Diffusion Models on CPUs", arXiv, 2023. [paper]



  • "PTQD: Accurate Post-Training Quantization for Diffusion Models", NeurIPS, 2023. [paper][PTQ]
  • "Q-DM: An Efficient Low-bit Quantized Diffusion Model", NeurIPS, 2023. [paper]
  • "Temporal Dynamic Quantization for Diffusion Models", NeurIPS, 2023. [paper]
  • "Q-diffusion: Quantizing Diffusion Models", ICCV, 2023. [paper] [code] [PTQ]
  • "Towards Accurate Data-free Quantization for Diffusion Models", arXiv, 2023. [paper][PTQ]
  • "Post-training Quantization on Diffusion Models", CVPR, 2023. [paper] [code][PTQ]

[Back to Overview]

Convolutional Neural Networks

Image Classification

  • "MetaAug: Meta-Data Augmentation for Post-Training Quantization", ECCV, 2024. [paper]
  • "Sharpness-Aware Data Generation for Zero-shot Quantization", ICML, 2024. [paper]
  • "A2Q+: Improving Accumulator-Aware Weight Quantization", ICML, 2024. [paper]
  • "HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks", IJCAI, 2024. [paper][PTQ]
  • "Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning", CVPR, 2024. [paper][MP]
  • "Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices", CVPR, 2024. [paper][MP]
  • "Enhancing Post-training Quantization Calibration through Contrastive Learning", CVPR, 2024. [paper] [PTQ]
  • "Data-Free Quantization via Pseudo-label Filtering", CVPR, 2024. [paper]
  • "Make RepVGG Greater Again: A Quantization-Aware Approach", AAAI, 2024. [paper]
  • "MetaMix: Meta-State Precision Searcher for Mixed-Precision Activation Quantization", AAAI, 2024. [paper][MP]
  • "Robustness-Guided Image Synthesis for Data-Free Quantization", AAAI, 2024. [paper]
  • "PTMQ: Post-training Multi-Bit Quantization of Neural Networks", AAAI, 2024. [paper][PTQ]
  • "Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs", arXiv, 2023. [paper]
  • "StableQ: Enhancing Data-Scarce Quantization with Text-to-Image Data", arXiv, 2023. [paper]
  • "Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers", NeurIPS, 2023. [paper][Extreme]
  • "TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration", NeurIPS, 2023. [paper]
  • "Overcoming Forgetting Catastrophe in Quantization-Aware Training", ICCV, 2023. [paper]
  • "Causal-DFQ: Causality Guided Data-Free Network Quantization", ICCV, 2023. [paper] [code]
  • "DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization", ICCV, 2023. [paper]
  • "EQ-Net: Elastic Quantization Neural Networks", ICCV, 2023. [paper] [code]
  • "A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance", ICCV, 2023. [paper]
  • "EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization", ICCV, 2023. [paper][MP]
  • "Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning", ICCV, 2023. [paper][PTQ]
  • "Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction", ICML, 2023. [paper] [code]
  • "FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization", ICML, 2023. [paper][PTQ]
  • "Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning", PR, 2023. [paper]
  • "OMPQ: Orthogonal Mixed Precision Quantization", AAAI, 2023. [paper][MP]
  • "Rethinking Data-Free Quantization as a Zero-Sum Game", AAAI, 2023. [paper]
  • "Quantized Feature Distillation for Network Quantization", AAAI, 2023. [paper]
  • "Resilient Binary Neural Network", AAAI, 2023. [paper][Extreme]
  • "Fast and Accurate Binary Neural Networks Based on Depth-Width Reshaping", AAAI, 2023. [paper][Extreme]
  • "Efficient Quantization-aware Training with Adaptive Coreset Selection", arXiv, 2023. [paper]
  • "One-Shot Model for Mixed-Precision Quantization", CVPR, 2023. [paper][MP]
  • "Adaptive Data-Free Quantization", CVPR, 2023. [paper]
  • "Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization", CVPR, 2023. [paper][PTQ]
  • "Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective", CVPR, 2023. [paper] [code][PTQ]
  • "GENIE: Show Me the Data for Quantization", CVPR, 2023. [paper] [code][PTQ]
  • "Bayesian asymmetric quantized neural networks", PR, 2023. [paper]
  • "Distribution-sensitive Information Retention for Accurate Binary Neural Network", IJCV, 2023. [paper][Extreme]
  • "SDQ: Stochastic Differentiable Quantization with Mixed Precision", ICML, 2022. [paper][MP]
  • "Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks", ICML, 2022. [paper] [code]
  • "GACT: Activation Compressed Training for Generic Network Architectures", ICML, 2022. [paper] [code]
  • "Overcoming Oscillations in Quantization-Aware Training", ICML, 2022. [paper] [code]
  • "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation", CVPR, 2022. [paper] [code][Non-uniform]
  • "Learnable Lookup Table for Neural Network Quantization", CVPR, 2022. [paper] [code][Non-uniform]
  • "Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error", CVPR, 2022. [paper][PTQ] [Non-uniform]
  • "Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization", CVPR, 2022. [paper][Non-uniform] [MP]
  • "IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization", CVPR, 2022. [paper] [code]
  • "Instance-Aware Dynamic Neural Network Quantization", CVPR, 2022. [paper]
  • "Leveraging Inter-Layer Dependency for Post-Training Quantization", NeurIPS, 2022. [paper][PTQ]
  • "Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques", NeurIPS, 2022. [paper]
  • "Entropy-Driven Mixed-Precision Quantization for Deep Network Design", NeurIPS, 2022. [paper][MP]
  • "Redistribution of Weights and Activations for AdderNet Quantization", NeurIPS, 2022. [paper]
  • "FP8 Quantization: The Power of the Exponent", NeurIPS, 2022. [paper] [code]
  • "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning", NeurIPS, 2022. [paper] [code][PTQ]
  • "ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences", NeurIPS, 2022. [paper]
  • "Non-Uniform Step Size Quantization for Accurate Post-Training Quantization", ECCV, 2022. [paper][PTQ] [Non-uniform]
  • "Towards Accurate Network Quantization with Equivalent Smooth Regularizer", ECCV, 2022. [paper]
  • "BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks", ECCV, 2022. [paper] [code]
  • "RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization", ECCV, 2022. [paper]
  • "Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance", ECCV, 2022. [paper] [Code] [code][MP]
  • "Symmetry Regularization and Saturating Nonlinearity for Robust Quantization", ECCV, 2022. [paper]
  • "RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization", IJCAI, 2022. [paper] [code][PTQ]
  • "MultiQuant: Training Once for Multi-bit Quantization of Neural Networks", IJCAI, 2022. [paper]
  • "F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization", ICLR, 2022. [paper]
  • "8-bit Optimizers via Block-wise Quantization", ICLR, 2022. [paper] [code]
  • "Information Bottleneck: Exact Analysis of (Quantized) Neural Networks", ICLR, 2022. [paper] [code]
  • "QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization", ICLR, 2022. [paper] [code][PTQ]
  • "SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation", ICLR, 2022. [paper] [code][PTQ]
  • "FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization", FPGA, 2022. [paper][MP]
  • "Accurate Post Training Quantization with Small Calibration Sets", ICML, 2021. [paper] [code][PTQ]
  • "How Do Adam and Training Strategies Help BNNs Optimization?", ICML, 2021. [paper] [code]
  • "ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training", ICML, 2021. [paper] [code]
  • "HAWQ-V3: Dyadic Neural Network Quantization", ICML, 2021. [paper] [code][MP]
  • "Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution", ICML, 2021. [paper][MP]
  • "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators", ICML, 2021. [paper] [code]
  • "Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples", NeurIPS, 2021. [paper] [code]
  • "Post-Training Sparsity-Aware Quantization", NeurIPS, 2021. [paper] [code][PTQ]
  • "Diversifying Sample Generation for Accurate Data-Free Quantization", CVPR, 2021. [paper][PTQ]
  • "Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks.", CVPR, 2021. [paper] [code]
  • "Learnable Companding Quantization for Accurate Low-bit Neural Networks", CVPR, 2021. [paper]
  • "Zero-shot Adversarial Quantization", CVPR, 2021. [paper] [code]
  • "Network Quantization with Element-wise Gradient Scaling", CVPR, 2021. [paper] [code]
  • "High-Capacity Expert Binary Networks", ICLR, 2021. [paper] [code][Extreme]
  • "Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network", ICLR, 2021. [paper] [code] [Extreme]
  • "BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction", ICLR, 2021. [paper] [code][PTQ]
  • "Neural gradients are near-lognormal: improved quantized and sparse training", ICLR, 2021. [paper]
  • "Training with Quantization Noise for Extreme Model Compression", ICLR, 2021. [paper]
  • "BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization", ICLR, 2021. [paper] [code][MP]
  • "Simple Augmentation Goes a Long Way: ADRL for DNN Quantization", ICLR, 2021. [paper]
  • "Distribution Adaptive INT8 Quantization for Training CNNs", AAAI, 2021. [paper]
  • "Stochastic Precision Ensemble: Self‐Knowledge Distillation for Quantized Deep Neural Networks", AAAI, 2021. [paper]
  • "Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization", AAAI, 2021. [paper][MP]
  • "OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization", AAAI, 2021. [paper]
  • "Scalable Verification of Quantized Neural Networks", AAAI, 2021. [paper] [code]
  • "Uncertainty Quantification in CNN through the Bootstrap of Convex Neural Networks", AAAI, 2021. [paper]
  • "FracBits: Mixed Precision Quantization via Fractional Bit-Widths", AAAI, 2021. [paper][MP]
  • "Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision", AAAI, 2021. [paper][PTQ] [MP]
  • "ZeroQ: A Novel Zero Shot Quantization Framework", CVPR, 2020. [paper] [code][PTQ]
  • "LSQ+: Improving Low-bit Quantization Through Learnable Offsets and Better Initialization", CVPR, 2020. [paper]
  • "HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks", NeurIPS, 2020. [paper][MP]
  • "Learned step size quantization", ICLR, 2020. [paper]
  • "HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision", ICCV, 2019. [paper][MP]
  • "Data-Free Quantization Through Weight Equalization and Bias Correction", ICCV, 2019. [paper][PTQ]
  • "HAQ: Hardware-Aware Automated Quantization with Mixed Precision", CVPR, 2019. [paper] [code][MP]
  • "PACT: Parameterized Clipping Activation for Quantized Neural Networks", arXiv, 2018. [paper]
  • "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", CVPR, 2018. [paper]

[Back to Overview]

Other Tasks

Object Detection

  • "Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector", CVPR, 2024. [paper][PTQ]
  • "Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric", arXiv, 2023. [paper][PTQ]
  • "AQD: Towards Accurate Quantized Object Detection", CVPR, 2021. [paper]
  • "BiDet: An Efficient Binarized Object Detector", CVPR, 2020. [paper] [code][Extreme]
  • "Fully Quantized Network for Object Detection", CVPR, 2019. [paper]

[Back to Overview]

Super Resolution

  • "Towards Robust Full Low-bit Quantization of Super Resolution Networks", ECCV, 2024. [paper][PTQ]
  • "Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks", ECCV, 2024. [paper]
  • "QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution", NeurIPS, 2023. [paper]
  • "Toward Accurate Post-Training Quantization for Image Super Resolution", CVPR, 2023. [paper] [code][PTQ]
  • "EBSR: Enhanced Binary Neural Network for Image Super-Resolution", arXiv, 2023. [paper][Extreme]
  • "CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution ", ECCV, 2022. [paper] [code]
  • "Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks", ECCV, 2022. [paper] [code]
  • "DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks", WACV, 2022. [paper] [code]
  • "Fully Quantized Image Super-Resolution Networks", ACM MM, 2021. [paper] [code]
  • "PAMS: Quantized Super-Resolution via Parameterized Max Scale", ECCV, 2020. [paper] [code]
  • "Training Binary Neural Network without Batch Normalization for Image Super-Resolution", AAAI, 2021. [paper][Extreme]

[Back to Overview]

Point Cloud

  • "LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection", ICLR, 2024. [paper][PTQ]
  • "Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis", arXiv, 2023. [paper][Extreme]
  • "BiPointNet: Binary Neural Network for Point Clouds", ICLR, 2021. [paper][code][Extreme]

[Back to Overview]

Spiking Neural Networks

  • "QP-SNN: Quantized and Pruned Spiking Neural Networks", ICLR, 2025. [paper] "Q-SNNs: Quantized Spiking Neural Networks", arXiv, 2025. [paper]

[Back to Overview]


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors