model-quantization

Here are 14 public repositories matching this topic...

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

compression language-model knowledge-distillation model-quantization pruning-algorithms llm llm-compression efficient-llm

Updated Jun 17, 2025
Python

htqin / BiBench

Star

[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.

benchmark binarization model-compression binary-neural-networks binarized-neural-networks model-quantization icml-2023

Updated Mar 4, 2024
Python

htqin / QuantSR

Star

[NeurIPS 2023 Spotlight] This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.

super-resolution quantized-neural-networks model-quantization

Updated May 13, 2024
Python

nbasyl / OFQ

Star

The official implementation of the ICML 2023 paper OFQ-ViT

icml model-compression model-compression-papers model-quantization vision-transformer vision-transformers icml2023 quantization-awar

Updated Oct 3, 2023
Python

seonglae / llama2gptq

Sponsor

Star

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

chatbot cuda transformers question-answering gpt quantization rye model-quantization chatai streamlit-chat chatgpt langchain llama2 llama-2

Updated Nov 25, 2023
Python

HaoranREN / TensorFlow_Model_Quantization

Star

A tutorial of model quantization using TensorFlow

machine-learning tensorflow tensorflow-lite tflite model-quantization inference-efficiency quantization-aware-training

Updated Aug 2, 2021
Python

frickyinn / BiDense

Star

PyTorch implementation of "BiDense: Binarization for Dense Prediction," A binary neural network for dense prediction tasks.

model-compression model-quantization

Updated Nov 21, 2024
Python

This project distills a ViT model into a compact CNN, reducing its size to 1.24MB with minimal accuracy loss. ONNXRuntime with CUDA boosts inference speed, while FastAPI and Docker simplify deployment.

python docker image-classification knowledge-distillation onnx fastapi onnxruntime model-quantization vision-transformer

Updated May 17, 2025
Python

sebasmos / curious-qmoe

Star

🔬 Curiosity-Driven Quantized Mixture of Experts

pytorch audio-classification mixture-of-experts model-quantization efficient-ai

Updated Nov 13, 2025
Python

TheodorePTP / model_compression_action

Star

模型压缩的技术文档与项目实践

knowledge-distillation model-pruning model-compression model-quantization

Updated Nov 25, 2025
Python

FlosMume / LLAMA-qLoRA-Unsloth-Starter

Star

Fine-tuning Llama models with QLoRA using Unsloth for supervised instruction tasks

pytorch llama lora fine-tuning efficient-training model-quantization large-language-models llm low-rank-adaptation qlora bitsandbytes open-source-ai unsloth

Updated Oct 20, 2025
Python

santidrj / model-quantization-aggregation

Star

Replication package for the paper "Aggregating empirical evidence from data strategies studies: a case on model quantization" published in the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

software-engineering green-ai model-quantization research-synthesis structured-synthesis-method

Updated Jul 14, 2025
Python

harshmorya / Assignment__HB1--1

Star

This project explores generating high-quality images using depth maps and conditioning techniques like Canny edges, leveraging Stable Diffusion and ControlNet models. It focuses on optimizing image generation with different aspect ratios, inference steps to balance speed and quality.

optimization pytorch image-generation unet canny-edge-detection depth-map image-synthesis model-quantization stable-diffusion latent-diffusion-models ai-art-generator controlnet

Updated Oct 20, 2024
Python

dslisleedh / NCNet-flax

Star

Unofficial implementation of NCNet using flax and jax

flax super-resolution jax model-quantization

Updated Jan 11, 2023
Python

Improve this page

Add a description, image, and links to the model-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the model-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model-quantization

Here are 14 public repositories matching this topic...

horseee / Awesome-Efficient-LLM

htqin / BiBench

htqin / QuantSR

nbasyl / OFQ

seonglae / llama2gptq

HaoranREN / TensorFlow_Model_Quantization

frickyinn / BiDense

first-coding / VIT

sebasmos / curious-qmoe

TheodorePTP / model_compression_action

FlosMume / LLAMA-qLoRA-Unsloth-Starter

santidrj / model-quantization-aggregation

harshmorya / Assignment__HB1--1

dslisleedh / NCNet-flax

Improve this page

Add this topic to your repo