Learned-Rounding

A repository of Python & PyTorch scripts which (currently) converts .safetensors models into scaled FP8 variants, utilizing gradient descent for optimal rounding.

TPEC-Quant (Top-Principal Error Correction Quantization)

A novel method (Designed by Clybius) which utilizes SVD to calculate error on the top principal component and descends upon more accurate representations, leading to far better results in FP8 precision. Results are very akin to FP16/BF16 precision when scaled with a single scalar value.
Obtainable via convert_fp8_scaled_learned_svd_fast.py in this repository.
Natively supported in ComfyUI and other UI utilities based off of ComfyUI, thanks to their scaled FP8 implementation!
Takes ~10 minutes to quantize a large AI image diffusion model (Chroma). Performance may largely vary depending on hardware, and can likely be improved upon with multiprocessing (not yet complete), a faster drive for reading/writing (partially broke), and a faster GPU/processing unit.
Supports Chroma, FLUX, T5XXL (includes removal of decoder and extra tensors from a full model), and maybe more!

Usage: python convert_fp8_scaled_learned_svd_fast.py --input /path/to/model.safetensors

Arguments:

--input "/path/to/model.safetensors": Input safetensors file path.
--output "/path/to/output_model.safetensors": Output safetensors file path. If not provided, it will be created based on the input location and name.
--keep_distillation: Exclude distillation layers from quantization. (Likely not helpful because ComfyUI may use Round-to-Nearest in place of this without further modification.) (Default False)
--t5xxl: Exclude certain layers for T5XXL model compatibility. (Default False)
--calib_samples INT: Number of random samples for calibration. Currently only used for bias correction. (Default 3072)
--num_iter INT: Number of optimization iterations per tensor. Increasing will result in higher accuracy, but longer offline quantization times. (Default 500)
--top_k INT: Top K principal components to descend upon. We usually target the top component for ease of descent and accuracy, but you can experiment with more components if 1 isn't working well. (Default 1)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md
convert_fp8_scaled_learned.py		convert_fp8_scaled_learned.py
convert_fp8_scaled_learned_svd.py		convert_fp8_scaled_learned_svd.py
convert_fp8_scaled_learned_svd_fast.py		convert_fp8_scaled_learned_svd_fast.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learned-Rounding

TPEC-Quant (Top-Principal Error Correction Quantization)

About

Uh oh!

Releases

Packages

Languages

License

Clybius/Learned-Rounding

Folders and files

Latest commit

History

Repository files navigation

Learned-Rounding

TPEC-Quant (Top-Principal Error Correction Quantization)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages