quantize: add option to automatically choose optimal quant types to reach a file/bpw target size at lowest error by EAddario · Pull Request #15550 · ggml-org/llama.cpp

EAddario · 2025-08-24T21:44:58Z

This PR adds target_bpw_type(), a function to determine an optimal per-tensor quantization mix to achieve a user-specified total file size (i.e., --target-size 1.5g) or a global bits-per-weight (bpw) target (i.e., --target-bpw 4.5678).

The function solves a constrained optimization problem to minimize quantization error, subject to a global size budget. It estimates per-tensor error for each layer, and dynamically allocates the bit budget where it matters most.

High level flow:

Error estimation via Monte Carlo sampling: For every applicable tensor, the function computes the quantization error, dequantizing a subset of rows weighted by an importance matrix (imatrix).
Pareto Frontier analysis: For each tensor, it identifies the Pareto frontier discarding types that increase size without sufficiently decreasing error.
Lagrangian optimization: It uses Lagrangian relaxation to find an optimal distribution of bits across the entire model. Higher bit-rates are dynamically allocated to tensors where they provide the highest reduction in error.
Resumable state: When --state-file is set, target computations are saved to a file. If the quantization is interrupted (e.g., Ctrl+C), it can resume error calculation from where it left off in the next run.

Advantages

Target arbitrary size models
- The algorithm will produce a model (nearly) exactly of the requested file/bpw size, which is very useful for maximizing VRAM usage. In a system with 24GB VRAM and a 70B model, standard quants might produce a 16.8GB file (too small, quality left on table) or a 24.1GB file (won't fit). --target-size 23.85g will generate a 23.85 GiB file to utilize the hardware fully.
Data-driven mixed precision often can improve quality at fixed size
- Instead of using hardcoded heuristics, that may be sub‑optimal for a given architecture or size, the mix is determined by the actual error sensitivity of the specific model's weights. This often yields a better quality/size trade-off, especially in aggressive quantization scenarios (1.5 to 3.5 bpw), or for unusual architectures.
Allows better like-for-like comparisons between models and families
- Standard quantization uses hardcoded rules like: "use Q4_K_M, except bump some tensors up/down, except fall back if incompatible, except keep some tensors unquantized..." and consquently, two different models quantized at the same Q4_K_M level can end up with very different bpw (e.g. 4.75 and 4.30).
- Model performance generally scales with size; larger models typically outperform smaller ones. A model quantized with more bits will usually perform better (exhibiting lower perplexity and better evaluation scores) than a smaller version, even when the same underlying quantization method is used. This makes performance comparisons between models not a controlled experiment, as the models being compared have different effective compression ratios.
- --target-bpw helps to normalize experiments by forcing models to be quantized to a roughly equal overall byte budget. This standardization allows performance variations between models to be more accurately linked to underlying factors such as architectural or training differences, the effect of quantization error at the same compression level, or the decisions made by the optimizer regarding allocation.

Disadvantages

Quantization process is significantly slower than standard
- This approach can take 5x-10x longer as it quantizes a sample of most tensors into 15 different formats, dequantizes them back, computes error diffs, and selects the best size/error option that fits the target file size or global bpw budget.
- However, the --state-file option will save the above-mentioned computations to disk so that future quantizations can be generated at normal speed. It also allows to interrupt the computation process and resume it at a later time.
The optimization target is only a proxy for the model's performance quality
- The process minimizes a per-tensor estimated error computed from sampled rows, not actual perplexity or divergence of output distributions. Since errors interact nonlinearly across layers, there are no guarantees it will select the best possible mix subject to the file/bpw size constraint.
An imatrix with activations data is required for best results
- The algorithm requires an imatrix generated by a yet to be merged pull request imatrix: calculate activation-based statistics for new format (GGUF) imatrices #14891
- Activation and statistics data are needed to compute the error. If the imatrix file does not contain the necessary data, --target-bpw and --target-file will refuse to run.

Design considerations

The target_bpw_type() function is implemented as a container for several lambdas providing all the logic for serialization, multithreading, math/stats, and optimization.

Although there are clear downsides to this approach (i.e. cognitive load, testability and maintainability), a self-contained "God function" seemed a better choice to prevent llama-quantize's global scope pollution: structs and helper lambdas are highly specific to this exact algorithm and have no reuse value elsewhere in the library.

Test results

Based on 132 tests with models from 11 different families, the target_bpw_type() optimization routine generated 96 (~70%) better quality models, and 10 (~8%) same as standard quantization. However, even though the method produced better quality often, it lost in surprising cases. Naive quants made up for the remaining 25 tests (20%) performing better, sometimes by a significant margin (e.g. ERNIE-4.5-21B-A3B-PT-IQ1_M, granite-4.0-h-tiny-IQ2_M, granite-4.0-h-tiny-IQ1_M)

Of the 96 cases where it performed better, about 1/3 achieved higher scores when using the --ignore-tensor-importance option, forcing the algorithm to treat each tensor equally instead of prioritising some (i.e. attn_output, ffn_down, etc.).

Target BPW test results

Using Cor(ln(PPL(Q)), ln(PPL(base))) as the discriminant metric

| Model                                        |    BPW |         PPL |   𝜌PPL |      KLD | Same Top P | Best          | --target-bpw | --no-importance |
| -------------------------------------------- | -----: | ----------: | -----: | -------: | ---------: | :------------ | :----------: | :-------------: |
| ARWKV-R1-1B5-Q6_K_M-naive                    | 6.6851 |   35.263025 | 99.95% | 0.003130 |     97.02% |               |              |                 |
| ARWKV-R1-1B5-Q6_K_M-test                     | 6.6851 |   35.244879 | 99.95% | 0.003225 |     96.93% | Same          |    99.95%    |     99.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-Q6_K_M-naive                     | 6.6283 |   20.353217 | 99.95% | 0.002558 |     97.67% |               |              |                 |
| ARWKV-R1-7B-Q6_K_M-test                      | 6.6275 |   20.355471 | 99.95% | 0.002662 |     97.65% | Same          |    99.95%    |     99.95%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-Q5_K_M-naive                    | 5.8687 |   35.189166 | 99.92% | 0.006015 |     95.79% |               |              |                 |
| ARWKV-R1-1B5-Q5_K_M-test                     | 5.8675 |   35.265399 | 99.92% | 0.005931 |     95.76% | Same          |    99.92%    |     99.88%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-Q5_K_M-naive                     | 5.7624 |   20.383109 | 99.92% | 0.004544 |     96.68% |               |              |                 |
| ARWKV-R1-7B-Q5_K_M-test                      | 5.7622 |   20.378203 | 99.92% | 0.005341 |     96.27% | Same          |    99.92%    |     99.88%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-Q4_K_M-naive                    | 5.1004 |   35.737707 | 99.78% | 0.017944 |     92.35% |               |              |                 |
| ARWKV-R1-1B5-Q4_K_M-test                     | 5.1004 |   35.450752 | 99.80% | 0.016954 |     92.38% | Target BPW    |    99.80%    |     99.73%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-Q4_K_M-naive                     | 4.9474 |   20.510452 | 99.83% | 0.012588 |     94.20% |               |              |                 |
| ARWKV-R1-7B-Q4_K_M-test                      | 4.9472 |   20.463235 | 99.83% | 0.012339 |     94.36% | Same          |    99.83%    |     99.80%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-Q3_K_M-naive                    | 4.1899 |   36.901085 | 98.98% | 0.087509 |     83.43% |               |              |                 |
| ARWKV-R1-1B5-Q3_K_M-test                     | 4.1896 |   37.147295 | 99.45% | 0.049345 |     87.24% | Target BPW    |    99.45%    |     99.13%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-Q3_K_M-naive                     | 3.9747 |   21.308946 | 99.29% | 0.054533 |     88.23% |               |              |                 |
| ARWKV-R1-7B-Q3_K_M-test                      | 3.9744 |   21.040018 | 99.43% | 0.045004 |     89.19% | Target BPW    |    99.43%    |     98.79%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-IQ2_M-naive                     | 3.2024 |   57.511363 | 93.55% | 0.579476 |     62.23% |               |              |                 |
| ARWKV-R1-1B5-IQ2_M-test                      | 3.2024 |   45.057798 | 97.43% | 0.235909 |     73.38% | Target BPW    |    97.43%    |     95.43%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-IQ2_M-naive                      | 2.9175 |   27.362006 | 95.23% | 0.378432 |     71.15% |               |              |                 |
| ARWKV-R1-7B-IQ2_M-test                       | 2.9174 |   24.951673 | 96.66% | 0.253425 |     76.46% | Target BPW    |    96.66%    |     95.33%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-IQ1_M-naive                     | 2.4958 |  191.855358 | 79.31% | 1.833383 |     38.33% |               |              |                 |
| ARWKV-R1-1B5-IQ1_M-test                      | 2.4956 |   73.897487 | 90.95% | 0.789102 |     56.30% | Target BPW    |    90.95%    |     88.79%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-IQ1_M-naive                      | 2.1619 |   51.790959 | 85.96% | 1.123773 |     50.89% |               |              |                 |
| ARWKV-R1-7B-IQ1_M-test                       | 2.1615 |   35.385003 | 90.83% | 0.720311 |     61.79% | Target BPW    |    90.83%    |     89.40%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q6_K_M-naive               | 6.5652 |   14.657145 | 99.87% | 0.008099 |     94.49% |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q6_K_M-test                | 6.5646 |   14.672409 | 99.85% | 0.009163 |     94.25% | Naive         |    99.85%    |     99.85%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q6_K_M-naive            | 6.5678 |    6.252806 | 99.88% | 0.003923 |     97.31% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q6_K_M-test             | 6.5678 |    6.282425 | 99.87% | 0.004336 |     97.07% | No Importance |    99.87%    |     99.91%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q5_K_M-naive               | 5.9050 |   14.760451 | 99.70% | 0.018535 |     92.32% |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q5_K_M-test                | 5.9047 |   14.750293 | 99.74% | 0.016542 |     92.36% | Target BPW    |    99.74%    |     99.67%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q5_K_M-naive            | 5.6851 |    6.282396 | 99.80% | 0.007295 |     96.26% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q5_K_M-test             | 5.6851 |    6.329379 | 99.71% | 0.011546 |     94.99% | No Importance |    99.71%    |     99.80%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q4_K_M-naive               | 5.2837 |   15.360215 | 99.03% | 0.062306 |     86.24% |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q4_K_M-test                | 5.2837 |   15.127183 | 99.36% | 0.041702 |     88.63% | Target BPW    |    99.36%    |     99.03%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q4_K_M-naive            | 4.8543 |    6.356795 | 99.49% | 0.020904 |     93.73% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q4_K_M-test             | 4.8543 |    6.329602 | 99.50% | 0.021666 |     93.20% | No Importance |    99.50%    |     99.55%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q3_K_M-naive               | 4.6599 |   17.624006 | 96.28% | 0.233042 |     75.01% |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q3_K_M-test                | 4.6598 |   15.572376 | 98.80% | 0.081764 |     83.33% | Target BPW    |    98.80%    |     97.90%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q3_K_M-naive            | 3.8363 |    6.653076 | 98.43% | 0.065814 |     88.93% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q3_K_M-test             | 3.8363 |    6.960149 | 96.89% | 0.140407 |     83.10% | Naive         |    96.89%    |     98.12%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-IQ2_M-naive                | 3.5601 |  126.510216 | 67.36% | 2.245763 |     31.50% |               |              |                 |
| ERNIE-4.5-0.3B-PT-IQ2_M-test                 | 3.5600 |   21.776330 | 92.99% | 0.468520 |     64.78% | Target BPW    |    92.99%    |     91.33%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-IQ2_M-naive             | 2.6266 |    9.325763 | 90.05% | 0.452442 |     71.35% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-IQ2_M-test              | 2.6266 |   11.443878 | 86.25% | 0.673509 |     64.15% | Naive         |    86.25%    |     88.21%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-IQ1_M-naive                | 2.9380 | 1993.785673 | 43.82% | 5.024711 |      9.48% |               |              |                 |
| ERNIE-4.5-0.3B-PT-IQ1_M-test                 | 2.9380 |   52.789501 | 80.71% | 1.401172 |     43.87% | Target BPW    |    80.71%    |     74.19%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-IQ1_M-naive             | 1.8207 |   14.684351 | 80.98% | 0.944582 |     59.54% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-IQ1_M-test              | 1.8207 |   34.634577 | 68.09% | 1.888292 |     43.24% | Naive         |    68.09%    |     73.29%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q6_K_M-naive         | 6.5724 |   16.855635 | 99.40% | 0.012620 |     94.87% |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q6_K_M-test          | 6.5723 |   17.139379 | 99.39% | 0.014104 |     94.89% | Naive         |    99.39%    |     99.18%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-Q6_K_M-naive           | 6.5665 |    7.951421 | 99.53% | 0.010064 |     95.13% |               |              |                 |
| Falcon-H1-7B-Instruct-Q6_K_M-test            | 6.5664 |    7.982449 | 99.57% | 0.006085 |     96.81% | Target BPW    |    99.57%    |     99.53%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q5_K_M-naive         | 5.6838 |   16.573882 | 99.12% | 0.030210 |     92.56% |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q5_K_M-test          | 5.6838 |   16.918074 | 98.93% | 0.042785 |     91.43% | No Importance |    98.93%    |     98.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-Q5_K_M-naive           | 5.6789 |    7.966140 | 99.42% | 0.015622 |     94.28% |               |              |                 |
| Falcon-H1-7B-Instruct-Q5_K_M-test            | 5.6789 |    8.013705 | 99.43% | 0.014055 |     94.86% | Target BPW    |    99.43%    |     99.35%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q4_K_M-naive         | 4.8474 |   17.630107 | 97.98% | 0.106563 |     87.08% |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q4_K_M-test          | 4.8473 |   17.151377 | 98.20% | 0.092033 |     87.74% | Target BPW    |    98.20%    |     98.18%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-Q4_K_M-naive           | 4.8435 |    8.012346 | 98.96% | 0.038206 |     91.75% |               |              |                 |
| Falcon-H1-7B-Instruct-Q4_K_M-test            | 4.8435 |    7.992109 | 99.03% | 0.035016 |     92.04% | Target BPW    |    99.03%    |     98.91%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q3_K_M-naive         | 3.9229 |   16.622489 | 94.71% | 0.323379 |     78.25% |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q3_K_M-test          | 3.9229 |   17.527258 | 95.41% | 0.290505 |     77.66% | Target BPW    |    95.41%    |     95.01%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-Q3_K_M-naive           | 3.8833 |    8.593841 | 97.34% | 0.125806 |     86.02% |               |              |                 |
| Falcon-H1-7B-Instruct-Q3_K_M-test            | 3.8833 |    8.559043 | 97.01% | 0.152381 |     83.06% | Naive         |    97.01%    |     95.89%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-IQ2_M-naive          | 2.9630 |   40.116694 | 78.11% | 1.716486 |     53.43% |               |              |                 |
| Falcon-H1-1.5B-Instruct-IQ2_M-test           | 2.9630 |   29.781395 | 85.51% | 1.108352 |     63.46% | Target BPW    |    85.51%    |     85.47%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-IQ2_M-naive            | 2.8225 |   10.098122 | 88.34% | 0.596415 |     70.78% |               |              |                 |
| Falcon-H1-7B-Instruct-IQ2_M-test             | 2.8225 |   10.309871 | 91.69% | 0.450354 |     74.88% | Target BPW    |    91.71%    |     88.74%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-IQ1_M-naive          | 2.2094 |  121.364852 | 64.81% | 3.142805 |     38.85% |               |              |                 |
| Falcon-H1-1.5B-Instruct-IQ1_M-test           | 2.2094 |  102.480281 | 69.13% | 2.833264 |     40.20% | Target BPW    |    69.13%    |     64.46%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-IQ1_M-naive            | 2.0412 |   18.060904 | 76.92% | 1.341149 |     59.08% |               |              |                 |
| Falcon-H1-7B-Instruct-IQ1_M-test             | 2.0412 |   18.906200 | 78.57% | 1.341664 |     57.51% | No Importance |    78.57%    |     78.46%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-Q6_K_M-naive                   | 6.5649 |   15.601843 | 98.67% | 0.010012 |     96.44% |               |              |                 |
| gemma-3-4b-it-Q6_K_M-test                    | 6.5649 |   15.391448 | 98.66% | 0.011424 |     96.19% | Naive         |    98.66%    |     98.65%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-Q6_K_M-naive                  | 6.5642 |    9.113018 | 99.57% | 0.005106 |     97.07% |               |              |                 |
| gemma-3-12b-it-Q6_K_M-test                   | 6.5637 |    9.098008 | 99.50% | 0.007870 |     96.42% | Naive         |    99.50%    |     99.48%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-Q5_K_M-naive                   | 5.8205 |   15.621460 | 98.40% | 0.023596 |     94.65% |               |              |                 |
| gemma-3-4b-it-Q5_K_M-test                    | 5.8204 |   15.583861 | 98.41% | 0.024760 |     94.40% | Target BPW    |    98.41%    |     97.91%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-Q5_K_M-naive                  | 5.7375 |    9.149511 | 99.36% | 0.013113 |     95.54% |               |              |                 |
| gemma-3-12b-it-Q5_K_M-test                   | 5.7370 |    9.164833 | 99.34% | 0.015653 |     94.93% | Naive         |    99.34%    |     98.49%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-Q4_K_M-naive                   | 5.1200 |   15.272407 | 97.35% | 0.076753 |     90.47% |               |              |                 |
| gemma-3-4b-it-Q4_K_M-test                    | 5.1195 |   15.625719 | 97.42% | 0.070577 |     90.84% | Target BPW    |    97.42%    |     97.23%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-Q4_K_M-naive                  | 4.9595 |    9.177975 | 98.66% | 0.045939 |     91.89% |               |              |                 |
| gemma-3-12b-it-Q4_K_M-test                   | 4.9589 |    9.282984 | 98.57% | 0.048337 |     91.53% | Naive         |    98.57%    |     97.82%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-Q3_K_M-naive                   | 4.3129 |   16.001779 | 94.27% | 0.230011 |     83.58% |               |              |                 |
| gemma-3-4b-it-Q3_K_M-test                    | 4.3129 |   15.899249 | 96.22% | 0.141760 |     86.81% | Target BPW    |    96.22%    |     95.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-Q3_K_M-naive                  | 4.0811 |    9.689398 | 96.59% | 0.141529 |     86.16% |               |              |                 |
| gemma-3-12b-it-Q3_K_M-test                   | 4.0810 |    9.610142 | 96.51% | 0.149120 |     84.57% | Naive         |    96.51%    |     96.27%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-IQ2_M-naive                    | 3.1574 |   17.583538 | 82.90% | 0.892136 |     65.37% |               |              |                 |
| gemma-3-4b-it-IQ2_M-test                     | 3.1573 |   16.608971 | 89.10% | 0.517544 |     74.60% | Target BPW    |    89.10%    |     85.64%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-IQ2_M-naive                   | 2.9263 |   10.513148 | 86.85% | 0.604512 |     70.09% |               |              |                 |
| gemma-3-12b-it-IQ2_M-test                    | 2.9262 |   10.614872 | 90.35% | 0.430504 |     74.96% | Target BPW    |    90.35%    |     87.04%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-IQ1_M-naive                    | 2.4597 |   35.890562 | 69.51% | 1.966169 |     47.67% |               |              |                 |
| gemma-3-4b-it-IQ1_M-test                     | 2.4597 |   19.682657 | 79.94% | 1.078927 |     60.69% | Target BPW    |    79.94%    |     77.44%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-IQ1_M-naive                   | 2.1473 |   20.829944 | 72.56% | 1.495531 |     52.87% |               |              |                 |
| gemma-3-12b-it-IQ1_M-test                    | 2.1472 |   15.581627 | 78.73% | 1.116024 |     56.99% | Target BPW    |    78.73%    |     76.52%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-Q6_K_M-naive              | 6.5800 |    8.399185 | 99.83% | 0.007239 |     95.75% |               |              |                 |
| granite-4.0-h-tiny-Q6_K_M-test               | 6.5799 |    8.412915 | 99.73% | 0.011246 |     94.83% | Naive         |    99.73%    |     99.80%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-Q6_K_M-naive               | 6.5641 |   10.320644 | 99.77% | 0.008214 |     95.83% |               |              |                 |
| granite-4.0-micro-Q6_K_M-test                | 6.5637 |   10.418077 | 99.77% | 0.008245 |     95.80% | No Importance |    99.77%    |     99.78%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-Q5_K_M-naive              | 5.7010 |    8.414691 | 99.70% | 0.013480 |     94.32% |               |              |                 |
| granite-4.0-h-tiny-Q5_K_M-test               | 5.7010 |    8.533514 | 99.27% | 0.034641 |     90.73% | Naive         |    99.27%    |     99.50%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-Q5_K_M-naive               | 5.7210 |   10.483792 | 99.55% | 0.019151 |     93.74% |               |              |                 |
| granite-4.0-micro-Q5_K_M-test                | 5.7206 |   10.436584 | 99.49% | 0.022465 |     92.98% | Naive         |    99.49%    |     99.34%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-Q4_K_M-naive              | 4.8737 |    8.522251 | 99.25% | 0.035130 |     91.29% |               |              |                 |
| granite-4.0-h-tiny-Q4_K_M-test               | 4.8737 |    8.837110 | 98.11% | 0.092883 |     84.96% | Naive         |    98.11%    |     99.23%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-Q4_K_M-naive               | 4.9275 |   10.511546 | 98.62% | 0.064691 |     88.82% |               |              |                 |
| granite-4.0-micro-Q4_K_M-test                | 4.9274 |   10.625824 | 98.84% | 0.056839 |     89.13% | Target BPW    |    98.84%    |     98.69%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-Q3_K_M-naive              | 3.8616 |    8.914942 | 97.37% | 0.123502 |     83.99% |               |              |                 |
| granite-4.0-h-tiny-Q3_K_M-test               | 3.8616 |   10.884609 | 93.10% | 0.362505 |     74.29% | Naive         |    93.10%    |     95.47%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-Q3_K_M-naive               | 4.0484 |   11.202790 | 95.80% | 0.211357 |     80.67% |               |              |                 |
| granite-4.0-micro-Q3_K_M-test                | 4.0484 |   11.165460 | 96.39% | 0.188088 |     80.70% | Target BPW    |    96.39%    |     96.08%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-IQ2_M-naive               | 2.6695 |   15.734925 | 85.55% | 0.821723 |     57.89% |               |              |                 |
| granite-4.0-h-tiny-IQ2_M-test                | 2.6695 |   39.990624 | 73.52% | 1.839443 |     43.53% | Naive         |    73.52%    |     81.22%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-IQ2_M-naive                | 2.9103 |   40.430709 | 73.35% | 1.759012 |     47.31% |               |              |                 |
| granite-4.0-micro-IQ2_M-test                 | 2.9103 |   16.745333 | 86.44% | 0.763426 |     63.47% | Target BPW    |    86.44%    |     84.23%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-IQ1_M-naive               | 1.8752 |   39.256179 | 70.36% | 1.853246 |     44.35% |               |              |                 |
| granite-4.0-h-tiny-IQ1_M-test                | 1.8752 |  720.525238 | 47.64% | 4.859275 |     14.60% | Naive         |    47.64%    |     58.41%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-IQ1_M-naive                | 2.1284 |  125.146696 | 60.35% | 3.002957 |     34.76% |               |              |                 |
| granite-4.0-micro-IQ1_M-test                 | 2.9103 |   84.336421 | 63.71% | 2.602684 |     36.85% | Target BPW    |    63.71%    |     68.64%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q6_K_M-naive           | 6.5655 |   18.164981 | 99.80% | 0.007364 |     95.31% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q6_K_M-test            | 6.5647 |   18.155518 | 99.81% | 0.007040 |     95.41% | Target BPW    |    99.81%    |     99.76%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q6_K_M-naive | 6.5642 |   14.592521 | 99.66% | 0.008116 |     95.97% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q6_K_M-test  | 6.5641 |   14.630426 | 99.66% | 0.007304 |     96.23% | Same          |    99.66%    |     99.64%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q5_K_M-naive           | 5.7541 |   18.258318 | 99.66% | 0.017431 |     92.87% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q5_K_M-test            | 5.7540 |   18.269360 | 99.65% | 0.017580 |     92.86% | No Importance |    99.65%    |     99.68%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q5_K_M-naive | 5.7152 |   14.621338 | 99.52% | 0.015283 |     94.59% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q5_K_M-test  | 5.7149 |   14.610505 | 99.50% | 0.015519 |     94.60% | No Importance |    99.50%    |     99.53%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q4_K_M-naive           | 4.9904 |   18.859517 | 99.03% | 0.058574 |     87.37% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q4_K_M-test            | 4.9903 |   18.712691 | 99.31% | 0.039891 |     89.55% | Target BPW    |    99.31%    |     99.16%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q4_K_M-naive | 4.9162 |   15.009760 | 98.95% | 0.045500 |     90.86% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q4_K_M-test  | 4.9162 |   14.633656 | 99.16% | 0.033903 |     92.00% | Target BPW    |    99.16%    |     99.13%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q3_K_M-naive           | 4.1221 |   21.888925 | 96.44% | 0.230853 |     76.63% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q3_K_M-test            | 4.1219 |   20.308097 | 97.64% | 0.146930 |     81.02% | Target BPW    |    97.64%    |     97.11%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q3_K_M-naive | 3.9606 |   15.558934 | 96.90% | 0.154157 |     83.35% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q3_K_M-test  | 3.9606 |   15.194790 | 97.75% | 0.107148 |     86.21% | Target BPW    |    97.75%    |     96.96%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-IQ2_M-naive            | 3.1089 |   90.621951 | 76.71% | 1.718901 |     42.10% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-IQ2_M-test             | 3.1088 |   36.247619 | 88.29% | 0.788680 |     59.18% | Target BPW    |    88.29%    |     84.06%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-IQ2_M-naive  | 2.8595 |   26.902770 | 83.16% | 1.039928 |     59.91% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-IQ2_M-test   | 2.8595 |   21.852258 | 88.71% | 0.636635 |     67.94% | Target BPW    |    88.71%    |     87.54%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-IQ1_M-naive            | 2.3694 | 2755.174125 | 46.93% | 5.250251 |     12.92% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-IQ1_M-test             | 2.3694 |  226.909163 | 67.09% | 2.700929 |     31.01% | Target BPW    |    67.09%    |     46.33%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-IQ1_M-naive  | 2.0921 |   96.325903 | 66.88% | 2.431305 |     40.58% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-IQ1_M-test   | 2.0920 |   55.148303 | 73.43% | 1.839316 |     47.37% | Target BPW    |    73.43%    |     63.95%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-Q6_K_M-naive                    | 6.5633 |    6.155437 | 99.94% | 0.003014 |     97.38% |               |              |                 |
| Llama-3.1-8B-Q6_K_M-test                     | 6.5632 |    6.151212 | 99.95% | 0.002538 |     97.67% | Target BPW    |    99.95%    |     99.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-Q6_K_M-naive                    | 6.5639 |    9.685527 | 99.91% | 0.004948 |     96.17% |               |              |                 |
| Llama-3.2-1B-Q6_K_M-test                     | 6.5638 |    9.684942 | 99.93% | 0.003642 |     96.62% | No Importance |    99.93%    |     99.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-Q5_K_M-naive                    | 5.7036 |    6.181832 | 99.85% | 0.007059 |     96.15% |               |              |                 |
| Llama-3.1-8B-Q5_K_M-test                     | 5.7035 |    6.176639 | 99.86% | 0.006445 |     96.27% | Target BPW    |    99.86%    |     99.82%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-Q5_K_M-naive                    | 5.8499 |    9.753430 | 99.80% | 0.011244 |     94.43% |               |              |                 |
| Llama-3.2-1B-Q5_K_M-test                     | 5.8491 |    9.726408 | 99.85% | 0.008460 |     94.91% | Target BPW    |    99.85%    |     99.73%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-Q4_K_M-naive                    | 4.8944 |    6.286192 | 99.47% | 0.023817 |     93.15% |               |              |                 |
| Llama-3.1-8B-Q4_K_M-test                     | 4.8943 |    6.247224 | 99.61% | 0.018801 |     93.52% | Target BPW    |    99.61%    |     99.51%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-Q4_K_M-naive                    | 5.1779 |   10.023605 | 99.34% | 0.037436 |     90.23% |               |              |                 |
| Llama-3.2-1B-Q4_K_M-test                     | 5.1773 |    9.849751 | 99.65% | 0.020680 |     92.31% | Target BPW    |    99.65%    |     99.54%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-Q3_K_M-naive                    | 3.9960 |    6.603232 | 98.19% | 0.075276 |     87.98% |               |              |                 |
| Llama-3.1-8B-Q3_K_M-test                     | 3.9960 |    6.562788 | 98.37% | 0.067765 |     88.55% | Target BPW    |    98.37%    |     97.30%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-Q3_K_M-naive                    | 4.4215 |   10.966295 | 97.72% | 0.125486 |     82.70% |               |              |                 |
| Llama-3.2-1B-Q3_K_M-test                     | 4.4213 |   10.123164 | 99.16% | 0.048729 |     88.24% | Target BPW    |    99.16%    |     99.11%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-IQ2_M-naive                     | 2.9294 |   11.936119 | 85.93% | 0.657555 |     65.94% |               |              |                 |
| Llama-3.1-8B-IQ2_M-test                      | 2.9293 |    8.667808 | 91.72% | 0.343883 |     74.80% | Target BPW    |    91.72%    |     89.32%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-IQ2_M-naive                     | 3.2860 |   44.742869 | 75.79% | 1.505601 |     46.44% |               |              |                 |
| Llama-3.2-1B-IQ2_M-test                      | 3.2859 |   14.471273 | 92.78% | 0.406797 |     69.43% | No Importance |    92.78%    |     92.80%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-IQ1_M-naive                     | 2.1460 |   29.102629 | 70.49% | 1.540128 |     49.43% |               |              |                 |
| Llama-3.1-8B-IQ1_M-test                      | 2.1460 |   21.881896 | 75.09% | 1.253807 |     53.33% | Target BPW    |    75.09%    |     69.87%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-IQ1_M-naive                     | 2.6268 |  363.807707 | 51.07% | 3.591351 |     21.43% |               |              |                 |
| Llama-3.2-1B-IQ1_M-test                      | 2.6268 |   36.739223 | 78.22% | 1.317966 |     48.31% | No Importance |    78.22%    |     74.97%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-Q6_K_M-naive                   | 6.6837 |   10.826776 | 99.90% | 0.005337 |     95.23% |               |              |                 |
| mamba-1.4b-hf-Q6_K_M-test                    | 6.6837 |   10.825823 | 99.90% | 0.005339 |     95.22% | Same          |    99.90%    |     99.90%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-Q6_K_M-naive                   | 6.6700 |    9.472057 | 99.89% | 0.005525 |     95.34% |               |              |                 |
| mamba-2.8b-hf-Q6_K_M-test                    | 6.6697 |    9.473708 | 99.89% | 0.005594 |     95.36% | Same          |    99.89%    |     99.89%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-Q5_K_M-naive                   | 5.6782 |    9.492145 | 99.85% | 0.007647 |     94.72% |               |              |                 |
| mamba-1.4b-hf-Q5_K_M-test                    | 5.6781 |   10.962664 | 99.69% | 0.020115 |     90.93% | Naive         |    99.69%    |     99.69%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-Q5_K_M-naive                   | 5.6326 |    9.472057 | 99.89% | 0.005525 |     95.34% |               |              |                 |
| mamba-2.8b-hf-Q5_K_M-test                    | 5.6326 |    9.682213 | 99.55% | 0.028049 |     89.50% | Naive         |    99.55%    |     99.55%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-Q4_K_M-naive                   | 4.7657 |   10.942318 | 99.70% | 0.017359 |     92.31% |               |              |                 |
| mamba-1.4b-hf-Q4_K_M-test                    | 4.7657 |   11.115953 | 99.49% | 0.031787 |     89.35% | Naive         |    99.49%    |     99.48%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-Q4_K_M-naive                   | 4.6914 |    9.574036 | 99.71% | 0.015643 |     93.09% |               |              |                 |
| mamba-2.8b-hf-Q4_K_M-test                    | 4.6914 |    9.789406 | 99.37% | 0.038112 |     88.37% | Naive         |    99.37%    |     99.37%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-Q3_K_M-naive                   | 3.7876 |   11.600416 | 98.74% | 0.074951 |     85.31% |               |              |                 |
| mamba-1.4b-hf-Q3_K_M-test                    | 3.7876 |   12.465589 | 97.84% | 0.135911 |     79.85% | Naive         |    97.84%    |     97.84%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-Q3_K_M-naive                   | 3.6824 |    9.974359 | 98.87% | 0.063989 |     86.72% |               |              |                 |
| mamba-2.8b-hf-Q3_K_M-test                    | 3.6823 |   15.370013 | 93.00% | 0.473393 |     64.99% | Naive         |    93.00%    |     93.00%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-IQ2_M-naive                    | 2.9176 |   26.825640 | 87.12% | 0.903547 |     60.26% |               |              |                 |
| mamba-1.4b-hf-IQ2_M-test                     | 2.9175 |   22.366520 | 88.58% | 0.761930 |     53.78% | Target BPW    |    88.58%    |     88.58%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-IQ2_M-naive                    | 2.8177 |   24.361332 | 84.52% | 0.971537 |     60.65% |               |              |                 |
| mamba-2.8b-hf-IQ2_M-test                     | 2.8177 |   24.518860 | 86.15% | 0.954394 |     51.53% | Target BPW    |    86.15%    |     86.15%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-IQ1_M-naive                    | 2.1837 |   37.377075 | 81.52% | 1.247609 |     54.68% |               |              |                 |
| mamba-1.4b-hf-IQ1_M-test                     | 2.1837 |  139.471320 | 68.68% | 2.561901 |     25.74% | Target BPW    |    68.68%    |     68.68%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-IQ1_M-naive                    | 2.0606 |   29.246815 | 83.15% | 1.161747 |     57.76% |               |              |                 |
| mamba-2.8b-hf-IQ1_M-test                     | 2.0606 | 1500.981697 | 55.03% | 5.072153 |     52.72% | Target BPW    |    55.03%    |     55.03%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q6_K_M-naive      | 8.2158 |    7.807138 | 99.86% | 0.006957 |     95.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q6_K_M-test       | 8.2157 |    7.774620 | 99.92% | 0.003909 |     96.66% | No Importance |    99.92%    |     99.97%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q6_K_M-naive     | 6.5673 |    6.514285 | 99.71% | 0.012372 |     94.92% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q6_K_M-test      | 6.5665 |    6.519770 | 99.75% | 0.010997 |     95.15% | No Importance |    99.75%    |     99.95%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q5_K_M-naive      | 6.3562 |    7.807138 | 99.86% | 0.006957 |     95.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q5_K_M-test       | 6.3561 |    7.774620 | 99.92% | 0.003909 |     96.66% | No Importance |    99.92%    |     99.93%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q5_K_M-naive     | 5.6906 |    6.514285 | 99.71% | 0.012372 |     94.92% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q5_K_M-test      | 5.6903 |    6.519770 | 99.75% | 0.010997 |     95.15% | No Importance |    99.75%    |     99.87%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M-naive      | 5.8664 |    7.807138 | 99.86% | 0.006957 |     95.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M-test       | 5.8663 |    7.774620 | 99.92% | 0.003909 |     96.66% | Target BPW    |    99.92%    |     99.91%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q4_K_M-naive     | 4.8654 |    6.514285 | 99.71% | 0.012372 |     94.92% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q4_K_M-test      | 4.8650 |    6.519770 | 99.75% | 0.010997 |     95.15% | Target BPW    |    99.75%    |     99.75%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q3_K_M-naive      | 4.8350 |    7.901055 | 99.55% | 0.022542 |     92.67% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q3_K_M-test       | 4.8350 |    7.846752 | 99.74% | 0.013281 |     94.17% | No Importance |    99.74%    |     99.75%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q3_K_M-naive     | 3.9094 |    6.690889 | 98.86% | 0.051482 |     89.61% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q3_K_M-test      | 3.9094 |    6.698637 | 98.89% | 0.049145 |     90.15% | Target BPW    |    98.89%    |     98.74%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-IQ2_M-naive       | 4.4901 |    8.332496 | 98.49% | 0.078295 |     87.27% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-IQ2_M-test        | 4.4901 |    7.892440 | 99.61% | 0.020684 |     92.53% | Target BPW    |    99.61%    |     99.56%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-IQ2_M-naive      | 2.8415 |    8.394154 | 93.06% | 0.340536 |     75.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-IQ2_M-test       | 2.8411 |    8.326604 | 93.34% | 0.299314 |     77.24% | Target BPW    |    93.34%    |     93.31%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-IQ1_M-naive       | 4.3627 |    8.332496 | 98.49% | 0.078295 |     87.27% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-IQ1_M-test        | 4.3627 |    7.892440 | 99.61% | 0.020684 |     92.53% | Target BPW    |    99.61%    |     99.17%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-IQ1_M-naive      | 2.0663 |    8.394154 | 93.06% | 0.340536 |     75.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-IQ1_M-test       | 2.0662 |    8.326604 | 93.34% | 0.299314 |     77.24% | Target BPW    |    93.34%    |     85.11%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-Q6_K_M-naive            | 6.5638 |   79.657778 | 95.51% | 0.286043 |     79.93% |               |              |                 |
| Phi-4-mini-reasoning-Q6_K_M-test             | 6.5620 |   92.616509 | 96.37% | 0.212204 |     82.49% | No Importance |    96.37%    |     96.77%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-Q6_K_M-naive                 | 6.5632 |    6.972020 | 99.98% | 0.001062 |     98.41% |               |              |                 |
| Phi-4-reasoning-Q6_K_M-test                  | 6.5605 |    6.980271 | 99.96% | 0.001616 |     98.11% | No Importance |    99.96%    |     99.98%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-Q5_K_M-naive            | 5.9225 |   82.656950 | 94.19% | 0.427141 |     76.12% |               |              |                 |
| Phi-4-mini-reasoning-Q5_K_M-test             | 5.9187 |   96.715709 | 94.47% | 0.421151 |     75.72% | No Importance |    94.47%    |     94.93%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-Q5_K_M-naive                 | 5.7850 |    6.984280 | 99.94% | 0.002555 |     97.62% |               |              |                 |
| Phi-4-reasoning-Q5_K_M-test                  | 5.7842 |    6.986200 | 99.94% | 0.002799 |     97.47% | Same          |    99.94%    |     99.93%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-Q4_K_M-naive            | 5.1796 |   79.913113 | 90.13% | 0.858518 |     67.55% |               |              |                 |
| Phi-4-mini-reasoning-Q4_K_M-test             | 5.1789 |  102.653408 | 90.63% | 0.864110 |     67.55% | No Importance |    90.63%    |     90.70%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-Q4_K_M-naive                 | 4.9385 |    7.033351 | 99.80% | 0.009278 |     95.53% |               |              |                 |
| Phi-4-reasoning-Q4_K_M-test                  | 4.9374 |    7.018981 | 99.80% | 0.009399 |     95.49% | Target BPW    |    99.80%    |     99.79%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-Q3_K_M-naive            | 4.3982 |   94.813679 | 83.24% | 1.712537 |     56.76% |               |              |                 |
| Phi-4-mini-reasoning-Q3_K_M-test             | 4.3980 |   99.959381 | 85.41% | 1.516922 |     57.16% | No Importance |    85.41%    |     85.75%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-Q3_K_M-naive                 | 4.0162 |    7.177370 | 99.34% | 0.030253 |     92.09% |               |              |                 |
| Phi-4-reasoning-Q3_K_M-test                  | 4.0158 |    7.224695 | 99.20% | 0.036670 |     91.41% | Naive         |    99.20%    |     99.30%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-IQ2_M-naive             | 3.1265 |  140.211058 | 53.68% | 3.785088 |     28.97% |               |              |                 |
| Phi-4-mini-reasoning-IQ2_M-test              | 3.1263 |         nan |    nan |      nan |     28.27% | No Importance |     nan      |     62.04%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-IQ2_M-naive                  | 2.7866 |   10.118470 | 91.61% | 0.381891 |     73.54% |               |              |                 |
| Phi-4-reasoning-IQ2_M-test                   | 2.7862 |    9.684945 | 92.68% | 0.328806 |     75.56% | No Importance |    92.68%    |     94.06%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-IQ1_M-naive             | 2.4000 |         nan |    nan |      nan |      2.89% |               |              |                 |
| Phi-4-mini-reasoning-IQ1_M-test              | 2.3992 |         nan |    nan |      nan |      0.78% | N/A           |     nan      |     25.73%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-IQ1_M-naive                  | 1.9627 |   21.964494 | 76.60% | 1.167902 |     56.81% |               |              |                 |
| Phi-4-reasoning-IQ1_M-test                   | 1.9627 |   24.371781 | 75.48% | 1.282693 |     54.26% | No Importance |    75.48%    |     78.57%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-Q6_K_M-naive                        | 6.5635 |    9.403178 | 99.64% | 0.003126 |     97.59% |               |              |                 |
| Qwen3-8B-Q6_K_M-test                         | 6.5630 |    9.417844 | 99.63% | 0.002852 |     97.80% | Target BPW    |    99.63%    |     99.63%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-Q6_K_M-naive                       | 6.5632 |    8.356044 | 99.83% | 0.002415 |     97.87% |               |              |                 |
| Qwen3-14B-Q6_K_M-test                        | 6.5631 |    8.353905 | 99.85% | 0.001700 |     98.18% | Target BPW    |    99.85%    |     99.85%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-Q5_K_M-naive                        | 5.7090 |    9.432894 | 99.55% | 0.007302 |     96.45% |               |              |                 |
| Qwen3-8B-Q5_K_M-test                         | 5.7085 |    9.456015 | 99.53% | 0.007925 |     96.47% | No Importance |    99.53%    |     99.54%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-Q5_K_M-naive                       | 5.6925 |    8.349641 | 99.78% | 0.004632 |     97.10% |               |              |                 |
| Qwen3-14B-Q5_K_M-test                        | 5.6925 |    8.374862 | 99.78% | 0.004706 |     96.97% | Same          |    99.78%    |     99.78%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-Q4_K_M-naive                        | 4.9049 |    9.484175 | 99.20% | 0.022981 |     93.80% |               |              |                 |
| Qwen3-8B-Q4_K_M-test                         | 4.9048 |    9.499435 | 99.22% | 0.021199 |     94.16% | No Importance |    99.22%    |     99.23%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-Q4_K_M-naive                       | 4.8730 |    8.438122 | 99.48% | 0.016470 |     94.74% |               |              |                 |
| Qwen3-14B-Q4_K_M-test                        | 4.8730 |    8.391917 | 99.56% | 0.014632 |     95.00% | No Importance |    99.56%    |     99.55%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-Q3_K_M-naive                        | 4.0223 |    9.806293 | 97.74% | 0.085251 |     88.31% |               |              |                 |
| Qwen3-8B-Q3_K_M-test                         | 4.0222 |    9.727678 | 98.33% | 0.062507 |     89.96% | Target BPW    |    98.33%    |     98.01%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-Q3_K_M-naive                       | 3.9627 |    8.608721 | 98.48% | 0.058355 |     90.33% |               |              |                 |
| Qwen3-14B-Q3_K_M-test                        | 3.9626 |    8.641670 | 98.70% | 0.049203 |     91.17% | Target BPW    |    98.70%    |     98.50%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-IQ2_M-naive                         | 2.9750 |   12.149941 | 88.46% | 0.546543 |     71.90% |               |              |                 |
| Qwen3-8B-IQ2_M-test                          | 2.9749 |   10.807367 | 93.07% | 0.292973 |     78.84% | Target BPW    |    93.07%    |     91.57%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-IQ2_M-naive                        | 2.8802 |   10.032410 | 91.33% | 0.385816 |     75.81% |               |              |                 |
| Qwen3-14B-IQ2_M-test                         | 2.8801 |    9.611706 | 94.37% | 0.239130 |     80.69% | Target BPW    |    94.37%    |     93.11%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-IQ1_M-naive                         | 2.1978 |   24.298130 | 75.73% | 1.336847 |     56.02% |               |              |                 |
| Qwen3-8B-IQ1_M-test                          | 2.1977 |   15.898865 | 83.31% | 0.802131 |     65.28% | Target BPW    |    83.31%    |     76.51%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-IQ1_M-naive                        | 2.0821 |   15.201915 | 81.19% | 0.934159 |     62.44% |               |              |                 |
| Qwen3-14B-IQ1_M-test                         | 2.0820 |   12.808369 | 85.11% | 0.692015 |     67.15% | Target BPW    |    85.11%    |     82.91%      |

Tests:         132

Naive:          25 (19%)
Same:           10 ( 8%)
Target BPW:     69 (52%)
No Importance:  27 (20%)
N/A:             1 (<1%)

AI usage disclosure

AI was used to validate the mathematical approach and calculations, and to optimize and debug the code.

Special thanks to @AesSedai, @compilade and @ddh0 for their contributions during the development of this PR.

Thireus · 2026-04-14T20:10:31Z

Lately, I've been thinking if we can find a way to implement these into smaller sub-tools, rather than adding more things into llama-quantize

That's how this whole thing started 😆. The PR's idea came from a convo with @jukofyork and @compilade, after he released the new imatrix format. The initial thought was to implement as a stand alone tool, but along the way I realised it was going to duplicate a lot of the llama-quantize code so went down the route of adding in-place, but encapsulating as much as possible into a single function to avoid messing up the rest.

Having said that, a stand alone tool may be the way to go. I'm open to suggestions

I can't believe we've been working on the same thing for nearly a year. I saw your quant kld results today and was surprised they are well optimised compared to others, so decided to check your work.

For my part it's a tool suite that I've created independently to the llama code.

If you need to brainstorm please let me know, maybe there are some aspects I've already resolved and vice-versa. You can check the tool suite in action on https://gguf.thireus.com/quant_assign.html, I'm sure you'll notice a lot of similarities.

Cheers.

ivy-42 · 2026-05-02T21:01:28Z

Hi, I don't know the proper protocol for "intruding" on another's PR … 😉

I've been working on inference speed-aware quantization. This is especially useful for NPUs, drafting models or devices that have abundant memory but lack compute. It's based on this PR's knapsack solver infrastructure. Since this PR is already hard to review, I plan to submit a separate PR for it once it is a little more polished.
In case someone is interested: Fork

As I am somewhat familiar with the source code by now, I could give a little feedback, if helpful.

PS: Nice algorithmic work.

EAddario · 2026-05-03T09:20:37Z

Hi @ivy-42 and thank you! No protocol expected / needed. Please feel free to use the code in any way you deem fit. I'll definitely check your fork but in the meantime, any questions/suggestions are always welcome

ivy-42 · 2026-05-03T09:50:20Z

+    };
+
+    // Quality metrics
+    struct quant_error {


It's not clear to me by what the fields of this struct are scaled. I think it approximates a weighted sum over all elements of the tensor (approximate because not all are sampled), right? Maybe rename the fields or add a comment? E.g. weighted_error, wse and wce + a comment clarifying that those are scaled by the tensor element count

ivy-42 · 2026-05-03T09:53:18Z

+    constexpr double INFINITE = std::numeric_limits<double>::infinity();
+    constexpr uint64_t STATE_MAGIC = 0x4250572d5631; // "BPW-V1"
+    constexpr uint64_t HASH_MAGIC = 0xeabada55cafed00d;
+    constexpr float penalty = 2.0f;


penalty can mean a lot in the context of an optimization problem. Maybe boost_factor is more clear?

ivy-42 · 2026-05-03T10:30:14Z

In my fork, I added a CLI flag --maximize-budget-use (default: false) to allow users to opt into greedy tensor upgrades, rather than having them enabled by default. This is especially useful when there’s no precise target size, e.g., when quantizing models for redistribution. Personally, I prefer sticking with a Pareto-optimal quant mix and adjusting ctx-size, offloading, or KV-cache quantization.

If you think this fits the scope of this PR, feel free to include my commit.

EAddario added 30 commits August 19, 2025 11:00

Add fallback_type enum

c96b8ee

Add is_iq()

9adae08

Validate if imatrix contains activations

017945a

Add target_bpw_type() logic

92f49ab

Implement bpw_overrides call

1187f6a

Refactor variable names

5aceb9e

Update comments

ee05d6b

Avoid division by zero if truncation occurs

f22b309

Increase precision for error calculation

936294f

Merge branch 'master' into quantize

b33abae

Add F16/BF16 type

5cd69a6

Add F16/BF16 type

69586e2

Do not mix K and IQ quants

29b2dc3

Add better fallbacks for IQ mixes

43caadf

Skip if output.weight or type is COPY

52da4a4

Fix bias lambda bug

3f0118d

Optimise tensor sampling

b0b33b7

Improve error estimation using weighted MSE

35ad0fc

Exclude embeddings and output tensor

5ef493e

Change error estimate to use normalised weighted MSE

95b2ab2

Parallelise candidate evaluation

e01dad8

Dequantise sampled rows only

887490c

Precompute error denominator in estimate_erro()

9e11f82

General code refactor

5b6f1e9

Merge branch 'master' into quantize

e6eefa6

Include embeddings and output tensors

ec0afbe

Fix byte count for 3d or higher tensors

35c1504

Update comments

bb0d912

Parameterise type

2f13fee

Reduce sampling window to speedup process

47cdbe2

EAddario added 16 commits March 28, 2026 08:11

Merge branch 'master' into quantize

5a7cb0f

TurboQuant bias factor for inner-product sensitive tensors

2351248

TurboQuant outlier mitigation

7616760

Merge branch 'master' into quantize

54e53b5

Refactor TurboQuant improvements

8b8e20d

Fix remap tensor bug

df7232c

Remove --importance-pct option

a9ddf15

Refactor llama_model_quantize_params to expose a pure C interface

004465c

Consolidate load/save process and remove --save-state flag

7c8fc96

Minor log output changes

56c87dc

Use the tensor statistics' schema

ffc07ce

Merge branch 'master' into quantize

67bef17

Merge branch 'master' into quantize

07ef9ea

Update README.md

12a9017

Add quant types

8432c20

Merge branch 'master' into quantize

07d29d8

EAddario added 8 commits April 16, 2026 20:47

Update message

5fe1923

Merge branch 'master' into quantize

85f9fcf

Merge branch 'master' into quantize

6fdfc9b

Release finished tensors mmap allocation

e6cd408

Reuse sample buffers

e80e587

Fix C2327 error when target compilation is a MS platform

4bd29b9

Merge branch 'master' into quantize

c08662e

Merge branch 'master' into quantize

4c14709

ivy-42 reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantize: add option to automatically choose optimal quant types to reach a file/bpw target size at lowest error#15550

quantize: add option to automatically choose optimal quant types to reach a file/bpw target size at lowest error#15550
EAddario wants to merge 260 commits intoggml-org:masterfrom
EAddario:quantize

EAddario commented Aug 24, 2025 •

edited

Loading

Uh oh!

Thireus commented Apr 14, 2026

Uh oh!

ivy-42 commented May 2, 2026

Uh oh!

EAddario commented May 3, 2026

Uh oh!

ivy-42 May 3, 2026

Uh oh!

ivy-42 May 3, 2026

Uh oh!

ivy-42 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

EAddario commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

High level flow:

Advantages

Disadvantages

Design considerations

Test results

AI usage disclosure

Special thanks to @AesSedai, @compilade and @ddh0 for their contributions during the development of this PR.

Uh oh!

Thireus commented Apr 14, 2026

Uh oh!

ivy-42 commented May 2, 2026

Uh oh!

EAddario commented May 3, 2026

Uh oh!

ivy-42 May 3, 2026

Choose a reason for hiding this comment

Uh oh!

ivy-42 May 3, 2026

Choose a reason for hiding this comment

Uh oh!

ivy-42 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

EAddario commented Aug 24, 2025 •

edited

Loading