Skip to content

quantize: add option to automatically choose optimal quant types to reach a file/bpw target size at lowest error#15550

Open
EAddario wants to merge 260 commits intoggml-org:masterfrom
EAddario:quantize
Open

quantize: add option to automatically choose optimal quant types to reach a file/bpw target size at lowest error#15550
EAddario wants to merge 260 commits intoggml-org:masterfrom
EAddario:quantize

Conversation

@EAddario
Copy link
Copy Markdown
Contributor

@EAddario EAddario commented Aug 24, 2025

This PR adds target_bpw_type(), a function to determine an optimal per-tensor quantization mix to achieve a user-specified total file size (i.e., --target-size 1.5g) or a global bits-per-weight (bpw) target (i.e., --target-bpw 4.5678).

The function solves a constrained optimization problem to minimize quantization error, subject to a global size budget. It estimates per-tensor error for each layer, and dynamically allocates the bit budget where it matters most.

High level flow:

  1. Error estimation via Monte Carlo sampling: For every applicable tensor, the function computes the quantization error, dequantizing a subset of rows weighted by an importance matrix (imatrix).
  2. Pareto Frontier analysis: For each tensor, it identifies the Pareto frontier discarding types that increase size without sufficiently decreasing error.
  3. Lagrangian optimization: It uses Lagrangian relaxation to find an optimal distribution of bits across the entire model. Higher bit-rates are dynamically allocated to tensors where they provide the highest reduction in error.
  4. Resumable state: When --state-file is set, target computations are saved to a file. If the quantization is interrupted (e.g., Ctrl+C), it can resume error calculation from where it left off in the next run.

Advantages

  1. Target arbitrary size models

    • The algorithm will produce a model (nearly) exactly of the requested file/bpw size, which is very useful for maximizing VRAM usage. In a system with 24GB VRAM and a 70B model, standard quants might produce a 16.8GB file (too small, quality left on table) or a 24.1GB file (won't fit). --target-size 23.85g will generate a 23.85 GiB file to utilize the hardware fully.
  2. Data-driven mixed precision often can improve quality at fixed size

    • Instead of using hardcoded heuristics, that may be sub‑optimal for a given architecture or size, the mix is determined by the actual error sensitivity of the specific model's weights. This often yields a better quality/size trade-off, especially in aggressive quantization scenarios (1.5 to 3.5 bpw), or for unusual architectures.
  3. Allows better like-for-like comparisons between models and families

    • Standard quantization uses hardcoded rules like: "use Q4_K_M, except bump some tensors up/down, except fall back if incompatible, except keep some tensors unquantized..." and consquently, two different models quantized at the same Q4_K_M level can end up with very different bpw (e.g. 4.75 and 4.30).

    • Model performance generally scales with size; larger models typically outperform smaller ones. A model quantized with more bits will usually perform better (exhibiting lower perplexity and better evaluation scores) than a smaller version, even when the same underlying quantization method is used. This makes performance comparisons between models not a controlled experiment, as the models being compared have different effective compression ratios.

    • --target-bpw helps to normalize experiments by forcing models to be quantized to a roughly equal overall byte budget. This standardization allows performance variations between models to be more accurately linked to underlying factors such as architectural or training differences, the effect of quantization error at the same compression level, or the decisions made by the optimizer regarding allocation.

Disadvantages

  1. Quantization process is significantly slower than standard

    • This approach can take 5x-10x longer as it quantizes a sample of most tensors into 15 different formats, dequantizes them back, computes error diffs, and selects the best size/error option that fits the target file size or global bpw budget.

    • However, the --state-file option will save the above-mentioned computations to disk so that future quantizations can be generated at normal speed. It also allows to interrupt the computation process and resume it at a later time.

  2. The optimization target is only a proxy for the model's performance quality

    • The process minimizes a per-tensor estimated error computed from sampled rows, not actual perplexity or divergence of output distributions. Since errors interact nonlinearly across layers, there are no guarantees it will select the best possible mix subject to the file/bpw size constraint.
  3. An imatrix with activations data is required for best results

Design considerations

The target_bpw_type() function is implemented as a container for several lambdas providing all the logic for serialization, multithreading, math/stats, and optimization.

Although there are clear downsides to this approach (i.e. cognitive load, testability and maintainability), a self-contained "God function" seemed a better choice to prevent llama-quantize's global scope pollution: structs and helper lambdas are highly specific to this exact algorithm and have no reuse value elsewhere in the library.

Test results

Based on 132 tests with models from 11 different families, the target_bpw_type() optimization routine generated 96 (~70%) better quality models, and 10 (~8%) same as standard quantization. However, even though the method produced better quality often, it lost in surprising cases. Naive quants made up for the remaining 25 tests (20%) performing better, sometimes by a significant margin (e.g. ERNIE-4.5-21B-A3B-PT-IQ1_M, granite-4.0-h-tiny-IQ2_M, granite-4.0-h-tiny-IQ1_M)

Of the 96 cases where it performed better, about 1/3 achieved higher scores when using the --ignore-tensor-importance option, forcing the algorithm to treat each tensor equally instead of prioritising some (i.e. attn_output, ffn_down, etc.).

Target BPW test results

Using Cor(ln(PPL(Q)), ln(PPL(base))) as the discriminant metric

| Model                                        |    BPW |         PPL |   𝜌PPL |      KLD | Same Top P | Best          | --target-bpw | --no-importance |
| -------------------------------------------- | -----: | ----------: | -----: | -------: | ---------: | :------------ | :----------: | :-------------: |
| ARWKV-R1-1B5-Q6_K_M-naive                    | 6.6851 |   35.263025 | 99.95% | 0.003130 |     97.02% |               |              |                 |
| ARWKV-R1-1B5-Q6_K_M-test                     | 6.6851 |   35.244879 | 99.95% | 0.003225 |     96.93% | Same          |    99.95%    |     99.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-Q6_K_M-naive                     | 6.6283 |   20.353217 | 99.95% | 0.002558 |     97.67% |               |              |                 |
| ARWKV-R1-7B-Q6_K_M-test                      | 6.6275 |   20.355471 | 99.95% | 0.002662 |     97.65% | Same          |    99.95%    |     99.95%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-Q5_K_M-naive                    | 5.8687 |   35.189166 | 99.92% | 0.006015 |     95.79% |               |              |                 |
| ARWKV-R1-1B5-Q5_K_M-test                     | 5.8675 |   35.265399 | 99.92% | 0.005931 |     95.76% | Same          |    99.92%    |     99.88%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-Q5_K_M-naive                     | 5.7624 |   20.383109 | 99.92% | 0.004544 |     96.68% |               |              |                 |
| ARWKV-R1-7B-Q5_K_M-test                      | 5.7622 |   20.378203 | 99.92% | 0.005341 |     96.27% | Same          |    99.92%    |     99.88%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-Q4_K_M-naive                    | 5.1004 |   35.737707 | 99.78% | 0.017944 |     92.35% |               |              |                 |
| ARWKV-R1-1B5-Q4_K_M-test                     | 5.1004 |   35.450752 | 99.80% | 0.016954 |     92.38% | Target BPW    |    99.80%    |     99.73%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-Q4_K_M-naive                     | 4.9474 |   20.510452 | 99.83% | 0.012588 |     94.20% |               |              |                 |
| ARWKV-R1-7B-Q4_K_M-test                      | 4.9472 |   20.463235 | 99.83% | 0.012339 |     94.36% | Same          |    99.83%    |     99.80%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-Q3_K_M-naive                    | 4.1899 |   36.901085 | 98.98% | 0.087509 |     83.43% |               |              |                 |
| ARWKV-R1-1B5-Q3_K_M-test                     | 4.1896 |   37.147295 | 99.45% | 0.049345 |     87.24% | Target BPW    |    99.45%    |     99.13%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-Q3_K_M-naive                     | 3.9747 |   21.308946 | 99.29% | 0.054533 |     88.23% |               |              |                 |
| ARWKV-R1-7B-Q3_K_M-test                      | 3.9744 |   21.040018 | 99.43% | 0.045004 |     89.19% | Target BPW    |    99.43%    |     98.79%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-IQ2_M-naive                     | 3.2024 |   57.511363 | 93.55% | 0.579476 |     62.23% |               |              |                 |
| ARWKV-R1-1B5-IQ2_M-test                      | 3.2024 |   45.057798 | 97.43% | 0.235909 |     73.38% | Target BPW    |    97.43%    |     95.43%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-IQ2_M-naive                      | 2.9175 |   27.362006 | 95.23% | 0.378432 |     71.15% |               |              |                 |
| ARWKV-R1-7B-IQ2_M-test                       | 2.9174 |   24.951673 | 96.66% | 0.253425 |     76.46% | Target BPW    |    96.66%    |     95.33%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-1B5-IQ1_M-naive                     | 2.4958 |  191.855358 | 79.31% | 1.833383 |     38.33% |               |              |                 |
| ARWKV-R1-1B5-IQ1_M-test                      | 2.4956 |   73.897487 | 90.95% | 0.789102 |     56.30% | Target BPW    |    90.95%    |     88.79%      |
|                                              |        |             |        |          |            |               |              |                 |
| ARWKV-R1-7B-IQ1_M-naive                      | 2.1619 |   51.790959 | 85.96% | 1.123773 |     50.89% |               |              |                 |
| ARWKV-R1-7B-IQ1_M-test                       | 2.1615 |   35.385003 | 90.83% | 0.720311 |     61.79% | Target BPW    |    90.83%    |     89.40%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q6_K_M-naive               | 6.5652 |   14.657145 | 99.87% | 0.008099 |     94.49% |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q6_K_M-test                | 6.5646 |   14.672409 | 99.85% | 0.009163 |     94.25% | Naive         |    99.85%    |     99.85%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q6_K_M-naive            | 6.5678 |    6.252806 | 99.88% | 0.003923 |     97.31% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q6_K_M-test             | 6.5678 |    6.282425 | 99.87% | 0.004336 |     97.07% | No Importance |    99.87%    |     99.91%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q5_K_M-naive               | 5.9050 |   14.760451 | 99.70% | 0.018535 |     92.32% |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q5_K_M-test                | 5.9047 |   14.750293 | 99.74% | 0.016542 |     92.36% | Target BPW    |    99.74%    |     99.67%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q5_K_M-naive            | 5.6851 |    6.282396 | 99.80% | 0.007295 |     96.26% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q5_K_M-test             | 5.6851 |    6.329379 | 99.71% | 0.011546 |     94.99% | No Importance |    99.71%    |     99.80%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q4_K_M-naive               | 5.2837 |   15.360215 | 99.03% | 0.062306 |     86.24% |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q4_K_M-test                | 5.2837 |   15.127183 | 99.36% | 0.041702 |     88.63% | Target BPW    |    99.36%    |     99.03%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q4_K_M-naive            | 4.8543 |    6.356795 | 99.49% | 0.020904 |     93.73% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q4_K_M-test             | 4.8543 |    6.329602 | 99.50% | 0.021666 |     93.20% | No Importance |    99.50%    |     99.55%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q3_K_M-naive               | 4.6599 |   17.624006 | 96.28% | 0.233042 |     75.01% |               |              |                 |
| ERNIE-4.5-0.3B-PT-Q3_K_M-test                | 4.6598 |   15.572376 | 98.80% | 0.081764 |     83.33% | Target BPW    |    98.80%    |     97.90%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q3_K_M-naive            | 3.8363 |    6.653076 | 98.43% | 0.065814 |     88.93% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-Q3_K_M-test             | 3.8363 |    6.960149 | 96.89% | 0.140407 |     83.10% | Naive         |    96.89%    |     98.12%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-IQ2_M-naive                | 3.5601 |  126.510216 | 67.36% | 2.245763 |     31.50% |               |              |                 |
| ERNIE-4.5-0.3B-PT-IQ2_M-test                 | 3.5600 |   21.776330 | 92.99% | 0.468520 |     64.78% | Target BPW    |    92.99%    |     91.33%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-IQ2_M-naive             | 2.6266 |    9.325763 | 90.05% | 0.452442 |     71.35% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-IQ2_M-test              | 2.6266 |   11.443878 | 86.25% | 0.673509 |     64.15% | Naive         |    86.25%    |     88.21%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-0.3B-PT-IQ1_M-naive                | 2.9380 | 1993.785673 | 43.82% | 5.024711 |      9.48% |               |              |                 |
| ERNIE-4.5-0.3B-PT-IQ1_M-test                 | 2.9380 |   52.789501 | 80.71% | 1.401172 |     43.87% | Target BPW    |    80.71%    |     74.19%      |
|                                              |        |             |        |          |            |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-IQ1_M-naive             | 1.8207 |   14.684351 | 80.98% | 0.944582 |     59.54% |               |              |                 |
| ERNIE-4.5-21B-A3B-PT-IQ1_M-test              | 1.8207 |   34.634577 | 68.09% | 1.888292 |     43.24% | Naive         |    68.09%    |     73.29%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q6_K_M-naive         | 6.5724 |   16.855635 | 99.40% | 0.012620 |     94.87% |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q6_K_M-test          | 6.5723 |   17.139379 | 99.39% | 0.014104 |     94.89% | Naive         |    99.39%    |     99.18%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-Q6_K_M-naive           | 6.5665 |    7.951421 | 99.53% | 0.010064 |     95.13% |               |              |                 |
| Falcon-H1-7B-Instruct-Q6_K_M-test            | 6.5664 |    7.982449 | 99.57% | 0.006085 |     96.81% | Target BPW    |    99.57%    |     99.53%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q5_K_M-naive         | 5.6838 |   16.573882 | 99.12% | 0.030210 |     92.56% |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q5_K_M-test          | 5.6838 |   16.918074 | 98.93% | 0.042785 |     91.43% | No Importance |    98.93%    |     98.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-Q5_K_M-naive           | 5.6789 |    7.966140 | 99.42% | 0.015622 |     94.28% |               |              |                 |
| Falcon-H1-7B-Instruct-Q5_K_M-test            | 5.6789 |    8.013705 | 99.43% | 0.014055 |     94.86% | Target BPW    |    99.43%    |     99.35%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q4_K_M-naive         | 4.8474 |   17.630107 | 97.98% | 0.106563 |     87.08% |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q4_K_M-test          | 4.8473 |   17.151377 | 98.20% | 0.092033 |     87.74% | Target BPW    |    98.20%    |     98.18%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-Q4_K_M-naive           | 4.8435 |    8.012346 | 98.96% | 0.038206 |     91.75% |               |              |                 |
| Falcon-H1-7B-Instruct-Q4_K_M-test            | 4.8435 |    7.992109 | 99.03% | 0.035016 |     92.04% | Target BPW    |    99.03%    |     98.91%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q3_K_M-naive         | 3.9229 |   16.622489 | 94.71% | 0.323379 |     78.25% |               |              |                 |
| Falcon-H1-1.5B-Instruct-Q3_K_M-test          | 3.9229 |   17.527258 | 95.41% | 0.290505 |     77.66% | Target BPW    |    95.41%    |     95.01%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-Q3_K_M-naive           | 3.8833 |    8.593841 | 97.34% | 0.125806 |     86.02% |               |              |                 |
| Falcon-H1-7B-Instruct-Q3_K_M-test            | 3.8833 |    8.559043 | 97.01% | 0.152381 |     83.06% | Naive         |    97.01%    |     95.89%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-IQ2_M-naive          | 2.9630 |   40.116694 | 78.11% | 1.716486 |     53.43% |               |              |                 |
| Falcon-H1-1.5B-Instruct-IQ2_M-test           | 2.9630 |   29.781395 | 85.51% | 1.108352 |     63.46% | Target BPW    |    85.51%    |     85.47%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-IQ2_M-naive            | 2.8225 |   10.098122 | 88.34% | 0.596415 |     70.78% |               |              |                 |
| Falcon-H1-7B-Instruct-IQ2_M-test             | 2.8225 |   10.309871 | 91.69% | 0.450354 |     74.88% | Target BPW    |    91.71%    |     88.74%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-1.5B-Instruct-IQ1_M-naive          | 2.2094 |  121.364852 | 64.81% | 3.142805 |     38.85% |               |              |                 |
| Falcon-H1-1.5B-Instruct-IQ1_M-test           | 2.2094 |  102.480281 | 69.13% | 2.833264 |     40.20% | Target BPW    |    69.13%    |     64.46%      |
|                                              |        |             |        |          |            |               |              |                 |
| Falcon-H1-7B-Instruct-IQ1_M-naive            | 2.0412 |   18.060904 | 76.92% | 1.341149 |     59.08% |               |              |                 |
| Falcon-H1-7B-Instruct-IQ1_M-test             | 2.0412 |   18.906200 | 78.57% | 1.341664 |     57.51% | No Importance |    78.57%    |     78.46%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-Q6_K_M-naive                   | 6.5649 |   15.601843 | 98.67% | 0.010012 |     96.44% |               |              |                 |
| gemma-3-4b-it-Q6_K_M-test                    | 6.5649 |   15.391448 | 98.66% | 0.011424 |     96.19% | Naive         |    98.66%    |     98.65%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-Q6_K_M-naive                  | 6.5642 |    9.113018 | 99.57% | 0.005106 |     97.07% |               |              |                 |
| gemma-3-12b-it-Q6_K_M-test                   | 6.5637 |    9.098008 | 99.50% | 0.007870 |     96.42% | Naive         |    99.50%    |     99.48%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-Q5_K_M-naive                   | 5.8205 |   15.621460 | 98.40% | 0.023596 |     94.65% |               |              |                 |
| gemma-3-4b-it-Q5_K_M-test                    | 5.8204 |   15.583861 | 98.41% | 0.024760 |     94.40% | Target BPW    |    98.41%    |     97.91%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-Q5_K_M-naive                  | 5.7375 |    9.149511 | 99.36% | 0.013113 |     95.54% |               |              |                 |
| gemma-3-12b-it-Q5_K_M-test                   | 5.7370 |    9.164833 | 99.34% | 0.015653 |     94.93% | Naive         |    99.34%    |     98.49%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-Q4_K_M-naive                   | 5.1200 |   15.272407 | 97.35% | 0.076753 |     90.47% |               |              |                 |
| gemma-3-4b-it-Q4_K_M-test                    | 5.1195 |   15.625719 | 97.42% | 0.070577 |     90.84% | Target BPW    |    97.42%    |     97.23%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-Q4_K_M-naive                  | 4.9595 |    9.177975 | 98.66% | 0.045939 |     91.89% |               |              |                 |
| gemma-3-12b-it-Q4_K_M-test                   | 4.9589 |    9.282984 | 98.57% | 0.048337 |     91.53% | Naive         |    98.57%    |     97.82%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-Q3_K_M-naive                   | 4.3129 |   16.001779 | 94.27% | 0.230011 |     83.58% |               |              |                 |
| gemma-3-4b-it-Q3_K_M-test                    | 4.3129 |   15.899249 | 96.22% | 0.141760 |     86.81% | Target BPW    |    96.22%    |     95.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-Q3_K_M-naive                  | 4.0811 |    9.689398 | 96.59% | 0.141529 |     86.16% |               |              |                 |
| gemma-3-12b-it-Q3_K_M-test                   | 4.0810 |    9.610142 | 96.51% | 0.149120 |     84.57% | Naive         |    96.51%    |     96.27%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-IQ2_M-naive                    | 3.1574 |   17.583538 | 82.90% | 0.892136 |     65.37% |               |              |                 |
| gemma-3-4b-it-IQ2_M-test                     | 3.1573 |   16.608971 | 89.10% | 0.517544 |     74.60% | Target BPW    |    89.10%    |     85.64%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-IQ2_M-naive                   | 2.9263 |   10.513148 | 86.85% | 0.604512 |     70.09% |               |              |                 |
| gemma-3-12b-it-IQ2_M-test                    | 2.9262 |   10.614872 | 90.35% | 0.430504 |     74.96% | Target BPW    |    90.35%    |     87.04%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-4b-it-IQ1_M-naive                    | 2.4597 |   35.890562 | 69.51% | 1.966169 |     47.67% |               |              |                 |
| gemma-3-4b-it-IQ1_M-test                     | 2.4597 |   19.682657 | 79.94% | 1.078927 |     60.69% | Target BPW    |    79.94%    |     77.44%      |
|                                              |        |             |        |          |            |               |              |                 |
| gemma-3-12b-it-IQ1_M-naive                   | 2.1473 |   20.829944 | 72.56% | 1.495531 |     52.87% |               |              |                 |
| gemma-3-12b-it-IQ1_M-test                    | 2.1472 |   15.581627 | 78.73% | 1.116024 |     56.99% | Target BPW    |    78.73%    |     76.52%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-Q6_K_M-naive              | 6.5800 |    8.399185 | 99.83% | 0.007239 |     95.75% |               |              |                 |
| granite-4.0-h-tiny-Q6_K_M-test               | 6.5799 |    8.412915 | 99.73% | 0.011246 |     94.83% | Naive         |    99.73%    |     99.80%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-Q6_K_M-naive               | 6.5641 |   10.320644 | 99.77% | 0.008214 |     95.83% |               |              |                 |
| granite-4.0-micro-Q6_K_M-test                | 6.5637 |   10.418077 | 99.77% | 0.008245 |     95.80% | No Importance |    99.77%    |     99.78%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-Q5_K_M-naive              | 5.7010 |    8.414691 | 99.70% | 0.013480 |     94.32% |               |              |                 |
| granite-4.0-h-tiny-Q5_K_M-test               | 5.7010 |    8.533514 | 99.27% | 0.034641 |     90.73% | Naive         |    99.27%    |     99.50%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-Q5_K_M-naive               | 5.7210 |   10.483792 | 99.55% | 0.019151 |     93.74% |               |              |                 |
| granite-4.0-micro-Q5_K_M-test                | 5.7206 |   10.436584 | 99.49% | 0.022465 |     92.98% | Naive         |    99.49%    |     99.34%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-Q4_K_M-naive              | 4.8737 |    8.522251 | 99.25% | 0.035130 |     91.29% |               |              |                 |
| granite-4.0-h-tiny-Q4_K_M-test               | 4.8737 |    8.837110 | 98.11% | 0.092883 |     84.96% | Naive         |    98.11%    |     99.23%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-Q4_K_M-naive               | 4.9275 |   10.511546 | 98.62% | 0.064691 |     88.82% |               |              |                 |
| granite-4.0-micro-Q4_K_M-test                | 4.9274 |   10.625824 | 98.84% | 0.056839 |     89.13% | Target BPW    |    98.84%    |     98.69%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-Q3_K_M-naive              | 3.8616 |    8.914942 | 97.37% | 0.123502 |     83.99% |               |              |                 |
| granite-4.0-h-tiny-Q3_K_M-test               | 3.8616 |   10.884609 | 93.10% | 0.362505 |     74.29% | Naive         |    93.10%    |     95.47%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-Q3_K_M-naive               | 4.0484 |   11.202790 | 95.80% | 0.211357 |     80.67% |               |              |                 |
| granite-4.0-micro-Q3_K_M-test                | 4.0484 |   11.165460 | 96.39% | 0.188088 |     80.70% | Target BPW    |    96.39%    |     96.08%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-IQ2_M-naive               | 2.6695 |   15.734925 | 85.55% | 0.821723 |     57.89% |               |              |                 |
| granite-4.0-h-tiny-IQ2_M-test                | 2.6695 |   39.990624 | 73.52% | 1.839443 |     43.53% | Naive         |    73.52%    |     81.22%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-IQ2_M-naive                | 2.9103 |   40.430709 | 73.35% | 1.759012 |     47.31% |               |              |                 |
| granite-4.0-micro-IQ2_M-test                 | 2.9103 |   16.745333 | 86.44% | 0.763426 |     63.47% | Target BPW    |    86.44%    |     84.23%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-h-tiny-IQ1_M-naive               | 1.8752 |   39.256179 | 70.36% | 1.853246 |     44.35% |               |              |                 |
| granite-4.0-h-tiny-IQ1_M-test                | 1.8752 |  720.525238 | 47.64% | 4.859275 |     14.60% | Naive         |    47.64%    |     58.41%      |
|                                              |        |             |        |          |            |               |              |                 |
| granite-4.0-micro-IQ1_M-naive                | 2.1284 |  125.146696 | 60.35% | 3.002957 |     34.76% |               |              |                 |
| granite-4.0-micro-IQ1_M-test                 | 2.9103 |   84.336421 | 63.71% | 2.602684 |     36.85% | Target BPW    |    63.71%    |     68.64%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q6_K_M-naive           | 6.5655 |   18.164981 | 99.80% | 0.007364 |     95.31% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q6_K_M-test            | 6.5647 |   18.155518 | 99.81% | 0.007040 |     95.41% | Target BPW    |    99.81%    |     99.76%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q6_K_M-naive | 6.5642 |   14.592521 | 99.66% | 0.008116 |     95.97% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q6_K_M-test  | 6.5641 |   14.630426 | 99.66% | 0.007304 |     96.23% | Same          |    99.66%    |     99.64%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q5_K_M-naive           | 5.7541 |   18.258318 | 99.66% | 0.017431 |     92.87% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q5_K_M-test            | 5.7540 |   18.269360 | 99.65% | 0.017580 |     92.86% | No Importance |    99.65%    |     99.68%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q5_K_M-naive | 5.7152 |   14.621338 | 99.52% | 0.015283 |     94.59% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q5_K_M-test  | 5.7149 |   14.610505 | 99.50% | 0.015519 |     94.60% | No Importance |    99.50%    |     99.53%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q4_K_M-naive           | 4.9904 |   18.859517 | 99.03% | 0.058574 |     87.37% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q4_K_M-test            | 4.9903 |   18.712691 | 99.31% | 0.039891 |     89.55% | Target BPW    |    99.31%    |     99.16%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q4_K_M-naive | 4.9162 |   15.009760 | 98.95% | 0.045500 |     90.86% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q4_K_M-test  | 4.9162 |   14.633656 | 99.16% | 0.033903 |     92.00% | Target BPW    |    99.16%    |     99.13%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q3_K_M-naive           | 4.1221 |   21.888925 | 96.44% | 0.230853 |     76.63% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-Q3_K_M-test            | 4.1219 |   20.308097 | 97.64% | 0.146930 |     81.02% | Target BPW    |    97.64%    |     97.11%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q3_K_M-naive | 3.9606 |   15.558934 | 96.90% | 0.154157 |     83.35% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-Q3_K_M-test  | 3.9606 |   15.194790 | 97.75% | 0.107148 |     86.21% | Target BPW    |    97.75%    |     96.96%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-IQ2_M-naive            | 3.1089 |   90.621951 | 76.71% | 1.718901 |     42.10% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-IQ2_M-test             | 3.1088 |   36.247619 | 88.29% | 0.788680 |     59.18% | Target BPW    |    88.29%    |     84.06%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-IQ2_M-naive  | 2.8595 |   26.902770 | 83.16% | 1.039928 |     59.91% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-IQ2_M-test   | 2.8595 |   21.852258 | 88.71% | 0.636635 |     67.94% | Target BPW    |    88.71%    |     87.54%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-IQ1_M-naive            | 2.3694 | 2755.174125 | 46.93% | 5.250251 |     12.92% |               |              |                 |
| Huihui-MoE-1.2B-A0.6B-IQ1_M-test             | 2.3694 |  226.909163 | 67.09% | 2.700929 |     31.01% | Target BPW    |    67.09%    |     46.33%      |
|                                              |        |             |        |          |            |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-IQ1_M-naive  | 2.0921 |   96.325903 | 66.88% | 2.431305 |     40.58% |               |              |                 |
| Huihui-MoE-5B-A1.7B-abliterated-IQ1_M-test   | 2.0920 |   55.148303 | 73.43% | 1.839316 |     47.37% | Target BPW    |    73.43%    |     63.95%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-Q6_K_M-naive                    | 6.5633 |    6.155437 | 99.94% | 0.003014 |     97.38% |               |              |                 |
| Llama-3.1-8B-Q6_K_M-test                     | 6.5632 |    6.151212 | 99.95% | 0.002538 |     97.67% | Target BPW    |    99.95%    |     99.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-Q6_K_M-naive                    | 6.5639 |    9.685527 | 99.91% | 0.004948 |     96.17% |               |              |                 |
| Llama-3.2-1B-Q6_K_M-test                     | 6.5638 |    9.684942 | 99.93% | 0.003642 |     96.62% | No Importance |    99.93%    |     99.94%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-Q5_K_M-naive                    | 5.7036 |    6.181832 | 99.85% | 0.007059 |     96.15% |               |              |                 |
| Llama-3.1-8B-Q5_K_M-test                     | 5.7035 |    6.176639 | 99.86% | 0.006445 |     96.27% | Target BPW    |    99.86%    |     99.82%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-Q5_K_M-naive                    | 5.8499 |    9.753430 | 99.80% | 0.011244 |     94.43% |               |              |                 |
| Llama-3.2-1B-Q5_K_M-test                     | 5.8491 |    9.726408 | 99.85% | 0.008460 |     94.91% | Target BPW    |    99.85%    |     99.73%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-Q4_K_M-naive                    | 4.8944 |    6.286192 | 99.47% | 0.023817 |     93.15% |               |              |                 |
| Llama-3.1-8B-Q4_K_M-test                     | 4.8943 |    6.247224 | 99.61% | 0.018801 |     93.52% | Target BPW    |    99.61%    |     99.51%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-Q4_K_M-naive                    | 5.1779 |   10.023605 | 99.34% | 0.037436 |     90.23% |               |              |                 |
| Llama-3.2-1B-Q4_K_M-test                     | 5.1773 |    9.849751 | 99.65% | 0.020680 |     92.31% | Target BPW    |    99.65%    |     99.54%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-Q3_K_M-naive                    | 3.9960 |    6.603232 | 98.19% | 0.075276 |     87.98% |               |              |                 |
| Llama-3.1-8B-Q3_K_M-test                     | 3.9960 |    6.562788 | 98.37% | 0.067765 |     88.55% | Target BPW    |    98.37%    |     97.30%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-Q3_K_M-naive                    | 4.4215 |   10.966295 | 97.72% | 0.125486 |     82.70% |               |              |                 |
| Llama-3.2-1B-Q3_K_M-test                     | 4.4213 |   10.123164 | 99.16% | 0.048729 |     88.24% | Target BPW    |    99.16%    |     99.11%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-IQ2_M-naive                     | 2.9294 |   11.936119 | 85.93% | 0.657555 |     65.94% |               |              |                 |
| Llama-3.1-8B-IQ2_M-test                      | 2.9293 |    8.667808 | 91.72% | 0.343883 |     74.80% | Target BPW    |    91.72%    |     89.32%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-IQ2_M-naive                     | 3.2860 |   44.742869 | 75.79% | 1.505601 |     46.44% |               |              |                 |
| Llama-3.2-1B-IQ2_M-test                      | 3.2859 |   14.471273 | 92.78% | 0.406797 |     69.43% | No Importance |    92.78%    |     92.80%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.1-8B-IQ1_M-naive                     | 2.1460 |   29.102629 | 70.49% | 1.540128 |     49.43% |               |              |                 |
| Llama-3.1-8B-IQ1_M-test                      | 2.1460 |   21.881896 | 75.09% | 1.253807 |     53.33% | Target BPW    |    75.09%    |     69.87%      |
|                                              |        |             |        |          |            |               |              |                 |
| Llama-3.2-1B-IQ1_M-naive                     | 2.6268 |  363.807707 | 51.07% | 3.591351 |     21.43% |               |              |                 |
| Llama-3.2-1B-IQ1_M-test                      | 2.6268 |   36.739223 | 78.22% | 1.317966 |     48.31% | No Importance |    78.22%    |     74.97%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-Q6_K_M-naive                   | 6.6837 |   10.826776 | 99.90% | 0.005337 |     95.23% |               |              |                 |
| mamba-1.4b-hf-Q6_K_M-test                    | 6.6837 |   10.825823 | 99.90% | 0.005339 |     95.22% | Same          |    99.90%    |     99.90%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-Q6_K_M-naive                   | 6.6700 |    9.472057 | 99.89% | 0.005525 |     95.34% |               |              |                 |
| mamba-2.8b-hf-Q6_K_M-test                    | 6.6697 |    9.473708 | 99.89% | 0.005594 |     95.36% | Same          |    99.89%    |     99.89%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-Q5_K_M-naive                   | 5.6782 |    9.492145 | 99.85% | 0.007647 |     94.72% |               |              |                 |
| mamba-1.4b-hf-Q5_K_M-test                    | 5.6781 |   10.962664 | 99.69% | 0.020115 |     90.93% | Naive         |    99.69%    |     99.69%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-Q5_K_M-naive                   | 5.6326 |    9.472057 | 99.89% | 0.005525 |     95.34% |               |              |                 |
| mamba-2.8b-hf-Q5_K_M-test                    | 5.6326 |    9.682213 | 99.55% | 0.028049 |     89.50% | Naive         |    99.55%    |     99.55%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-Q4_K_M-naive                   | 4.7657 |   10.942318 | 99.70% | 0.017359 |     92.31% |               |              |                 |
| mamba-1.4b-hf-Q4_K_M-test                    | 4.7657 |   11.115953 | 99.49% | 0.031787 |     89.35% | Naive         |    99.49%    |     99.48%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-Q4_K_M-naive                   | 4.6914 |    9.574036 | 99.71% | 0.015643 |     93.09% |               |              |                 |
| mamba-2.8b-hf-Q4_K_M-test                    | 4.6914 |    9.789406 | 99.37% | 0.038112 |     88.37% | Naive         |    99.37%    |     99.37%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-Q3_K_M-naive                   | 3.7876 |   11.600416 | 98.74% | 0.074951 |     85.31% |               |              |                 |
| mamba-1.4b-hf-Q3_K_M-test                    | 3.7876 |   12.465589 | 97.84% | 0.135911 |     79.85% | Naive         |    97.84%    |     97.84%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-Q3_K_M-naive                   | 3.6824 |    9.974359 | 98.87% | 0.063989 |     86.72% |               |              |                 |
| mamba-2.8b-hf-Q3_K_M-test                    | 3.6823 |   15.370013 | 93.00% | 0.473393 |     64.99% | Naive         |    93.00%    |     93.00%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-IQ2_M-naive                    | 2.9176 |   26.825640 | 87.12% | 0.903547 |     60.26% |               |              |                 |
| mamba-1.4b-hf-IQ2_M-test                     | 2.9175 |   22.366520 | 88.58% | 0.761930 |     53.78% | Target BPW    |    88.58%    |     88.58%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-IQ2_M-naive                    | 2.8177 |   24.361332 | 84.52% | 0.971537 |     60.65% |               |              |                 |
| mamba-2.8b-hf-IQ2_M-test                     | 2.8177 |   24.518860 | 86.15% | 0.954394 |     51.53% | Target BPW    |    86.15%    |     86.15%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-1.4b-hf-IQ1_M-naive                    | 2.1837 |   37.377075 | 81.52% | 1.247609 |     54.68% |               |              |                 |
| mamba-1.4b-hf-IQ1_M-test                     | 2.1837 |  139.471320 | 68.68% | 2.561901 |     25.74% | Target BPW    |    68.68%    |     68.68%      |
|                                              |        |             |        |          |            |               |              |                 |
| mamba-2.8b-hf-IQ1_M-naive                    | 2.0606 |   29.246815 | 83.15% | 1.161747 |     57.76% |               |              |                 |
| mamba-2.8b-hf-IQ1_M-test                     | 2.0606 | 1500.981697 | 55.03% | 5.072153 |     52.72% | Target BPW    |    55.03%    |     55.03%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q6_K_M-naive      | 8.2158 |    7.807138 | 99.86% | 0.006957 |     95.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q6_K_M-test       | 8.2157 |    7.774620 | 99.92% | 0.003909 |     96.66% | No Importance |    99.92%    |     99.97%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q6_K_M-naive     | 6.5673 |    6.514285 | 99.71% | 0.012372 |     94.92% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q6_K_M-test      | 6.5665 |    6.519770 | 99.75% | 0.010997 |     95.15% | No Importance |    99.75%    |     99.95%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q5_K_M-naive      | 6.3562 |    7.807138 | 99.86% | 0.006957 |     95.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q5_K_M-test       | 6.3561 |    7.774620 | 99.92% | 0.003909 |     96.66% | No Importance |    99.92%    |     99.93%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q5_K_M-naive     | 5.6906 |    6.514285 | 99.71% | 0.012372 |     94.92% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q5_K_M-test      | 5.6903 |    6.519770 | 99.75% | 0.010997 |     95.15% | No Importance |    99.75%    |     99.87%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M-naive      | 5.8664 |    7.807138 | 99.86% | 0.006957 |     95.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M-test       | 5.8663 |    7.774620 | 99.92% | 0.003909 |     96.66% | Target BPW    |    99.92%    |     99.91%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q4_K_M-naive     | 4.8654 |    6.514285 | 99.71% | 0.012372 |     94.92% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q4_K_M-test      | 4.8650 |    6.519770 | 99.75% | 0.010997 |     95.15% | Target BPW    |    99.75%    |     99.75%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q3_K_M-naive      | 4.8350 |    7.901055 | 99.55% | 0.022542 |     92.67% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-Q3_K_M-test       | 4.8350 |    7.846752 | 99.74% | 0.013281 |     94.17% | No Importance |    99.74%    |     99.75%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q3_K_M-naive     | 3.9094 |    6.690889 | 98.86% | 0.051482 |     89.61% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-Q3_K_M-test      | 3.9094 |    6.698637 | 98.89% | 0.049145 |     90.15% | Target BPW    |    98.89%    |     98.74%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-IQ2_M-naive       | 4.4901 |    8.332496 | 98.49% | 0.078295 |     87.27% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-IQ2_M-test        | 4.4901 |    7.892440 | 99.61% | 0.020684 |     92.53% | Target BPW    |    99.61%    |     99.56%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-IQ2_M-naive      | 2.8415 |    8.394154 | 93.06% | 0.340536 |     75.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-IQ2_M-test       | 2.8411 |    8.326604 | 93.34% | 0.299314 |     77.24% | Target BPW    |    93.34%    |     93.31%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-IQ1_M-naive       | 4.3627 |    8.332496 | 98.49% | 0.078295 |     87.27% |               |              |                 |
| NVIDIA-Nemotron-Nano-9B-v2-IQ1_M-test        | 4.3627 |    7.892440 | 99.61% | 0.020684 |     92.53% | Target BPW    |    99.61%    |     99.17%      |
|                                              |        |             |        |          |            |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-IQ1_M-naive      | 2.0663 |    8.394154 | 93.06% | 0.340536 |     75.78% |               |              |                 |
| NVIDIA-Nemotron-Nano-12B-v2-IQ1_M-test       | 2.0662 |    8.326604 | 93.34% | 0.299314 |     77.24% | Target BPW    |    93.34%    |     85.11%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-Q6_K_M-naive            | 6.5638 |   79.657778 | 95.51% | 0.286043 |     79.93% |               |              |                 |
| Phi-4-mini-reasoning-Q6_K_M-test             | 6.5620 |   92.616509 | 96.37% | 0.212204 |     82.49% | No Importance |    96.37%    |     96.77%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-Q6_K_M-naive                 | 6.5632 |    6.972020 | 99.98% | 0.001062 |     98.41% |               |              |                 |
| Phi-4-reasoning-Q6_K_M-test                  | 6.5605 |    6.980271 | 99.96% | 0.001616 |     98.11% | No Importance |    99.96%    |     99.98%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-Q5_K_M-naive            | 5.9225 |   82.656950 | 94.19% | 0.427141 |     76.12% |               |              |                 |
| Phi-4-mini-reasoning-Q5_K_M-test             | 5.9187 |   96.715709 | 94.47% | 0.421151 |     75.72% | No Importance |    94.47%    |     94.93%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-Q5_K_M-naive                 | 5.7850 |    6.984280 | 99.94% | 0.002555 |     97.62% |               |              |                 |
| Phi-4-reasoning-Q5_K_M-test                  | 5.7842 |    6.986200 | 99.94% | 0.002799 |     97.47% | Same          |    99.94%    |     99.93%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-Q4_K_M-naive            | 5.1796 |   79.913113 | 90.13% | 0.858518 |     67.55% |               |              |                 |
| Phi-4-mini-reasoning-Q4_K_M-test             | 5.1789 |  102.653408 | 90.63% | 0.864110 |     67.55% | No Importance |    90.63%    |     90.70%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-Q4_K_M-naive                 | 4.9385 |    7.033351 | 99.80% | 0.009278 |     95.53% |               |              |                 |
| Phi-4-reasoning-Q4_K_M-test                  | 4.9374 |    7.018981 | 99.80% | 0.009399 |     95.49% | Target BPW    |    99.80%    |     99.79%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-Q3_K_M-naive            | 4.3982 |   94.813679 | 83.24% | 1.712537 |     56.76% |               |              |                 |
| Phi-4-mini-reasoning-Q3_K_M-test             | 4.3980 |   99.959381 | 85.41% | 1.516922 |     57.16% | No Importance |    85.41%    |     85.75%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-Q3_K_M-naive                 | 4.0162 |    7.177370 | 99.34% | 0.030253 |     92.09% |               |              |                 |
| Phi-4-reasoning-Q3_K_M-test                  | 4.0158 |    7.224695 | 99.20% | 0.036670 |     91.41% | Naive         |    99.20%    |     99.30%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-IQ2_M-naive             | 3.1265 |  140.211058 | 53.68% | 3.785088 |     28.97% |               |              |                 |
| Phi-4-mini-reasoning-IQ2_M-test              | 3.1263 |         nan |    nan |      nan |     28.27% | No Importance |     nan      |     62.04%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-IQ2_M-naive                  | 2.7866 |   10.118470 | 91.61% | 0.381891 |     73.54% |               |              |                 |
| Phi-4-reasoning-IQ2_M-test                   | 2.7862 |    9.684945 | 92.68% | 0.328806 |     75.56% | No Importance |    92.68%    |     94.06%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-mini-reasoning-IQ1_M-naive             | 2.4000 |         nan |    nan |      nan |      2.89% |               |              |                 |
| Phi-4-mini-reasoning-IQ1_M-test              | 2.3992 |         nan |    nan |      nan |      0.78% | N/A           |     nan      |     25.73%      |
|                                              |        |             |        |          |            |               |              |                 |
| Phi-4-reasoning-IQ1_M-naive                  | 1.9627 |   21.964494 | 76.60% | 1.167902 |     56.81% |               |              |                 |
| Phi-4-reasoning-IQ1_M-test                   | 1.9627 |   24.371781 | 75.48% | 1.282693 |     54.26% | No Importance |    75.48%    |     78.57%      |
|                                              |        |             |        |          |            |               |              |                 |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-Q6_K_M-naive                        | 6.5635 |    9.403178 | 99.64% | 0.003126 |     97.59% |               |              |                 |
| Qwen3-8B-Q6_K_M-test                         | 6.5630 |    9.417844 | 99.63% | 0.002852 |     97.80% | Target BPW    |    99.63%    |     99.63%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-Q6_K_M-naive                       | 6.5632 |    8.356044 | 99.83% | 0.002415 |     97.87% |               |              |                 |
| Qwen3-14B-Q6_K_M-test                        | 6.5631 |    8.353905 | 99.85% | 0.001700 |     98.18% | Target BPW    |    99.85%    |     99.85%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-Q5_K_M-naive                        | 5.7090 |    9.432894 | 99.55% | 0.007302 |     96.45% |               |              |                 |
| Qwen3-8B-Q5_K_M-test                         | 5.7085 |    9.456015 | 99.53% | 0.007925 |     96.47% | No Importance |    99.53%    |     99.54%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-Q5_K_M-naive                       | 5.6925 |    8.349641 | 99.78% | 0.004632 |     97.10% |               |              |                 |
| Qwen3-14B-Q5_K_M-test                        | 5.6925 |    8.374862 | 99.78% | 0.004706 |     96.97% | Same          |    99.78%    |     99.78%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-Q4_K_M-naive                        | 4.9049 |    9.484175 | 99.20% | 0.022981 |     93.80% |               |              |                 |
| Qwen3-8B-Q4_K_M-test                         | 4.9048 |    9.499435 | 99.22% | 0.021199 |     94.16% | No Importance |    99.22%    |     99.23%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-Q4_K_M-naive                       | 4.8730 |    8.438122 | 99.48% | 0.016470 |     94.74% |               |              |                 |
| Qwen3-14B-Q4_K_M-test                        | 4.8730 |    8.391917 | 99.56% | 0.014632 |     95.00% | No Importance |    99.56%    |     99.55%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-Q3_K_M-naive                        | 4.0223 |    9.806293 | 97.74% | 0.085251 |     88.31% |               |              |                 |
| Qwen3-8B-Q3_K_M-test                         | 4.0222 |    9.727678 | 98.33% | 0.062507 |     89.96% | Target BPW    |    98.33%    |     98.01%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-Q3_K_M-naive                       | 3.9627 |    8.608721 | 98.48% | 0.058355 |     90.33% |               |              |                 |
| Qwen3-14B-Q3_K_M-test                        | 3.9626 |    8.641670 | 98.70% | 0.049203 |     91.17% | Target BPW    |    98.70%    |     98.50%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-IQ2_M-naive                         | 2.9750 |   12.149941 | 88.46% | 0.546543 |     71.90% |               |              |                 |
| Qwen3-8B-IQ2_M-test                          | 2.9749 |   10.807367 | 93.07% | 0.292973 |     78.84% | Target BPW    |    93.07%    |     91.57%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-IQ2_M-naive                        | 2.8802 |   10.032410 | 91.33% | 0.385816 |     75.81% |               |              |                 |
| Qwen3-14B-IQ2_M-test                         | 2.8801 |    9.611706 | 94.37% | 0.239130 |     80.69% | Target BPW    |    94.37%    |     93.11%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-8B-IQ1_M-naive                         | 2.1978 |   24.298130 | 75.73% | 1.336847 |     56.02% |               |              |                 |
| Qwen3-8B-IQ1_M-test                          | 2.1977 |   15.898865 | 83.31% | 0.802131 |     65.28% | Target BPW    |    83.31%    |     76.51%      |
|                                              |        |             |        |          |            |               |              |                 |
| Qwen3-14B-IQ1_M-naive                        | 2.0821 |   15.201915 | 81.19% | 0.934159 |     62.44% |               |              |                 |
| Qwen3-14B-IQ1_M-test                         | 2.0820 |   12.808369 | 85.11% | 0.692015 |     67.15% | Target BPW    |    85.11%    |     82.91%      |

Tests:         132

Naive:          25 (19%)
Same:           10 ( 8%)
Target BPW:     69 (52%)
No Importance:  27 (20%)
N/A:             1 (<1%)

AI usage disclosure

AI was used to validate the mathematical approach and calculations, and to optimize and debug the code.

Special thanks to @AesSedai, @compilade and @ddh0 for their contributions during the development of this PR.

@Thireus
Copy link
Copy Markdown
Contributor

Thireus commented Apr 14, 2026

Lately, I've been thinking if we can find a way to implement these into smaller sub-tools, rather than adding more things into llama-quantize

That's how this whole thing started 😆. The PR's idea came from a convo with @jukofyork and @compilade, after he released the new imatrix format. The initial thought was to implement as a stand alone tool, but along the way I realised it was going to duplicate a lot of the llama-quantize code so went down the route of adding in-place, but encapsulating as much as possible into a single function to avoid messing up the rest.

Having said that, a stand alone tool may be the way to go. I'm open to suggestions

I can't believe we've been working on the same thing for nearly a year. I saw your quant kld results today and was surprised they are well optimised compared to others, so decided to check your work.

For my part it's a tool suite that I've created independently to the llama code.

If you need to brainstorm please let me know, maybe there are some aspects I've already resolved and vice-versa. You can check the tool suite in action on https://gguf.thireus.com/quant_assign.html, I'm sure you'll notice a lot of similarities.

Cheers.

@ivy-42
Copy link
Copy Markdown

ivy-42 commented May 2, 2026

Hi, I don't know the proper protocol for "intruding" on another's PR … 😉

I've been working on inference speed-aware quantization. This is especially useful for NPUs, drafting models or devices that have abundant memory but lack compute. It's based on this PR's knapsack solver infrastructure. Since this PR is already hard to review, I plan to submit a separate PR for it once it is a little more polished.
In case someone is interested: Fork

As I am somewhat familiar with the source code by now, I could give a little feedback, if helpful.

PS: Nice algorithmic work.

@EAddario
Copy link
Copy Markdown
Contributor Author

EAddario commented May 3, 2026

Hi @ivy-42 and thank you! No protocol expected / needed. Please feel free to use the code in any way you deem fit. I'll definitely check your fork but in the meantime, any questions/suggestions are always welcome

Comment thread src/llama-quant.cpp
};

// Quality metrics
struct quant_error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me by what the fields of this struct are scaled. I think it approximates a weighted sum over all elements of the tensor (approximate because not all are sampled), right? Maybe rename the fields or add a comment? E.g. weighted_error, wse and wce + a comment clarifying that those are scaled by the tensor element count

Comment thread src/llama-quant.cpp
constexpr double INFINITE = std::numeric_limits<double>::infinity();
constexpr uint64_t STATE_MAGIC = 0x4250572d5631; // "BPW-V1"
constexpr uint64_t HASH_MAGIC = 0xeabada55cafed00d;
constexpr float penalty = 2.0f;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

penalty can mean a lot in the context of an optimization problem. Maybe boost_factor is more clear?

@ivy-42
Copy link
Copy Markdown

ivy-42 commented May 3, 2026

In my fork, I added a CLI flag --maximize-budget-use (default: false) to allow users to opt into greedy tensor upgrades, rather than having them enabled by default. This is especially useful when there’s no precise target size, e.g., when quantizing models for redistribution. Personally, I prefer sticking with a Pareto-optimal quant mix and adjusting ctx-size, offloading, or KV-cache quantization.

If you think this fits the scope of this PR, feel free to include my commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants