Skip to content

convert: rework ftype heuristics#18214

Merged
taronaeo merged 4 commits intoggml-org:masterfrom
taronaeo:feat/default_precision
Dec 22, 2025
Merged

convert: rework ftype heuristics#18214
taronaeo merged 4 commits intoggml-org:masterfrom
taronaeo:feat/default_precision

Conversation

@taronaeo
Copy link
Copy Markdown
Member

@taronaeo taronaeo commented Dec 20, 2025

fixes: #18182

This PR updates the heuristic detection logic for the default ftype. When --outtype is not specified, the heuristics will attempt to figure out the highest-fidelity 16-bit ftype based on the first tensor.

If the first tensor does not match the following, it will continue to the second and nth tensor until it finds a tensor that matches:

  1. At least a 2D tensor
  2. Tensor not F32 dtype

If all tensors do not match the criteria above, it will default to f16 ftype.

Tested against the following models:

  1. Granite-4.0-1B, defaulted to bf16 ftype (correct)
  2. GPT-NeoX-20B, defaulted to f16 ftype (correct)

Note about alternative methods, such as relying on config.json dtype: some finetunes or quantisation lie about the correct dtype, so it can't be trusted. And some models like GPT-OSS do not actually contain a dtype key within their config.json, so the easier route was to do heuristics.

AI Declaration: AI was used when creating this PR to identify existing logic relating to heuristics and to scaffold the code.

@taronaeo taronaeo requested a review from CISC as a code owner December 20, 2025 04:36
@github-actions github-actions Bot added the python python script changes label Dec 20, 2025
@taronaeo taronaeo requested a review from pwilkin December 20, 2025 04:37
@taronaeo
Copy link
Copy Markdown
Member Author

I was wondering if we should warn the user about using --outtype f16 when the model is trained using bfloat16 since this issue came up with Granite 4.0. It would be good to stop these occurrences of ??????? or having faulty models distributed online.

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

convert: fix type-check

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

convert: bring back heuristics comment

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
@taronaeo taronaeo force-pushed the feat/default_precision branch from bcc2001 to eae4555 Compare December 20, 2025 04:56
@CISC
Copy link
Copy Markdown
Member

CISC commented Dec 20, 2025

Is it really necessary to check all the tensors? Wasn't the issue just that it defaulted to f16 instead of auto (and that auto only really checks for f16)?

The reason I'm saying is because this duplicates tensor loading logic that really needs to be refactored, see #18043 (review)

Apart from MXFP4/FP8 models I can't remember seeing any safetensors with mixed datatypes...

@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Dec 20, 2025

@CISC once I finish my refactoring of convert_hf_to_gguf.py it won't really make that much of a difference :) I don't think it's that complicated, it's just like 10 lines of code.

Copy link
Copy Markdown
Member

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heuristics should work a bit differently - you should only take into account (a) at least 2D tensors (b) non-F32 tensors.

@taronaeo
Copy link
Copy Markdown
Member Author

taronaeo commented Dec 21, 2025

Is it really necessary to check all the tensors?

I was a bit unsure whether the first tensor gave enough information about the dtype to determine the correct type. But looking at how many tensors there are for models bigger than 1B, I guess reverting back to the first tensor still makes sense.

Edit: Ignore what I said. Will revert to using first tensor :)

@taronaeo
Copy link
Copy Markdown
Member Author

The heuristics should work a bit differently - you should only take into account (a) at least 2D tensors (b) non-F32 tensors.

In retrospect, I think looping through all the tensors might not have been a good idea, especially if the model is large. In this case I'll revert back to using the first tensor and I guess, check to see if it's tensor.dim >= 2 and tensor.dtype != torch.float32.

If the conditions don't meet, we'll just jump to the next tensor and check again. Let me know your thoughts about this :)

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
@taronaeo
Copy link
Copy Markdown
Member Author

Updated this PR and retained the original --outtype auto logic whereby it will only choose the highest-fidelity 16-bit ftype. Heuristics now check the first tensor for either f16 or bf16 dtype and if it doesn't match, it will continue until it gets a match.

If there are no matches, it defaults to f16 ftype. Updated the PR description also, PTAL again.

Copy link
Copy Markdown
Member

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to enumerate and have some threshold before giving up, but optional.

Comment thread convert_hf_to_gguf.py Outdated
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@taronaeo
Copy link
Copy Markdown
Member Author

Might want to enumerate and have some threshold before giving up, but optional.

Hmm ideally it should detect the correct 16-bit ftype within the first few tensors. But I don't know what a good threshold is to set it such that it doesn't fail prematurely, and doesn't take too long to fall back to f16 ftype.

I'll leave it as-is until we get a report about it taking way too long, then I can see what kind of model causes it :)

@taronaeo taronaeo requested a review from pwilkin December 22, 2025 11:58
@taronaeo
Copy link
Copy Markdown
Member Author

@pwilkin Re-requesting your review before we can merge :)

Copy link
Copy Markdown
Member

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aight, first F16 tensor as heuristics should be fine, can't think of an example where that would be bad (it can be F32 but that'll get skipped over, so it should cover all cases).

@taronaeo taronaeo merged commit a283104 into ggml-org:master Dec 22, 2025
6 checks passed
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* convert: rework ftype heuristics

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

convert: fix type-check

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

convert: bring back heuristics comment

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: revert to using first tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: rework heuristics logic

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: rm redundant float32 check

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* convert: rework ftype heuristics

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

convert: fix type-check

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

convert: bring back heuristics comment

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: revert to using first tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: rework heuristics logic

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: rm redundant float32 check

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* convert: rework ftype heuristics

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

convert: fix type-check

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

convert: bring back heuristics comment

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: revert to using first tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: rework heuristics logic

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* convert: rm redundant float32 check

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: convert_hf_to_gguf.py to default to the original precision

3 participants