convert: rework ftype heuristics#18214
Conversation
|
I was wondering if we should warn the user about using |
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> convert: fix type-check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> convert: bring back heuristics comment Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
bcc2001 to
eae4555
Compare
|
Is it really necessary to check all the tensors? Wasn't the issue just that it defaulted to The reason I'm saying is because this duplicates tensor loading logic that really needs to be refactored, see #18043 (review) Apart from MXFP4/FP8 models I can't remember seeing any safetensors with mixed datatypes... |
|
@CISC once I finish my refactoring of |
pwilkin
left a comment
There was a problem hiding this comment.
The heuristics should work a bit differently - you should only take into account (a) at least 2D tensors (b) non-F32 tensors.
Edit: Ignore what I said. Will revert to using first tensor :) |
In retrospect, I think looping through all the tensors might not have been a good idea, especially if the model is large. In this case I'll revert back to using the first tensor and I guess, check to see if it's If the conditions don't meet, we'll just jump to the next tensor and check again. Let me know your thoughts about this :) |
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
|
Updated this PR and retained the original If there are no matches, it defaults to f16 ftype. Updated the PR description also, PTAL again. |
CISC
left a comment
There was a problem hiding this comment.
Might want to enumerate and have some threshold before giving up, but optional.
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Hmm ideally it should detect the correct 16-bit ftype within the first few tensors. But I don't know what a good threshold is to set it such that it doesn't fail prematurely, and doesn't take too long to fall back to f16 ftype. I'll leave it as-is until we get a report about it taking way too long, then I can see what kind of model causes it :) |
|
@pwilkin Re-requesting your review before we can merge :) |
pwilkin
left a comment
There was a problem hiding this comment.
Aight, first F16 tensor as heuristics should be fine, can't think of an example where that would be bad (it can be F32 but that'll get skipped over, so it should cover all cases).
* convert: rework ftype heuristics Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> convert: fix type-check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> convert: bring back heuristics comment Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: revert to using first tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: rework heuristics logic Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: rm redundant float32 check Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* convert: rework ftype heuristics Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> convert: fix type-check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> convert: bring back heuristics comment Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: revert to using first tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: rework heuristics logic Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: rm redundant float32 check Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* convert: rework ftype heuristics Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> convert: fix type-check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> convert: bring back heuristics comment Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: revert to using first tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: rework heuristics logic Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * convert: rm redundant float32 check Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
fixes: #18182
This PR updates the heuristic detection logic for the default ftype. When
--outtypeis not specified, the heuristics will attempt to figure out the highest-fidelity 16-bit ftype based on the first tensor.If the first tensor does not match the following, it will continue to the second and nth tensor until it finds a tensor that matches:
If all tensors do not match the criteria above, it will default to
f16ftype.Tested against the following models:
Note about alternative methods, such as relying on
config.jsondtype: some finetunes or quantisation lie about the correct dtype, so it can't be trusted. And some models like GPT-OSS do not actually contain adtypekey within their config.json, so the easier route was to do heuristics.AI Declaration: AI was used when creating this PR to identify existing logic relating to heuristics and to scaffold the code.