@city96
I don't really know where to put this since it's not quite a full solution, but it does seem to work without errors.
The conversion process is difficult for a layman to follow. Moreso because the instructions lead to errors.
With a bit of llm magic and tinkering, I was able to get it to run on colab, grabbing the model file from a public huggingface repo. I do not know if it works as intended, and haven't tested the resulting gguf, but it certainly seemed to be doing the things it was supposed to. It runs on cpu, no gpu (T4) utilised, I'm not sure if that's even possible, but anyway the process runs in colab for ~30 minutes so SEP.
Downloading the file currently also takes an equally long time; perhaps it might be better to copy it to hf or drive but having space available is a nontrivial requirement.
Edit: There is one serious problem - free colab only has 12gb ram, convert.py eats it up and when you reach that much (~53% on one FP8 model I have) the process will simply halt and clear memory. Perhaps convert should handle low ram.
Edit 2: There's a very simple fix for that. Will PR it now.
https://colab.research.google.com/drive/1JFRdqy1EdsZNU6STYnf3ft2a4DUHvKXm?usp=sharing
@city96
I don't really know where to put this since it's not quite a full solution, but it does seem to work without errors.
The conversion process is difficult for a layman to follow. Moreso because the instructions lead to errors.
With a bit of llm magic and tinkering, I was able to get it to run on colab, grabbing the model file from a public huggingface repo. I do not know if it works as intended, and haven't tested the resulting gguf, but it certainly seemed to be doing the things it was supposed to. It runs on cpu, no gpu (T4) utilised, I'm not sure if that's even possible, but anyway the process runs in colab for ~30 minutes so SEP.
Downloading the file currently also takes an equally long time; perhaps it might be better to copy it to hf or drive but having space available is a nontrivial requirement.
Edit: There is one serious problem - free colab only has 12gb ram, convert.py eats it up and when you reach that much (~53% on one FP8 model I have) the process will simply halt and clear memory. Perhaps convert should handle low ram.
Edit 2: There's a very simple fix for that. Will PR it now.
https://colab.research.google.com/drive/1JFRdqy1EdsZNU6STYnf3ft2a4DUHvKXm?usp=sharing