You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
This should be relatively straightforward - it reads in the original ggml model, runs the quantization functions over the data, and writes it out to disk.
The exciting possibility is for parallelisation 👀 - all you should have to do is scan through the file to determine the tensor boundaries, then build an iterator from it and feed it to rayon. It would be a huge improvement over the C++ version, and it would be practically free!
Split this off from #21 as it's a separate issue.
This should be relatively straightforward - it reads in the original
ggmlmodel, runs the quantization functions over the data, and writes it out to disk.The exciting possibility is for parallelisation 👀 - all you should have to do is scan through the file to determine the tensor boundaries, then build an iterator from it and feed it to
rayon. It would be a huge improvement over the C++ version, and it would be practically free!