Ported quantize.cpp#84
Conversation
|
Awesome work! I've left some feedback; it's not the most Rust-y code, but that's fine as it's a port and we can fix that up later on 🙂 Really appreciate you doing this, it's great to get one step closer to being completely standalone 🚀 |
|
Your comments arent showing up in the PR, And I fully agree, im going to go ahead and fix all the clippy issues plus see if I can improve some of the logic. Do you have any recommendations on data reading and writing? I use the same buffer multiple times since its more efficient when working on rust embedded systems so I went ahead and did the same here. |
|
Weird, I can definitely see the comments here and in the diff. Not sure what's happening there. For data read/write, not sure - reusing the same buffer seems reasonable if they're semantically similar, but I'd just use a new buffer if they're not. Do you have any examples of something you'd want advice on? |
|
I've updated the PR, but the output seems to be incorrect: Probably an assumption somewhere that I broke. Need to look into it further - any ideas? I'd also like to support loading unversioned models and GGJT, so this is going to be a bit of a headache in general :( |
|
Looking at your commits, I found a couple of places where it could've broken, Ill check it out and see if it fixes it. |
|
@philpax Regarding supporting other types of models, if you can provide the relevant issues I could research into making those work. |
|
Ok, updated to the latest
assert_eq!(src.len(), n as usize);
assert_eq!(dst.len(), n as usize);
assert!(hist.len() >= 16); |
What about |
The issue is the size - it should be equivalent to the size of the original array in bytes. I guess we could shove 4*size onto the user, it's not that big of a deal |
|
Merged in main again, comments/questions from above still apply |
|
Cool then, let's see if #125 happens soon and in the meantime we can fix 2/3. |
|
#125 is going to be merged soon if all goes well, but its ggml-format loader doesn't work in its current stage. Given that, I think we're OK to merge this once that's in. Let's try to get support for the other formats as soon as possible, but I won't let that block merging. |
|
I implemented write support for the loader (now |
I went ahead and ported the main
quantize.cppfile. My changes involve porting the file while keeping the internal C++ function calls intact. I have plans to port those function calls to remove ggml dependencies in a future PR though.During the porting process, I faced some challenges as I was not familiar with how to use
Context. As a result, I added thehalflibrary to handle the f16->f32 conversion. I could remove the dependency if needed but ill need some help with working withContext. Something to note on this is that if there are plans to move away from ggml then thehalflibrary will be necessary.Additionally, I included some print statements inside the function to mimic the original behavior of quantize.cpp. I can remove those if needed.
Currently, there is no way to access the function since I did not implement a CLI function for it.
I am open to feedback and suggestions on how to improve this pull request.
Resolves #40