Split up modules#135
Conversation
…-> InferenceSession::from_snapshot
| let (_, vocabulary) = args.model_load.load(); | ||
| let toks = match vocabulary.tokenize(&prompt, false) { | ||
| let model = args.model_load.load(); |
There was a problem hiding this comment.
I agree with the change. I had considered doing this a few times before. Since both the model and the vocabulary are meant to be immutable after creation, bundling them into the same struct will hardly cause issues.
| // The size of a scratch buffer used for inference. This is used for temporary | ||
| // storage of intermediate results during inference. | ||
| // | ||
| // The specific value was copied from `llama.cpp`. | ||
| const SCRATCH_SIZE: usize = 512 * 1024 * 1024; |
There was a problem hiding this comment.
I think llama.cpp figured out a proper way to compute this value. We should have a look at this. Not in this PR of course 👍
There was a problem hiding this comment.
Yeah, I'm a little concerned about that one myself, but I don't think it should be too bad - it'll mostly just be discarding the changes to |

This is a little overdue I think, and it'll cause problems for the other PRs, but it also makes it much easier to maintain.
I've made a few controversial changes in the last three commits. The rest are pretty straightforward.