Split up modules by philpax · Pull Request #135 · rustformers/llm

philpax · 2023-04-13T03:06:44Z

This is a little overdue I think, and it'll cause problems for the other PRs, but it also makes it much easier to maintain.

I've made a few controversial changes in the last three commits. The rest are pretty straightforward.

…-> InferenceSession::from_snapshot

setzer22

Everything looks good. I think the way modules are split makes sense 👍

Hopefully this wouldn't cause too many conflicts with existing PRs? e.g. #125 comes to mind

setzer22 · 2023-04-13T09:26:43Z

-    let (_, vocabulary) = args.model_load.load();
-    let toks = match vocabulary.tokenize(&prompt, false) {
+    let model = args.model_load.load();


I agree with the change. I had considered doing this a few times before. Since both the model and the vocabulary are meant to be immutable after creation, bundling them into the same struct will hardly cause issues.

setzer22 · 2023-04-13T09:28:04Z

+// The size of a scratch buffer used for inference. This is used for temporary
+// storage of intermediate results during inference.
+//
+// The specific value was copied from `llama.cpp`.
+const SCRATCH_SIZE: usize = 512 * 1024 * 1024;


I think llama.cpp figured out a proper way to compute this value. We should have a look at this. Not in this PR of course 👍

https://github.com/ggerganov/llama.cpp/blob/82d146df9b43cf677e0dbce20b03cf864958a0cc/llama.cpp#L45-L57

philpax · 2023-04-13T09:45:52Z

Hopefully this wouldn't cause too many conflicts with existing PRs? e.g. #125 comes to mind

Yeah, I'm a little concerned about that one myself, but I don't think it should be too bad - it'll mostly just be discarding the changes to loader.rs here. The others shouldn't be too hard to accommodate.

philpax added 10 commits April 13, 2023 04:09

refactor(llama): split inference session out

b0480a7

refactor(llama): move TokenUtf8Buffer to util

d3a8da1

refactor(llama): split vocabulary out

c95f0ef

refactor(llama): split loader out

c018af3

refactor(llama): move SnapshotError

0b671d3

refactor(llama): split model out

baaf9ab

refactor(llama): move InferenceStats

f5a1ed1

refactor(llama): include vocabulary in model

24ccef1

refactor(llama): move model creation into Model::new

470b807

refactor(llama): InferenceSession::new, Model::session_from_snapshot …

ec58e46

…-> InferenceSession::from_snapshot

philpax added the meta:maintenance Changes that will make it easier for us to maintain code label Apr 13, 2023

philpax requested a review from setzer22 April 13, 2023 03:06

setzer22 approved these changes Apr 13, 2023

View reviewed changes

philpax merged commit 4938dad into rustformers:main Apr 13, 2023

philpax deleted the split-up-modules branch April 13, 2023 09:51

philpax mentioned this pull request Apr 13, 2023

Investigate concurrent inference across threads with one model #95

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split up modules#135

Split up modules#135
philpax merged 10 commits intorustformers:mainfrom
philpax:split-up-modules

philpax commented Apr 13, 2023

Uh oh!

setzer22 left a comment

Uh oh!

setzer22 Apr 13, 2023

Uh oh!

setzer22 Apr 13, 2023

Uh oh!

philpax Apr 13, 2023

Uh oh!

philpax commented Apr 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

philpax commented Apr 13, 2023

Uh oh!

setzer22 left a comment

Choose a reason for hiding this comment

Uh oh!

setzer22 Apr 13, 2023

Choose a reason for hiding this comment

Uh oh!

setzer22 Apr 13, 2023

Choose a reason for hiding this comment

Uh oh!

philpax Apr 13, 2023

Choose a reason for hiding this comment

Uh oh!

philpax commented Apr 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants