Refactor most code in main.cpp into a separate module (preparing to implement TCP mode) by tarruda · Pull Request #267 · ggml-org/llama.cpp

tarruda · 2023-03-18T15:34:35Z

The goal of this refactor is allow reusing the model execution while using streams other than stdin/stdout for interaction.

In my case, I'd like to implement a simple TCP server (which is enabled as a command-line option) that will run llama_main for each new connection, which will be handled in a child process via fork(). This would bring a few benefits:

Loading model weights can be very slow, so in TCP mode we can load it before listening. Each new connection is handled in a forked process, which inherits the parent's memory (so doesn't have to reload the model)
We can quickly start a new context by opening a new TCP socket. New connections will also be able to specify some new parameters such as seed and prompt.
It becomes easier to wrap this into a REST/HTTP server
Can be more convenient in a LAN where you have a powerful computer as the model server.

If this PR is accepted, I will follow up with a PR that implements the TCP server command line option

This PR is simpler to review than it appears. Just look at the commits individually (most of the additions/deletions happen in the first commit, where main.cpp is simply renamed as llama.cpp).

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

The goal is to allow running llama_main while connected to other streams, such as TCP sockets. Signed-off-by: Thiago Padilha <thiago@padilha.cc>

Green-Sky · 2023-03-18T15:48:40Z

How does this PR tie into the current active refactor here #77 ?

tarruda · 2023-03-18T15:54:01Z

How does this PR tie into the current active refactor here #77 ?

I was not aware of that PR, I should have searched it first. The only reason I created this PR is because I had a clear vision of how to implement a TCP server mode into llama.cpp. Honestly not sure what to do, should I close this PR?

Green-Sky · 2023-03-18T16:14:06Z

Honestly not sure what to do, should I close this PR?

Not my call, but you could review the other PR with your insight :)

tarruda · 2023-03-18T16:25:51Z

Not my call, but you could review the other PR with your insight :)

I had a quick look and it seems that the goal in #77 is to make llama.cpp embeddable as a library, which requires modifying/refactoring more than what I do here.

This PR has no such goals and makes almost no changes to existing code. It can be summarized as:

Most code in main.cpp moved to llama.cpp. Didn't split existing functions, only created llama_main which has most code of the old main function.
llama_main now accepts the following as arguments:
- parsed parameters
- preloaded model
- input/output/error streams which are now used instead of hardcoded stdin/stdout/stderr

ggerganov · 2023-03-18T17:10:35Z

@tarruda
Adding a TCP server would be awesome!
Please keep doing this - for now, do it on a branch in this repo as you find best. Just invited you as a collaborator.
I will review #77 very soon and merge it first. After that, we will update your changes to fit the C-style API

tarruda · 2023-03-19T18:44:50Z

Closing in favor of #278

Move main.cpp to llama.cpp

51d003e

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

tarruda force-pushed the refactor-llama-main branch from 9132124 to 934840d Compare March 18, 2023 15:40

tarruda added 4 commits March 18, 2023 12:40

Move struct definitions in llama.cpp to llama.h

82e70db

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

Add main.cpp back, and invoke llama_main from it

e364847

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

Move model loading back to main.cpp

1088d2d

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

Remove direct access to std streams from llama_main

edc17cf

The goal is to allow running llama_main while connected to other streams, such as TCP sockets. Signed-off-by: Thiago Padilha <thiago@padilha.cc>

tarruda force-pushed the refactor-llama-main branch from 934840d to edc17cf Compare March 18, 2023 15:41

tarruda mentioned this pull request Mar 19, 2023

Proof of concept TCP server mode #278

Closed

tarruda closed this Mar 19, 2023

tarruda deleted the refactor-llama-main branch March 20, 2023 09:57

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor most code in main.cpp into a separate module (preparing to implement TCP mode)#267

Refactor most code in main.cpp into a separate module (preparing to implement TCP mode)#267
tarruda wants to merge 5 commits intoggml-org:masterfrom
tarruda:refactor-llama-main

tarruda commented Mar 18, 2023

Uh oh!

Green-Sky commented Mar 18, 2023

Uh oh!

tarruda commented Mar 18, 2023

Uh oh!

Green-Sky commented Mar 18, 2023

Uh oh!

tarruda commented Mar 18, 2023

Uh oh!

ggerganov commented Mar 18, 2023 •

edited

Loading

Uh oh!

tarruda commented Mar 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tarruda commented Mar 18, 2023

Uh oh!

Green-Sky commented Mar 18, 2023

Uh oh!

tarruda commented Mar 18, 2023

Uh oh!

Green-Sky commented Mar 18, 2023

Uh oh!

tarruda commented Mar 18, 2023

Uh oh!

ggerganov commented Mar 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarruda commented Mar 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggerganov commented Mar 18, 2023 •

edited

Loading