server : Smart selection of available slot using Longest Common Prefix#7728
server : Smart selection of available slot using Longest Common Prefix#7728ggerganov merged 6 commits intoggml-org:masterfrom
Conversation
|
I'll test again and then mark the PR as ready for review. |
|
By the way, should this be on by default? Or is it better to leave it off as it is now? |
The LCS algorithm is an overkill for this purpose. All you need to look for is the longest common prefix, which is much simpler to compute |
|
As far as I know, the server can reuse not only the prompt prefix, but also the suffix ( |
|
Although the So for now the slot selection logic is better to follow the prompt caching logic and look just at the prefix |
42fb804 to
a8842fd
Compare
|
Would this be adding new parameters to the server command line in order to enable this feature? |




In the current implementation, an available slot is selected using LRU (Least Recently Used). This PR adds slot selection by
LCS (Longest Common Substring)LCP (Longest Common Prefix) algorithm to select a slot with a prompt that has at leastn%similarity to the requested prompt. This reduces prompt processing in multi-user scenarios.Additionally, this PR: