Skip to content

server: continue to update other slots on embedding concurrent request#5699

Merged
phymbert merged 3 commits intomasterfrom
hotfix/server-issue-5655-concurrent-embedding-final
Feb 24, 2024
Merged

server: continue to update other slots on embedding concurrent request#5699
phymbert merged 3 commits intomasterfrom
hotfix/server-issue-5655-concurrent-embedding-final

Conversation

@phymbert
Copy link
Copy Markdown
Collaborator

@phymbert phymbert commented Feb 24, 2024

Context

If multiple slots are computing embedding, only the first one is updated.

Changes

Continue to update remaining slots in update_slots in the main loop on embedding task.
Test scenario moved to parallel feature.

Closes #5655

…t request.

server: tests: add multi users embeddings as fixed
@phymbert phymbert requested review from ggerganov and ngxson February 24, 2024 12:05
@phymbert phymbert added bug Something isn't working server/webui labels Feb 24, 2024
Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go 🚀

@phymbert
Copy link
Copy Markdown
Collaborator Author

I will enjoy this PR to add OAI compatible embeddings concurrent scenario

Copy link
Copy Markdown
Contributor

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@phymbert phymbert merged commit 9e359a4 into master Feb 24, 2024
@phymbert phymbert deleted the hotfix/server-issue-5655-concurrent-embedding-final branch February 24, 2024 18:16
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
ggml-org#5699)

* server: ggml-org#5655 - continue to update other slots on embedding concurrent request.

* server: tests: add multi users embeddings as fixed

* server: tests: adding OAI compatible embedding concurrent endpoint

* server: tests: adding OAI compatible embedding with multiple inputs
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
ggml-org#5699)

* server: ggml-org#5655 - continue to update other slots on embedding concurrent request.

* server: tests: add multi users embeddings as fixed

* server: tests: adding OAI compatible embedding concurrent endpoint

* server: tests: adding OAI compatible embedding with multiple inputs
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
ggml-org#5699)

* server: ggml-org#5655 - continue to update other slots on embedding concurrent request.

* server: tests: add multi users embeddings as fixed

* server: tests: adding OAI compatible embedding concurrent endpoint

* server: tests: adding OAI compatible embedding with multiple inputs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working server/webui

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation fault

3 participants