server : return error on too large embedding input#7389
Merged
Conversation
JohannesGaessler
approved these changes
May 19, 2024
Contributor
JohannesGaessler
left a comment
There was a problem hiding this comment.
Definitely an improvement over master.
Contributor
Seunghhon
pushed a commit
to Seunghhon/llama.cpp
that referenced
this pull request
Apr 26, 2026
phuongncn
pushed a commit
to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4
that referenced
this pull request
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




fix #7277
The input for embeddings must fit inside the physical batch size. If it does not, we now return an error:
$ ▶ curl -X POST "http://localhost:8080/embedding" --data '{"content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam rhoncus mauris eget magna semper, ut varius arcu eleifend. Vestibulum quis justo eget ex pretium sollicitudin. Nam euismod orci vulputate erat sagittis, sed pulvinar ante varius. Proin in dui non eros sodales tempus. Proin et mi scelerisque tellus eleifend auctor. Sed sagittis erat sapien, in porttitor augue bibendum nec. Nam ut mi accumsan lorem volutpat tempus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. In ac nulla tempor, pharetra felis id, venenatis tortor. Donec felis turpis, egestas non ligula at, eleifend fringilla est. Fusce elit mi, fermentum a sapien eleifend, rutrum scelerisque eros. Sed et vestibulum orci. Quisque ut magna vel nibh accumsan dictum eget eu urna. Duis rhoncus, lacus in imperdiet tincidunt, turpis turpis vestibulum ante, at mollis nisi massa et purus. Phasellus sed ante eros. Aenean consequat nisi non massa eleifend finibus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce ultrices libero id metus consequat semper. Nam venenatis, est quis interdum commodo, nisl ex placerat diam, sed fringilla ex nisi sed sem. Pellentesque luctus orci id tellus dictum tristique. Integer molestie varius risus quis maximus. In id feugiat nulla, at scelerisque massa. Nulla neque diam, consequat ac orci laoreet, venenatis pharetra enim.Aenean rhoncus dapibus augue ac volutpat. Nullam laoreet, lorem quis fermentum scelerisque,"}' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1613 100 120 100 1493 32795 398k --:--:-- --:--:-- --:--:-- 525k { "error": { "code": 500, "message": "input is too large to process. increase the physical batch size", "type": "server_error" } }