Skip to content

server: allow router to report child instances sleep status#20849

Merged
ngxson merged 4 commits intoggml-org:masterfrom
ngxson:xsn/server_router_sleep_status
Mar 22, 2026
Merged

server: allow router to report child instances sleep status#20849
ngxson merged 4 commits intoggml-org:masterfrom
ngxson:xsn/server_router_sleep_status

Conversation

@ngxson
Copy link
Copy Markdown
Contributor

@ngxson ngxson commented Mar 21, 2026

Allow router server to report child sleeping status via /models endpoint, example:

{
  "data": [{
    "id": "LiquidAI/LFM2-350M-GGUF",
    "aliases": [],
    "tags": [],
    "object": "model",
    "owned_by": "llamacpp",
    "created": 1774133483,
    "status": {
      "value": "sleeping",
      "args": [....],
      "preset": "[LiquidAI/LFM2-350M-GGUF]\nsleep-idle-seconds = 10\nhf-repo = LiquidAI/LFM2-350M-GGUF\n\n"
    }
  },
...

@ngxson ngxson requested a review from a team as a code owner March 22, 2026 00:25
Comment thread tools/server/server-models.h Outdated
Comment on lines +58 to +59
server_model_status status = SERVER_MODEL_STATUS_UNLOADED;
bool is_sleeping = false; // whether the model is in sleeping state (only valid when status == LOADED)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a new status SERVER_MODEL_STATUS_SLEEPING instead of the extra bool?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a bit complicated but still possible, my initial plan is to consider "sleeping" as a sub-state of "loaded". but on second thought, it maybe more beneficial in the long term to have just one set of states.

I implemented it in 02c8b88 , it also requires changing the semantic of "loaded" in some places to be "ready", with "ready" means either "loaded" or "sleeping"

@ngxson ngxson requested a review from ggerganov March 22, 2026 15:05
@ngxson ngxson merged commit 49bfdde into ggml-org:master Mar 22, 2026
46 of 48 checks passed
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
…#20849)

* server: allow router to report child instances sleep status

* refactor

* move sleeping to state

* nits
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
…#20849)

* server: allow router to report child instances sleep status

* refactor

* move sleeping to state

* nits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants