server: use httplib dynamic threads by ngxson · Pull Request #20817 · ggml-org/llama.cpp

ngxson · 2026-03-20T19:22:29Z

Alternative to #20799

Fix #20684

Server can now create up to maximum of 1024 threads on demand. Dynamic threads will be terminated once they finish their task.

rgerganov

This seems to work well and while it's not the ideal solution, I think it is good enough for all practical purposes. Thanks!

rgerganov · 2026-03-23T08:51:33Z

+        // spawn n_threads_http fixed thread (always alive), while allow up to 1024 max possible number of threads
+        // when n_threads_http is used, server will create new "dynamic" threads that will be destroyed after processing each request
+        // ref: https://github.com/yhirose/cpp-httplib/pull/2368
+        return new httplib::ThreadPool(n_threads_http, 1024);


just in case, let's use std::max(1024, n_threads_http) to support more than 1024:

return new httplib::ThreadPool(n_threads_http, std::max(1024, n_threads_http));

good catch, I changed the logic to n_threads_http + 1024 instead. if n_threads_http is high because of n_parallel = 2000 for example, then we always have 1024 for overhead connections

* server: use httplib dynamic threads * change to n_threads_http + 1024

server: use httplib dynamic threads

252540f

ngxson requested a review from a team as a code owner March 20, 2026 19:22

ngxson requested a review from rgerganov March 20, 2026 19:22

github-actions Bot added examples server labels Mar 20, 2026

rgerganov approved these changes Mar 23, 2026

View reviewed changes

rgerganov mentioned this pull request Mar 23, 2026

server : add special handling for /health in httplib #20799

Closed

rgerganov reviewed Mar 23, 2026

View reviewed changes

change to n_threads_http + 1024

2c4e8e5

ngxson requested a review from rgerganov March 23, 2026 09:39

rgerganov approved these changes Mar 23, 2026

View reviewed changes

ngxson merged commit 31a5cf4 into ggml-org:master Mar 23, 2026
48 checks passed

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

server: use httplib dynamic threads (ggml-org#20817)

93ce8e6

* server: use httplib dynamic threads * change to n_threads_http + 1024

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

server: use httplib dynamic threads (ggml-org#20817)

e531382

* server: use httplib dynamic threads * change to n_threads_http + 1024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: use httplib dynamic threads#20817

server: use httplib dynamic threads#20817
ngxson merged 2 commits intoggml-org:masterfrom
ngxson:xsn/server_dynamic_threads

ngxson commented Mar 20, 2026

Uh oh!

rgerganov left a comment

Uh oh!

rgerganov Mar 23, 2026 •

edited

Loading

Uh oh!

ngxson Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngxson commented Mar 20, 2026

Uh oh!

rgerganov left a comment

Choose a reason for hiding this comment

Uh oh!

rgerganov Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rgerganov Mar 23, 2026 •

edited

Loading