server: split HTTP into its own interface by ngxson · Pull Request #17216 · ggml-org/llama.cpp

ngxson · 2025-11-12T17:50:01Z

How it works:

sequenceDiagram
    participant User
    participant server_http_context
    participant server_http_res
    
    User->>server_http_context: request
    server_http_context->>server_http_req: create request
    server_http_req->>handler:
    handler->>server_http_res: create response
    
    loop for each result
        server_http_res->>server_http_context: response chunk
        server_http_context->>User: response chunk
        server_http_context->>server_http_res: next()
    end

    server_http_res->>server_http_context: terminate
    server_http_context->>User: close connection

Each endpoint handler returns a server_res_generator, which is a derived class from server_http_res
The server_res_generator indicates one of 2 modes: stream or non-stream
- In non-stream mode, we simply return the data back to user
- In stream mode, we call server_res_generator::next() until it returns false. Each time we call next(), we get a new chunk of data

TODO:

fix error handling
add exception handler at server_routes level

Testing:

passed automated tests.sh
test normal usage with web UI (with multimodal input)
test usage with web UI, with concurrent requests and random interruptions

ngxson · 2025-11-13T10:27:35Z

No rush for reviewing this, would appreciate if you can do some testings on your side @ggerganov

In the next PR, I'll try to break the server.cpp into smaller pieces, the rough plan will be:

server-context.cpp
server-queue.cpp
server-task.cpp (containing both task + response + queue)
server-common.cpp (everything else)

While working on this, I'm also thinking about maybe re-using server code in llama-cli (I made a demo here); the main benefit will be to bring the same webui experience into CLI, including multimodal support, conversation control (delete/regenerate message), tool call, etc. The old CLI can be moved to llama-completion and the chat support will be removed from it. What do you think about this idea?

* server: split HTTP into its own interface * move server-http and httplib to its own file * add the remaining endpoints * fix exception/error handling * renaming * missing header * fix missing windows header * fix error responses from http layer * fix slot save/restore handler * fix case where only one stream chunk is returned * add NOMINMAX * do not call sink.write on empty data * use safe_json_to_str for SSE * clean up * add some comments * improve usage of next() * bring back the "server is listening on" message * more generic handler * add req.headers * move the chat template print to init() * add req.path * cont : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

github-actions bot added examples server labels Nov 12, 2025

ngxson mentioned this pull request Nov 12, 2025

server: (refactor) implement generator-based API for task results #17174

Merged

server: split HTTP into its own interface

45b2fe1

ngxson force-pushed the xsn/split_http_server_context branch from 0594df9 to 45b2fe1 Compare November 12, 2025 17:53

ngxson added 5 commits November 12, 2025 21:00

move server-http and httplib to its own file

fe98058

add the remaining endpoints

473b0e5

fix exception/error handling

a2e6a00

renaming

66c6fe2

missing header

92a150f

ngxson mentioned this pull request Nov 12, 2025

PoC llama-cli using server code ngxson/llama.cpp#35

Draft

ngxson added 8 commits November 12, 2025 23:22

fix missing windows header

d990534

fix error responses from http layer

f428fe5

fix slot save/restore handler

25cc7eb

fix case where only one stream chunk is returned

3be8a3a

add NOMINMAX

9917e04

do not call sink.write on empty data

fc35e91

use safe_json_to_str for SSE

8c7fbec

clean up

da458d6

ngxson marked this pull request as ready for review November 13, 2025 10:21

ngxson requested a review from ggerganov as a code owner November 13, 2025 10:21

add some comments

cd10470

This was referenced Nov 13, 2025

cmake : move OpenSSL linking to vendor/cpp-httplib #17177

Merged

server: fixing naming conflict res_error #17243

Merged

DajanaV mentioned this pull request Nov 13, 2025

UPSTREAM PR #17243: server: fixing naming conflict res_error auroralabs-loci/llama.cpp#195

Open

ngxson added 2 commits November 14, 2025 15:03

Merge branch 'master' into xsn/split_http_server_context

8dbe547

improve usage of next()

1bc41f6

DajanaV mentioned this pull request Nov 14, 2025

UPSTREAM PR #17216: server: split HTTP into its own interface auroralabs-loci/llama.cpp#208

Open

5 tasks

bring back the "server is listening on" message

55ccf46

ngxson and others added 5 commits November 15, 2025 21:17

more generic handler

4d37cee

add req.headers

68d5c6f

move the chat template print to init()

2c9fe91

add req.path

016f8b4

cont : minor

2ba1443

ggerganov approved these changes Nov 17, 2025

View reviewed changes

ngxson merged commit 0de8878 into ggml-org:master Nov 17, 2025
65 of 69 checks passed

ddh0 mentioned this pull request Nov 18, 2025

Compile bug: error: ‘svr’ was not declared in this scope after #17216 #17341

Closed

This was referenced Nov 27, 2025

New llama-run #17554

Open

llama-cli: add support for reasoning #16603

Open

Feature Request: Better chat UX for llama-cli #11202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: split HTTP into its own interface#17216

server: split HTTP into its own interface#17216
ngxson merged 23 commits intoggml-org:masterfrom
ngxson:xsn/split_http_server_context

ngxson commented Nov 12, 2025 •

edited

Loading

Uh oh!

ngxson commented Nov 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngxson commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Nov 12, 2025 •

edited

Loading

ngxson commented Nov 13, 2025 •

edited

Loading