server : add Anthropic Messages API support by noname22 · Pull Request #17570 · ggml-org/llama.cpp

noname22 · 2025-11-28T09:09:16Z

Summary

This PR adds Anthropic Messages API compatibility to llama-server. The implementation converts Anthropic's format to OpenAI-compatible internal format, reusing existing inference pipeline.

Motivation

Enables llama.cpp to serve as a local/self-hosted alternative to Anthropic's Claude API
Allows Claude Code and other Anthropic-compatible clients to work with llama-server

Features Implemented

Endpoints:

POST /v1/messages - Chat completions with streaming support
POST /v1/messages/count_tokens - Token counting for prompts

Functionality:

Streaming with proper Anthropic SSE event types (message_start, content_block_delta, etc.)
Tool use (function calling) with tool_use/tool_result content blocks
Vision support with image content blocks (base64 and URL)
System prompts and multi-turn conversations
Extended thinking parameter support

Testing

Tests in test_anthropic_api.py
Tests cover: basic messages, streaming, tools, vision, token counting, parameters, error handling, content block indices

…se64_with_multimodal_model in test_anthropic_api.py

…response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()

noname22 · 2025-11-28T09:10:35Z

New PR to allow maintainers to edit.
Old PR here: #17425

noname22 · 2025-11-28T11:08:34Z

The RISCV test is getting

The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

I'm guessing it's not related to the PR? Any way to retry?

ngxson · 2025-11-28T11:15:04Z

This PR can be merge when server CI passes. Other CI are not important.

ericcurtin · 2025-11-28T12:20:49Z

I stumbled across this as it hit conflicts with my PR. I am curious. What models does this work with? With sufficient hardware is this capable of beating Claude cloud models?

noname22 · 2025-11-28T12:36:27Z

Technically it works with pretty much any model but to get anywhere near Claude Sonnet you'd probably need a large, agentic model like MiniMax M2, Kimi K2, Qwen3 Coder 480B-A35B, etc.

That being said, I've had decent results for simple tasks with Qwen3 Coder 30B-A3B and gpt-oss-20b on a single 4090.

In my very subjective experience, the same models tend to perform a lot better with the Claude Code CLI app than with alternatives such as Open Code or gemini-cli and its clones, like Qwen3-Coder (the cli app).

ericcurtin · 2025-11-28T12:51:27Z

Interesting... If you want to take a quick peek, I fixed the conflicts here:

#17554

although they weren't major conflicts, it was just moving code from one place to another.

noname22 · 2025-11-29T11:19:41Z

Interesting... If you want to take a quick peek, I fixed the conflicts here:

#17554

although they weren't major conflicts, it was just moving code from one place to another.

What files were the conflicts in, server.cpp?

ngxson · 2025-12-22T11:06:58Z

tools/server/server-task.cpp

+json server_task_result_cmpl_partial::to_json_anthropic() {
+    json events = json::array();
+    bool first = (n_decoded == 1);
+    static bool text_block_started = false;


@noname22 is there any reasons why this is static?

using static here will cause data race problem where one 2 requests running in parallel. ~~please create a PR to remove this static~~ nvm, I'll remove it

* server : add Anthropic Messages API support * remove -@pytest.mark.slow from tool calling/jinja tests * server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py * server : removed redundant n field logic in anthropic_params_from_json * server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream() * server : refactor Anthropic API to use OAI conversion * make sure basic test always go first * clean up * clean up api key check, add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

noname22 added 6 commits November 25, 2025 17:05

server : add Anthropic Messages API support

aa6192d

remove -@pytest.mark.slow from tool calling/jinja tests

f7d463d

server : remove unused code and slow/skip on test_anthropic_vision_ba…

32b65f0

…se64_with_multimodal_model in test_anthropic_api.py

server : removed redundant n field logic in anthropic_params_from_json

c922b4a

server : use single error object instead of error_array in streaming …

f388e35

…response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()

server : refactor Anthropic API to use OAI conversion

728d4ec

noname22 requested review from ggerganov and ngxson as code owners November 28, 2025 09:09

noname22 mentioned this pull request Nov 28, 2025

server : add Anthropic Messages API support #17425

Closed

ngxson added 3 commits November 28, 2025 11:21

make sure basic test always go first

3323564

clean up

b13b41f

clean up api key check, add test

1381ded

ngxson approved these changes Nov 28, 2025

View reviewed changes

github-actions bot added examples python python script changes server labels Nov 28, 2025

ngxson merged commit ddf9f94 into ggml-org:master Nov 28, 2025
65 of 69 checks passed

mostlygeek mentioned this pull request Nov 30, 2025

proxy: add support for anthropic v1/messages api mostlygeek/llama-swap#417

Merged

mpetruc mentioned this pull request Dec 1, 2025

[Feature]: try supporting different types of agent by using langchain deep agent CherryHQ/cherry-studio#11592

Closed

4 tasks

ericcurtin mentioned this pull request Dec 3, 2025

Implement and test Anthropic Messages API docker/model-runner#475

Closed

Copilot AI mentioned this pull request Dec 3, 2025

Implement Anthropic Messages API support ericcurtin/model-runner#75

Draft

isgallagher mentioned this pull request Dec 12, 2025

Eval bug: [Router] Claude Code - model not found - invalid_request_error #17968

Closed

ngxson reviewed Dec 22, 2025

View reviewed changes

ngxson mentioned this pull request Dec 22, 2025

server: fix data race in to_json_anthropic #18283

Merged

countzero mentioned this pull request Jan 5, 2026

Misc. bug: llama-server responds only with last token in Claude Code CLI #18613

Closed

louis-jan mentioned this pull request Jan 24, 2026

feat: Enable proxy to llama.cpp for Anthropic Messages API support. janhq/jan#7395

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : add Anthropic Messages API support#17570

server : add Anthropic Messages API support#17570
ngxson merged 9 commits intoggml-org:masterfrom
noname22:feature/anthropic-api-support

noname22 commented Nov 28, 2025

Uh oh!

noname22 commented Nov 28, 2025

Uh oh!

noname22 commented Nov 28, 2025

Uh oh!

ngxson commented Nov 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

ericcurtin commented Nov 28, 2025 •

edited

Loading

Uh oh!

noname22 commented Nov 28, 2025

Uh oh!

ericcurtin commented Nov 28, 2025 •

edited

Loading

Uh oh!

noname22 commented Nov 29, 2025

Uh oh!

ngxson Dec 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

noname22 commented Nov 28, 2025

Summary

Motivation

Features Implemented

Testing

Uh oh!

noname22 commented Nov 28, 2025

Uh oh!

noname22 commented Nov 28, 2025

Uh oh!

ngxson commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ericcurtin commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noname22 commented Nov 28, 2025

Uh oh!

ericcurtin commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noname22 commented Nov 29, 2025

Uh oh!

ngxson Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Nov 28, 2025 •

edited

Loading

ericcurtin commented Nov 28, 2025 •

edited

Loading

ericcurtin commented Nov 28, 2025 •

edited

Loading

ngxson Dec 22, 2025 •

edited

Loading