server : add Anthropic Messages API support#17570
Conversation
…se64_with_multimodal_model in test_anthropic_api.py
…response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()
|
New PR to allow maintainers to edit. |
|
The RISCV test is getting
I'm guessing it's not related to the PR? Any way to retry? |
|
This PR can be merge when server CI passes. Other CI are not important. |
|
I stumbled across this as it hit conflicts with my PR. I am curious. What models does this work with? With sufficient hardware is this capable of beating Claude cloud models? |
|
Technically it works with pretty much any model but to get anywhere near Claude Sonnet you'd probably need a large, agentic model like MiniMax M2, Kimi K2, Qwen3 Coder 480B-A35B, etc. That being said, I've had decent results for simple tasks with Qwen3 Coder 30B-A3B and gpt-oss-20b on a single 4090. In my very subjective experience, the same models tend to perform a lot better with the Claude Code CLI app than with alternatives such as Open Code or gemini-cli and its clones, like Qwen3-Coder (the cli app). |
|
Interesting... If you want to take a quick peek, I fixed the conflicts here: although they weren't major conflicts, it was just moving code from one place to another. |
What files were the conflicts in, server.cpp? |
| json server_task_result_cmpl_partial::to_json_anthropic() { | ||
| json events = json::array(); | ||
| bool first = (n_decoded == 1); | ||
| static bool text_block_started = false; |
There was a problem hiding this comment.
@noname22 is there any reasons why this is static?
using static here will cause data race problem where one 2 requests running in parallel. please create a PR to remove this nvm, I'll remove itstatic
* server : add Anthropic Messages API support * remove -@pytest.mark.slow from tool calling/jinja tests * server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py * server : removed redundant n field logic in anthropic_params_from_json * server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream() * server : refactor Anthropic API to use OAI conversion * make sure basic test always go first * clean up * clean up api key check, add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* server : add Anthropic Messages API support * remove -@pytest.mark.slow from tool calling/jinja tests * server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py * server : removed redundant n field logic in anthropic_params_from_json * server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream() * server : refactor Anthropic API to use OAI conversion * make sure basic test always go first * clean up * clean up api key check, add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Summary
This PR adds Anthropic Messages API compatibility to llama-server. The implementation converts Anthropic's format to OpenAI-compatible internal format, reusing existing inference pipeline.
Motivation
Features Implemented
Endpoints:
Functionality:
Testing