-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
LocalAI version:
ed5df1e
Describe the bug
Some mismatches between localAI's and openAI's streaming messages:
- when streaming, openAI does not send the
rolekey with every data message, but instead only sends therolein an initial delta message that lacks anycontent, which all follow in lean data messages. localAI streams currently send therolekey with every delta message. Aside from not being compatible, this is also inefficient over the wire. - when streaming, openAI terminates the stream not only with a
..., choices: [... finish_reason: stop]message, but also a separate message containing only "[DONE]". localAI streams currently lack this.
These break integration with tools which trigger explicitly on these expected aspects of the openAI spec, such as org-ai. For 2., see https://platform.openai.com/docs/api-reference/chat/create#chat/create-stream, and for 1., see https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb (I couldn't find this in the spec itself, only the examples).
There are some other differences (lack of id & created keys, and localAI superfluously sends consecutive data events as named events) that haven't lead to any practical issues in my testing.
To Reproduce
An example response from localAI's streaming API:
HTTP/1.1 200 OK
Date: Sun, 21 May 2023 11:01:33 GMT
Content-Type: text/event-stream
Vary: Origin
Access-Control-Allow-Origin: *
Cache-Control: no-cache
Connection: keep-alive
Transfer-Encoding: chunked
event: data
data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":"1"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
event: data
data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":" 2"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
event: data
data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":" 3"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
event: data
data: {"model":"ggml-gpt4all-j","choices":[{"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
Expected behavior
An example response from openAI's v1 streaming API:
HTTP/1.1 200 OK
Date: Sun, 21 May 2023 10:44:49 GMT
Content-Type: text/event-stream
Transfer-Encoding: chunked
Connection: keep-alive
access-control-allow-origin: *
Cache-Control: no-cache, must-revalidate
openai-model: gpt-4-0314
openai-organization: user-blablabla
openai-processing-ms: 1210
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-ratelimit-limit-requests: 200
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 199
x-ratelimit-remaining-tokens: 39963
x-ratelimit-reset-requests: 300ms
x-ratelimit-reset-tokens: 55ms
x-request-id: blablabla
CF-Cache-Status: DYNAMIC
Server: cloudflare
CF-RAY: blablabla
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"1"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"2"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"3"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}
data: [DONE]