Skip to content

Streaming messages not compliant with openAI spec #340

@Lederstrumpf

Description

@Lederstrumpf

LocalAI version:
ed5df1e

Describe the bug
Some mismatches between localAI's and openAI's streaming messages:

  1. when streaming, openAI does not send the role key with every data message, but instead only sends the role in an initial delta message that lacks any content, which all follow in lean data messages. localAI streams currently send the role key with every delta message. Aside from not being compatible, this is also inefficient over the wire.
  2. when streaming, openAI terminates the stream not only with a ..., choices: [... finish_reason: stop] message, but also a separate message containing only "[DONE]". localAI streams currently lack this.

These break integration with tools which trigger explicitly on these expected aspects of the openAI spec, such as org-ai. For 2., see https://platform.openai.com/docs/api-reference/chat/create#chat/create-stream, and for 1., see https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb (I couldn't find this in the spec itself, only the examples).
There are some other differences (lack of id & created keys, and localAI superfluously sends consecutive data events as named events) that haven't lead to any practical issues in my testing.

To Reproduce
An example response from localAI's streaming API:

HTTP/1.1 200 OK
Date: Sun, 21 May 2023 11:01:33 GMT
Content-Type: text/event-stream
Vary: Origin
Access-Control-Allow-Origin: *
Cache-Control: no-cache
Connection: keep-alive
Transfer-Encoding: chunked

event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":"1"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":" 2"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"object":"chat.completion.chunk","model":"ggml-gpt4all-j","choices":[{"delta":{"role":"assistant","content":" 3"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}


event: data

data: {"model":"ggml-gpt4all-j","choices":[{"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Expected behavior
An example response from openAI's v1 streaming API:

HTTP/1.1 200 OK
Date: Sun, 21 May 2023 10:44:49 GMT
Content-Type: text/event-stream
Transfer-Encoding: chunked
Connection: keep-alive
access-control-allow-origin: *
Cache-Control: no-cache, must-revalidate
openai-model: gpt-4-0314
openai-organization: user-blablabla
openai-processing-ms: 1210
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-ratelimit-limit-requests: 200
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 199
x-ratelimit-remaining-tokens: 39963
x-ratelimit-reset-requests: 300ms
x-ratelimit-reset-tokens: 55ms
x-request-id: blablabla
CF-Cache-Status: DYNAMIC
Server: cloudflare
CF-RAY: blablabla
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"1"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"2"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"3"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions