Streaming on LLM calls

### Is there an existing issue for this?

- [x] I confirm that I have not found an existing or similar issue.

### Description

Implement the new streaming endpoint to get output from LLM to the user as results are coming in.

#### 1) Content streaming (text)

User hits the runPixelAsync method call with the following python code:

```python

from ai_server import ModelEngine
model = ModelEngine(engine_id = "b0d18f4b-ff2c-4563-8f9d-57efbff53d60")

# Text Generation
command = '\''write me a paragraph about soccer'\''
output = model.ask(command = command, param_dict={'\''max_completion_tokens'\'':20000,'\''temperature'\'':0.3})
output

```

The run pixel async returns a json with the jobId

```json
{
    "jobId": "019a6f97-9c11-73a9-a540-61325b1c2f5c"
}
```

The new endpoint for streaming - POST /Monolith/api/engine/pixelJobStreaming with body --data-urlencode 'jobId=019a6f96-1192-7cdc-8f61-7e2362f6ed5e' - will return new messages in the format:

```json
{
    "message": [
        {
            "stream_type": "content",
            "data": {
                "content": ""
            }
        },
        {
            "stream_type": "content",
            "data": {
                "content": "Soccer"
            }
        },
        {
            "stream_type": "content",
            "data": {
                "content": ","
            }
        },
        {
            "stream_type": "content",
            "data": {
                "content": " known"
            }
        },
        {
            "stream_type": "content",
            "data": {
                "content": " as football"
            }
        },
        ......
        {
            "stream_type": "content",
            "data": {
                "content": " cher"
            }
        },
        {
            "stream_type": "content",
            "data": {
                "content": "ished clubs"
            }
        },
        {
            "stream_type": "content",
            "data": {
                "content": "."
            }
        },
        {
            "stream_type": "content",
            "data": {
                "finish_reason": "stop"
            }
        }
    ],
    "status": "ProgressComplete"
}
```

the FE will continue calling this until it gets a message where data contains the key "finish_reason" and the value will be the reason why it ended (like stop for naturally finished generations or length if token_limit has been reached and the response is truncated, etc.)

#### 2) Tool calling streaming







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming on LLM calls #2178

Is there an existing issue for this?

Description

1) Content streaming (text)

2) Tool calling streaming

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streaming on LLM calls #2178

Description

Is there an existing issue for this?

Description

1) Content streaming (text)

2) Tool calling streaming

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions