Is there an existing issue for this?
Description
Implement the new streaming endpoint to get output from LLM to the user as results are coming in.
1) Content streaming (text)
User hits the runPixelAsync method call with the following python code:
from ai_server import ModelEngine
model = ModelEngine(engine_id = "b0d18f4b-ff2c-4563-8f9d-57efbff53d60")
# Text Generation
command = '\''write me a paragraph about soccer'\''
output = model.ask(command = command, param_dict={'\''max_completion_tokens'\'':20000,'\''temperature'\'':0.3})
output
The run pixel async returns a json with the jobId
{
"jobId": "019a6f97-9c11-73a9-a540-61325b1c2f5c"
}
The new endpoint for streaming - POST /Monolith/api/engine/pixelJobStreaming with body --data-urlencode 'jobId=019a6f96-1192-7cdc-8f61-7e2362f6ed5e' - will return new messages in the format:
{
"message": [
{
"stream_type": "content",
"data": {
"content": ""
}
},
{
"stream_type": "content",
"data": {
"content": "Soccer"
}
},
{
"stream_type": "content",
"data": {
"content": ","
}
},
{
"stream_type": "content",
"data": {
"content": " known"
}
},
{
"stream_type": "content",
"data": {
"content": " as football"
}
},
......
{
"stream_type": "content",
"data": {
"content": " cher"
}
},
{
"stream_type": "content",
"data": {
"content": "ished clubs"
}
},
{
"stream_type": "content",
"data": {
"content": "."
}
},
{
"stream_type": "content",
"data": {
"finish_reason": "stop"
}
}
],
"status": "ProgressComplete"
}
the FE will continue calling this until it gets a message where data contains the key "finish_reason" and the value will be the reason why it ended (like stop for naturally finished generations or length if token_limit has been reached and the response is truncated, etc.)
2) Tool calling streaming
Is there an existing issue for this?
Description
Implement the new streaming endpoint to get output from LLM to the user as results are coming in.
1) Content streaming (text)
User hits the runPixelAsync method call with the following python code:
The run pixel async returns a json with the jobId
{ "jobId": "019a6f97-9c11-73a9-a540-61325b1c2f5c" }The new endpoint for streaming - POST /Monolith/api/engine/pixelJobStreaming with body --data-urlencode 'jobId=019a6f96-1192-7cdc-8f61-7e2362f6ed5e' - will return new messages in the format:
{ "message": [ { "stream_type": "content", "data": { "content": "" } }, { "stream_type": "content", "data": { "content": "Soccer" } }, { "stream_type": "content", "data": { "content": "," } }, { "stream_type": "content", "data": { "content": " known" } }, { "stream_type": "content", "data": { "content": " as football" } }, ...... { "stream_type": "content", "data": { "content": " cher" } }, { "stream_type": "content", "data": { "content": "ished clubs" } }, { "stream_type": "content", "data": { "content": "." } }, { "stream_type": "content", "data": { "finish_reason": "stop" } } ], "status": "ProgressComplete" }the FE will continue calling this until it gets a message where data contains the key "finish_reason" and the value will be the reason why it ended (like stop for naturally finished generations or length if token_limit has been reached and the response is truncated, etc.)
2) Tool calling streaming