Implement basic chat/completions openai endpoint by teddybear082 · Pull Request #461 · LostRuins/koboldcpp

teddybear082 · 2023-10-03T17:25:05Z

-Basic support for openai chat/completions endpoint documented at: https://platform.openai.com/docs/api-reference/chat/create

-Tested with example code from openai for chat/completions and chat/completions with stream=True parameter found here: https://cookbook.openai.com/examples/how_to_stream_completions.

-Tested with Mantella, the skyrim mod that turns all the NPC's into AI chattable characters, which uses openai's acreate / async competions method: https://github.com/art-from-the-machine/Mantella/blob/main/src/output_manager.py

-Tested default koboldcpp api behavior with streaming and non-streaming generate endpoints and running GUI and seems to be fine.

-Still TODO / evaluate before merging:

(1) determine if there is a way to use kobold's prompt formats for certain models when translating openai messages format into a prompt string. (Not sure if possible or where these are in the code)

(2) remove print statements throughout new code used for debug / evaluation purposes

Note I am a python noob, so if there is a more elegant way of doing this at minimum hopefully I have done some of the grunt work for you to implement on your own.

-Basic support for openai chat/completions endpoint documented at: https://platform.openai.com/docs/api-reference/chat/create -Tested with example code from openai for chat/completions and chat/completions with stream=True parameter found here: https://cookbook.openai.com/examples/how_to_stream_completions. -Tested with Mantella, the skyrim mod that turns all the NPC's into AI chattable characters, which uses openai's acreate / async competions method: https://github.com/art-from-the-machine/Mantella/blob/main/src/output_manager.py -Tested default koboldcpp api behavior with streaming and non-streaming generate endpoints and running GUI and seems to be fine. -Still TODO / evaluate before merging: (1) implement rest of openai chat/completion parameters to the extent possible, mapping to koboldcpp parameters (2) determine if there is a way to use kobold's prompt formats for certain models when translating openai messages format into a prompt string. (Not sure if possible or where these are in the code) (3) have chat/completions responses include the actual local model the user is using instead of just koboldcpp (Not sure if this is possible) Note I am a python noob, so if there is a more elegant way of doing this at minimum hopefully I have done some of the grunt work for you to implement on your own.

-Mistakenly left code relating to streaming argument from main branch in experimental.

-support stop parameter mapped to koboldai stop_sequence parameter -make default max_length / max_tokens parameter consistent with default 80 token length in generate function -add support for providing name of local model in openai responses

This reverts commit 443a6f7.

-support stop parameter mapped to koboldai stop_sequence parameter -make default max_length / max_tokens parameter consistent with default 80 token length in generate function -add support for providing name of local model in openai responses

LostRuins · 2023-10-04T05:06:02Z

Does mantella require using the chat completions api? i never liked that API since it forces everything into a specific instruct format. Will it work with the standard completions (https://platform.openai.com/docs/api-reference/completions/create#completions/create-stream) api instead?

teddybear082 · 2023-10-04T10:27:45Z

Yeah mantella requires using the chat completions api, I think this seems to be overall becoming standard because completions endpoint is marked as "legacy" and also it seems that's the way of steering the newer models like 3.5-turbo variants and GPT4 which are supported by mantella. Herika (the other skyrim AI mod for followers) also uses the chat completions API. Is there any way currently to "find" in the koboldcpp code what template is currently assigned to a model? I was going to look at that today but had not gotten a chance yet. I think that's the last thing I'd like to get into my PR if possible instead of using the ###Instruction / ###Response hack to translate the openai messages format into a prompt. The hack actually seems to work fine for me so far with my test scripts [which I had accidentally submitted in one of the changes above and then reverted if you want to use them :) ] as well as mantella with llama2-gguf and synthia-guff 7b models so far, but I imagine would break some models.

EDIT: I did look at the code and I have no idea how you make kobold work so well with so many models, and the scenarios are really fun too. I don’t see anywhere prompt templates for different models like llama2 vs. llama vs. pygmalion that are loaded in koboldcpp.py. Anyway, amazing work, this is a monster project for sure!

LostRuins · 2023-10-04T14:41:09Z

Yeah the code is quite cryptic haha. You can see the instruct prompt templates here.
https://github.com/LostRuins/lite.koboldai.net/blob/main/index.html#L6207
Alpaca will do fine.

teddybear082 · 2023-10-04T15:02:57Z

Thanks! And thanks btw for working through this with me and spending time reviewing, can't imagine how much time you spend on this project for everyone's benefit given all the upstream changes alone.

Is there somewhere in the koboldcpp code this information from the lite.kobold.net code is passed through and saved? Or the front end basically reformats the user's simple free-text prompt and then sends the whole formatted prompt to koboldcpp.py so koboldcpp.py is not typically involved in the formatting at all?

Basically I am looking to either at around line 417 or around line 1751 of koboldcpp.py, pull the applicable template based on the model, store "openai_system_prefix", "openai_user_prefix" "openai_assistant_prefix" based on the model and apply it accordingly when parsing the messages in openai format when user is using chat/completions API.

Failing that, if koboldcpp.py does not have access to the templates in a variable or anything I might be able to do a simple approximation based on, say, the most popular 3 model types based on parsing local_model_name and if not found, default to alpaca if that is acceptable to you, based on the code you shared about templates from lite.kobold.net.

EDIT: OHHH neat, I see now so what I defaulted to (I think?) basically is the ###INSTRUCTION: / ###RESPONSE: default format for alpaca. Great. I'll take a look and make sure it completely conforms to that standard if I wind up having to just go with that and not make model specific ones, like making it not all caps; explains why "it just worked" I guess with my hack.

to conform with alpaca standard used as default in lite.koboldai.net

LostRuins

Hi @teddybear082 I've cleaned up the code a little, can you test that everything is working for you, including both streaming and non-streaming versions?

Regarding the instruct tag formats, I think we can just stick to Alpaca format. Since this information is not stored in the model file, nor does the official OAI endpoint support setting it, we would need a custom endpoint to set it, something I feel is currently unnecessary,

teddybear082 · 2023-10-05T07:44:41Z

Yes thank you I will take a look! Much appreciated!

teddybear082 · 2023-10-05T11:39:45Z

These changes worked on all my tests (chat completion, streaming chat completion, chat completion with stop parameter as a string, streaming chat completion with stop parameters as a list, and in mantella). THANK YOU!!!! I sent the new code over to the mantella and herika discord servers to let a few other people test today; assuming no one reports problems, if you're comfortable with it I think this can be merged whenever you see fit.

LostRuins

lgtm

lofcz · 2023-10-08T15:28:07Z

Thanks for the initiative! Currently, I see a problem with the implementation - tried it with Mistral7b (fine-tuned on openorca). See this line https://github.com/LostRuins/koboldcpp/pull/461/files#diff-885e6237f0dc0cc77c7b4a47ef801248f4d2e6a7743b37b85a451c3ac446cbd2R414

We are setting the system message templates in a hardcoded manner, the model I tried expects [INST] msg [/INST] for a user prompt and <s>[INST] msg [/INST] for a system prompt. KoboldCpp allows to set the message template, however, this is not possible with the OpenAI-like API currently.

Can we extend the API to accept an optional object adapter to the root of /v1/chat/completions request that would contain KoboldCpp-specific settings? I can offer to implement this but I'd like to gather feedback on whether this is an agreeable approach.

{
    "temperature": 0.5,
    "messages": [
        {
            "role": "system",
            "content": "you are a dungeons and dragons dungeon master"
        },
        ...
    ], 
+   "adapter": {
+        "templates": {
+              "system": { "start": "", "end": ""},
+              "user": { "start": "", "end": "" }
+          }
+   }
}

Introducing an optional adapter object still has major advantages over using the native KoboldCpp's API for libraries written primarily for OpenAI as we can introduce just one optional field that the user can fill as they see fit.

Implemented in #466

teddybear082 added 5 commits October 3, 2023 13:24

Fix typographical error on deleted streaming argument

42c2f2c

-Mistakenly left code relating to streaming argument from main branch in experimental.

Revert "add additional openai chat completions parameters"

e68455d

This reverts commit 443a6f7.

add /n after formatting prompts from openaiformat

3308b50

to conform with alpaca standard used as default in lite.koboldai.net

LostRuins added the enhancement New feature or request label Oct 5, 2023

LostRuins added 2 commits October 5, 2023 14:36

tidy up and simplify code, do not set globals for streaming

0d14eed

oai endpoints must start with v1

3def4b3

LostRuins reviewed Oct 5, 2023

View reviewed changes

LostRuins approved these changes Oct 5, 2023

View reviewed changes

LostRuins merged commit f9f4cdf into LostRuins:concedo_experimental Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement basic chat/completions openai endpoint#461

Implement basic chat/completions openai endpoint#461
LostRuins merged 8 commits intoLostRuins:concedo_experimentalfrom
teddybear082:experimental_openai_chat_completions_api

teddybear082 commented Oct 3, 2023 •

edited

Loading

Uh oh!

LostRuins commented Oct 4, 2023

Uh oh!

teddybear082 commented Oct 4, 2023 •

edited

Loading

Uh oh!

LostRuins commented Oct 4, 2023

Uh oh!

teddybear082 commented Oct 4, 2023 •

edited

Loading

Uh oh!

LostRuins left a comment •

edited

Loading

Uh oh!

teddybear082 commented Oct 5, 2023

Uh oh!

teddybear082 commented Oct 5, 2023

Uh oh!

LostRuins left a comment

Uh oh!

lofcz commented Oct 8, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teddybear082 commented Oct 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Oct 4, 2023

Uh oh!

teddybear082 commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Oct 4, 2023

Uh oh!

teddybear082 commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

teddybear082 commented Oct 5, 2023

Uh oh!

teddybear082 commented Oct 5, 2023

Uh oh!

LostRuins left a comment

Choose a reason for hiding this comment

Uh oh!

lofcz commented Oct 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

teddybear082 commented Oct 3, 2023 •

edited

Loading

teddybear082 commented Oct 4, 2023 •

edited

Loading

teddybear082 commented Oct 4, 2023 •

edited

Loading

LostRuins left a comment •

edited

Loading

lofcz commented Oct 8, 2023 •

edited

Loading