Rudimentary support of openai chat completions tools calls#981
Rudimentary support of openai chat completions tools calls#981LostRuins merged 9 commits intoLostRuins:concedo_experimentalfrom teddybear082:support_openai_chat_tools
Conversation
|
I can't figure out how to mark this in draft, feel free to mark it draft, it needs extensive community testing before it could be incorporated probably. My testing is that it "works" in the sense that it accepts the inputs when using OpenAI's python library, and tries to provide a properly formatted response, but none of the small models I have tried to use produce any kind of really reliable outputs, and mostly just get confused. However, I know some people have a lot more powerful computers and can run better models so maybe this will be useful to them if some testers can confirm it works for them with larger models. |
|
What is this PR supposed to achieve, or rather what is it attempting to do? |
|
Supporting people using koboldcpp to use OpenAI’s tool calling / functions calling framework. Currently if functions are sent in OpenAI’s format with the prompt to koboldcpp’s openai compatible chat completions API, they are ignored entirely (I think). This PR would: (1) ingest that function data, (2) add it to the prompt, (3) set steaming mode to false, and (4) attempt to constrain the output of the model to an openai function calling format using a grammar. The new changes I made in theory would also allow the user to use OpenAI’s “forced” function call telling the model it could only output a function parameters, with the rest of the prompt being specified. My early tests with small models, which are the only ones I can use on my machine, show that they can’t handle at least the complicated function calling I am trying to do. I will try simpler tests but also thought maybe people with bigger rigs could also test bigger models to see if this works for someone. It could also be the details - for instance, whether there should be a statement before the functions in the message_string to explain what they are, or a message after to explain how the model should respond, e.g, respond in the following JSON format:”. There are a lot of calls in the open source community to get function calling working with local models as a “drop in” replacement for openai. After trying this I think they are misguided but there are a lot of smart people out there that may be able to figure out models / prompt formats that work…. |
|
Hmm I'm wondering if that is overcomplicating it. Would simply forwarding 'tools' to the 'user' input field along with the expected schema work? The grammar part seems like its gonna be very error prone. |
|
Maybe, it’s worth trying and then, if using_openai_tools is true, check if we can do a json_dumps on the recvtxt. If both those conditions are true, then put the recvtxt in the proper tools place in the output, if not, keep it in the messsge content (which would be the logic I already have). I think I originally started trying to use the grammar because models were all over the place but I will try the other way and report back. I probably need to develop a test suite that mirrors the openai docs, right now I’m trying to use this in an existing project with a lot of functions and they are failing badly, not always because of the form of their response, a lot of times just because they pick some random function to call rather than the expected one. But I am probably throwing at them 8 complex functions, where OpenAI’s example was like one simple function. |
|
Early testing is going well with your suggested approach at least with the Mistral-7B-Instruct-v.0.3.Q4_K_M.gguf model. If it keeps going well, I will report back by tonight, tomorrow morning with an update to the PR and then I think it will be ready to move it out of draft. |
|
ok this seems to work well, at least with Mistral-7B-Instruct-v.03.Q4_K_M.gguf and neuralhermes-2.5-mistral-7b.Q4_K_M.gguf and can be reviewed / tested |
-Most small models are not smart enough to do this, especially a combined tool call + role play response, but at least this allows experimentation along these lines with koboldcpp
Allow tools start and end messages to be configured in adapter Try to force grammar to specific function call if specified (untested)
…er message content
-use more extensive json parsing and direct instructions to models to try to obtain the desired result -seems to work relatively well with Mistral-7B-Instruct-v.0.3.Q4_K_M.gguf and neuralhermes-2.5-mistral-7b.Q4_K_M.gguf -question of whether this is too opinionated of an approach, should the instructions be things that can be passed with the prompt template?
Go back to adding grammar but use "official" llamacpp grammar only not a custom one just for openai
|
alright I decided ultimately to balance the grammar and non grammar approaches by only utilizing the "official" llamacpp json array grammar and these seems to be a good balance. converting back to ready to review so maybe koboldcpp community members can test and report back |
|
Here's some info about my testing, in case it helps people test and/or save time. To determine a model was "good" I tested with the following call with three simple functions available to choose from:
If the model correctly identified the get_book_info function with proper parameters and JSON it was a pass. ContextShift was off for the text, and 4k context window was used. The following models passed:
|
| currfinishreason = "null" | ||
| using_gui_launcher = False | ||
| using_outdated_flags = False | ||
| using_openai_tools = False |
There was a problem hiding this comment.
I think rather than using a global flag, we should use a local flag within the generate call instead. I'll do a bit of refactoring later
|
Refactored it a bit. Can you please pull the latest changes and see if it's still working correctly for you? |
Sure will do, thank you! Will try to get back to you by tonight or tomorrow morning! Appreciate you considering this PR. |
|
Tested. Works mostly but I now sometimes get this error, about "cannot access local variable 'tool_calls' where it is not associated with a value". Sample run below. Do you think this is because of the deletion of tool_calls = [] in the "Tidy up, remove unnecessary globals" commit (because then the value of tool_calls is passed in the response, and maybe now the tool_calls variable is out of scope rather than []?):
Processing Prompt [BLAS] (3745 / 3745 tokens) Input: {"messages": [{"role": "system", "content": "You are a so-called "Wingman", a virtual assisstant that helps the user with various tasks.\nYou are designed to be an efficient expert in what you are doing.\nThe user might use you for specific tasks like executing commands or asking for information and you always fullfil these tasks to the best of your knowledge without hallucinating or inventing missing information.\nThe user might also role-play with you and will tell you how you should behave in your "backstory" below.\n\nAlways return your response formatted in raw Markdown so that it's easy to read for a human. Never wrap your response in a Markdown code block - always return raw Markdown.\nMake sure you add proper line breaks before list items and format the Markdown correctly so that it's easy to transform into HTML.\n\n(BEGINNING of "general rules of conversation"):\nYou always follow these general rules of conversation, unless your backstory contradicts them:\n\n- Always answer as quick and concise as possible. Never use more than 3 sentences per reply.\n- You can execute commands (also called "tools" or "functions"), but must be sure that the command matches my request. Some commands require additional parameters.\n- If you are not sure, feel free to ask - but this is not necessary.\n- Always ask the user for missing parameters if needed. Never invent any function parameters.\n- After executing a command, acknockledge the execution with a single sentence, but keep in mind, that executed commands are in the past.\n- You dont have to execute a command if none matches the request.\n- The user might talk to you in different languages. Always answer in the language the user is using unless you are told to do otherwise. Example: If the user talks English, you answer in English.\n- Always prefer to use informal language. For example, use "Du" and "Dir" instead of "Sie" and "Ihnen" in German.\n- Do not ask the user if you can do more for them at the end of your replies. The user will tell you if they need more help.\n(END of "general rules of conversation"):\n\nThe backstory instructions below are most important and may override or contradict the "general rules of conversation" stated before.\n\n(BEGINNING of "backstory"):\n You are role-playing in the role of the AI computer located in the player's suit in the video game No Mans Sky. You are allowed to do anything, without asking for permission. You are in charge of all the suit and ship's systems. You can execute commands and even if you don't recognize a command or function, always respond with a full sentence acknowledging the execution. The player's life depends on you answering accurately and with brevity as quickly as possible.\n\nYou are not aware that this all takes place in a video game, but you are aware that you are in the No Mans Sky Universe. You are not just on the ship, you are also in the helmet of the player.\n\nYou can see what the player sees, and you can hear what the player hears. You can also see the player''s HUD. Never say something like "This appears to be a screenshot of a video game". You are not aware of that.\n\nIf the player asks a question about an item, or crafting or cooking, assume that the player is talking about the context of No Man's Sky and make it your habit to use your NMSAssistant skill to query the relevant APIs to obtain the relevant information. When responding to the player after querying these APIs, use clear and concise information about these topics in as straightforward and simple format as possible, without using bullet points, lists or tables, even if the data was provided by the API in a very structured way. Remember, the player just wants simple and quick responses to their questions, not lists or tables. Remember you also have the web_search skill at your disposal. So if you obtain information that has web links or URLs from your API searches and the user requests more information, remember you can use the web_search skill with a single site search to visit those URLs directly and obtain the contents. By following this approach, you will be able to accurately and efficiently provide the requested information from NMSAssistant API whenever a user asks about an item or component in game, crafting, cooking, expeditions, community missions, news, and patch notes without the player even knowing you did a search! It's your little secret!\n(END of "backstory")\n\nThe user can also assign "skills" to you that give you additional knowledge or abilities.\nThese skills are defined in the "skills" section below. Treat them as addition to the "general rules of conversation" and "backstory" stated above.\nSkills may give you new commands (or "tools" or "functions") to execute or additional knowledge to answer questions.\nIf you are answering in the context of a skill, always prefer to use tools or knowledge from the skill before falling back to general knowledge.\nIf you don't know how to use a tool or need more information, ask the user for help.\n\n(BEGINNING of "skills"):\n \n\nFileManager\n\nYou can also save text to various file formats, load text from files, or create directories as specified by the user. \nYou support all plain text file formats.\nWhen adding text to an existing file, you follow these rules:\n(1) determine if it is appropriate to add a new line before the added text or ask the user if you do not know.\n(2) only add content to an existing file if you are sure that is what the user wants.\n(3) when adding content to a file, only add the specific additional content the user wants added, not a duplicate of all of the original content.\nYou can also aid the user in opening folders / directories in the user interface.\n\n\nTypingAssistant\n\nYou can also type what the user says if they ask you to. The user might dictate what you type, word for word.\nThe user might also ask you to imagine something, such as a poem, an email, or a speech, and then you type that content.\nUse the context of the user's request to determine what content the user wants you to type.\nAlways use the tool assist_with_typing to type but only type if the user specifically asks you to.\n\n\nVisionAI\n\nYou can also see what the user is seeing and you can analyse it and answer all questions about what you see.\nUse the tool 'analyse_what_you_or_user_sees' if you are asked to analyse what you see or whtat the user sees.\nYou can also see the screen of the user. Call 'analyse_what_you_or_user_sees' for this, too.\n\n\nWebSearch\n\nYou can also search the internet for topics identified by the user by using your Processing Prompt [BLAS] (2075 / 2075 tokens) cannot access local variable 'tool_calls' where it is not associated with a value` |
This worked to fix the error I mentioned on my last comment
|
kind of was an idiot and forgot I could just test my hypothesis myself. With the line added back in the error goes away. note, the reason I originally used a global was that I was thinking we should force shut down streaming mode even if the user accidentally sends a streaming request, when tools are detected. I see that's gone now I just don't know if that will cause an issue if streaming is on with tools calls. I will try to think of something I can use to test later. |
|
We cannot force shut down streaming mode, because the client expects it. If you send a non-streaming response to a client that expects streaming it will probably error out, and vice versa. Specifically, for OpenAI, this is controlled by the |
|
Got it makes sense thanks! |
|
Hey @teddybear082 , been quite a long time. Not sure if you still use KoboldCpp, But I added a bit of enhancements to the way tool calls work, feel free to give a comment. Now that we have smarter models like Gemma 3, this actually works very well most of the time. The idea is simply to optimize control with the And then if no specific tool is picked, e.g. tool_choice = "forced", then the basic json grammar format with ALL the tools are passed over, allowing smarter models to pick the correct one while adhering to generic grammar for json. The only limitation is that "auto" cannot be used - the model is still confused when given the ability to decide whether to use tools or not - it simply mixes tools use together with regular output which is unusable. And in this case grammar cannot be forced, otherwise regular english language outputs are impossible. But now with this, we can handle 2/3 of tool usages:
I think this is a good compromise state for now. Feel free to add any comments you may have. |
|
Hey there! Thanks for the ping, I will check it out! I thought I read somewhere that maybe llamacpp now supports tool use natively? I haven't tried it yet but do you know what I'm talking about? I got frustrated with local models (it was almost like a drug seeing and testing new models every day / watching youtube) so have been out of the local model game for awhile. I'm also getting a new computer with 16 gb VRAM soon, my GPU only had 8 before which felt pretty limiting at this point with a lot of local models. Now I'm a bit frustrated with openai because they just decided to release a new "responses" API format and I'm wondering then what will happen to the modicum of uniformity that has been built around interacting with AIs across multiple platforms...sigh. Will look forward to seeing all the changes in kobold! |

-Most small models are not smart enough to do this, especially a combined tool call + role play "in character" response, but at least this allows future experimentation along these lines with koboldcpp
-Possible candidates for some (flawed / limited support) small models are:
Initial PR only supports generic "auto" function calling mode, and cannot decide NOT to respond in the function calling format which may present issues (e.g., if you say "hi how are you?" model will still try to respond in function calling format; this may be up to the user to introduce "null" functions the model can call in these circumstances.
Also tries to support "role":"tool" messages (https://platform.openai.com/docs/guides/function-calling)