server/webui: cleanup dual representation approach, simplify to openai-compat#21090
server/webui: cleanup dual representation approach, simplify to openai-compat#21090allozaur merged 7 commits intoggml-org:masterfrom
Conversation
|
This will be a major refactor with many edge cases! |
|
@ServeurpersoCom I thought so too, but it went surprisingly smoothly, all tests pass and nothing seems broken at least with cursory testing, so maybe it's not as bad as it looks? |
|
Technically it only touches the storage/rendering side, I'll pull the branch tomorrow and give it a proper test run, because honestly this is how it was originally supposed to be done! This is really the final piece of the MCP client integration, which started as a simple OAI-compat proxy outside the WebUI. That said, there are a lot of subtle edge cases we've been polishing over the past weeks (streaming interruptions mid-tool-call, partial marker handling, continue generation with open reasoning blocks, tool result truncation, etc.), so I want to make sure nothing regresses before giving the thumbs up! |
|
I'm currently testing it: it's encouraging, it just works. This layer of abstraction was indeed the final blemish on our implementation. And it was much less disruptive than I thought because we naturally converged towards the same structure. Some obsolete Settings options to remove:
I'm keeping your commit on my everyday server; after a few days of use, I'll be able to spot any edge cases I might have missed. |
|
I don't thikn |
Ah yes, it needs to be simplified/reworked (broken on the commit for now). We need it for markdown. |
ServeurpersoCom
left a comment
There was a problem hiding this comment.
Honestly, we can merge quickly and fix what we find, we'll add commits on top and we'll make progress! It works.
Not exactly 😅 Let's first address any regressions that this PR introduces and then let's merge. |
We're going to work together to eradicate them 😅 |
|
Can you tell me what the regressions found are? |
I am yet to finish reviewing and testing this PR tomorrow, I will let u know from my side. |
|
@pwilkin i've fixed the agentic turns UI regression + added few more tests. This is a middle-step to what we've internally discussed about having separate structure for chat messages data that is 100% OpenAI compatible, but on the UI end we want to show the assistant + tool messages as a joint agentic chain of messages |
|
Looking good! |
|
All that remains is to correct the agent turn display options (visually useful only if there are multiple CoT + Toolcall pairs), and fix the raw Markdown display switch. I've rebased the pull request on my server to try more:) |
these should already be fixed by now |
|
Good work on this PR. The conversation history looks like it's a proper alternating sequence of assistant/tool roles. One slight issue, the reasoning content is still not sent back on assistant messages so I opened #21249 to address. |
…i-compat (ggml-org#21090) * server/webui: cleanup dual representation approach, simplify to openai-compat * feat: Fix regression for Agentic Loop UI * chore: update webui build output * refactor: Post-review code improvements * chore: update webui build output * refactor: Cleanup * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
…i-compat (ggml-org#21090) * server/webui: cleanup dual representation approach, simplify to openai-compat * feat: Fix regression for Agentic Loop UI * chore: update webui build output * refactor: Post-review code improvements * chore: update webui build output * refactor: Cleanup * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

Overview
Removes the double-representation from WebUI, bases everything on OpenAI-compat message history
Additional information
Since the current dual representation approach seems to run into a lot of problems (see eg. #21087 ) and will probably run into more when we add the MCP proxy server, I reckoned I'd try to clean it up by removing the double representation and making everything derive from simple chat history - and it seems to be working out pretty well from my tests (tested both normal reasoning and reasoning with tool calling).
Requirements
@allozaur @ServeurpersoCom could you take a look and see if it makes sense to you?