Skip to content

server/webui: cleanup dual representation approach, simplify to openai-compat#21090

Merged
allozaur merged 7 commits intoggml-org:masterfrom
pwilkin:cleanup-webui
Mar 31, 2026
Merged

server/webui: cleanup dual representation approach, simplify to openai-compat#21090
allozaur merged 7 commits intoggml-org:masterfrom
pwilkin:cleanup-webui

Conversation

@pwilkin
Copy link
Copy Markdown
Member

@pwilkin pwilkin commented Mar 28, 2026

Overview

Removes the double-representation from WebUI, bases everything on OpenAI-compat message history

Additional information

Since the current dual representation approach seems to run into a lot of problems (see eg. #21087 ) and will probably run into more when we add the MCP proxy server, I reckoned I'd try to clean it up by removing the double representation and making everything derive from simple chat history - and it seems to be working out pretty well from my tests (tested both normal reasoning and reasoning with tool calling).

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, no way I rewrite all of that by hand 😛

@allozaur @ServeurpersoCom could you take a look and see if it makes sense to you?

@pwilkin pwilkin requested a review from a team as a code owner March 28, 2026 00:06
@ServeurpersoCom
Copy link
Copy Markdown
Contributor

This will be a major refactor with many edge cases!

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 28, 2026

@ServeurpersoCom I thought so too, but it went surprisingly smoothly, all tests pass and nothing seems broken at least with cursory testing, so maybe it's not as bad as it looks?

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

Technically it only touches the storage/rendering side, I'll pull the branch tomorrow and give it a proper test run, because honestly this is how it was originally supposed to be done! This is really the final piece of the MCP client integration, which started as a simple OAI-compat proxy outside the WebUI.

That said, there are a lot of subtle edge cases we've been polishing over the past weeks (streaming interruptions mid-tool-call, partial marker handling, continue generation with open reasoning blocks, tool result truncation, etc.), so I want to make sure nothing regresses before giving the thumbs up!

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

I'm currently testing it: it's encouraging, it just works. This layer of abstraction was indeed the final blemish on our implementation. And it was much less disruptive than I thought because we naturally converged towards the same structure.

Some obsolete Settings options to remove:

  • showRawOutputSwitch
  • alwaysShowAgenticTurns
  • maxToolPreviewLines

I'm keeping your commit on my everyday server; after a few days of use, I'll be able to spot any edge cases I might have missed.

@allozaur
Copy link
Copy Markdown
Contributor

I don't thikn showRawOutputSwitch is obsolete, we still might want to use it to debug Markdown rendering

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

I don't thikn showRawOutputSwitch is obsolete, we still might want to use it to debug Markdown rendering

Ah yes, it needs to be simplified/reworked (broken on the commit for now). We need it for markdown.

Copy link
Copy Markdown
Contributor

@ServeurpersoCom ServeurpersoCom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, we can merge quickly and fix what we find, we'll add commits on top and we'll make progress! It works.

@allozaur
Copy link
Copy Markdown
Contributor

Honestly, we can merge quickly and fix what we find, we'll add commits on top and we'll make progress! It works.

Not exactly 😅 Let's first address any regressions that this PR introduces and then let's merge.

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

Not exactly 😅 Let's first address any regressions that this PR introduces and then let's merge.

We're going to work together to eradicate them 😅

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 28, 2026

Can you tell me what the regressions found are?

@allozaur
Copy link
Copy Markdown
Contributor

Can you tell me what the regressions found are?

I am yet to finish reviewing and testing this PR tomorrow, I will let u know from my side.

@allozaur
Copy link
Copy Markdown
Contributor

@pwilkin i've fixed the agentic turns UI regression + added few more tests. This is a middle-step to what we've internally discussed about having separate structure for chat messages data that is 100% OpenAI compatible, but on the UI end we want to show the assistant + tool messages as a joint agentic chain of messages

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

Cool, it's definitely more compact this way while retaining the advantages of clean, structured data, good thinking Alek!

Sans titre

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 30, 2026

Looking good!

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

All that remains is to correct the agent turn display options (visually useful only if there are multiple CoT + Toolcall pairs), and fix the raw Markdown display switch. I've rebased the pull request on my server to try more:)

Comment thread tools/server/webui/src/lib/stores/conversations.svelte.ts
Comment thread tools/server/webui/src/lib/types/chat.d.ts Outdated
Comment thread tools/server/webui/src/lib/utils/legacy-migration.ts
@allozaur allozaur requested a review from ggerganov March 30, 2026 19:54
@allozaur
Copy link
Copy Markdown
Contributor

All that remains is to correct the agent turn display options (visually useful only if there are multiple CoT + Toolcall pairs), and fix the raw Markdown display switch. I've rebased the pull request on my server to try more:)

these should already be fixed by now

@allozaur allozaur merged commit 4453e77 into ggml-org:master Mar 31, 2026
6 checks passed
@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented Apr 1, 2026

Good work on this PR. The conversation history looks like it's a proper alternating sequence of assistant/tool roles.

One slight issue, the reasoning content is still not sent back on assistant messages so I opened #21249 to address.

slartibardfast pushed a commit to slartibardfast/llama.cpp that referenced this pull request Apr 12, 2026
…i-compat (ggml-org#21090)

* server/webui: cleanup dual representation approach, simplify to openai-compat

* feat: Fix regression for Agentic Loop UI

* chore: update webui build output

* refactor: Post-review code improvements

* chore: update webui build output

* refactor: Cleanup

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
…i-compat (ggml-org#21090)

* server/webui: cleanup dual representation approach, simplify to openai-compat

* feat: Fix regression for Agentic Loop UI

* chore: update webui build output

* refactor: Post-review code improvements

* chore: update webui build output

* refactor: Cleanup

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants