webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts by allozaur · Pull Request #18655 · ggml-org/llama.cpp

allozaur · 2026-01-07T08:32:56Z

New features

Adding System Message to conversation or injecting it to an existing one
CORS Proxy on llama-server backend side (run llama-server command witg --webui-mcp-proxy flag to enable it)
MCP
- Servers Selector
- Settings with Server cards showing capabilities, instructions and other information
- Tool Calls
  - Agentic Loop
    - Logic
    - UI with processing stats
- Prompts
  - Detection logic in „Add” dropdown
  - Prompt Picker
  - Prompt Args Form
  - Prompt Attachments in Chat Form and Chat Messages
- Resources
  - Browser with search & filetree view
  - Resource Attachments & Preview dialog
  - Use Resource Templates to attach resources
Show raw output switch under the assistant message
Favicon utility
Key-Value form component (used for MCP Server headers in add new/edit mode)

UI Improvements

Created TruncatedText component
Created Image CORS error fallback UI
Created DropdownMenuSearchable component
Created HorizontalScrollCarousel
Created CollapsibleContentBlock component + refactored Reasoning Content && Tool Calls
Code block improved UI + unfinished code block handling + syntax highlighting improvements
Max-heght for components + autoscroll of overflowing content during streaming
New statistics UI + added new data for Tool Calls and Agentic summary
Better time formatting for Chat Message Statistics

Architecture refactors/improvements:

Autoscroll hook
Renamed all service files to have .service.ts format
Common types definitions
Abort Signal utility + refactor
API Fetch utility
Cache TTL
Components folder naming & restructuring
Context API for editing messages and message actions
Enums & constants cleanup
Markdown Rendering improvements
Chat Form components
Resolving images in Markdown Content
Chat Attachments API improvements
Removed Model Change validation logic
Removed File Upload validation logic
New formatters
Storybook upgrade
New prop to define initial section when opening Chat Settings

Important

This PR includes MCP-only changes, but it builds on couple of PRs improving architecture and UI foundations in the codebase: #19689, #19685, #19596, #19586, #19571, #19556, #19551, #19541 and #20066

Video demos

Adding a new MCP Server and using it within an Agentic Loop

demo1.mp4

Using MCP Prompts

demo2.mp4

Using MCP Resources

demo3.mp4

Image Generation and Web Search using different MCP servers

demo4.mp4

ServeurpersoCom · 2026-01-07T11:32:45Z

Thank you for the architectural unification! The SearchableDropdownMenu refactor is superb, we're making good progress!

Only remaining items/features for a MVP (& testing on my side) :

Server naming based solely on domain name is problematic for those wanting to host multiple MCP services on web subdirectories or subdomains (like on our home/personal servers or a future integrated MCP backend!): we need to find a solution
Display on narrow widths needs improvement -> MCP selector drops below the model name (yes model names are long, plus I prefix MoE/Dense on all my models because this info really needs to be visible to everyone. Auto-detection from GGUF would be fantastic someday!)
Stress test context with heavy RAG, we've already done many long complex agentic loops on sandbox successfully (your webinterface automatic publication from agent was cool btw)
Global on/off button on the first server seems strange to me

tools/server/webui/docs/architecture/high-level-architecture-simplified.md

tools/server/webui/docs/architecture/high-level-architecture.md

tools/server/webui/docs/flows/settings-flow.md

tools/server/webui/package.json

tools/server/webui/src/lib/agentic/openai-sse-client.ts

tools/server/webui/src/lib/mcp/host-manager.ts

tools/server/webui/src/lib/mcp/server-connection.ts

tools/server/webui/src/lib/stores/chat.svelte.ts

tools/server/webui/src/lib/types/database.d.ts

strawberrymelonpanda · 2026-01-09T09:49:14Z

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:

{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}

Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

ServeurpersoCom · 2026-01-09T10:12:01Z

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:
{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}
Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

This PR is browser-side (Svelte) -> TCP (streamable-http / sse / websocket), so no stdio support on browser.
A backend relay in llama-server for stdio->HTTP MCP bridging would be possible but not yet implemented.
I have a personal Node.js proxy doing this (OAI->MCP (with stdio)->OAI hack), can share if useful.

EDIT: And once we have the MCP client in the browser, nothing prevents a small example script in Python or Node.js from relaying MCP to stdio :)

allozaur · 2026-01-09T10:39:57Z

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:
{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}
Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

Hey! We are introducing a solid basis for MCP support in llama.cpp, starting with pure WebUI implementation. We will add further enhancements in near future ;)

strawberrymelonpanda · 2026-01-09T22:29:50Z

This PR is browser-side (Svelte) -> TCP (streamable-http / sse / websocket), so no stdio support on browser.
Hey! We are introducing a solid basis for MCP support in llama.cpp, starting with pure WebUI implementation.

Thanks folks, makes sense. I'll make a small proxy script then, just wanted to make sure I wasn't overlooking a component.

I have a personal Node.js proxy doing this (OAI->MCP (with stdio)->OAI hack), can share if useful.

I think I can whip something up, but I wouldn't say no to a reference.

I would say that if the MCP button appears in the WebUI as is, this sort of question is probably bound to come up. A small example proxy script in a docs might not be a bad idea perhaps.

I appreciate the work being done on this and other WebUI PRs.

ServeurpersoCom · 2026-01-16T09:57:25Z

Now we can work with image attachments properly, without context overloading.

An MCP server can return an attachment; the SvelteUI doesn't saturate its context, but informs the LLM that an attachment is available, and intuitively creates a Markdown link!

If the Markdown link points to an attachment, it's displayed.

Now we can do things like ChatGPT DALL-E/Sora Image, and all locally. llama.cpp can query an MCP server stable-diffusion.cpp with Qwen3-Image, and from what I've tested (I'll make videos), the rendering is between DALL-E and Sora Image (an LLM prompts the image generator much better than a human, while capturing the intent without leaving any ambiguity for the image model).

ServeurpersoCom · 2026-01-16T10:47:30Z

Since this works, all that remains is to refactor the CoT with the new pre-rendering format (client-specific tags <<< ... >>>) to have complete control over the context and sending the "interleaving" back to the server during an agentic loop. This will answer several questions from llama.cpp users and developers about powerful models like Minimax. And for Qwen, it will finally provide visibility into the CoT during the agentic loop!

ServeurpersoCom · 2026-01-16T13:14:23Z

Testing interleaved reasoning block and toolcall. I need to remove "Filter reasoning after first turn" useless now

InterleavedThinkingBlockAndImage.mp4

ServeurpersoCom · 2026-01-16T17:31:00Z

Obviously, with the refactoring of the CoT display, for now, all the reasoning is sent back to the model with our proprietary UI tags included ! A simple strip that reuses the regular expressions before sending it to the API will restore the previous functionality.

Later, we'll need an option to choose whether to send this reasoning back to the backend, preserving the actual location of each reasoning block!

For the MCP tool responses, the model has access to everything during the agentic loop, but once a new loop is started, the previous loops are "compressed" to the last N lines of the option, just like the display. This eliminates the token cache, so the backend has to rebuild the last modified agentic loop. This optimizes the context and doesn't degrade agentic performance because the LLM is supposed to have already performed its synthesis.

ServeurpersoCom · 2026-01-17T10:40:20Z

A little fun with image generation through MCP (other server different from the LLM server but dedicated to image generation, nothing prevents from having both 2 in 1 if we have enough VRAM for both inference instances)

MCP-Image-Gen2.mp4

ServeurpersoCom · 2026-01-17T11:22:17Z

The CI fails because we simplified the code path so that the modality type is detected in the post-upload codepath (better), instead of pre-uploading (filepicker, too limiting and incompatible). The Storybook tests still check the old UX where Images/Audio buttons were disabled based on model modalities, but that logic got nuked from ChatForm in the refactor. Now the filepicker accepts everything and validation happens client-side after upload with text fallback, so tests are looking for DOM elements that don't exist anymore. The modality props became orphaned, ChatFormActions still receives them but ChatForm doesn't compute or pass them anymore. We need to nuke these obsolete UI tests and keep the actual modality validation logic in unit tests where it belongs.

ServeurpersoCom · 2026-01-17T17:25:23Z

I really want to rebase this branch (but I'll hold off)!
The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM. Furthermore, the property is only loaded for models already in use. Personally, I don't like this feature, this unnecessarily restricts the LLM use (Vision model are less powerfull for some specific following interactions, and there's also a wider selection of Non-Vision models)
But I can fix MCP tollcall response attachment should NOT be considered a user upload.

strawberrymelonpanda · 2026-01-17T21:40:55Z

The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM.

I've certainly wondered about this scenario. There are cases where I can certainly see chaining vision and non-vision models to be useful, such as having a vision model do OCR and then a non-vision model follow-up. I'd love an omni-model that's on par, but in my tests, even the recent large Qwen vision models still lack in coding and logic compared to their non-vision counterparts.

That said, the "transformation" is incomplete and based on prompt, isn't it? Like, describing the image or OCR'ing the image, rather than actually converting the image into non-vision text tokens? I assume it was grayed out because the non-vision model wouldn't be able to 'see' the image / make sense of the tokens,

Personally I'd totally support a PR to change it regardless.

ServeurpersoCom · 2026-01-17T21:53:50Z

The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM.

I've certainly wondered about this scenario. There are cases where I can certainly see chaining vision and non-vision models to be useful, such as having a vision model do OCR and then a non-vision model follow-up. I'd love an omni-model that's on par, but in my tests, even the recent large Qwen vision models still lack in coding and logic compared to their non-vision counterparts.

That said, the "transformation" is incomplete and based on prompt, isn't it? Like, describing the image or OCR'ing the image, rather than actually converting the image into non-vision text tokens? I assume it was grayed out because the non-vision model wouldn't be able to 'see' the image / make sense of the tokens,

Personally I'd totally support a PR to change it regardless.

As a precaution, I simply fixed the problem. But the question was raised, I performed a simple test:

I sent an image and asked the model NOT to respond; it accepted.

After several exchanges, I requested a very detailed description of the image, and it managed to provide it without error! ~~The projection is indeed present in the backend and is destroyed if the model is changed~~. This needs to be tested even when switching from one VL to another. -> This work

ServeurpersoCom · 2026-01-17T21:57:40Z

Actually, it's quite simple. The image remains attached in the client-side prompt. And of course, it's sent again on the next request. A non-VL model would throw an error. So the feature is simply a security measure,

Alternatively, you could just filter out the image and continue in text mode using the user-requested description instead, though this is less precise than having a VL model process the actual image. In fact, it would reduce code complexity and give users more freedom. You'd just need a small notification that the image and its full description are no longer considered in the context.

alankila · 2026-03-09T17:01:57Z

OK. I start to see the issue.

So the image embedding is either not available or is somehow being ignored during the reasoning phase immediately after the tool call returns. However, it actually works on my next turn when I talk to the model again. I can't entirely understand why this is, but that seems to be what is going on.

ServeurpersoCom · 2026-03-09T17:13:17Z

Okay, I've updated the entire Qwen3.5 series with the latest GGUF and mmproj from Unsloth, and it works well. But indeed, I'm asking the model again; it's not about the inside of the agentic loop, although it should work: The LLM needs to be given a loop to expect a specific thing from the generator. Then I can also do it on the sandbox, which can return images like Claude's with the "View" command. I'll test it right away.

ServeurpersoCom · 2026-03-09T17:21:27Z

Agentic loop with exit condition based on image description :

Simply enter "an unspecified object" in the image generator prompt. Then describe the image. If it turns out to be something that can be a container, keep prompting the generator with "an unspecified object" until you get something other than something that can be a container.

Loop-On-Image.mp4

Work...

ServeurpersoCom · 2026-03-09T17:26:35Z

Simply enter "an unspecified object" in the image generator prompt. Then describe the image. If it turns out to be something that can be spherical or near spherical, keep prompting the generator with "an unspecified object" until you get something other than something that can be spherical or near spherical...
Endless lol

More seriously, we need to try and see why it's not working for you.

ServeurpersoCom · 2026-03-09T17:28:04Z

conversation_0d2d6fae-79f3-4a3b-a171-4c7b722a127f_agentic_loop_with_ex.json

Conversation with embedded images

My last commit 213c4a0

I update from master and retry :

No problem on last master

alankila · 2026-03-09T18:26:32Z

Yes, I've become convinced that it somehow mysteriously seems to only fail for me.

While LLMs are notorious liars, what Qwen says here seems to exactly match what is in the messages structure, so I'm inclined to believe it.

ServeurpersoCom · 2026-03-09T18:35:05Z

Be aware that sometimes model can mimic tool calls and responses; for MCP, remember to force the LLM to populate a description field in the JSON: it helps a lot! We have a fix included on this PR (striping reserved display tags); you need to be on a very recent "master" version. ~~also verify that its endpoint /props correctly returns "modalities": {"vision": true}~~
We can also do a paranoid check: we exchange our MCP servers to see if we are exchanging the bug. Mine is exposed online, PM me if you want to test it, it's a PHP script of a few lines (which complies with the standard!) and it works in Claude's website in the same way as on the llama.cpp web interface!

alankila · 2026-03-09T18:39:29Z

On the next conversation iteration, the tool call is no longer present, and it's just "my prompt for generate x", "assistant reply explaining it couldn't see the image in the tool call but that text string, exactly as shown in screenshot", "image_url message part, holding a data url to png image", "my comment asking if the assistant can see the image now".

In its reply, Qwen is now somewhat confused, and confirms that it can see an image, but thinks it has been added from some different source and is probably not related to the tool call. So perhaps the way my own custom MCP server is not returning images correctly -- though from my reading of the mcp sdk docs, that is how you're supposed to do it.

ServeurpersoCom · 2026-03-09T18:43:47Z

Are you in router mode with multiple models or a single model?

alankila · 2026-03-09T18:49:41Z

Single model mode. Debugged it to this point: [AgenticStore] Skipping image attachment (model "undefined" does not support vision). This was in console all along. Clearly, this is wrong, the model is perfectly capable of images.

ServeurpersoCom · 2026-03-09T18:53:32Z

Cool, we finally found the bug! I'm going to make a patch and PR it!

ServeurpersoCom · 2026-03-09T18:59:27Z

In the meantime, you can switch to router mode and install just one model to test it. Then you'll see, you'll want more.
backend.ini
The command line simply requires creating an .ini file like --models-preset presets.ini

alankila · 2026-03-09T19:03:10Z

I am aware of the multi-model mode. The reason I don't use it there is an annoying usability snafu in the webui in that I always have to choose the loaded model because the select doesn't seem to default to one of the models already loaded.

alankila · 2026-03-09T19:05:53Z

Oh, there's another, more serious reason. The model routing proxy works very poorly with agentic coding, when you have something like Strix Halo only and the prompt is like 100k tokens long and for some reason it must be reprocessed. This takes a while -- at least 15 minutes, I think -- and every software I've ever tried timeouts at 5 minutes. So they retry.

With the multi-model mode, what happens is that the connection disconnects to the proxy, but the proxy doesn't disconnect to the backend server actually doing the inference, and soon the client sends a new request, and at this point you'll have bunch of inferences all going, all starting from 1st token all over again.

With a single-model mode, when the client disconnects, the server stops, and saves progress in the prompt. So the next retry continues appropriately. Multi-model mode is pretty much useless to me because of this problem, as long as I use slow hardware.

ServeurpersoCom · 2026-03-09T19:06:16Z

I am aware of the multi-model mode. The reason I don't use it there is an annoying usability snafu in the webui in that I always have to choose the loaded model because the select doesn't seem to default to one of the models already loaded.

I admit it annoys me too. I'll keep it in mind for the next PR! This is exactly the kind of feedback we need to improve the user experience!
If I give you a patch for single-model, can you build and test it?

ServeurpersoCom · 2026-03-09T19:08:46Z

Oh, there's another, more serious reason. The model routing proxy works very poorly with agentic coding, when you have something like Strix Halo only and the prompt is like 100k tokens long and for some reason it must be reprocessed. This takes a while -- at least 15 minutes, I think -- and every software I've ever tried timeouts at 5 minutes. So they retry.

With the multi-model mode, what happens is that the connection disconnects to the proxy, but the proxy doesn't disconnect to the backend server actually doing the inference, and soon the client sends a new request, and at this point you'll have bunch of inferences all going, all starting from 1st token all over again.

With a single-model mode, when the client disconnects, the server stops, and saves progress in the prompt. So the next retry continues appropriately. Multi-model mode is pretty much useless to me because of this problem, as long as I use slow hardware.

That, on the other hand, is very important! I also have Strix Halo devices at work with people working on them using llama.cpp in router mode. So I need to look into that!

alankila · 2026-03-09T19:10:52Z

Yep I've had Strix Halo doing "work" all night which involved just reprocessing some 200k token prompts over and over again. Because this hardware is mediocre for this kind of long context work using the 122B model, prompt processing and token generation can get very slow, but it doesn't really bother me as long as it happens when I sleep. It's not like I hear its fans screaming or anything, and when I come back in the morning, the "night shift" has usually made lots of nice progress. But it does have to have way to recover from a single http timeout...

ServeurpersoCom · 2026-03-09T19:11:50Z

We'll find this bug, don't worry. I have the equipment to test it! Can you open both GitHub issues with what we've learned? That way we won't forget anything!

alankila · 2026-03-09T19:14:41Z

Yes, I suppose I will be doing that.

alankila · 2026-03-09T20:39:45Z

I made the MCP model=undefined ticket. I think I am not confident enough to create the ticket concerning the multimodel server and backend timeouts issue. At least, not until I confirm the bad behavior with long contexts with the routing server again, I suppose. I see that there's been work done on llama.cpp, such as steady context checkpoints which also could influence the behavior.

One major issue that I see is that prompt processing isn't interruptible. For instance, if the web client closes the http request, the llama.cpp continues processing the prompt regardless. I think a key improvement would be to cancel the prompt if the client no longer wants it. I'm also seeing that there are probably bugs in how concurrent requests targeting the same prompt at multiple slots are handled. The bug is pretty bad, it is that all the work gets thrown away and prompt processing starts from scratch. This is probably what I was seeing to happen to me with the routing server.

When you have multiple copies of the same prompt, it seems that the first inference finishes, and then the new slot gets its turn, but it doesn't reuse the cache, it seems to delete everything and start from 0. My testing of this problem is quite rough, but I think I reproduced it multiple times tonight just by clicking cancel and continue in Kilocode. I also am seeing that this problem seems to go away if you specify -np 1, which removes the multiple parallel slots. I think what happens with -np 1 is that the next request in queue isn't processed at all until the previous inference fully completes, and then it uses the KV cache appropriately.

I don't think parallel prompts which have the exact same prefix to like 99 % their length have to actually work, and just canceling when the client isn't listening anymore would seem a simple and sufficient solution to me.

ServeurpersoCom · 2026-03-09T20:47:06Z

You're right, I'm also limiting myself to -np 1 because I've encountered performance issues, but I haven't been able to pinpoint the cause to properly troubleshoot the issue; it was random. I don't think it's due to KV cache fragmentation with -kvu, but further testing is needed.

However, interrupting an inference process works; have you tested it recently? -> With the webui or a custom client?
When the HTTP connection is closed, the inference stops correctly. This has been the case with various clients I've tested, including Claude CLI.

alankila · 2026-03-09T20:49:41Z

I think inference does interrupt appropriately, but it seems to me that the prompt processing does not. It seems to be on rails and run its course even when the http client is no longer connected to the server. I think it terminates immediately afterwards without inferring a single token, probably.

ServeurpersoCom · 2026-03-09T20:52:07Z

That's possible, the interrupt for the prompt preprocessing is missing, on RTX it's quite fast but it should be fixed, it must stop at the boundaries of the batches.

…d Prompts (ggml-org#18655)

strawberrymelonpanda · 2026-03-11T02:21:32Z

Just chiming in that on a 3090, I'm pretty sure I've hit the bug of continuing to process old prompts in parallel to new prompts after a cancel. My solution was also to set parallel = 1, which I was kind of surprised wasn't the default.

an annoying usability snafu in the webui in that I always have to choose the loaded model because the select doesn't seem to default to one of the models already loaded.

Bothers me as well, tbh.

ServeurpersoCom · 2026-03-11T04:52:42Z

Just chiming in that on a 3090, I'm pretty sure I've hit the bug of continuing to process old prompts in parallel to new prompts after a cancel. My solution was also to set parallel = 1, which I was kind of surprised wasn't the default.

Interesting I'm going to set it back to 4 to continue trying to pinpoint the bug! Note that -kvu (kv unified) is also enabled by default, which is supposed to make it transparent when using only one thread.
But the prompt processing that doesn't stop, that's perhaps where we need to look; there might be a race condition on backend? Reproducing the issue with an agent client making parallel requests was quick. The web UI is not the root cause.

an annoying usability snafu in the webui in that I always have to choose the loaded model because the select doesn't seem to default to one of the models already loaded.

Please open an issue if it doesn't already exist so that we don't forget it

strawberrymelonpanda · 2026-03-11T06:36:04Z

The web UI is not the root cause.

Oh yeah, sorry - for me I believe I hit it with Roo Code. It's been too long to remember the details very well at this point unfortunately. Usually something unexpected would happen like a loop or a crash on Roo's side or an unloaded model, then I'd restart the server, and find everything going at a snail's pace and with signs that pointed to the old attempt processing in the background (and usually stuck).

Problem went away with np=1 and I haven't thought much about it since.

Please open an issue if it doesn't already exist

Couldn't find one exactly, so sure.
#20382

Different enough than this to open another issue I think, but it's also important: 😉
#18129

managing default settings takes priority after the MCP client merge

audstanley · 2026-03-12T07:52:25Z

for all you that have stdin/stdout binaries and need a bridge for this feature; https://github.com/AgentForgeEngine/mpc-bridge - more features coming soon.

allozaur mentioned this pull request Jan 7, 2026

webui: MCP client with low coupling to current codebase #17487

Closed

loci-dev mentioned this pull request Jan 7, 2026

UPSTREAM PR #18655: MCP MVP auroralabs-loci/llama.cpp#842

Open

github-actions bot added examples server labels Jan 7, 2026

allozaur mentioned this pull request Jan 7, 2026

add in mcp server support to frontend webui [SERVER] [WEBUI] #18422

Closed

allozaur commented Jan 7, 2026

View reviewed changes

This was referenced Jan 8, 2026

webui: Fix the header backdrop blur #18230

Closed

Misc. bug: Header backdrop blur does not cover full width - WebUI #18229

Closed

allozaur added server/webui enhancement New feature or request labels Jan 8, 2026

ServeurpersoCom added a commit to ServeurpersoCom/llama.cpp that referenced this pull request Jan 9, 2026

Testing PR ggml-org#18694 and ggml-org#18655 together

500689f

ServeurpersoCom added a commit to ServeurpersoCom/llama.cpp that referenced this pull request Jan 9, 2026

Testing PR ggml-org#18694 and ggml-org#18655 together

e734d93

allozaur force-pushed the allozaur/mcp-mvp branch 2 times, most recently from 1c7048d to b11b32e Compare January 14, 2026 12:08

allozaur mentioned this pull request Jan 15, 2026

Feature Request: Add UI Toggle to Auto‑Remove Chain‑of‑Thought and Pre‑Submit Conversation History for Faster Next‑Turn Encoding #18853

Open

4 tasks

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 10, 2026

webui: Agentic Loop + MCP Client with support for Tools, Resources an…

314e6e7

…d Prompts (ggml-org#18655)

strawberrymelonpanda mentioned this pull request Mar 11, 2026

Misc. bug: Web UI does not select the loaded model by default #20382

Open

audstanley mentioned this pull request Mar 12, 2026

Add MCP support [model context protocol] foundryvtt/foundryvtt#13941

Open

Conversation

allozaur commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New features

UI Improvements

Architecture refactors/improvements:

Video demos

Adding a new MCP Server and using it within an Agentic Loop

Using MCP Prompts

Using MCP Resources

Image Generation and Web Search using different MCP servers

Uh oh!

ServeurpersoCom commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

strawberrymelonpanda commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur commented Jan 9, 2026

Uh oh!

strawberrymelonpanda commented Jan 9, 2026

Uh oh!

ServeurpersoCom commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Jan 16, 2026

Uh oh!

ServeurpersoCom commented Jan 16, 2026

Uh oh!

ServeurpersoCom commented Jan 16, 2026

Uh oh!

ServeurpersoCom commented Jan 17, 2026

Uh oh!

ServeurpersoCom commented Jan 17, 2026

Uh oh!

ServeurpersoCom commented Jan 17, 2026

Uh oh!

strawberrymelonpanda commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alankila commented Mar 9, 2026

Uh oh!

ServeurpersoCom commented Mar 9, 2026

Uh oh!

ServeurpersoCom commented Mar 9, 2026

Uh oh!

ServeurpersoCom commented Mar 9, 2026

Uh oh!

ServeurpersoCom commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alankila commented Mar 9, 2026

Uh oh!

ServeurpersoCom commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alankila commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Mar 9, 2026

Uh oh!

allozaur commented Jan 7, 2026 •

edited

Loading

ServeurpersoCom commented Jan 7, 2026 •

edited

Loading

strawberrymelonpanda commented Jan 9, 2026 •

edited

Loading

ServeurpersoCom commented Jan 9, 2026 •

edited

Loading

ServeurpersoCom commented Jan 16, 2026 •

edited

Loading

strawberrymelonpanda commented Jan 17, 2026 •

edited

Loading

ServeurpersoCom commented Jan 17, 2026 •

edited

Loading

ServeurpersoCom commented Jan 17, 2026 •

edited

Loading

ServeurpersoCom commented Mar 9, 2026 •

edited

Loading

ServeurpersoCom commented Mar 9, 2026 •

edited

Loading

alankila commented Mar 9, 2026 •

edited

Loading

alankila commented Mar 9, 2026 •

edited

Loading

alankila commented Mar 9, 2026 •

edited

Loading

ServeurpersoCom commented Mar 9, 2026 •

edited

Loading

ServeurpersoCom commented Mar 9, 2026 •

edited

Loading

alankila commented Mar 9, 2026 •

edited

Loading

strawberrymelonpanda commented Mar 11, 2026 •

edited

Loading

ServeurpersoCom commented Mar 11, 2026 •

edited

Loading

strawberrymelonpanda commented Mar 11, 2026 •

edited

Loading