Skip to content

webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts#18655

Merged
allozaur merged 374 commits intoggml-org:masterfrom
allozaur:allozaur/mcp-mvp
Mar 6, 2026
Merged

webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts#18655
allozaur merged 374 commits intoggml-org:masterfrom
allozaur:allozaur/mcp-mvp

Conversation

@allozaur
Copy link
Contributor

@allozaur allozaur commented Jan 7, 2026

New features

  • Adding System Message to conversation or injecting it to an existing one
  • CORS Proxy on llama-server backend side (run llama-server command witg --webui-mcp-proxy flag to enable it)
  • MCP
    • Servers Selector
    • Settings with Server cards showing capabilities, instructions and other information
    • Tool Calls
      • Agentic Loop
        • Logic
        • UI with processing stats
    • Prompts
      • Detection logic in „Add” dropdown
      • Prompt Picker
      • Prompt Args Form
      • Prompt Attachments in Chat Form and Chat Messages
    • Resources
      • Browser with search & filetree view
      • Resource Attachments & Preview dialog
      • Use Resource Templates to attach resources
  • Show raw output switch under the assistant message
  • Favicon utility
  • Key-Value form component (used for MCP Server headers in add new/edit mode)

UI Improvements

  • Created TruncatedText component
  • Created Image CORS error fallback UI
  • Created DropdownMenuSearchable component
  • Created HorizontalScrollCarousel
  • Created CollapsibleContentBlock component + refactored Reasoning Content && Tool Calls
  • Code block improved UI + unfinished code block handling + syntax highlighting improvements
  • Max-heght for components + autoscroll of overflowing content during streaming
  • New statistics UI + added new data for Tool Calls and Agentic summary
  • Better time formatting for Chat Message Statistics

Architecture refactors/improvements:

  • Autoscroll hook
  • Renamed all service files to have .service.ts format
  • Common types definitions
  • Abort Signal utility + refactor
  • API Fetch utility
  • Cache TTL
  • Components folder naming & restructuring
  • Context API for editing messages and message actions
  • Enums & constants cleanup
  • Markdown Rendering improvements
  • Chat Form components
  • Resolving images in Markdown Content
  • Chat Attachments API improvements
  • Removed Model Change validation logic
  • Removed File Upload validation logic
  • New formatters
  • Storybook upgrade
  • New prop to define initial section when opening Chat Settings

Important

This PR includes MCP-only changes, but it builds on couple of PRs improving architecture and UI foundations in the codebase: #19689, #19685, #19596, #19586, #19571, #19556, #19551, #19541 and #20066

Video demos

Adding a new MCP Server and using it within an Agentic Loop

demo1.mp4

Using MCP Prompts

demo2.mp4

Using MCP Resources

demo3.mp4

Image Generation and Web Search using different MCP servers

demo4.mp4

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Jan 7, 2026

Thank you for the architectural unification! The SearchableDropdownMenu refactor is superb, we're making good progress!

Only remaining items/features for a MVP (& testing on my side) :

  • Server naming based solely on domain name is problematic for those wanting to host multiple MCP services on web subdirectories or subdomains (like on our home/personal servers or a future integrated MCP backend!): we need to find a solution
  • Display on narrow widths needs improvement -> MCP selector drops below the model name (yes model names are long, plus I prefix MoE/Dense on all my models because this info really needs to be visible to everyone. Auto-detection from GGUF would be fantastic someday!)
  • Stress test context with heavy RAG, we've already done many long complex agentic loops on sandbox successfully (your webinterface automatic publication from agent was cool btw)
  • Global on/off button on the first server seems strange to me

@strawberrymelonpanda
Copy link
Contributor

strawberrymelonpanda commented Jan 9, 2026

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:

{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}

Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Jan 9, 2026

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:

{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}

Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

This PR is browser-side (Svelte) -> TCP (streamable-http / sse / websocket), so no stdio support on browser.
A backend relay in llama-server for stdio->HTTP MCP bridging would be possible but not yet implemented.
I have a personal Node.js proxy doing this (OAI->MCP (with stdio)->OAI hack), can share if useful.

EDIT: And once we have the MCP client in the browser, nothing prevents a small example script in Python or Node.js from relaying MCP to stdio :)

@allozaur
Copy link
Contributor Author

allozaur commented Jan 9, 2026

I'm interested in this, so gave the PR a look and just wanted to ask, are local MCP servers planned to be supported?

Right now it looks like URL is required, without a local "command" "args" "env" type option. (for Node, NPX / UVX, Docker, etc) Might be able to get around this with a MCP proxy server, but built-in support of local servers like many MCP clients would be welcomed.

e.g. Cursor, VS Code, OpenCode, Roo Code, Antigravity, LM Studio, and others support the following with small variations:

{
  "mcpServers": {
    "git": {
      "command": "uvx",
      "args": ["mcp-server-git"]
    },
    "name": {
      "command": "npx",
      "args": [ "/path/index.js" ],
      "env": { "VAR": "VAL" }
    }   
  }
}

Lots of examples here.

I know it's still WIP, but just wanted to ask. Or maybe I've missed it?

Hey! We are introducing a solid basis for MCP support in llama.cpp, starting with pure WebUI implementation. We will add further enhancements in near future ;)

ServeurpersoCom added a commit to ServeurpersoCom/llama.cpp that referenced this pull request Jan 9, 2026
ServeurpersoCom added a commit to ServeurpersoCom/llama.cpp that referenced this pull request Jan 9, 2026
@strawberrymelonpanda
Copy link
Contributor

This PR is browser-side (Svelte) -> TCP (streamable-http / sse / websocket), so no stdio support on browser.
Hey! We are introducing a solid basis for MCP support in llama.cpp, starting with pure WebUI implementation.

Thanks folks, makes sense. I'll make a small proxy script then, just wanted to make sure I wasn't overlooking a component.

I have a personal Node.js proxy doing this (OAI->MCP (with stdio)->OAI hack), can share if useful.

I think I can whip something up, but I wouldn't say no to a reference.

I would say that if the MCP button appears in the WebUI as is, this sort of question is probably bound to come up. A small example proxy script in a docs might not be a bad idea perhaps.

I appreciate the work being done on this and other WebUI PRs.

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Jan 16, 2026

Now we can work with image attachments properly, without context overloading.

  • An MCP server can return an attachment; the SvelteUI doesn't saturate its context, but informs the LLM that an attachment is available, and intuitively creates a Markdown link!

If the Markdown link points to an attachment, it's displayed.

Now we can do things like ChatGPT DALL-E/Sora Image, and all locally. llama.cpp can query an MCP server stable-diffusion.cpp with Qwen3-Image, and from what I've tested (I'll make videos), the rendering is between DALL-E and Sora Image (an LLM prompts the image generator much better than a human, while capturing the intent without leaving any ambiguity for the image model).

MCPAttach0 MCPAttach1

@ServeurpersoCom
Copy link
Contributor

Since this works, all that remains is to refactor the CoT with the new pre-rendering format (client-specific tags <<< ... >>>) to have complete control over the context and sending the "interleaving" back to the server during an agentic loop. This will answer several questions from llama.cpp users and developers about powerful models like Minimax. And for Qwen, it will finally provide visibility into the CoT during the agentic loop!

@ServeurpersoCom
Copy link
Contributor

Testing interleaved reasoning block and toolcall. I need to remove "Filter reasoning after first turn" useless now

InterleavedThinkingBlockAndImage.mp4

@ServeurpersoCom
Copy link
Contributor

Obviously, with the refactoring of the CoT display, for now, all the reasoning is sent back to the model with our proprietary UI tags included ! A simple strip that reuses the regular expressions before sending it to the API will restore the previous functionality.

Later, we'll need an option to choose whether to send this reasoning back to the backend, preserving the actual location of each reasoning block!

For the MCP tool responses, the model has access to everything during the agentic loop, but once a new loop is started, the previous loops are "compressed" to the last N lines of the option, just like the display. This eliminates the token cache, so the backend has to rebuild the last modified agentic loop. This optimizes the context and doesn't degrade agentic performance because the LLM is supposed to have already performed its synthesis.

@ServeurpersoCom
Copy link
Contributor

A little fun with image generation through MCP (other server different from the LLM server but dedicated to image generation, nothing prevents from having both 2 in 1 if we have enough VRAM for both inference instances)

MCP-Image-Gen2.mp4

@ServeurpersoCom
Copy link
Contributor

The CI fails because we simplified the code path so that the modality type is detected in the post-upload codepath (better), instead of pre-uploading (filepicker, too limiting and incompatible). The Storybook tests still check the old UX where Images/Audio buttons were disabled based on model modalities, but that logic got nuked from ChatForm in the refactor. Now the filepicker accepts everything and validation happens client-side after upload with text fallback, so tests are looking for DOM elements that don't exist anymore. The modality props became orphaned, ChatFormActions still receives them but ChatForm doesn't compute or pass them anymore. We need to nuke these obsolete UI tests and keep the actual modality validation logic in unit tests where it belongs.

@ServeurpersoCom
Copy link
Contributor

  • I really want to rebase this branch (but I'll hold off)!
  • The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM. Furthermore, the property is only loaded for models already in use. Personally, I don't like this feature, this unnecessarily restricts the LLM use (Vision model are less powerfull for some specific following interactions, and there's also a wider selection of Non-Vision models)
  • But I can fix MCP tollcall response attachment should NOT be considered a user upload.

@strawberrymelonpanda
Copy link
Contributor

strawberrymelonpanda commented Jan 17, 2026

  • The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM.

I've certainly wondered about this scenario. There are cases where I can certainly see chaining vision and non-vision models to be useful, such as having a vision model do OCR and then a non-vision model follow-up. I'd love an omni-model that's on par, but in my tests, even the recent large Qwen vision models still lack in coding and logic compared to their non-vision counterparts.

That said, the "transformation" is incomplete and based on prompt, isn't it? Like, describing the image or OCR'ing the image, rather than actually converting the image into non-vision text tokens? I assume it was grayed out because the non-vision model wouldn't be able to 'see' the image / make sense of the tokens,

Personally I'd totally support a PR to change it regardless.

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Jan 17, 2026

  • The existing feature of graying out non-Vision models if there's an image upload in the conversation isn't ideal: when you send an image to a Vision model, it "transforms" the projection into text that can be processed by any other non-Vision LLM.

I've certainly wondered about this scenario. There are cases where I can certainly see chaining vision and non-vision models to be useful, such as having a vision model do OCR and then a non-vision model follow-up. I'd love an omni-model that's on par, but in my tests, even the recent large Qwen vision models still lack in coding and logic compared to their non-vision counterparts.

That said, the "transformation" is incomplete and based on prompt, isn't it? Like, describing the image or OCR'ing the image, rather than actually converting the image into non-vision text tokens? I assume it was grayed out because the non-vision model wouldn't be able to 'see' the image / make sense of the tokens,

Personally I'd totally support a PR to change it regardless.

As a precaution, I simply fixed the problem. But the question was raised, I performed a simple test:

I sent an image and asked the model NOT to respond; it accepted.

After several exchanges, I requested a very detailed description of the image, and it managed to provide it without error! The projection is indeed present in the backend and is destroyed if the model is changed. This needs to be tested even when switching from one VL to another. -> This work

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Jan 17, 2026

Actually, it's quite simple. The image remains attached in the client-side prompt. And of course, it's sent again on the next request. A non-VL model would throw an error. So the feature is simply a security measure,

Alternatively, you could just filter out the image and continue in text mode using the user-requested description instead, though this is less precise than having a VL model process the actual image. In fact, it would reduce code complexity and give users more freedom. You'd just need a small notification that the image and its full description are no longer considered in the context.

@alankila
Copy link

alankila commented Mar 9, 2026

OK. I start to see the issue.

image

So the image embedding is either not available or is somehow being ignored during the reasoning phase immediately after the tool call returns. However, it actually works on my next turn when I talk to the model again. I can't entirely understand why this is, but that seems to be what is going on.

@ServeurpersoCom
Copy link
Contributor

Okay, I've updated the entire Qwen3.5 series with the latest GGUF and mmproj from Unsloth, and it works well. But indeed, I'm asking the model again; it's not about the inside of the agentic loop, although it should work: The LLM needs to be given a loop to expect a specific thing from the generator. Then I can also do it on the sandbox, which can return images like Claude's with the "View" command. I'll test it right away.

@ServeurpersoCom
Copy link
Contributor

Agentic loop with exit condition based on image description :

Simply enter "an unspecified object" in the image generator prompt. Then describe the image. If it turns out to be something that can be a container, keep prompting the generator with "an unspecified object" until you get something other than something that can be a container.

Loop-On-Image.mp4

Work...

@ServeurpersoCom
Copy link
Contributor

  • Simply enter "an unspecified object" in the image generator prompt. Then describe the image. If it turns out to be something that can be spherical or near spherical, keep prompting the generator with "an unspecified object" until you get something other than something that can be spherical or near spherical...
    Endless lol

More seriously, we need to try and see why it's not working for you.

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Mar 9, 2026

conversation_0d2d6fae-79f3-4a3b-a171-4c7b722a127f_agentic_loop_with_ex.json

Conversation with embedded images

My last commit 213c4a0

I update from master and retry :

Sans titre

No problem on last master

@alankila
Copy link

alankila commented Mar 9, 2026

Yes, I've become convinced that it somehow mysteriously seems to only fail for me.

image

While LLMs are notorious liars, what Qwen says here seems to exactly match what is in the messages structure, so I'm inclined to believe it.

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Mar 9, 2026

Be aware that sometimes model can mimic tool calls and responses; for MCP, remember to force the LLM to populate a description field in the JSON: it helps a lot! We have a fix included on this PR (striping reserved display tags); you need to be on a very recent "master" version. also verify that its endpoint /props correctly returns "modalities": {"vision": true}
We can also do a paranoid check: we exchange our MCP servers to see if we are exchanging the bug. Mine is exposed online, PM me if you want to test it, it's a PHP script of a few lines (which complies with the standard!) and it works in Claude's website in the same way as on the llama.cpp web interface!

@alankila
Copy link

alankila commented Mar 9, 2026

On the next conversation iteration, the tool call is no longer present, and it's just "my prompt for generate x", "assistant reply explaining it couldn't see the image in the tool call but that text string, exactly as shown in screenshot", "image_url message part, holding a data url to png image", "my comment asking if the assistant can see the image now".

In its reply, Qwen is now somewhat confused, and confirms that it can see an image, but thinks it has been added from some different source and is probably not related to the tool call. So perhaps the way my own custom MCP server is not returning images correctly -- though from my reading of the mcp sdk docs, that is how you're supposed to do it.

@ServeurpersoCom
Copy link
Contributor

Are you in router mode with multiple models or a single model?

@alankila
Copy link

alankila commented Mar 9, 2026

Single model mode. Debugged it to this point: [AgenticStore] Skipping image attachment (model "undefined" does not support vision). This was in console all along. Clearly, this is wrong, the model is perfectly capable of images.

@ServeurpersoCom
Copy link
Contributor

Cool, we finally found the bug! I'm going to make a patch and PR it!

@ServeurpersoCom
Copy link
Contributor

In the meantime, you can switch to router mode and install just one model to test it. Then you'll see, you'll want more.
backend.ini
The command line simply requires creating an .ini file like --models-preset presets.ini

@alankila
Copy link

alankila commented Mar 9, 2026

I am aware of the multi-model mode. The reason I don't use it there is an annoying usability snafu in the webui in that I always have to choose the loaded model because the select doesn't seem to default to one of the models already loaded.

@alankila
Copy link

alankila commented Mar 9, 2026

Oh, there's another, more serious reason. The model routing proxy works very poorly with agentic coding, when you have something like Strix Halo only and the prompt is like 100k tokens long and for some reason it must be reprocessed. This takes a while -- at least 15 minutes, I think -- and every software I've ever tried timeouts at 5 minutes. So they retry.

With the multi-model mode, what happens is that the connection disconnects to the proxy, but the proxy doesn't disconnect to the backend server actually doing the inference, and soon the client sends a new request, and at this point you'll have bunch of inferences all going, all starting from 1st token all over again.

With a single-model mode, when the client disconnects, the server stops, and saves progress in the prompt. So the next retry continues appropriately. Multi-model mode is pretty much useless to me because of this problem, as long as I use slow hardware.

@ServeurpersoCom
Copy link
Contributor

I am aware of the multi-model mode. The reason I don't use it there is an annoying usability snafu in the webui in that I always have to choose the loaded model because the select doesn't seem to default to one of the models already loaded.

I admit it annoys me too. I'll keep it in mind for the next PR! This is exactly the kind of feedback we need to improve the user experience!
If I give you a patch for single-model, can you build and test it?

@ServeurpersoCom
Copy link
Contributor

Oh, there's another, more serious reason. The model routing proxy works very poorly with agentic coding, when you have something like Strix Halo only and the prompt is like 100k tokens long and for some reason it must be reprocessed. This takes a while -- at least 15 minutes, I think -- and every software I've ever tried timeouts at 5 minutes. So they retry.

With the multi-model mode, what happens is that the connection disconnects to the proxy, but the proxy doesn't disconnect to the backend server actually doing the inference, and soon the client sends a new request, and at this point you'll have bunch of inferences all going, all starting from 1st token all over again.

With a single-model mode, when the client disconnects, the server stops, and saves progress in the prompt. So the next retry continues appropriately. Multi-model mode is pretty much useless to me because of this problem, as long as I use slow hardware.

That, on the other hand, is very important! I also have Strix Halo devices at work with people working on them using llama.cpp in router mode. So I need to look into that!

@alankila
Copy link

alankila commented Mar 9, 2026

Yep I've had Strix Halo doing "work" all night which involved just reprocessing some 200k token prompts over and over again. Because this hardware is mediocre for this kind of long context work using the 122B model, prompt processing and token generation can get very slow, but it doesn't really bother me as long as it happens when I sleep. It's not like I hear its fans screaming or anything, and when I come back in the morning, the "night shift" has usually made lots of nice progress. But it does have to have way to recover from a single http timeout...

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Mar 9, 2026

We'll find this bug, don't worry. I have the equipment to test it! Can you open both GitHub issues with what we've learned? That way we won't forget anything!

@alankila
Copy link

alankila commented Mar 9, 2026

Yes, I suppose I will be doing that.

@alankila
Copy link

alankila commented Mar 9, 2026

I made the MCP model=undefined ticket. I think I am not confident enough to create the ticket concerning the multimodel server and backend timeouts issue. At least, not until I confirm the bad behavior with long contexts with the routing server again, I suppose. I see that there's been work done on llama.cpp, such as steady context checkpoints which also could influence the behavior.

One major issue that I see is that prompt processing isn't interruptible. For instance, if the web client closes the http request, the llama.cpp continues processing the prompt regardless. I think a key improvement would be to cancel the prompt if the client no longer wants it. I'm also seeing that there are probably bugs in how concurrent requests targeting the same prompt at multiple slots are handled. The bug is pretty bad, it is that all the work gets thrown away and prompt processing starts from scratch. This is probably what I was seeing to happen to me with the routing server.

When you have multiple copies of the same prompt, it seems that the first inference finishes, and then the new slot gets its turn, but it doesn't reuse the cache, it seems to delete everything and start from 0. My testing of this problem is quite rough, but I think I reproduced it multiple times tonight just by clicking cancel and continue in Kilocode. I also am seeing that this problem seems to go away if you specify -np 1, which removes the multiple parallel slots. I think what happens with -np 1 is that the next request in queue isn't processed at all until the previous inference fully completes, and then it uses the KV cache appropriately.

I don't think parallel prompts which have the exact same prefix to like 99 % their length have to actually work, and just canceling when the client isn't listening anymore would seem a simple and sufficient solution to me.

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Mar 9, 2026

You're right, I'm also limiting myself to -np 1 because I've encountered performance issues, but I haven't been able to pinpoint the cause to properly troubleshoot the issue; it was random. I don't think it's due to KV cache fragmentation with -kvu, but further testing is needed.

However, interrupting an inference process works; have you tested it recently? -> With the webui or a custom client?
When the HTTP connection is closed, the inference stops correctly. This has been the case with various clients I've tested, including Claude CLI.

@alankila
Copy link

alankila commented Mar 9, 2026

I think inference does interrupt appropriately, but it seems to me that the prompt processing does not. It seems to be on rails and run its course even when the http client is no longer connected to the server. I think it terminates immediately afterwards without inferring a single token, probably.

@ServeurpersoCom
Copy link
Contributor

That's possible, the interrupt for the prompt preprocessing is missing, on RTX it's quite fast but it should be fixed, it must stop at the boundaries of the batches.

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 10, 2026
@strawberrymelonpanda
Copy link
Contributor

strawberrymelonpanda commented Mar 11, 2026

Just chiming in that on a 3090, I'm pretty sure I've hit the bug of continuing to process old prompts in parallel to new prompts after a cancel. My solution was also to set parallel = 1, which I was kind of surprised wasn't the default.

an annoying usability snafu in the webui in that I always have to choose the loaded model because the select doesn't seem to default to one of the models already loaded.

Bothers me as well, tbh.

@ServeurpersoCom
Copy link
Contributor

ServeurpersoCom commented Mar 11, 2026

Just chiming in that on a 3090, I'm pretty sure I've hit the bug of continuing to process old prompts in parallel to new prompts after a cancel. My solution was also to set parallel = 1, which I was kind of surprised wasn't the default.

Interesting I'm going to set it back to 4 to continue trying to pinpoint the bug! Note that -kvu (kv unified) is also enabled by default, which is supposed to make it transparent when using only one thread.
But the prompt processing that doesn't stop, that's perhaps where we need to look; there might be a race condition on backend? Reproducing the issue with an agent client making parallel requests was quick. The web UI is not the root cause.

an annoying usability snafu in the webui in that I always have to choose the loaded model because the select doesn't seem to default to one of the models already loaded.

Please open an issue if it doesn't already exist so that we don't forget it

@strawberrymelonpanda
Copy link
Contributor

strawberrymelonpanda commented Mar 11, 2026

The web UI is not the root cause.

Oh yeah, sorry - for me I believe I hit it with Roo Code. It's been too long to remember the details very well at this point unfortunately. Usually something unexpected would happen like a loop or a crash on Roo's side or an unloaded model, then I'd restart the server, and find everything going at a snail's pace and with signs that pointed to the old attempt processing in the background (and usually stuck).

Problem went away with np=1 and I haven't thought much about it since.

Please open an issue if it doesn't already exist

Couldn't find one exactly, so sure.
#20382

Different enough than this to open another issue I think, but it's also important: 😉
#18129

managing default settings takes priority after the MCP client merge

@audstanley
Copy link

for all you that have stdin/stdout binaries and need a bridge for this feature; https://github.com/AgentForgeEngine/mpc-bridge - more features coming soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Compilation issues devops improvements to build systems and github actions enhancement New feature or request examples server/webui server

Projects

None yet

Development

Successfully merging this pull request may close these issues.