Skip to content

[FEATURE]: Implement Dynamic/Lazy Loading for MCP Tool Schemas to Prevent Context Bloat #17482

@wenbochang888

Description

@wenbochang888

Feature hasn't been suggested before.

  • I have verified this feature I'm about to request hasn't been suggested before.

Describe the enhancement you want to request

Type

Performance Optimization / Token Efficiency

Description

Currently, OpenCode injects the entire input_schema of all connected MCP tools into the LLM context at the start of every session. While this follows a literal implementation of the MCP protocol, it creates a massive "Token Tax," especially when dealing with complex MCP servers.

I performed a comparative test between OpenCode and Cursor using an MCP server with a large input_schema:

Behavior OpenCode Cursor
Initial Context Full schema loaded immediately Minimal metadata (name/description) only
On Tool Call N/A Fetches input_schema just-in-time

The Problem

As discussed in previous issues like #2841, the current implementation causes the system prompt to grow linearly with the number and complexity of MCP tools. For power users with many MCP servers, this makes the tool almost unusable due to:

  • High Token Costs: Paying for the same large schema in every single turn
  • Context Exhaustion: Large schemas leave less room for actual code and reasoning
  • Model Confusion: Overloading the prompt with irrelevant schemas can lead to hallucinations

Past Evidence of Context Bloat (Screenshots Attached)

I have attached a comparison showing the extreme token consumption caused by connecting a single MCP server (lark-mcp-docx):

Configuration Token Count Overhead
Without MCP ~21k tokens -
With MCP Enabled ~168k tokens +147k tokens

This demonstrates that the entire schema is being injected into the system prompt, consuming 86% of the context window before a single user message is even processed. This makes the tool unusable for large-scale MCP integrations.

Steps to Reproduce

  1. Connect an MCP server with a very large/complex input_schema
  2. Start a session and ask a simple question like "How is the weather?"
  3. Inspect the tokens/context. OpenCode will show high token usage because it loaded the entire tool definition, whereas Cursor remains lean

Proposed Solution

Adopt a "Lazy Loading" or "Two-Step Discovery" mechanism similar to Cursor or the latest Claude Code updates:

  1. Initial Context: Send only the name and a short description of the MCP tools
  2. Just-in-Time Injection: When the LLM's intent matches a tool's description, the client should then inject the detailed input_schema for that specific tool before the final inference step
  3. Toggle: Provide a setting to "Lazy Load" specific MCP servers

Additional Context

Addressing this would bring OpenCode's MCP efficiency on par with Cursor and Windsurf, making it much more viable for large-scale enterprise MCP integrations.

References

Image

Metadata

Metadata

Assignees

Labels

coreAnything pertaining to core functionality of the application (opencode server stuff)discussionUsed for feature requests, proposals, ideas, etc. Open discussionperfIndicates a performance issue or need for optimization

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions