Skip to content

wgsim/model-bridge-mcp

Repository files navigation

model-bridge-mcp

This project modularizes the monolithic MCP server from archive/coder_ai_allocator_v1.0.py / archive/coder_ai_allocator_v1.1.py into the src/model_bridge structure.

Architecture

main.py (MCP tools)
  -> config/config_loader.py
  -> security/sanitizer.py
  -> core/failover_manager.py
  -> adapters/subprocess_adapter.py

Environment

  • Standard development environment: model-bridge-mcp_dev (conda)
  • Environment guide: ENVIRONMENT.md
  • Environment snapshot: environment/model-bridge-mcp_dev.yml
conda create -n model-bridge-mcp_dev python=3.11 -y
conda activate model-bridge-mcp_dev
python -m pip install mcp PyYAML pytest

Configuration

The default configuration file is src/model_bridge/config/default.yaml.

  • commands: codex/gemini/ollama/claude_code execution and health commands
  • routing.default_chains: default failover chain per tool
  • models: default/final-backup ollama models, catalog, aliases, local fallback chain
  • security: block patterns and sensitive paths
  • runtime.system_suffix: CLI prompt suffix
  • runtime.apply_system_suffix: per-service suffix application policy
  • runtime.transport_mode: subprocess (default) or sdk (scaffold)
  • runtime.extra_path: additional PATH directories for CLI discovery (see below)

CLI Path Discovery

MCP server automatically discovers CLI tools installed via version managers:

Auto-detected paths:

  • Node.js: nvm (~/.nvm/versions/node/*/bin), fnm, volta
  • Python: pyenv, conda (miniconda3/anaconda3)
  • Ruby: rbenv, rvm
  • Rust: cargo (~/.cargo/bin)
  • Go: ~/.go/bin, ~/go/bin
  • User local: ~/.local/bin

User-specified paths (highest priority):

If auto-discovery fails or you need custom paths, add them in config:

runtime:
  extra_path:
    - /custom/path/to/bin
    - ~/another/path

Priority order (highest first):

  1. User-specified extra_path from config
  2. Auto-discovered version manager paths
  3. System PATH at MCP server startup

Environment Variables

Some CLI providers require environment variables for authentication (e.g., Google Cloud Vertex AI). MCP servers may not inherit shell profile variables due to non-interactive shell execution.

Configure required environment variables:

runtime:
  extra_env_vars:
    GOOGLE_CLOUD_PROJECT: "your-project-id"
    GOOGLE_CLOUD_LOCATION: "us-central1"
    # Other provider vars as needed:
    # OPENAI_API_KEY: "sk-..."
    # ANTHROPIC_API_KEY: "sk-ant-..."

Priority order (highest first):

  1. User-specified extra_env_vars from config
  2. Auto-discovered from login shell (if accessible)
  3. MCP server process environment

Local Config File

For machine-specific settings (API keys, project IDs, paths), use a local config file that won't be committed to git:

Location: ~/.model_bridge/local.yaml

# ~/.model_bridge/local.yaml
runtime:
  extra_env_vars:
    GOOGLE_CLOUD_PROJECT: "your-project-id"
    GOOGLE_CLOUD_LOCATION: "us-central1"
  extra_path:
    - ~/custom/bin

Merge behavior: Local config is deep-merged on top of the default config. Only specify values you want to override.

Note: List fields (like extra_path) are replaced, not concatenated. If default has extra_path: ["/a"] and local has extra_path: ["~/b"], the result is ["~/b"] only.

Config loader verification:

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -m model_bridge.config.config_loader --pretty'

Plugins

model-bridge-mcp supports a plugin architecture for extending with custom AI providers.

Plugin Locations

  • Built-in: src/model_bridge/plugins/builtins/
  • User: ~/.model_bridge/plugins/

Creating a Custom Plugin

# ~/.model_bridge/plugins/my_provider/plugin.py
from model_bridge.plugins import ProviderPlugin, register_provider

@register_provider
class MyProvider(ProviderPlugin):
    @property
    def provider_id(self) -> str:
        return "my_custom"

    async def execute(self, prompt: str, model, options, **kwargs) -> str:
        # Your implementation
        return "response"

See docs/PLUGIN_GUIDE.md for full documentation.

Run

Import smoke

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -c "from model_bridge.main import mcp; print(type(mcp).__name__)"'

Runtime initialization note:

  • Runtime dependencies (config, adapter, failover) are initialized lazily on first tool call.
  • Importing model_bridge.main no longer eagerly loads runtime configuration.
  • runtime.transport_mode=sdk currently supports direct API for codex, gemini, claude_code, and ollama.
  • codex SDK auth priority: OPENAI_API_KEY (recommended) -> OPENAI_ACCESS_TOKEN (manual OAuth token path).
  • gemini SDK auth priority: GEMINI_API_KEY (recommended) -> GOOGLE_API_KEY.
  • claude_code SDK auth: ANTHROPIC_API_KEY -> ANTHROPIC_OAUTH_ACCESS_TOKEN/ANTHROPIC_ACCESS_TOKEN (model alias override via ANTHROPIC_MODEL* env supported).
  • ollama SDK auth is not required (local HTTP endpoint; configurable by OLLAMA_BASE_URL).
  • OAuth refresh automation (OpenAI/Gemini/Anthropic): set <PROVIDER>_OAUTH_TOKEN_FILE with token metadata and optionally <PROVIDER>_OAUTH_* refresh env (TOKEN_URL, REFRESH_TOKEN, CLIENT_ID, CLIENT_SECRET, SCOPE).

MCP run

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -m model_bridge.main'

Ollama Model Selection

ask_ollama now uses alias-first model resolution.

  • Default call: model="default"
  • Alias examples: default, fast, coder
  • Direct model names are allowed only when they exist in models.ollama_catalog

Behavior:

  • Explicit model request (model="coder" etc.) performs local install precheck via ollama list.
  • If requested model is not installed, it returns:
    • [MODEL ERROR] ... Install with: ollama pull <model>
  • For model="default", local fallback chain (models.ollama_local_fallback_chain) is attempted before cloud fallback.
  • For model="auto", alias is selected by lightweight prompt heuristic (fast/coder/default).

Unified Ask API

You can use the new unified tool:

  • ask(prompt, provider="auto|codex|gemini|ollama|claude_code", model="default|auto|...", reasoning_effort=None)
  • For codex/gemini/claude_code, set model="<provider-model-id>" to forward model selection to each CLI.
  • For codex/gemini/claude_code, model trial policy is:
    • explicit model => try that model first, then retry once without --model
    • no explicit model => try provider catalog models in order, then retry once without --model
  • reasoning_effort is currently supported for:
    • codex
    • gemini (sdk transport only)
    • claude_code
  • Provider-specific transport mapping:
    • codex sdk => payload.reasoning.effort
    • codex subprocess => codex exec -c model_reasoning_effort="..."
    • gemini sdk => generationConfig.thinkingConfig.thinkingLevel
    • gemini subprocess => unsupported in this MCP
    • claude_code sdk => output_config.effort plus thinking={"type":"adaptive"} for Claude 4.6 aliases
    • claude_code subprocess => claude --effort <level>
  • Claude effort handling also uses a transport-specific runtime probe:
    • confirmed unsupported on the current machine/account => hard fail
    • probe timeout or inconclusive transport/auth error => pass through to the normal request path
  • when set, provider fallback candidates are filtered to documented-compatible models only
  • Common options across ask tools:
    • timeout_seconds
    • max_output_tokens
    • response_format (text/json)
    • verbosity (brief/normal/detailed)
    • stream (fallback chunk mode)
    • session_id (for optional session continuity)

Runtime behavior:

  • Optional prompt cache (TTL + max entries).
  • Optional session memory (TTL + max turns).

Batch Ask API (MCP-internal Orchestration)

Use ask_batch(...) to process multiple prompts in one MCP call.

  • prompts: list[str] (required)
  • mode: sequential|parallel (default: sequential)
  • max_concurrency (used when mode=parallel)
  • Reuses existing ask options: provider, model, force_model, timeout_seconds, response_format, verbosity, stream, session_id

ask_batch executes within MCP server orchestration, so external client parallelism is not required.

Ollama safety behavior:

  • For provider="ollama" in mode="parallel", concurrency is automatically clamped by runtime resource guard.
  • Default conservative start is 1.
  • Guard uses runtime RAM/VRAM visibility and model memory profile from config:
    • runtime.ollama_resource_guard_*
    • runtime.ollama_model_memory_gb

Skill Workflows

This repository now includes workflow-oriented skill definitions under skills/.

  • ask-general-workflow
  • ask-review-workflow
  • ask-code-writing-workflow
  • ask-strict-json-workflow
  • ask-batch-workflow
  • ask-provider-routing-workflow

Routing/trigger policy is documented in:

  • docs/skills/skill-routing-spec.md

Health Check Example

The current operational health check verifies CLI availability from config.

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python - <<\"PY\"
import shutil
from model_bridge.config.config_loader import load_config
cfg = load_config()
print(\"--- CLI Health Check ---\")
for name in [\"codex\", \"gemini\", \"ollama\", \"claude_code\"]:
    cmd = cfg[\"commands\"][name][\"health\"][0]
    status = \"Online\" if shutil.which(cmd) else \"Offline\"
    print(f\"[{name.capitalize()}]: {status}\")
PY'

Example output:

--- CLI Health Check ---
[Codex]: Online
[Gemini]: Online
[Ollama]: Online
[Claude_code]: Online

Routing Log Example

Response format example for ask_chatgpt_cli(prompt, force_model=True):

[Task Execution Failed]
Forced Primary (codex) failed.
Error: <service error>

--- [Routing Log] ---
[1] Primary (codex): Trying...
    [FAILED]

Security block example:

[SECURITY BLOCK] Access to critical system path '/etc/' is strictly FORBIDDEN.

Ollama Inventory Tool

The MCP tool list_ollama_models() returns both configured and runtime availability info.

Includes:

  • default_model
  • effective_default
  • aliases
  • recommended_aliases
  • catalog
  • installed
  • missing
  • pull_commands
  • status / error

JSON output contract:

  • schemas/list_ollama_models.schema.json
  • Unit validation: tests/unit/test_response_contracts.py

Example invocation and parsing:

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python - <<\"PY\"
import json
from model_bridge.main import list_ollama_models
payload = json.loads(list_ollama_models())
print(\"status:\", payload[\"status\"])
print(\"effective_default:\", payload[\"effective_default\"])
print(\"installed_count:\", len(payload[\"installed\"]))
print(\"missing:\", payload[\"missing\"])
print(\"pull_commands:\", payload.get(\"pull_commands\", []))
PY'

Interpretation guide:

  • status="ok": runtime ollama list succeeded.
  • status="unavailable": local runtime inventory failed; see error.
  • missing: configured models not currently installed.
  • pull_commands: ready-to-run install commands for missing models.

Provider Model Inventory Tool

Use list_provider_models(provider="all|codex|gemini|ollama|claude_code") to inspect model options per provider.

  • ollama: dynamic runtime inventory (installed, missing, pull_commands).
  • codex/gemini/claude_code: config-based catalog from:
    • models.codex_model_catalog
    • models.gemini_model_catalog
    • models.claude_code_model_catalog
  • Each non-ollama provider includes model_flag="--model", default_model, and configured command metadata.
  • Current default catalogs:
    • codex: gpt-5.4, gpt-5.3-codex, gpt-5.2-codex, gpt-5.1-codex-max, gpt-5.2, gpt-5.1-codex-mini
    • gemini: gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-2.5-pro, gemini-2.5-flash-lite, gemini-2.5-flash, gemini-3-pro-preview
    • claude_code: haiku, sonnet, opus
  • Codex default_model is the first catalog entry, currently gpt-5.4.
  • Gemini default_model is the first catalog entry, currently gemini-3.1-pro-preview.
  • Gemini reasoning_effort guardrails are model-specific:
    • gemini-3.1-pro-preview, gemini-3-pro-preview: low, high
    • gemini-3-flash-preview: minimal, low, medium, high
    • gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite: disabled in this MCP
  • Codex reasoning_effort guardrails are model-specific:
    • gpt-5.4: none, low, medium, high, xhigh
    • gpt-5.3-codex, gpt-5.2-codex: low, medium, high, xhigh
    • gpt-5.1-codex-max, gpt-5.1-codex-mini: disabled in this MCP until documented support is confirmed
  • Claude 4.6 reasoning_effort guardrails are model-specific:
    • sonnet / claude-sonnet-4-6: low, medium, high
    • opus / claude-opus-4-6: low, medium, high, max
    • haiku: disabled in this MCP
  • Note: some Gemini preview models may require additional internal flags/account enablement.

Orchestrator Capability Tool

Use list_orchestrator_capabilities() to inspect external orchestrator assumptions and the recommended execution policy.

  • Recommended default: one MCP call + internal fan-out via ask_batch(mode="parallel")
  • Capability matrix included for:
    • codex
    • gemini
    • claude_code
  • Fallback rule:
    • if external parallel behavior is uncertain, use MCP-internal parallel orchestration

Runtime Resource Tool

Use list_runtime_resources(model=\"default\", requested_max_concurrency=1) to inspect runtime resource snapshot and ollama concurrency recommendation.

  • Returns:
    • ram_total_gb, ram_free_gb
    • vram_total_gb, vram_free_gb, vram_detector
    • ollama_recommendation.applied_max_concurrency

CLI Startup Prompt Policy Tool

Use list_cli_noninteractive_policy() to inspect startup/trust prompt handling for each provider.

  • codex
    • non-interactive path: codex exec
    • skip flag: --skip-git-repo-check (recommended in config)
  • gemini
    • non-interactive path: gemini -p
    • no documented workspace-trust skip flag in current CLI help output
    • if stalled, complete one-time trust/auth in interactive mode
  • claude_code
    • non-interactive path: claude -p
    • CLI help indicates workspace trust dialog is skipped in -p mode

Prompt Execution Policy Tool

Use list_prompt_execution_policy() to inspect available prompt policy presets for ask(...) / ask_batch(...).

  • Recommended deterministic call pattern:
    • instruction_preset="strict_once"
    • response_format="json"
  • strict_once injects a fixed policy block that enforces:
    • single-response execution
    • no follow-up questions
    • explicit Assumption labeling when context is missing
    • strict output format compliance

Output cleanliness option:

  • output_mode="clean" (default): strip known CLI startup/log noise lines from provider output.
  • output_mode="raw": keep original provider output (including startup/log lines).

Default policy without repeating args:

  • Configure once in runtime.ask_defaults:
    • instruction_preset: "strict_once"
    • output_mode: "clean"
  • Then ask(...) / ask_batch(...) automatically apply these when args are omitted.

Telemetry note:

  • model_bridge.telemetry logs structured events to stderr.
  • Current fields include request_id, routing_tier, status, error_category, and latency_ms.

Security Boundaries

  • Destructive pattern blocking (rm -rf, mkfs, dd if=, chmod 777, fork bomb)
  • Sensitive system path access blocking (/etc/, /var/, /boot/, /proc/, /root/)
  • Restricted save destinations (/etc, /var, /usr, /bin, /sbin, /root)

Tests

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src pytest -q tests'

Integration smoke coverage:

  • tests/integration/test_tool_smoke.py
  • verifies ask_chatgpt_cli, ask_gemini_cli, ask_claude_code, ask_ollama, and list_ollama_models entrypoint paths with minimal mocks.

Migration Note

  • Legacy source:
    • archive/coder_ai_allocator_v1.0.py
    • archive/coder_ai_allocator_v1.1.py
  • New entrypoint:
    • src/model_bridge/main.py
  • Existing tool signatures are preserved:
    • ask_chatgpt_cli(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)
    • ask_gemini_cli(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)
    • ask_claude_code(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)
    • ask_ollama(prompt, save_path=None, model=\"default\")

About

Modular MCP bridge server with provider routing, CLI/SDK failover, and plugin support

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages