model-bridge-mcp

This project modularizes the monolithic MCP server from archive/coder_ai_allocator_v1.0.py / archive/coder_ai_allocator_v1.1.py into the src/model_bridge structure.

Architecture

main.py (MCP tools)
  -> config/config_loader.py
  -> security/sanitizer.py
  -> core/failover_manager.py
  -> adapters/subprocess_adapter.py

Environment

Standard development environment: model-bridge-mcp_dev (conda)
Environment guide: ENVIRONMENT.md
Environment snapshot: environment/model-bridge-mcp_dev.yml

conda create -n model-bridge-mcp_dev python=3.11 -y
conda activate model-bridge-mcp_dev
python -m pip install mcp PyYAML pytest

Configuration

The default configuration file is src/model_bridge/config/default.yaml.

commands: codex/gemini/ollama/claude_code execution and health commands
routing.default_chains: default failover chain per tool
models: default/final-backup ollama models, catalog, aliases, local fallback chain
security: block patterns and sensitive paths
runtime.system_suffix: CLI prompt suffix
runtime.apply_system_suffix: per-service suffix application policy
runtime.transport_mode: subprocess (default) or sdk (scaffold)
runtime.extra_path: additional PATH directories for CLI discovery (see below)

CLI Path Discovery

MCP server automatically discovers CLI tools installed via version managers:

Auto-detected paths:

Node.js: nvm (~/.nvm/versions/node/*/bin), fnm, volta
Python: pyenv, conda (miniconda3/anaconda3)
Ruby: rbenv, rvm
Rust: cargo (~/.cargo/bin)
Go: ~/.go/bin, ~/go/bin
User local: ~/.local/bin

User-specified paths (highest priority):

If auto-discovery fails or you need custom paths, add them in config:

runtime:
  extra_path:
    - /custom/path/to/bin
    - ~/another/path

Priority order (highest first):

User-specified extra_path from config
Auto-discovered version manager paths
System PATH at MCP server startup

Environment Variables

Some CLI providers require environment variables for authentication (e.g., Google Cloud Vertex AI). MCP servers may not inherit shell profile variables due to non-interactive shell execution.

Configure required environment variables:

runtime:
  extra_env_vars:
    GOOGLE_CLOUD_PROJECT: "your-project-id"
    GOOGLE_CLOUD_LOCATION: "us-central1"
    # Other provider vars as needed:
    # OPENAI_API_KEY: "sk-..."
    # ANTHROPIC_API_KEY: "sk-ant-..."

Priority order (highest first):

User-specified extra_env_vars from config
Auto-discovered from login shell (if accessible)
MCP server process environment

Local Config File

For machine-specific settings (API keys, project IDs, paths), use a local config file that won't be committed to git:

Location: ~/.model_bridge/local.yaml

# ~/.model_bridge/local.yaml
runtime:
  extra_env_vars:
    GOOGLE_CLOUD_PROJECT: "your-project-id"
    GOOGLE_CLOUD_LOCATION: "us-central1"
  extra_path:
    - ~/custom/bin

Merge behavior: Local config is deep-merged on top of the default config. Only specify values you want to override.

Note: List fields (like extra_path) are replaced, not concatenated. If default has extra_path: ["/a"] and local has extra_path: ["~/b"], the result is ["~/b"] only.

Config loader verification:

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -m model_bridge.config.config_loader --pretty'

Plugins

model-bridge-mcp supports a plugin architecture for extending with custom AI providers.

Plugin Locations

Built-in: src/model_bridge/plugins/builtins/
User: ~/.model_bridge/plugins/

Creating a Custom Plugin

# ~/.model_bridge/plugins/my_provider/plugin.py
from model_bridge.plugins import ProviderPlugin, register_provider

@register_provider
class MyProvider(ProviderPlugin):
    @property
    def provider_id(self) -> str:
        return "my_custom"

    async def execute(self, prompt: str, model, options, **kwargs) -> str:
        # Your implementation
        return "response"

See docs/PLUGIN_GUIDE.md for full documentation.

Run

Import smoke

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -c "from model_bridge.main import mcp; print(type(mcp).__name__)"'

Runtime initialization note:

Runtime dependencies (config, adapter, failover) are initialized lazily on first tool call.
Importing model_bridge.main no longer eagerly loads runtime configuration.
runtime.transport_mode=sdk currently supports direct API for codex, gemini, claude_code, and ollama.
codex SDK auth priority: OPENAI_API_KEY (recommended) -> OPENAI_ACCESS_TOKEN (manual OAuth token path).
gemini SDK auth priority: GEMINI_API_KEY (recommended) -> GOOGLE_API_KEY.
claude_code SDK auth: ANTHROPIC_API_KEY -> ANTHROPIC_OAUTH_ACCESS_TOKEN/ANTHROPIC_ACCESS_TOKEN (model alias override via ANTHROPIC_MODEL* env supported).
ollama SDK auth is not required (local HTTP endpoint; configurable by OLLAMA_BASE_URL).
OAuth refresh automation (OpenAI/Gemini/Anthropic): set <PROVIDER>_OAUTH_TOKEN_FILE with token metadata and optionally <PROVIDER>_OAUTH_* refresh env (TOKEN_URL, REFRESH_TOKEN, CLIENT_ID, CLIENT_SECRET, SCOPE).

MCP run

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -m model_bridge.main'

Ollama Model Selection

ask_ollama now uses alias-first model resolution.

Default call: model="default"
Alias examples: default, fast, coder
Direct model names are allowed only when they exist in models.ollama_catalog

Behavior:

Explicit model request (model="coder" etc.) performs local install precheck via ollama list.
If requested model is not installed, it returns:
- [MODEL ERROR] ... Install with: ollama pull <model>
For model="default", local fallback chain (models.ollama_local_fallback_chain) is attempted before cloud fallback.
For model="auto", alias is selected by lightweight prompt heuristic (fast/coder/default).

Unified Ask API

You can use the new unified tool:

ask(prompt, provider="auto|codex|gemini|ollama|claude_code", model="default|auto|...", reasoning_effort=None)
For codex/gemini/claude_code, set model="<provider-model-id>" to forward model selection to each CLI.
For codex/gemini/claude_code, model trial policy is:
- explicit model => try that model first, then retry once without --model
- no explicit model => try provider catalog models in order, then retry once without --model
reasoning_effort is currently supported for:
- codex
- gemini (sdk transport only)
- claude_code
Provider-specific transport mapping:
- codex sdk => payload.reasoning.effort
- codex subprocess => codex exec -c model_reasoning_effort="..."
- gemini sdk => generationConfig.thinkingConfig.thinkingLevel
- gemini subprocess => unsupported in this MCP
- claude_code sdk => output_config.effort plus thinking={"type":"adaptive"} for Claude 4.6 aliases
- claude_code subprocess => claude --effort <level>
Claude effort handling also uses a transport-specific runtime probe:
- confirmed unsupported on the current machine/account => hard fail
- probe timeout or inconclusive transport/auth error => pass through to the normal request path
when set, provider fallback candidates are filtered to documented-compatible models only
Common options across ask tools:
- timeout_seconds
- max_output_tokens
- response_format (text/json)
- verbosity (brief/normal/detailed)
- stream (fallback chunk mode)
- session_id (for optional session continuity)

Runtime behavior:

Optional prompt cache (TTL + max entries).
Optional session memory (TTL + max turns).

Batch Ask API (MCP-internal Orchestration)

Use ask_batch(...) to process multiple prompts in one MCP call.

prompts: list[str] (required)
mode: sequential|parallel (default: sequential)
max_concurrency (used when mode=parallel)
Reuses existing ask options: provider, model, force_model, timeout_seconds, response_format, verbosity, stream, session_id

ask_batch executes within MCP server orchestration, so external client parallelism is not required.

Ollama safety behavior:

For provider="ollama" in mode="parallel", concurrency is automatically clamped by runtime resource guard.
Default conservative start is 1.
Guard uses runtime RAM/VRAM visibility and model memory profile from config:
- runtime.ollama_resource_guard_*
- runtime.ollama_model_memory_gb

Skill Workflows

This repository now includes workflow-oriented skill definitions under skills/.

ask-general-workflow
ask-review-workflow
ask-code-writing-workflow
ask-strict-json-workflow
ask-batch-workflow
ask-provider-routing-workflow

Routing/trigger policy is documented in:

docs/skills/skill-routing-spec.md

Health Check Example

The current operational health check verifies CLI availability from config.

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python - <<\"PY\"
import shutil
from model_bridge.config.config_loader import load_config
cfg = load_config()
print(\"--- CLI Health Check ---\")
for name in [\"codex\", \"gemini\", \"ollama\", \"claude_code\"]:
    cmd = cfg[\"commands\"][name][\"health\"][0]
    status = \"Online\" if shutil.which(cmd) else \"Offline\"
    print(f\"[{name.capitalize()}]: {status}\")
PY'

Example output:

--- CLI Health Check ---
[Codex]: Online
[Gemini]: Online
[Ollama]: Online
[Claude_code]: Online

Routing Log Example

Response format example for ask_chatgpt_cli(prompt, force_model=True):

[Task Execution Failed]
Forced Primary (codex) failed.
Error: <service error>

--- [Routing Log] ---
[1] Primary (codex): Trying...
    [FAILED]

Security block example:

[SECURITY BLOCK] Access to critical system path '/etc/' is strictly FORBIDDEN.

Ollama Inventory Tool

The MCP tool list_ollama_models() returns both configured and runtime availability info.

Includes:

default_model
effective_default
aliases
recommended_aliases
catalog
installed
missing
pull_commands
status / error

JSON output contract:

schemas/list_ollama_models.schema.json
Unit validation: tests/unit/test_response_contracts.py

Example invocation and parsing:

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python - <<\"PY\"
import json
from model_bridge.main import list_ollama_models
payload = json.loads(list_ollama_models())
print(\"status:\", payload[\"status\"])
print(\"effective_default:\", payload[\"effective_default\"])
print(\"installed_count:\", len(payload[\"installed\"]))
print(\"missing:\", payload[\"missing\"])
print(\"pull_commands:\", payload.get(\"pull_commands\", []))
PY'

Interpretation guide:

status="ok": runtime ollama list succeeded.
status="unavailable": local runtime inventory failed; see error.
missing: configured models not currently installed.
pull_commands: ready-to-run install commands for missing models.

Provider Model Inventory Tool

Use list_provider_models(provider="all|codex|gemini|ollama|claude_code") to inspect model options per provider.

ollama: dynamic runtime inventory (installed, missing, pull_commands).
codex/gemini/claude_code: config-based catalog from:
- models.codex_model_catalog
- models.gemini_model_catalog
- models.claude_code_model_catalog
Each non-ollama provider includes model_flag="--model", default_model, and configured command metadata.
Current default catalogs:
- codex: gpt-5.4, gpt-5.3-codex, gpt-5.2-codex, gpt-5.1-codex-max, gpt-5.2, gpt-5.1-codex-mini
- gemini: gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-2.5-pro, gemini-2.5-flash-lite, gemini-2.5-flash, gemini-3-pro-preview
- claude_code: haiku, sonnet, opus
Codex default_model is the first catalog entry, currently gpt-5.4.
Gemini default_model is the first catalog entry, currently gemini-3.1-pro-preview.
Gemini reasoning_effort guardrails are model-specific:
- gemini-3.1-pro-preview, gemini-3-pro-preview: low, high
- gemini-3-flash-preview: minimal, low, medium, high
- gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite: disabled in this MCP
Codex reasoning_effort guardrails are model-specific:
- gpt-5.4: none, low, medium, high, xhigh
- gpt-5.3-codex, gpt-5.2-codex: low, medium, high, xhigh
- gpt-5.1-codex-max, gpt-5.1-codex-mini: disabled in this MCP until documented support is confirmed
Claude 4.6 reasoning_effort guardrails are model-specific:
- sonnet / claude-sonnet-4-6: low, medium, high
- opus / claude-opus-4-6: low, medium, high, max
- haiku: disabled in this MCP
Note: some Gemini preview models may require additional internal flags/account enablement.

Orchestrator Capability Tool

Use list_orchestrator_capabilities() to inspect external orchestrator assumptions and the recommended execution policy.

Recommended default: one MCP call + internal fan-out via ask_batch(mode="parallel")
Capability matrix included for:
- codex
- gemini
- claude_code
Fallback rule:
- if external parallel behavior is uncertain, use MCP-internal parallel orchestration

Runtime Resource Tool

Use list_runtime_resources(model=\"default\", requested_max_concurrency=1) to inspect runtime resource snapshot and ollama concurrency recommendation.

Returns:
- ram_total_gb, ram_free_gb
- vram_total_gb, vram_free_gb, vram_detector
- ollama_recommendation.applied_max_concurrency

CLI Startup Prompt Policy Tool

Use list_cli_noninteractive_policy() to inspect startup/trust prompt handling for each provider.

codex
- non-interactive path: codex exec
- skip flag: --skip-git-repo-check (recommended in config)
gemini
- non-interactive path: gemini -p
- no documented workspace-trust skip flag in current CLI help output
- if stalled, complete one-time trust/auth in interactive mode
claude_code
- non-interactive path: claude -p
- CLI help indicates workspace trust dialog is skipped in -p mode

Prompt Execution Policy Tool

Use list_prompt_execution_policy() to inspect available prompt policy presets for ask(...) / ask_batch(...).

Recommended deterministic call pattern:
- instruction_preset="strict_once"
- response_format="json"
strict_once injects a fixed policy block that enforces:
- single-response execution
- no follow-up questions
- explicit Assumption labeling when context is missing
- strict output format compliance

Output cleanliness option:

output_mode="clean" (default): strip known CLI startup/log noise lines from provider output.
output_mode="raw": keep original provider output (including startup/log lines).

Default policy without repeating args:

Configure once in runtime.ask_defaults:
- instruction_preset: "strict_once"
- output_mode: "clean"
Then ask(...) / ask_batch(...) automatically apply these when args are omitted.

Telemetry note:

model_bridge.telemetry logs structured events to stderr.
Current fields include request_id, routing_tier, status, error_category, and latency_ms.

Security Boundaries

Destructive pattern blocking (rm -rf, mkfs, dd if=, chmod 777, fork bomb)
Sensitive system path access blocking (/etc/, /var/, /boot/, /proc/, /root/)
Restricted save destinations (/etc, /var, /usr, /bin, /sbin, /root)

Tests

conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src pytest -q tests'

Integration smoke coverage:

tests/integration/test_tool_smoke.py
verifies ask_chatgpt_cli, ask_gemini_cli, ask_claude_code, ask_ollama, and list_ollama_models entrypoint paths with minimal mocks.

Migration Note

Legacy source:
- archive/coder_ai_allocator_v1.0.py
- archive/coder_ai_allocator_v1.1.py
New entrypoint:
- src/model_bridge/main.py
Existing tool signatures are preserved:
- ask_chatgpt_cli(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)
- ask_gemini_cli(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)
- ask_claude_code(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)
- ask_ollama(prompt, save_path=None, model=\"default\")

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.github/workflows		.github/workflows
archive		archive
docs		docs
environment		environment
schemas		schemas
skills		skills
src/model_bridge		src/model_bridge
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT_PLAN.md		DEVELOPMENT_PLAN.md
ENVIRONMENT.md		ENVIRONMENT.md
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

model-bridge-mcp

Architecture

Environment

Configuration

CLI Path Discovery

Environment Variables

Local Config File

Plugins

Plugin Locations

Creating a Custom Plugin

Run

Import smoke

MCP run

Ollama Model Selection

Unified Ask API

Batch Ask API (MCP-internal Orchestration)

Skill Workflows

Health Check Example

Routing Log Example

Ollama Inventory Tool

Provider Model Inventory Tool

Orchestrator Capability Tool

Runtime Resource Tool

CLI Startup Prompt Policy Tool

Prompt Execution Policy Tool

Security Boundaries

Tests

Migration Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

model-bridge-mcp

Architecture

Environment

Configuration

CLI Path Discovery

Environment Variables

Local Config File

Plugins

Plugin Locations

Creating a Custom Plugin

Run

Import smoke

MCP run

Ollama Model Selection

Unified Ask API

Batch Ask API (MCP-internal Orchestration)

Skill Workflows

Health Check Example

Routing Log Example

Ollama Inventory Tool

Provider Model Inventory Tool

Orchestrator Capability Tool

Runtime Resource Tool

CLI Startup Prompt Policy Tool

Prompt Execution Policy Tool

Security Boundaries

Tests

Migration Note

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages