This project modularizes the monolithic MCP server from archive/coder_ai_allocator_v1.0.py / archive/coder_ai_allocator_v1.1.py into the src/model_bridge structure.
main.py (MCP tools)
-> config/config_loader.py
-> security/sanitizer.py
-> core/failover_manager.py
-> adapters/subprocess_adapter.py
- Standard development environment:
model-bridge-mcp_dev(conda) - Environment guide:
ENVIRONMENT.md - Environment snapshot:
environment/model-bridge-mcp_dev.yml
conda create -n model-bridge-mcp_dev python=3.11 -y
conda activate model-bridge-mcp_dev
python -m pip install mcp PyYAML pytestThe default configuration file is src/model_bridge/config/default.yaml.
commands: codex/gemini/ollama/claude_code execution and health commandsrouting.default_chains: default failover chain per toolmodels: default/final-backup ollama models, catalog, aliases, local fallback chainsecurity: block patterns and sensitive pathsruntime.system_suffix: CLI prompt suffixruntime.apply_system_suffix: per-service suffix application policyruntime.transport_mode:subprocess(default) orsdk(scaffold)runtime.extra_path: additional PATH directories for CLI discovery (see below)
MCP server automatically discovers CLI tools installed via version managers:
Auto-detected paths:
- Node.js: nvm (
~/.nvm/versions/node/*/bin), fnm, volta - Python: pyenv, conda (miniconda3/anaconda3)
- Ruby: rbenv, rvm
- Rust: cargo (
~/.cargo/bin) - Go:
~/.go/bin,~/go/bin - User local:
~/.local/bin
User-specified paths (highest priority):
If auto-discovery fails or you need custom paths, add them in config:
runtime:
extra_path:
- /custom/path/to/bin
- ~/another/pathPriority order (highest first):
- User-specified
extra_pathfrom config - Auto-discovered version manager paths
- System PATH at MCP server startup
Some CLI providers require environment variables for authentication (e.g., Google Cloud Vertex AI). MCP servers may not inherit shell profile variables due to non-interactive shell execution.
Configure required environment variables:
runtime:
extra_env_vars:
GOOGLE_CLOUD_PROJECT: "your-project-id"
GOOGLE_CLOUD_LOCATION: "us-central1"
# Other provider vars as needed:
# OPENAI_API_KEY: "sk-..."
# ANTHROPIC_API_KEY: "sk-ant-..."Priority order (highest first):
- User-specified
extra_env_varsfrom config - Auto-discovered from login shell (if accessible)
- MCP server process environment
For machine-specific settings (API keys, project IDs, paths), use a local config file that won't be committed to git:
Location: ~/.model_bridge/local.yaml
# ~/.model_bridge/local.yaml
runtime:
extra_env_vars:
GOOGLE_CLOUD_PROJECT: "your-project-id"
GOOGLE_CLOUD_LOCATION: "us-central1"
extra_path:
- ~/custom/binMerge behavior: Local config is deep-merged on top of the default config. Only specify values you want to override.
Note: List fields (like extra_path) are replaced, not concatenated. If default has extra_path: ["/a"] and local has extra_path: ["~/b"], the result is ["~/b"] only.
Config loader verification:
conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -m model_bridge.config.config_loader --pretty'model-bridge-mcp supports a plugin architecture for extending with custom AI providers.
- Built-in:
src/model_bridge/plugins/builtins/ - User:
~/.model_bridge/plugins/
# ~/.model_bridge/plugins/my_provider/plugin.py
from model_bridge.plugins import ProviderPlugin, register_provider
@register_provider
class MyProvider(ProviderPlugin):
@property
def provider_id(self) -> str:
return "my_custom"
async def execute(self, prompt: str, model, options, **kwargs) -> str:
# Your implementation
return "response"See docs/PLUGIN_GUIDE.md for full documentation.
conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -c "from model_bridge.main import mcp; print(type(mcp).__name__)"'Runtime initialization note:
- Runtime dependencies (
config,adapter,failover) are initialized lazily on first tool call. - Importing
model_bridge.mainno longer eagerly loads runtime configuration. runtime.transport_mode=sdkcurrently supports direct API forcodex,gemini,claude_code, andollama.codexSDK auth priority:OPENAI_API_KEY(recommended) ->OPENAI_ACCESS_TOKEN(manual OAuth token path).geminiSDK auth priority:GEMINI_API_KEY(recommended) ->GOOGLE_API_KEY.claude_codeSDK auth:ANTHROPIC_API_KEY->ANTHROPIC_OAUTH_ACCESS_TOKEN/ANTHROPIC_ACCESS_TOKEN(model alias override viaANTHROPIC_MODEL*env supported).ollamaSDK auth is not required (local HTTP endpoint; configurable byOLLAMA_BASE_URL).- OAuth refresh automation (OpenAI/Gemini/Anthropic): set
<PROVIDER>_OAUTH_TOKEN_FILEwith token metadata and optionally<PROVIDER>_OAUTH_*refresh env (TOKEN_URL,REFRESH_TOKEN,CLIENT_ID,CLIENT_SECRET,SCOPE).
conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python -m model_bridge.main'ask_ollama now uses alias-first model resolution.
- Default call:
model="default" - Alias examples:
default,fast,coder - Direct model names are allowed only when they exist in
models.ollama_catalog
Behavior:
- Explicit model request (
model="coder"etc.) performs local install precheck viaollama list. - If requested model is not installed, it returns:
[MODEL ERROR] ... Install with: ollama pull <model>
- For
model="default", local fallback chain (models.ollama_local_fallback_chain) is attempted before cloud fallback. - For
model="auto", alias is selected by lightweight prompt heuristic (fast/coder/default).
You can use the new unified tool:
ask(prompt, provider="auto|codex|gemini|ollama|claude_code", model="default|auto|...", reasoning_effort=None)- For
codex/gemini/claude_code, setmodel="<provider-model-id>"to forward model selection to each CLI. - For
codex/gemini/claude_code, model trial policy is:- explicit
model=> try that model first, then retry once without--model - no explicit
model=> try provider catalog models in order, then retry once without--model
- explicit
reasoning_effortis currently supported for:codexgemini(sdk transport only)claude_code
- Provider-specific transport mapping:
codexsdk =>payload.reasoning.effortcodexsubprocess =>codex exec -c model_reasoning_effort="..."geminisdk =>generationConfig.thinkingConfig.thinkingLevelgeminisubprocess => unsupported in this MCPclaude_codesdk =>output_config.effortplusthinking={"type":"adaptive"}for Claude 4.6 aliasesclaude_codesubprocess =>claude --effort <level>
- Claude effort handling also uses a transport-specific runtime probe:
- confirmed unsupported on the current machine/account => hard fail
- probe timeout or inconclusive transport/auth error => pass through to the normal request path
- when set, provider fallback candidates are filtered to documented-compatible models only
- Common options across ask tools:
timeout_secondsmax_output_tokensresponse_format(text/json)verbosity(brief/normal/detailed)stream(fallback chunk mode)session_id(for optional session continuity)
Runtime behavior:
- Optional prompt cache (TTL + max entries).
- Optional session memory (TTL + max turns).
Use ask_batch(...) to process multiple prompts in one MCP call.
prompts: list[str](required)mode: sequential|parallel(default:sequential)max_concurrency(used whenmode=parallel)- Reuses existing ask options:
provider,model,force_model,timeout_seconds,response_format,verbosity,stream,session_id
ask_batch executes within MCP server orchestration, so external client parallelism is not required.
Ollama safety behavior:
- For
provider="ollama"inmode="parallel", concurrency is automatically clamped by runtime resource guard. - Default conservative start is
1. - Guard uses runtime RAM/VRAM visibility and model memory profile from config:
runtime.ollama_resource_guard_*runtime.ollama_model_memory_gb
This repository now includes workflow-oriented skill definitions under skills/.
ask-general-workflowask-review-workflowask-code-writing-workflowask-strict-json-workflowask-batch-workflowask-provider-routing-workflow
Routing/trigger policy is documented in:
docs/skills/skill-routing-spec.md
The current operational health check verifies CLI availability from config.
conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python - <<\"PY\"
import shutil
from model_bridge.config.config_loader import load_config
cfg = load_config()
print(\"--- CLI Health Check ---\")
for name in [\"codex\", \"gemini\", \"ollama\", \"claude_code\"]:
cmd = cfg[\"commands\"][name][\"health\"][0]
status = \"Online\" if shutil.which(cmd) else \"Offline\"
print(f\"[{name.capitalize()}]: {status}\")
PY'Example output:
--- CLI Health Check ---
[Codex]: Online
[Gemini]: Online
[Ollama]: Online
[Claude_code]: Online
Response format example for ask_chatgpt_cli(prompt, force_model=True):
[Task Execution Failed]
Forced Primary (codex) failed.
Error: <service error>
--- [Routing Log] ---
[1] Primary (codex): Trying...
[FAILED]
Security block example:
[SECURITY BLOCK] Access to critical system path '/etc/' is strictly FORBIDDEN.
The MCP tool list_ollama_models() returns both configured and runtime availability info.
Includes:
default_modeleffective_defaultaliasesrecommended_aliasescataloginstalledmissingpull_commandsstatus/error
JSON output contract:
schemas/list_ollama_models.schema.json- Unit validation:
tests/unit/test_response_contracts.py
Example invocation and parsing:
conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src python - <<\"PY\"
import json
from model_bridge.main import list_ollama_models
payload = json.loads(list_ollama_models())
print(\"status:\", payload[\"status\"])
print(\"effective_default:\", payload[\"effective_default\"])
print(\"installed_count:\", len(payload[\"installed\"]))
print(\"missing:\", payload[\"missing\"])
print(\"pull_commands:\", payload.get(\"pull_commands\", []))
PY'Interpretation guide:
status="ok": runtimeollama listsucceeded.status="unavailable": local runtime inventory failed; seeerror.missing: configured models not currently installed.pull_commands: ready-to-run install commands for missing models.
Use list_provider_models(provider="all|codex|gemini|ollama|claude_code") to inspect model options per provider.
ollama: dynamic runtime inventory (installed,missing,pull_commands).codex/gemini/claude_code: config-based catalog from:models.codex_model_catalogmodels.gemini_model_catalogmodels.claude_code_model_catalog
- Each non-ollama provider includes
model_flag="--model",default_model, and configured command metadata. - Current default catalogs:
codex:gpt-5.4,gpt-5.3-codex,gpt-5.2-codex,gpt-5.1-codex-max,gpt-5.2,gpt-5.1-codex-minigemini:gemini-3.1-pro-preview,gemini-3-flash-preview,gemini-2.5-pro,gemini-2.5-flash-lite,gemini-2.5-flash,gemini-3-pro-previewclaude_code:haiku,sonnet,opus
- Codex
default_modelis the first catalog entry, currentlygpt-5.4. - Gemini
default_modelis the first catalog entry, currentlygemini-3.1-pro-preview. - Gemini
reasoning_effortguardrails are model-specific:gemini-3.1-pro-preview,gemini-3-pro-preview:low,highgemini-3-flash-preview:minimal,low,medium,highgemini-2.5-pro,gemini-2.5-flash,gemini-2.5-flash-lite: disabled in this MCP
- Codex
reasoning_effortguardrails are model-specific:gpt-5.4:none,low,medium,high,xhighgpt-5.3-codex,gpt-5.2-codex:low,medium,high,xhighgpt-5.1-codex-max,gpt-5.1-codex-mini: disabled in this MCP until documented support is confirmed
- Claude 4.6
reasoning_effortguardrails are model-specific:sonnet/claude-sonnet-4-6:low,medium,highopus/claude-opus-4-6:low,medium,high,maxhaiku: disabled in this MCP
- Note: some Gemini preview models may require additional internal flags/account enablement.
Use list_orchestrator_capabilities() to inspect external orchestrator assumptions and the recommended execution policy.
- Recommended default: one MCP call + internal fan-out via
ask_batch(mode="parallel") - Capability matrix included for:
codexgeminiclaude_code
- Fallback rule:
- if external parallel behavior is uncertain, use MCP-internal parallel orchestration
Use list_runtime_resources(model=\"default\", requested_max_concurrency=1) to inspect runtime resource snapshot and ollama concurrency recommendation.
- Returns:
ram_total_gb,ram_free_gbvram_total_gb,vram_free_gb,vram_detectorollama_recommendation.applied_max_concurrency
Use list_cli_noninteractive_policy() to inspect startup/trust prompt handling for each provider.
codex- non-interactive path:
codex exec - skip flag:
--skip-git-repo-check(recommended in config)
- non-interactive path:
gemini- non-interactive path:
gemini -p - no documented workspace-trust skip flag in current CLI help output
- if stalled, complete one-time trust/auth in interactive mode
- non-interactive path:
claude_code- non-interactive path:
claude -p - CLI help indicates workspace trust dialog is skipped in
-pmode
- non-interactive path:
Use list_prompt_execution_policy() to inspect available prompt policy presets for ask(...) / ask_batch(...).
- Recommended deterministic call pattern:
instruction_preset="strict_once"response_format="json"
strict_onceinjects a fixed policy block that enforces:- single-response execution
- no follow-up questions
- explicit
Assumptionlabeling when context is missing - strict output format compliance
Output cleanliness option:
output_mode="clean"(default): strip known CLI startup/log noise lines from provider output.output_mode="raw": keep original provider output (including startup/log lines).
Default policy without repeating args:
- Configure once in
runtime.ask_defaults:instruction_preset: "strict_once"output_mode: "clean"
- Then
ask(...)/ask_batch(...)automatically apply these when args are omitted.
Telemetry note:
model_bridge.telemetrylogs structured events to stderr.- Current fields include
request_id,routing_tier,status,error_category, andlatency_ms.
- Destructive pattern blocking (
rm -rf,mkfs,dd if=,chmod 777, fork bomb) - Sensitive system path access blocking (
/etc/,/var/,/boot/,/proc/,/root/) - Restricted save destinations (
/etc,/var,/usr,/bin,/sbin,/root)
conda run -n model-bridge-mcp_dev bash -lc 'PYTHONPATH=src pytest -q tests'Integration smoke coverage:
tests/integration/test_tool_smoke.py- verifies
ask_chatgpt_cli,ask_gemini_cli,ask_claude_code,ask_ollama, andlist_ollama_modelsentrypoint paths with minimal mocks.
- Legacy source:
archive/coder_ai_allocator_v1.0.pyarchive/coder_ai_allocator_v1.1.py
- New entrypoint:
src/model_bridge/main.py
- Existing tool signatures are preserved:
ask_chatgpt_cli(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)ask_gemini_cli(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)ask_claude_code(prompt, save_path=None, force_model=False, model=None, reasoning_effort=None)ask_ollama(prompt, save_path=None, model=\"default\")