π§ Stop stuffing your LLM context windows. ToolFinder is a zero-hallucination routing middleware that dynamically connects Local SLMs to massive Model Context Protocol (MCP) ecosystems without OOM crashes.
It separates tool selection from tool execution, so your model only sees the few schemas it actually needs. The result is lower latency, tighter prompts, and a much safer path from local-model experimentation to production-grade MCP orchestration.
Blindly binding 50 MCP tools to a small local model like llama3.2 is a systems mistake.
- The prompt fills with irrelevant schemas before reasoning even begins.
- The model spends tokens comparing tools instead of using them.
- Similar APIs start colliding in-context, which increases tool-selection errors and malformed calls.
- Even when the right tool is chosen, smaller models often emit partially invalid JSON under long prompt pressure.
This is the classic "lost in the middle" failure mode applied to MCP orchestration: the right tool may exist in context, but the model has to wade through too much irrelevant structure to use it reliably.
ToolFinder turns MCP tool usage into a retrieval problem first.
- A contrastive bi-encoder built on
sentence-transformers/all-mpnet-base-v2embeds queries and MCP schemas into the same vector space. - A FAISS-backed similarity index retrieves only the top-k candidate tools for the userβs query.
- The model then reasons over a tiny, relevant tool surface instead of the entire ecosystem.
The rigorous data science, datasets, and semester-project evaluation pipeline live in academic_research. That folder contains the training corpora, notebooks, and zero-shot evaluation stack behind the retrieval layer.
Technical note: The current runtime implementation uses FAISS
IndexFlatIPfor exact dense retrieval with very low observed latency. The architectural scaling story generalizes cleanly to ANN indexes when larger tool graphs justify sublinear search.
The repository contains two proof surfaces:
- A focused LangGraph A/B benchmark showing the first-turn efficiency win of semantic routing.
- A self-bootstrapping multi-task evaluator that continuously refreshes the benchmark table below.
- ~95% prompt payload reduction in the focused LangGraph proof (
9110chars to485chars) - ~84% latency reduction in the same first-turn comparison (
85.52sto13.71s) - Sub-20ms to low-double-digit-ms routing in the LangGraph routing path
The block below is updated only when you run python examples/eval_toolfinder.py --update-readme. Preserve the markers so the automated suite can continue injecting the latest metrics.
Last auto-updated: 2026-03-16 17:01:57
| Metric | Naive Baseline | ToolFinder Enabled |
|---|---|---|
| Tasks Run | 3 | 3 |
| Average Tools In Context | 14 | 2 |
| Average Context Payload (Chars) | 9106 | 1450 |
| Average Total Latency (s) | 57.51 | 14.47 |
| Average Inference Latency (s) | 57.47 | 14.39 |
| Successful Tool Calls | 3/3 | 3/3 |
| Expected Tool Matches | 3/3 | 3/3 |
| State Verified | 3/3 | 3/3 |
Task outcomes:
- T1_READ: naive=
read_text_fileverified=True, toolfinder=read_text_fileverified=True - T2_WRITE: naive=
write_fileverified=True, toolfinder=write_fileverified=True - T3_LIST: naive=
list_directoryverified=True, toolfinder=list_directoryverified=True
What changed between the headline proof and the live table?
The headline LangGraph benchmark measures a narrower, first-turn filesystem task and highlights the raw routing advantage. The auto-updating evaluator is broader: it runs multiple tasks, performs correctness checks, bootstraps and tears down the sandbox, and averages end-to-end inference time across the suite. The exact percentages shift, but the systems conclusion remains the same: routing a tiny tool subset is materially cheaper and safer than context stuffing.
ToolFinder hardens both selection and execution.
- Semantic routing narrows the prompt to the top-k MCP tools before inference.
- Strict schema enforcement injects
additionalProperties: falseinto object schemas to reject speculative keys. - AST recovery parsing salvages Python-style dicts and malformed local-model outputs when strict JSON fails.
- ReAct execution loops let the agent observe failures, retry, and continue rather than crash on the first malformed response.
- Idempotency guards and bounded scratchpads prevent repeated actions and runaway context growth.
CRITICAL: Install PyTorch with your required hardware acceleration (e.g., CUDA) before installing this package to avoid defaulting to slow CPU inference. Example: pip install torch --index-url https://download.pytorch.org/whl/cu121
Then install ToolFinder:
# 1. Install PyTorch with your specific hardware acceleration (e.g., CUDA)
pip install torch --index-url https://download.pytorch.org/whl/cu121
# 2. Install ToolFinder
pip install toolfinder-mcpMinimal integration with LangChain or LangGraph:
from toolfinder.dynamic_faiss_router import UniversalMCPRouter
from langchain_ollama import ChatOllama
# Initialize and ingest tools
router = UniversalMCPRouter()
for tool in mcp_server_tools:
router.add_tool(tool)
router.build_index()
# Route and bind
llm = ChatOllama(model="llama3.2")
top_tools = router.route_top_k("Write a summary to output.txt", k=2)
# ToolFinder
ToolFinder is a semantic routing and guarded execution layer for MCP tool ecosystems. It narrows a natural-language intent to a small set
of relevant tools (via semantic retrieval) and executes selected calls with schema validation, policy enforcement, and telemetry.
## Short status (accurate and agnostic)
- The code provides a local HTTP runtime that can be started from the repository root.
- The project includes a pair of workspace-level validation scripts (`start_server.py` and `run_e2e_client.py`) that exercise the end-to-end API.
- This repository implements a runtime and examples; it does not package an opinionated multi-node distributed deployment.
If you need a production, multi-node deployment you will need to add an operational layer (service discovery, secrets, auth, load balancing, etc.).
## What is in this repo
- `toolfinder/` β core semantic router and FAISS-backed retrieval utilities.
- `Enterprise/` β hybrid runtime, API, policy, executor, planner, telemetry, and OpenClaw bridge.
- `examples/` β scripted demos and benchmarks.
- `academic_research/` β notebooks and datasets used for evaluation.
- Root-level validation helpers: `start_server.py` and `run_e2e_client.py`.
## Installation (minimal)
Requires Python 3.10+. From the repository root:
```bash
python -m pip install -e .Optional extras (examples and dev tools):
python -m pip install -e '.[dev]'
python -m pip install -e '.[langgraph]'
python -m pip install -e '.[enterprise]'From a shell opened at the repository root:
- Start the local runtime:
python start_server.py- In a second shell (same working directory) run the client:
python run_e2e_client.pyThe client sends a JSON request with a single required field, intent, to POST /execute and prints the server response for inspection.
Notes:
- Run the commands from your repository root so relative imports and workspace checks work correctly.
- The validation scripts are convenience helpers; they are not intended to be production entry points.
The API is implemented by a factory function in Enterprise/runtime/api.py and should be launched via the factory rather than assuming a module-level app object. See create_app(...) in Enterprise/runtime/api.py and the request model ExecuteIntentRequest in Enterprise/runtime/api.py.
Endpoint
POST /executeRequest JSON (required):
{ "intent": "<user natural language intent>" }Successful response (abridged):
{
"session_id": "api-...",
"execution_output": {
"status": "complete|failed|partial",
"answer": "...",
"tool_calls": [ ... ],
"telemetry": { ... }
}
}If the request fails schema validation or policy enforcement the server will return an error payload (HTTP 4xx/5xx with an error field).
HybridToolRegistryβ catalog + retrieval (FAISS/keyword fallback).PolicyEngineβ path and argument guardrails.HybridToolExecutorβ executes validated tool calls against MCP clients (or mocks).OpenClawHybridPipelineβ end-to-end orchestration: routing β agent β tool execution β fallback β telemetry.
These are designed to be composable; the repo contains examples that wire them together for local validation.
This repository provides the runtime components and examples. A production distributed deployment requires additional operational components (deployment manifests, service discovery, secrets, authentication, logging/monitoring, scaling strategies) which are outside the scope of this codebase.
See examples/ and Enterprise/examples/ for many runnable demos. Typical commands (from repo root):
python examples/eval_toolfinder.py
python Enterprise/examples/run_e2e_hybrid.py --max-cycles 1ARCHITECTURE_REPORT.mdSYSTEM_REALITY_REPORT.mdENTERPRISE_SYSTEM_REPORT.md