A lightweight server for hosting and managing PyTerrier pipelines — with optional AI integration for dynamic pipeline selection via MCP.
PyTerrier Server provides a simple way to deploy, expose, and manage information retrieval pipelines built with PyTerrier.
It can run standalone or connect to an MCP server, which allows AI models to automatically choose the right pipeline for a given task.
An example on how the system works is the following:
Clone the repository and install dependencies:
git clone https://github.com/<your-username>/pyterrier-server.git
cd pyterrier-server
pip install -r requirements.txt
pip install -e .Create a .env file based on the provided .env.example template:
cp .env.example .envThen edit .env to include your configuration values.
Some versions of PyTerrier may not include the column_info attribute within pyterrier.model.
To ensure you’re using a compatible version, you can run the following script before starting the server:
#!/usr/bin/env bash
echo "🔍 Checking if pyterrier.model.column_info exists..."
python - <<'PYCODE'
import importlib, subprocess, sys
try:
from pyterrier import model as pt_model
_ = pt_model.column_info
print("✅ pyterrier.model.column_info exists, skipping repo clone.")
except (ImportError, AttributeError):
print("⚠️ pyterrier.model.column_info missing — cloning replacement repo.")
subprocess.run(["git", "clone", "https://github.com/terrier-org/pyterrier.git", "pyterrier_src"], check=True)
subprocess.run([sys.executable, "-m", "pip", "install", "--force-reinstall", "./pyterrier_src"], check=True)
PYCODE💡 Tip: You can save this as a script (e.g.,
check_pyterrier.sh) and run it before deployment or server startup.
You can define one or more PyTerrier pipelines that the server will serve.
Set the PYTERRIER_SERVER_PIPELINE environment variable to:
- A single pipeline definition, or
- A YAML file containing multiple pipelines.
Each pipeline in the YAML file must follow this structure:
functions:
- name: <pipeline name>
task: <purpose>
description: <detailed description>
pipeline: |
# Python code defining the pipeline
# Must assign the final pipeline to a variable named 'p'functions:
- name: MSMARCO-search
task: search
description: Use this function to retrieve relevant documents from the MSMARCO passage dataset using a BM25 index.
pipeline: |
import pyterrier_pisa, pyterrier as pt
dataset = pt.get_dataset('irds:msmarco-passage')
index = pyterrier_pisa.PisaIndex.from_hf('macavaney/msmarco-passage.pisa').bm25()
p = index % 10 >> dataset.text_loader()PyTerrier Server consists of two components:
- MCP Server (optional) — for AI-assisted pipeline selection.
- Main Server — the core service that executes your PyTerrier pipelines.
If you want to enable AI-based pipeline selection, start the MCP server:
export PYTERRIER_MCP=true # Optional: helps separate logs between servers
pyterrier-mcpor
export PYTERRIER_MCP=true # Optional: helps separate logs between servers
python -m pyterrier_server._mcp_serverThis runs the MCP server locally.
To make it accessible to external AI models (e.g., OpenAI), you must expose it publicly. For local development, use ngrok:
ngrok http 8000💡 Tip: Any method that exposes your localhost to the internet will work (e.g.,
localtunnel, cloud hosting).
In another terminal window:
pyterrier-serveror
python -m pyterrier_server._serverIf the main server can’t reach the MCP server, it will automatically hide the AI-assisted features.
| Version | Date | Changes |
|---|---|---|
| 0.1 | 2025-10-16 | Initial release |
This project is licensed under the MIT License — see the LICENSE.md file for details.
