Problem
src/agentevals/mcp_server.py exposes 5 MCP tools for Claude Code integration — a key part of the product's developer story — but there is zero test coverage for the MCP server:
- No unit tests for any of the 5 tool handlers
-
- No integration tests verifying the MCP server starts and responds correctly
-
-
- No tests for error cases (invalid session IDs, missing eval sets, etc.)
The MCP server is explicitly called out in the README as a primary interface. Shipping it untested is a reliability risk.
Tools That Need Test Coverage
The 5 MCP tools exposed (from mcp_server.py):
list_sessions — list available sessions
-
get_session — retrieve session detail
-
run_evaluation — trigger evaluation against an eval set
-
list_eval_sets — list configured eval sets
-
get_evaluation_result — retrieve evaluation results
Suggested Test Approach
Create tests/test_mcp_server.py with:
# Unit tests using mock TraceManager
async def test_list_sessions_empty():
...
async def test_run_evaluation_success():
...
async def test_run_evaluation_invalid_session():
...
Use pytest-asyncio (already a dev dependency) and mock the TraceManager and evaluator pipeline.
Problem
src/agentevals/mcp_server.pyexposes 5 MCP tools for Claude Code integration — a key part of the product's developer story — but there is zero test coverage for the MCP server:The MCP server is explicitly called out in the README as a primary interface. Shipping it untested is a reliability risk.
Tools That Need Test Coverage
The 5 MCP tools exposed (from
mcp_server.py):list_sessions— list available sessionsget_session— retrieve session detailrun_evaluation— trigger evaluation against an eval setlist_eval_sets— list configured eval setsget_evaluation_result— retrieve evaluation resultsSuggested Test Approach
Create
tests/test_mcp_server.pywith:Use
pytest-asyncio(already a dev dependency) and mock theTraceManagerand evaluator pipeline.