Problem
analyze_server currently returns findings with next_tools — a list of MCP tools the AI client should call for deeper investigation. This is backwards. The engine has access to all the same data. It should do the drill-down itself and return a complete analysis.
Example: the engine detects a CPU spike and returns:
```json
{
"story_path": "CPU_SPIKE -> SOS_SCHEDULER_YIELD",
"next_tools": ["get_cpu_utilization", "get_top_queries_by_cpu"]
}
```
The AI client then has to make 2 more tool calls, interpret the results, and correlate them. The engine already knows WHERE to look — it should look there and include the results.
Proposed Behavior
analyze_server should:
- Run the inference engine (collect → score → traverse) as today
- For each finding, follow its own
next_tools recommendations internally
- Include the drill-down results in the finding output — e.g., the top queries during a CPU spike, the deadlock graph for a deadlock finding, the blocking chain details for lock contention
- Return a self-contained analysis that an AI client can narrate directly
Benefits
- Single tool call produces a complete diagnostic picture
- AI client doesn't need to understand which follow-up tools to call
- Reduces round-trips from ~5-10 tool calls to 1
- The engine's expertise is in the drill-down logic, not just the symptom detection
Constraints
- Keep
next_tools in output for clients that want to do their own investigation
- Response size — deep drill-down data could be large; may need a
detail_level parameter (summary vs full)
- Individual tools (
get_deadlocks, etc.) remain available for manual investigation
Relates to
- Lite MCP coverage gaps (#TBD) — the engine can only drill down into data that has tool/query support
- The
query_snapshots gap is the most acute example: CPU spike detected but no way to find the causal query
Problem
analyze_servercurrently returns findings withnext_tools— a list of MCP tools the AI client should call for deeper investigation. This is backwards. The engine has access to all the same data. It should do the drill-down itself and return a complete analysis.Example: the engine detects a CPU spike and returns:
```json
{
"story_path": "CPU_SPIKE -> SOS_SCHEDULER_YIELD",
"next_tools": ["get_cpu_utilization", "get_top_queries_by_cpu"]
}
```
The AI client then has to make 2 more tool calls, interpret the results, and correlate them. The engine already knows WHERE to look — it should look there and include the results.
Proposed Behavior
analyze_servershould:next_toolsrecommendations internallyBenefits
Constraints
next_toolsin output for clients that want to do their own investigationdetail_levelparameter (summary vs full)get_deadlocks, etc.) remain available for manual investigationRelates to
query_snapshotsgap is the most acute example: CPU spike detected but no way to find the causal query