fix: defer heavy imports in MCP server for first-connect reliability#380
fix: defer heavy imports in MCP server for first-connect reliability#380carlos-alm merged 5 commits intomainfrom
Conversation
…liability The MCP server was importing queries.js and better-sqlite3 before connecting the stdio transport, causing the client's `initialize` request to arrive before the server was listening on stdin. Now the transport connects immediately after the MCP SDK loads, and heavy modules are lazy-loaded on the first tool call (cached thereafter). Impact: 3 functions changed, 2 affected
|
Claude finished @carlos-alm's task —— View job PR Review: MCP Server First-Connect Reliability FixTodo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Race condition where MCP server's Why This Approach: The solution is justified and addresses the actual root cause. By connecting the stdio transport immediately after MCP SDK loads, the server can receive the Risk Assessment: LOW - The change preserves all functionality while improving reliability through proper async loading patterns. Phase 2: Configuration Change Detection ✅NO configuration changes detected - no linting rules modified, no test thresholds lowered, no build configuration changes. Phase 3: Test Integrity ✅NO test integrity issues - no test files modified, no assertions weakened, no test coverage compromised. Phase 4: Code Quality Assessment ✅Implementation Quality: The lazy loading pattern is well-implemented:
Potential Issues Identified:
Performance Impact: POSITIVE - eliminates ~100-500ms cold-start delay by deferring heavy SQLite native binding load. Security: No security concerns - the changes don't introduce new attack vectors. Maintainability: GOOD - the lazy loading pattern is clean and follows common Node.js practices. Final AssessmentMinor suggestion: Consider adding a comment in the code explaining the lazy loading rationale for future maintainers, but this is not blocking. |
Greptile SummaryThis PR successfully fixes the first-connect reliability problem in the MCP server by deferring Confidence Score: 5/5
Sequence DiagramsequenceDiagram
participant Client as MCP Client
participant Server as MCP Server (Node.js)
participant SQLite as better-sqlite3 (lazy via getDatabase)
Note over Server: startMCPServer() called
Note over Server: _Database var declared
Note over Server: server.setRequestHandler() registered (sync)
Server->>Client: server.connect(transport) — transport ready NOW
Client->>Server: initialize request
Server->>Client: initialize response (immediate)
Client->>Server: tool call (e.g. query)
Note over Server: First tool call enters handler
Server->>SQLite: require('better-sqlite3') → loads native addon
SQLite-->>Server: Database constructor cached in _Database
Server->>Server: execute tool logic
Server-->>Client: tool result
Client->>Server: subsequent tool calls
Note over Server: _Database already cached → no reload
Server-->>Client: tool result (fast path)
Last reviewed commit: 9250341 |
src/mcp.js
Outdated
| const { | ||
| impactAnalysisData, | ||
| moduleMapData, | ||
| fileDepsData, | ||
| exportsData, | ||
| fnDepsData, | ||
| fnImpactData, | ||
| pathData, | ||
| contextData, | ||
| childrenData, | ||
| explainData, | ||
| whereData, | ||
| diffImpactData, | ||
| listFunctionsData, | ||
| rolesData, | ||
| } = await getQueries(); | ||
| const Database = getDatabase(); |
There was a problem hiding this comment.
Lazy loads are outside the try-catch error boundary
getQueries() and getDatabase() are invoked at the top of the handler — before the try block on line 890. If import('./queries.js') fails (e.g. a missing dependency or a syntax error in that file) or if require('better-sqlite3') throws (native addon not found, platform mismatch — a common scenario), the error propagates as an unhandled promise rejection and bypasses the handler's own catch clause that returns a well-formed { content, isError: true } MCP response.
In the original code this was a startup failure (loud, obvious). With the new design it silently kills the first tool call and may crash the process, which is a harder regression to debug.
Move these two calls inside the try block:
| const { | |
| impactAnalysisData, | |
| moduleMapData, | |
| fileDepsData, | |
| exportsData, | |
| fnDepsData, | |
| fnImpactData, | |
| pathData, | |
| contextData, | |
| childrenData, | |
| explainData, | |
| whereData, | |
| diffImpactData, | |
| listFunctionsData, | |
| rolesData, | |
| } = await getQueries(); | |
| const Database = getDatabase(); | |
| try { | |
| const { | |
| impactAnalysisData, | |
| moduleMapData, | |
| fileDepsData, | |
| exportsData, | |
| fnDepsData, | |
| fnImpactData, | |
| pathData, | |
| contextData, | |
| childrenData, | |
| explainData, | |
| whereData, | |
| diffImpactData, | |
| listFunctionsData, | |
| rolesData, | |
| } = await getQueries(); | |
| const Database = getDatabase(); | |
| if (!multiRepo && args.repo) { |
There was a problem hiding this comment.
Fixed in 9250341 — moved await getQueries() and getDatabase() inside the try block. Load failures now return a well-formed MCP error response instead of surfacing as unhandled promise rejections.
src/mcp.js
Outdated
| const { | ||
| impactAnalysisData, | ||
| moduleMapData, | ||
| fileDepsData, | ||
| exportsData, | ||
| fnDepsData, | ||
| fnImpactData, | ||
| pathData, | ||
| contextData, | ||
| childrenData, | ||
| explainData, | ||
| whereData, | ||
| diffImpactData, | ||
| listFunctionsData, | ||
| rolesData, | ||
| } = await getQueries(); | ||
| const Database = getDatabase(); |
There was a problem hiding this comment.
Queries and Database loaded eagerly on every tool call
getQueries() and getDatabase() are now called unconditionally at the top of every CallToolRequestSchema invocation, regardless of whether the active tool needs them. Tools like semantic_search, structure, co_changes, sequence, complexity, communities, code_owners, branch_compare, cfg, dataflow, ast_query, and list_repos all rely exclusively on their own case-scoped await import(...) calls and never touch the destructured query functions or the top-level Database reference.
This means queries.js and better-sqlite3 will both be forced to load on the very first tool call — even if that call is, say, semantic_search. The original intent of deferring heavy imports until they are actually needed is partially defeated for those code paths.
Consider keeping the lazy-load calls inside only the switch cases that require them (or at minimum document why eagerly loading on first call is acceptable trade-off).
There was a problem hiding this comment.
Addressed in 9250341 — moved the calls inside the try-catch. Since both helpers cache after first load, the per-call overhead after the first invocation is just a variable check. Keeping them at handler level (inside try) is the right tradeoff: proper error handling without splitting them across dozens of switch cases.
|
Claude finished @carlos-alm's task —— View job PR Review: MCP Server First-Connect Reliability FixTodo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Race condition where MCP server's Why This Approach: The solution is justified and addresses the actual root cause. By connecting the stdio transport immediately after MCP SDK loads, the server can receive the Risk Assessment: LOW - The change preserves all functionality while improving reliability through proper async loading patterns. Phase 2: Configuration Change Detection ✅NO configuration changes detected - no linting rules modified, no test thresholds lowered, no build configuration changes. Phase 3: Test Integrity ✅NO test integrity issues - no test files modified, no assertions weakened, no test coverage compromised. Phase 4: Code Quality Assessment ✅Implementation Quality: The lazy loading pattern is well-implemented:
Potential Issues Identified:
Performance Impact: POSITIVE - eliminates ~100-500ms cold-start delay by deferring heavy SQLite native binding load. Security: No security concerns - the changes don't introduce new attack vectors. Maintainability: GOOD - the lazy loading pattern is clean and follows common Node.js practices. Final AssessmentMinor suggestion: Consider adding a comment in the code explaining the lazy loading rationale for future maintainers, but this is not blocking. |
Impact: 1 functions changed, 0 affected
|
Claude finished @carlos-alm's task —— View job PR Review: MCP Server First-Connect Reliability FixTodo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Race condition where MCP server's Why This Approach: The solution is justified and addresses the actual root cause. By connecting the stdio transport immediately after MCP SDK loads, the server can receive the Risk Assessment: LOW - The change preserves all functionality while improving reliability through proper async loading patterns. Phase 2: Configuration Change Detection ✅NO configuration changes detected - no linting rules modified, no test thresholds lowered, no build configuration changes. Phase 3: Test Integrity ✅NO test integrity issues - no test files modified, no assertions weakened, no test coverage compromised. Phase 4: Code Quality Assessment ✅Implementation Quality: The lazy loading pattern is well-implemented:
Potential Issues Identified:
Performance Impact: POSITIVE - eliminates ~100-500ms cold-start delay by deferring heavy SQLite native binding load. Security: No security concerns - the changes don't introduce new attack vectors. Maintainability: GOOD - the lazy loading pattern is clean and follows common Node.js practices. Final AssessmentMinor suggestion: Consider adding a comment in the code explaining the lazy loading rationale for future maintainers, but this is not blocking. |
|
Claude finished @carlos-alm's task —— View job PR Review: MCP Server First-Connect Reliability FixTodo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Race condition where MCP server's Why This Approach: The solution is justified and addresses the actual root cause. By connecting the stdio transport immediately after MCP SDK loads, the server can receive the Risk Assessment: LOW - The change preserves all functionality while improving reliability through proper async loading patterns. Phase 2: Configuration Change Detection ✅NO configuration changes detected - no linting rules modified, no test thresholds lowered, no build configuration changes. Phase 3: Test Integrity ✅NO test integrity issues - no test files modified, no assertions weakened, no test coverage compromised. Phase 4: Code Quality Assessment ✅Implementation Quality: The lazy loading pattern is well-implemented:
Potential Issues Identified:
Performance Impact: POSITIVE - eliminates ~100-500ms cold-start delay by deferring heavy SQLite native binding load. Security: No security concerns - the changes don't introduce new attack vectors. Maintainability: GOOD - the lazy loading pattern is clean and follows common Node.js practices. Final AssessmentMinor suggestion: Consider adding a comment in the code explaining the lazy loading rationale for future maintainers, but this is not blocking. |
|
Claude finished @carlos-alm's task —— View job PR Review: MCP Server First-Connect Reliability FixTodo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Race condition where MCP server's Why This Approach: The solution is justified and addresses the actual root cause. By connecting the stdio transport immediately after MCP SDK loads, the server can receive the Risk Assessment: LOW - The change preserves all functionality while improving reliability through proper async loading patterns. Phase 2: Configuration Change Detection ✅NO configuration changes detected - no linting rules modified, no test thresholds lowered, no build configuration changes. Phase 3: Test Integrity ✅NO test integrity issues - no test files modified, no assertions weakened, no test coverage compromised. Phase 4: Code Quality Assessment ✅Implementation Quality: The lazy loading pattern is well-implemented:
Potential Issues Identified:
Performance Impact: POSITIVE - eliminates ~100-500ms cold-start delay by deferring heavy SQLite native binding load. Security: No security concerns - the changes don't introduce new attack vectors. Maintainability: GOOD - the lazy loading pattern is clean and follows common Node.js practices. Final AssessmentMinor suggestion: Consider adding a comment in the code explaining the lazy loading rationale for future maintainers, but this is not blocking. |
|
Re: the minor suggestion about adding a comment explaining lazy loading rationale — this is already covered in the code at lines 840-842: No additional changes needed. |
Summary
startMCPServerwasawait import()-ingqueries.jsandrequire()-ingbetter-sqlite3before connecting the stdio transport. On cold start these imports are slow, so the client'sinitializerequest arrives before the server is listening — the message is lost and connection fails.getQueries()/getDatabase()) that run on first tool call and are cached thereafter. The transport now connects immediately after the MCP SDK loads, so the server is ready to receiveinitializeright away.Test plan
initializeJSON-RPC request gets immediate response/mcpconnects on first try in Claude Code