-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fix(mcp): properly terminate stdio processes when refreshing MCP servers #6887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Enhanced deleteConnection to properly terminate stdio processes (Docker containers) - Added process termination logic with SIGTERM followed by SIGKILL if needed - Added delays after cleanup to ensure processes are fully terminated - Fixes issue where refreshing MCP servers was creating duplicate instances This ensures that Docker-based MCP servers like Tavily are properly cleaned up when using the refresh button, preventing multiple instances from running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this code 5 minutes ago and already found 6 ways it could break.
| proc.kill("SIGTERM") | ||
|
|
||
| // Give it a moment to terminate gracefully | ||
| await new Promise((resolve) => setTimeout(resolve, 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition risk here. The 100ms delay between SIGTERM and SIGKILL might not be sufficient for all processes, especially Docker containers. Consider:
| await new Promise((resolve) => setTimeout(resolve, 100)) | |
| // Try to gracefully terminate the process first | |
| proc.kill("SIGTERM") | |
| // Give it more time to terminate gracefully (especially for Docker) | |
| await new Promise((resolve) => setTimeout(resolve, 500)) | |
| // If still not killed, force kill it | |
| if (!proc.killed) { | |
| proc.kill("SIGKILL") | |
| } |
Alternatively, could we listen to the process 'exit' event for a more deterministic approach?
| const proc = (connection.transport as any).proc | ||
| if (proc && !proc.killed) { | ||
| // Try to gracefully terminate the process first | ||
| proc.kill("SIGTERM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for process termination. The proc.kill() calls could throw if the process has already exited. Should we wrap these in try-catch blocks?
| proc.kill("SIGTERM") | |
| // Try to gracefully terminate the process first | |
| try { | |
| proc.kill("SIGTERM") | |
| } catch (error) { | |
| // Process might have already exited | |
| console.debug(`Process already terminated: ${error}`) | |
| } |
| await connection.client.close() | ||
|
|
||
| // For stdio transports, we need to ensure the process is terminated | ||
| if (connection.transport && "proc" in connection.transport) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type safety concern: Using "proc" in connection.transport and casting to any bypasses TypeScript's type system. Could we add a proper type guard or extend the StdioClientTransport interface to include the proc property?
| await this.deleteConnection(serverName, connection.server.source) | ||
|
|
||
| // Add a small delay to ensure the process is fully terminated | ||
| await delay(200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These hardcoded delays (200ms here, 300ms in refreshAllConnections) seem arbitrary. Different systems and Docker containers might need different delays. Should we make these configurable or use a more deterministic approach like waiting for process exit events?
| const proc = (connection.transport as any).proc | ||
| if (proc && !proc.killed) { | ||
| // Try to gracefully terminate the process first | ||
| proc.kill("SIGTERM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross-platform concern: SIGTERM might not work properly on Windows. Should we use platform-specific termination methods or consider using a library like tree-kill for cross-platform process termination?
| try { | ||
| if (connection.type === "connected") { | ||
| await connection.transport.close() | ||
| // First close the client to stop any ongoing operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This critical process termination logic should have unit tests. Consider adding tests to verify:
- Processes are properly terminated on refresh
- Docker containers are cleaned up
- Error handling works when processes are already dead
- The delays are sufficient for cleanup
|
Closing this PR as it doesn't actually fix the root cause of duplicate MCP server instances. While the approach of forcefully terminating processes is more aggressive than PR #6885, it still relies on arbitrary delays and hacky access to internal transport properties. The real issue appears to be that the MCP SDK's transport layer doesn't properly clean up stdio processes, especially for Docker containers. This needs to be addressed at the SDK level or with a more comprehensive solution that properly tracks and manages process lifecycles rather than adding band-aid fixes with hardcoded delays. |
Problem
When hitting the "refresh MCP servers" button or the refresh button on individual MCP server lines, new instances of the servers were being started without properly closing the old ones. This was particularly noticeable with Docker-based MCP servers like Tavily, but not with others like the GitHub MCP server.
Root Cause
The
deleteConnectionmethod was not properly terminating stdio processes (child processes), especially for Docker containers. Whentransport.close()was called, it didn't ensure the underlying process was actually terminated.Solution
Enhanced the cleanup logic in three key areas:
deleteConnectionmethod: Added proper process termination logic that:restartConnectionmethod: Added a delay after cleanup to ensure processes are fully terminated before reconnectingrefreshAllConnectionsmethod: Added a delay after clearing all connections to ensure all processes are terminated before reinitializingChanges Made
src/services/mcp/McpHub.tsto add proper process termination logicTesting
npx vitest run services/mcp)Related Issues
This might be related to recent changes in PR #6878, but the issue appears to be a long-standing problem with process cleanup rather than a recent regression.
Fixes the issue where refreshing MCP servers was creating duplicate instances, particularly for Docker-based servers.
Important
Fixes MCP server refresh issue by ensuring proper termination of old server instances, particularly for Docker-based servers, in
McpHub.ts.deleteConnectioninMcpHub.tsnow ensures stdio processes are terminated with SIGTERM, then SIGKILL if needed.restartConnectionandrefreshAllConnectionsmethods inMcpHub.tsnow include delays to ensure processes are fully terminated before reconnecting.npx vitest run services/mcp).This description was created by
for 4d5b0d8. You can customize this summary. It will automatically update as commits are pushed.