smart-mcp-proxy · Dumbris · Dec 21, 2025 · Dec 21, 2025 · Dec 21, 2025 · Dec 21, 2025
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -129,9 +129,21 @@ The event bus enables real-time communication between runtime and UI components:
 
 **Shutdown**:
 - Graceful context cancellation cascades to all background services
-- Upstream servers disconnected with proper Docker container cleanup
+- Upstream servers disconnected with proper subprocess and Docker container cleanup
 - Resources closed in dependency order (upstream → cache → index → storage)
 
+**Subprocess Shutdown Flow**:
+1. **Graceful Close** (10s timeout): Close MCP client connection, wait for subprocess to exit cleanly
+2. **Force Kill** (9s timeout): If graceful close fails, send SIGTERM to process group, poll for exit, then SIGKILL
+
+| Timeout | Value | Purpose |
+|---------|-------|---------|
+| MCP Client Close | 10s | Wait for graceful stdin/stdout close |
+| SIGTERM → SIGKILL | 9s | Time between graceful and force kill |
+| Docker Cleanup | 30s | Container stop/kill timeout |
+
+See [Shutdown Behavior](/operations/shutdown-behavior) for detailed documentation.
+
 ## Tray Application Architecture
 
 The tray application uses a robust state machine architecture for reliable core management.

diff --git a/docs/configuration/upstream-servers.md b/docs/configuration/upstream-servers.md
@@ -95,6 +95,27 @@ For enhanced security, stdio servers can run in Docker containers:
 
 See [Docker Isolation](/features/docker-isolation) for complete documentation.
 
+## Process Lifecycle
+
+### Startup
+
+When MCPProxy starts, it:
+1. Loads server configurations from `mcp_config.json`
+2. Creates MCP clients for each enabled, non-quarantined server
+3. Connects to servers in the background (async)
+4. Indexes tools once connections are established
+
+### Shutdown
+
+When MCPProxy stops, it performs graceful shutdown of all subprocesses:
+
+1. **Graceful Close** (10s): Close MCP connection, wait for process to exit
+2. **Force Kill** (9s): If still running, SIGTERM → poll → SIGKILL
+
+**Process groups**: Child processes (spawned by npm/npx/uvx) are placed in a process group, ensuring all related processes are terminated together.
+
+See [Shutdown Behavior](/operations/shutdown-behavior) for detailed documentation.
+
 ## Quarantine System
 
 New servers added via AI clients are automatically quarantined for security review. See [Security Quarantine](/features/security-quarantine) for details.
diff --git a/docs/development/architecture.md b/docs/development/architecture.md
@@ -86,4 +86,19 @@ Disconnected → Connecting → Authenticating → Ready
                     (on error)
 ```
 
+## Subprocess Shutdown
+
+When MCPProxy stops, subprocesses are terminated using a two-phase approach:
+
+1. **Graceful Close** (10s): Close MCP connection, wait for process to exit
+2. **Force Kill** (9s): If still running, SIGTERM → poll → SIGKILL
+
+| Timeout | Value | Purpose |
+|---------|-------|---------|
+| MCP Client Close | 10s | Wait for graceful stdin/stdout close |
+| SIGTERM → SIGKILL | 9s | Time between graceful and force kill |
+| Docker Cleanup | 30s | Container stop/kill timeout |
+
+See [Shutdown Behavior](/operations/shutdown-behavior) for detailed documentation.
+
 For complete architecture details, see [docs/architecture.md](https://github.com/smart-mcp-proxy/mcpproxy-go/blob/main/docs/architecture.md) in the repository.
diff --git a/docs/features/docker-isolation.md b/docs/features/docker-isolation.md
@@ -226,6 +226,39 @@ docker stats
 - Check container logs for specific error messages
 - Verify network access for package repositories
 
+## Container Lifecycle
+
+### Startup
+
+When a Docker-isolated server starts:
+1. MCPProxy detects runtime type (npm, uvx, python, etc.)
+2. Selects appropriate Docker image
+3. Runs container with stdio transport (`docker run -i`)
+4. Establishes MCP connection via stdin/stdout
+
+### Shutdown
+
+When MCPProxy stops, containers are cleaned up with a 30-second timeout:
+
+1. **Graceful Stop**: `docker stop` (sends SIGTERM to container)
+2. **Force Kill**: `docker kill` if container doesn't stop gracefully
+
+Containers are labeled with `mcpproxy.managed=true` for identification.
+
+### Manual Cleanup
+
+If containers remain after MCPProxy stops:
+
+```bash
+# List MCPProxy-managed containers
+docker ps --filter "label=mcpproxy.managed=true"
+
+# Remove all MCPProxy containers
+docker rm -f $(docker ps -q --filter "label=mcpproxy.managed=true")
+```
+
+See [Shutdown Behavior](/operations/shutdown-behavior) for detailed subprocess lifecycle documentation.
+
 ## Security Considerations
 
 Docker isolation provides strong security boundaries but consider:

diff --git a/docs/operations/shutdown-behavior.md b/docs/operations/shutdown-behavior.md
@@ -0,0 +1,232 @@
+---
+id: shutdown-behavior
+title: Shutdown Behavior
+sidebar_label: Shutdown Behavior
+sidebar_position: 1
+description: How MCPProxy handles graceful shutdown of upstream servers and subprocesses
+keywords: [shutdown, graceful, sigterm, sigkill, process, cleanup, timeout]
+---
+
+# Shutdown Behavior
+
+This document describes how MCPProxy handles graceful shutdown of upstream servers, including subprocess termination timeouts and cleanup procedures.
+
+## Overview
+
+When MCPProxy shuts down (via Ctrl+C, SIGTERM, or tray quit), it follows a structured cleanup process:
+
+1. Cancel application context (signals all background services to stop)
+2. Stop OAuth refresh manager
+3. Stop Supervisor (reconciliation loop)
+4. Shutdown all upstream servers (graceful → force)
+5. Close caches, indexes, and storage
+
+## Subprocess Shutdown Flow
+
+For stdio-based MCP servers (processes started via `command`), MCPProxy uses a two-phase shutdown approach:
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    Graceful Close Phase                          │
+│                      (10 seconds max)                            │
+├─────────────────────────────────────────────────────────────────┤
+│  1. Close MCP client connection (stdin/stdout)                  │
+│  2. Subprocess receives EOF and should exit cleanly             │
+│  3. Wait up to 10 seconds for graceful exit                     │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼ (if timeout)
+┌─────────────────────────────────────────────────────────────────┐
+│                    Force Kill Phase                              │
+│                      (9 seconds max)                             │
+├─────────────────────────────────────────────────────────────────┤
+│  1. Send SIGTERM to entire process group                        │
+│  2. Poll every 100ms to check if process exited                 │
+│  3. After 9 seconds: send SIGKILL (force kill)                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Timeout Constants
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `mcpClientCloseTimeout` | 10s | Max time to wait for graceful MCP client close |
+| `processGracefulTimeout` | 9s | Max time after SIGTERM before SIGKILL |
+| `processTerminationPollInterval` | 100ms | How often to check if process exited |
+| `dockerCleanupTimeout` | 30s | Max time for Docker container cleanup |
+
+### Why 9 seconds for SIGTERM?
+
+The SIGTERM timeout (9s) is intentionally less than the MCP client close timeout (10s). This ensures that if graceful close times out, the force kill phase can complete within a reasonable time window.
+
+**Total worst case for stdio servers:** 10s (graceful) + 9s (force kill) = 19 seconds
+
+## Docker Container Shutdown
+
+Docker containers follow a similar pattern but use Docker's native stop mechanism:
+
+1. `docker stop` (sends SIGTERM, waits for graceful exit)
+2. If container doesn't stop: `docker kill` (sends SIGKILL)
+
+Docker cleanup has a 30-second timeout to allow for container-specific cleanup procedures.
+
+## Process Groups
+
+MCPProxy uses Unix process groups to ensure all child processes are properly cleaned up:
+
+```go
+// All child processes are placed in a new process group
+cmd.SysProcAttr = &syscall.SysProcAttr{
+    Setpgid: true,  // Create new process group
+    Pgid:    0,     // Make this process the group leader
+}
+```
+
+When shutting down, MCPProxy sends signals to the entire process group (`-pgid`), ensuring that:
+- Child processes spawned by npm/npx are terminated
+- Orphaned processes don't accumulate
+- All related processes receive the shutdown signal
+
+## What Happens During Shutdown
+
+### When `call_tool` is called during shutdown
+
+If an AI client tries to call a tool while MCPProxy is shutting down:
+
+```
+Error: "Server 'xxx' is not connected (state: Disconnected)"
+```
+
+Or if the server client was already removed:
+
+```
+Error: "No client found for server: xxx"
+```
+
+### When `retrieve_tools` is called during shutdown
+
+- If the search index is still open: Returns results (possibly stale)
+- After index is closed: Returns an error
+
+### When `tools/list_changed` notification arrives during shutdown
+
+The notification is safely ignored:
+- Callback context is cancelled
+- Discovery doesn't block shutdown
+- Logged as a warning, no user impact
+
+## Shutdown Order
+
+```
+Runtime.Close()
+    │
+    ├─► Cancel app context
+    │
+    ├─► Stop OAuth refresh manager
+    │       └─► Prevents token refresh during shutdown
+    │
+    ├─► Stop Supervisor
+    │       ├─► Cancel reconciliation context
+    │       ├─► Wait for goroutines to exit
+    │       └─► Close upstream adapter
+    │
+    ├─► ShutdownAll on upstream manager (45s total timeout)
+    │       └─► For each server (parallel):
+    │               ├─► Graceful close (10s)
+    │               └─► Force kill if needed (9s)
+    │
+    ├─► Close cache manager
+    │
+    ├─► Close index manager
+    │
+    ├─► Close storage manager
+    │
+    └─► Close config service
+```
+
+## Debugging Shutdown Issues
+
+### Check for orphaned processes
+
+```bash
+# After stopping MCPProxy, check for orphaned MCP server processes
+pgrep -f "npx.*mcp"
+pgrep -f "uvx.*mcp"
+pgrep -f "node.*server"
+
+# If found, kill them manually
+pkill -f "npx.*mcp"
+```
+
+### Enable debug logging for shutdown
+
+```bash
+mcpproxy serve --log-level=debug 2>&1 | grep -E "(Disconnect|shutdown|SIGTERM|SIGKILL|process group)"
+```
+
+### View shutdown logs
+
+Look for these log messages during shutdown:
+
+```
+INFO  Disconnecting from upstream MCP server
+DEBUG Attempting graceful MCP client close
+DEBUG MCP client closed gracefully               # Success!
+# OR
+WARN  MCP client close timed out                 # Graceful failed
+INFO  Graceful close failed, force killing process group
+DEBUG SIGTERM sent to process group
+INFO  Process group terminated gracefully        # SIGTERM worked
+# OR
+WARN  Process group still running after SIGTERM, sending SIGKILL
+INFO  SIGKILL sent to process group
+```
+
+## Troubleshooting
+
+### Server processes not terminating
+
+**Symptoms:** `npx` or `uvx` processes remain running after MCPProxy stops.
+
+**Possible causes:**
+1. Process ignoring SIGTERM (bad signal handling in MCP server)
+2. Process group not properly set up
+3. Zombie processes from previous crashes
+
+**Solutions:**
+- Check server logs: `mcpproxy upstream logs <server-name>`
+- Manually kill orphaned processes
+- Report issue if consistently reproducible
+
+### Shutdown taking too long
+
+**Symptoms:** MCPProxy takes 20+ seconds to shut down.
+
+**Possible causes:**
+1. Many servers running in parallel
+2. Servers not responding to graceful shutdown
+3. Docker containers with slow cleanup
+
+**Solutions:**
+- Check which servers are slow: enable debug logging
+- Consider disabling problematic servers before shutdown
+- Report consistently slow servers as bugs
+
+### Docker containers not cleaning up
+
+**Symptoms:** Docker containers remain running after MCPProxy stops.
+
+**Solutions:**
+```bash
+# List MCPProxy containers
+docker ps --filter "label=mcpproxy.managed=true"
+
+# Force remove all MCPProxy containers
+docker rm -f $(docker ps -q --filter "label=mcpproxy.managed=true")
+```
+
+## Related Documentation
+
+- [Architecture](/development/architecture) - Runtime and lifecycle overview
+- [Docker Isolation](/features/docker-isolation) - Container management
+- [Upstream Servers](/configuration/upstream-servers) - Server configuration