diff --git a/README.md b/README.md index 9f22b67a..16e0b773 100644 --- a/README.md +++ b/README.md @@ -4,33 +4,7 @@ A gateway for Model Context Protocol (MCP) servers. This gateway is used with [GitHub Agentic Workflows](https://github.com/github/gh-aw) via the `sandbox.mcp` configuration to provide MCP server access to AI agents running in sandboxed environments. -📖 **[Full Configuration Specification](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/mcp-gateway.md)** - Complete reference for all configuration options and validation rules. - -## Features - -- **Configuration Modes**: Supports both TOML files and JSON stdin configuration - - **Spec-Compliant Validation**: Fail-fast validation with detailed error messages - - **Variable Expansion**: Environment variable substitution with `${VAR_NAME}` syntax - - **Type Normalization**: Automatic conversion of legacy `"local"` type to `"stdio"` -- **Schema Normalization**: Automatic fixing of malformed JSON schemas from backend MCP servers - - Adds missing `properties` field to object schemas - - Prevents downstream validation errors - - Transparent to both backends and clients -- **Routing Modes**: - - **Routed**: Each backend server accessible at `/mcp/{serverID}` - - **Unified**: Single endpoint `/mcp` that routes to configured servers -- **Docker Support**: Launch backend MCP servers as Docker containers -- **Stdio Transport**: JSON-RPC 2.0 over stdin/stdout for MCP communication -- **HTTP Transport**: Full support for HTTP-based MCP backends with session state preserved across requests -- **Container Detection**: Automatic detection of containerized environments with security warnings -- **Enhanced Debugging**: Detailed error context and troubleshooting suggestions for command failures -- **Per-ServerID Logs**: Separate log files for each backend MCP server (`{serverID}.log`) for easier troubleshooting - -## Getting Started - -For detailed setup instructions, building from source, and local development, see [CONTRIBUTING.md](CONTRIBUTING.md). - -### Quick Start with Docker +## Quick Start 1. **Pull the Docker image** (when available): ```bash @@ -67,266 +41,124 @@ For detailed setup instructions, building from source, and local development, se ghcr.io/github/gh-aw-mcpg:latest < config.json ``` +The gateway starts in routed mode on `http://0.0.0.0:8000`, proxying MCP requests to your configured backend servers. + **Required flags:** - `-i`: Enables stdin for passing JSON configuration -- `-e MCP_GATEWAY_*`: Required environment variables - `-v /var/run/docker.sock`: Required for spawning backend MCP servers -- `-v /path/to/logs:/tmp/gh-aw/mcp-logs`: Mount for persistent gateway logs (or use `-e MCP_GATEWAY_LOG_DIR=/custom/path` with matching volume mount) - - `mcp-gateway.log`: Unified log with all messages - - `{serverID}.log`: Per-server logs for easier troubleshooting - - `gateway.md`: Markdown-formatted logs for GitHub workflow previews - - `rpc-messages.jsonl`: Machine-readable RPC message logs - - `tools.json`: Available tools from all backend MCP servers - `-p 8000:8000`: Port mapping must match `MCP_GATEWAY_PORT` -MCPG will start in routed mode on `http://0.0.0.0:8000` (using `MCP_GATEWAY_PORT`), proxying MCP requests to your configured backend servers. - -## Configuration +## Guard Policies -MCP Gateway supports two configuration formats: -1. **TOML format** - Use with `--config` flag for file-based configuration -2. **JSON stdin format** - Use with `--config-stdin` flag for dynamic configuration +Guard policies enforce **Data Information Flow Control (DIFC)** at the gateway level, restricting what data agents can access and where they can write. Each server can have either an `allow-only` or a `write-sink` policy. -### TOML Format (`config.toml`) +### allow-only (source servers) -TOML configuration requires `command = "docker"` for stdio-based MCP servers to ensure containerization: +Restricts which repositories a guard allows and at what integrity level: -```toml -[servers] - -[servers.github] -command = "docker" -args = ["run", "--rm", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "-i", "ghcr.io/github/github-mcp-server:latest"] +```json +"github": { + "type": "stdio", + "container": "ghcr.io/github/github-mcp-server:latest", + "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "" }, + "guard-policies": { + "allow-only": { + "repos": ["github/gh-aw-mcpg", "github/gh-aw"], + "min-integrity": "unapproved" + } + } +} ``` -**Important**: Per [MCP Gateway Specification Section 3.2.1](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/mcp-gateway.md#321-containerization-requirement), all stdio-based MCP servers MUST be containerized. The gateway enforces this requirement by rejecting configurations where `command` is not `"docker"`. +**`repos`** — Repository access scope: +- `"all"` — All repositories accessible by the token +- `"public"` — Public repositories only +- `["owner/repo"]` — Exact match +- `["owner/*"]` — All repos under owner +- `["owner/prefix*"]` — Repos matching prefix -**Why containerization is required:** -- Provides necessary process isolation and security boundaries -- Enables reproducible environments across different deployment contexts -- Container images provide versioning and dependency management -- Ensures portability and consistent behavior +**`min-integrity`** — Minimum integrity level based on `author_association`: +- `"none"` — All objects (FIRST_TIME_CONTRIBUTOR, FIRST_TIMER, NONE) +- `"unapproved"` — Contributors (CONTRIBUTOR, FIRST_TIME_CONTRIBUTOR) +- `"approved"` — Members (OWNER, MEMBER, COLLABORATOR) +- `"merged"` — Objects reachable from main branch -For HTTP-based MCP servers, use the `url` field instead of `command`: +### write-sink (output servers) -```toml -[servers.myhttp] -type = "http" -url = "https://example.com/mcp" -``` - -### JSON Stdin Format - -For the complete JSON configuration specification with all validation rules, see the **[MCP Gateway Configuration Reference](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/mcp-gateway.md)**. +**Required for ALL output servers** when DIFC guards are enabled. Marks a server as a write-only channel that accepts writes from agents with matching secrecy labels: ```json -{ - "mcpServers": { - "github": { - "type": "stdio", - "container": "ghcr.io/github/github-mcp-server:latest", - "entrypoint": "/custom/entrypoint.sh", - "entrypointArgs": ["--verbose"], - "mounts": [ - "/host/config:/app/config:ro", - "/host/data:/app/data:rw" - ], - "env": { - "GITHUB_PERSONAL_ACCESS_TOKEN": "", - "EXPANDED_VAR": "${MY_HOME}/config" - }, - "guard-policies": { - "allow-only": { - "repos": ["github/gh-aw-mcpg", "github/gh-aw"], - "min-integrity": "unapproved" - } - } - }, - "safeoutputs": { - "type": "stdio", - "container": "ghcr.io/github/safe-outputs:latest", - "guard-policies": { - "write-sink": { - "accept": ["private:github/gh-aw-mcpg", "private:github/gh-aw"] - } - } +"safeoutputs": { + "type": "stdio", + "container": "ghcr.io/github/safe-outputs:latest", + "guard-policies": { + "write-sink": { + "accept": ["private:github/gh-aw-mcpg", "private:github/gh-aw"] } - }, - "gateway": { - "port": 8080, - "apiKey": "your-api-key", - "domain": "localhost", - "startupTimeout": 30, - "toolTimeout": 60, - "payloadDir": "/tmp/jq-payloads" } } ``` -For complete field-by-field reference, see **[docs/CONFIGURATION.md](docs/CONFIGURATION.md)**. - -Key server fields: `type` (stdio/http), `container` (Docker image), `env` (environment variables with `${VAR}` expansion), `url` (HTTP endpoint), `tools` (tool filter list), `guard` (DIFC guard name), `guard-policies` (access control). - -##### guard-policies - -- **`allow-only`**: Restricts repository access for GitHub MCP servers — configures `repos` (scope) and `min-integrity` (none/unapproved/approved/merged) -- **`write-sink`**: Required for ALL output servers when DIFC guards are enabled — configures `accept` patterns matching agent secrecy tags - -See **[docs/CONFIGURATION.md](docs/CONFIGURATION.md)** for guard-policy details including the allow-only → write-sink accept mapping table. +The `accept` entries must match the secrecy tags assigned by the guard. Key mappings: -#### Custom Schemas (`customSchemas`) +| `allow-only.repos` | `write-sink.accept` | +|---|---| +| `"all"` or `"public"` | `["*"]` | +| `["owner/repo"]` | `["private:owner/repo"]` | +| `["owner/*"]` | `["private:owner"]` | +| `["owner/prefix*"]` | `["private:owner/prefix*"]` | -Define custom server types beyond `"stdio"` and `"http"` by mapping type names to HTTPS schema URLs. See **[docs/CONFIGURATION.md](docs/CONFIGURATION.md)** for details. +See **[docs/CONFIGURATION.md](docs/CONFIGURATION.md)** for the complete mapping table and accept pattern reference. -#### Gateway Configuration Fields (Reserved) - -| Field | Description | Default | -|-------|-------------|---------| -| `port` | HTTP port (1-65535) | From `--listen` flag | -| `apiKey` | API key for authentication | (disabled) | -| `domain` | Gateway domain | `localhost` | -| `startupTimeout` | Backend startup timeout (seconds) | `60` | -| `toolTimeout` | Tool execution timeout (seconds) | `120` | -| `payloadDir` | Large payload storage directory | `/tmp/jq-payloads` | - -See **[docs/CONFIGURATION.md](docs/CONFIGURATION.md)** for TOML-only/CLI-only options and variable expansion features. - -### Configuration Validation - -The gateway provides fail-fast validation with precise error locations (line/column for TOML parse errors), unknown key detection (catches typos like `prot` instead of `port`), and environment variable expansion validation. Check log files for warnings after startup. - -## Usage - -Run `./awmg --help` for full CLI options. Key flags: +## Architecture -```bash -./awmg --config config.toml # TOML config file -./awmg --config-stdin < config.json # JSON stdin -./awmg --config config.toml --routed # Routed mode (default) -./awmg --config config.toml --unified # Unified mode -./awmg --config config.toml --log-dir /path # Custom log directory ``` - -## Environment Variables - -For complete reference, see **[docs/ENVIRONMENT_VARIABLES.md](docs/ENVIRONMENT_VARIABLES.md)**. - -Key variables: - -| Variable | Description | Default | -|----------|-------------|---------| -| `MCP_GATEWAY_PORT` | Gateway listening port | `8000` | -| `MCP_GATEWAY_API_KEY` | API key (reference in config via `"${MCP_GATEWAY_API_KEY}"`) | (disabled) | -| `MCP_GATEWAY_LOG_DIR` | Log file directory | `/tmp/gh-aw/mcp-logs` | -| `MCP_GATEWAY_PAYLOAD_DIR` | Payload storage directory | `/tmp/jq-payloads` | -| `MCP_GATEWAY_GUARDS_MODE` | DIFC mode: `strict`/`filter`/`propagate` | `strict` | -| `MCP_GATEWAY_WASM_GUARDS_DIR` | WASM guard directory | (disabled) | -| `DEBUG` | Debug logging pattern (e.g., `*`, `server:*`) | (disabled) | - -## Containerized Mode - -For production deployments, use `run_containerized.sh` which validates the environment, requires essential env vars, and checks Docker socket access: - -```bash -docker run -i \ - -e MCP_GATEWAY_PORT=8080 \ - -e MCP_GATEWAY_DOMAIN=localhost \ - -e MCP_GATEWAY_API_KEY=your-key \ - -v /var/run/docker.sock:/var/run/docker.sock \ - -v /path/to/logs:/tmp/gh-aw/mcp-logs \ - -p 8080:8080 \ - ghcr.io/github/gh-aw-mcpg:latest < config.json + ┌─────────────────────────────────────┐ + │ MCP Gateway │ + Client ──────────▶ /mcp/{serverID} (routed mode) │ + (JSON-RPC 2.0) │ /mcp (unified mode) │ + │ │ + │ ┌─────────────┐ ┌──────────────┐ │ + │ │ DIFC Guards │ │ Auth (7.1) │ │ + │ │ (WASM) │ │ API Key │ │ + │ └──────┬──────┘ └──────────────┘ │ + │ │ │ + │ ┌──────▼──────┐ ┌──────────────┐ │ + │ │ GitHub MCP │ │ Safe Outputs │ │ + │ │ (stdio/ │ │ (write-sink) │ │ + │ │ Docker) │ │ │ │ + │ └─────────────┘ └──────────────┘ │ + └─────────────────────────────────────┘ ``` -Key flags: `-i` (required for stdin config), `-v .../docker.sock` (required for spawning backends), `-p` (must match `MCP_GATEWAY_PORT`). - -For local development, use `run.sh` which provides defaults and warns about missing env vars. +**Transport**: JSON-RPC 2.0 over stdio (containerized Docker) or HTTP (session state preserved) -## Logging +**Routing**: Routed mode (`/mcp/{serverID}`) exposes each backend at its own endpoint. Unified mode (`/mcp`) routes to all configured servers through a single endpoint. -The gateway creates log files in the configured log directory (default: `/tmp/gh-aw/mcp-logs`): +**Security**: WASM-based DIFC guards enforce secrecy and integrity labels per request. Guards are loaded from `MCP_GATEWAY_WASM_GUARDS_DIR` and assigned per-server. Authentication uses plain API keys per MCP spec 7.1 (`Authorization: `). -| File | Purpose | -|------|---------| -| `mcp-gateway.log` | Unified log with all messages | -| `{serverID}.log` | Per-server logs (e.g., `github.log`) | -| `gateway.md` | Markdown-formatted logs for workflow previews | -| `rpc-messages.jsonl` | Machine-readable RPC messages | -| `tools.json` | Available tools from all backends | +**Logging**: Per-server log files (`{serverID}.log`), unified `mcp-gateway.log`, markdown workflow previews (`gateway.md`), and machine-readable `rpc-messages.jsonl`. -Configure log location with `--log-dir` flag or `MCP_GATEWAY_LOG_DIR` env var. Logs include timestamps, levels (INFO/WARN/ERROR/DEBUG), categories, and contextual details. - -For debug logging: `DEBUG=* ./awmg --config config.toml` (supports pattern matching: `DEBUG=server:*,launcher:*`) ## API Endpoints -### Routed Mode (default) - -- `POST /mcp/{serverID}` - Send JSON-RPC request to specific server - - Example: `POST /mcp/github` with body `{"jsonrpc": "2.0", "method": "tools/list", "id": 1}` - -### Unified Mode - -- `POST /mcp` - Send JSON-RPC request (routed to first configured server) - -### Health Check - -- `GET /health` - Returns `OK` - -## MCP Methods - -Supported JSON-RPC 2.0 methods: - -- `tools/list` - List available tools -- `tools/call` - Call a tool with parameters -- Any other MCP method (forwarded as-is) - -## Security Features - -### Authentication - -Per MCP spec 7.1, the gateway uses plain API key authentication: -- Header format: `Authorization: ` (NOT Bearer scheme) -- Configure via `[gateway] api_key` in TOML, or `"gateway": {"apiKey": "${MCP_GATEWAY_API_KEY}"}` in JSON -- When configured, all endpoints except `/health` require authentication -- When not configured, authentication is disabled +- `POST /mcp/{serverID}` — Routed mode (default): JSON-RPC request to specific server +- `POST /mcp` — Unified mode: JSON-RPC request routed to configured servers +- `GET /health` — Health check (returns `OK`) -### Enhanced Error Debugging - -Command failures include full command details, environment variables, and context-specific troubleshooting suggestions (Docker connectivity, image availability, network issues, MCP compatibility). - -## Architecture - -Core MCP proxy with optional DIFC security: - -- TOML and JSON stdin configuration with spec-compliant validation -- Environment variable expansion (`${VAR_NAME}`) with fail-fast behavior -- Stdio transport (containerized) and HTTP transport (session state preserved) -- Routed (`/mcp/{serverID}`) and unified (`/mcp`) modes -- Docker container launching for backend servers -- DIFC guard system with WASM-based guards - -## MCP Server Compatibility - -The gateway supports MCP servers via stdio transport using Docker containers. Tested with GitHub MCP and Serena MCP servers. - -```json -{ - "mcpServers": { - "github": { - "type": "stdio", - "container": "ghcr.io/github/github-mcp-server:latest" - }, - "serena": { - "type": "stdio", - "container": "ghcr.io/github/serena-mcp-server:latest" - } - } -} -``` +Supported MCP methods: `tools/list`, `tools/call`, and any other method (forwarded as-is). -## Contributing +## Further Reading -For development setup, build instructions, testing guidelines, and project architecture details, see [CONTRIBUTING.md](CONTRIBUTING.md). +| Topic | Link | +|-------|------| +| **Configuration Reference** | [docs/CONFIGURATION.md](docs/CONFIGURATION.md) — Server fields, TOML/JSON formats, guard-policy details, custom schemas, gateway fields, validation rules | +| **Environment Variables** | [docs/ENVIRONMENT_VARIABLES.md](docs/ENVIRONMENT_VARIABLES.md) — All env vars for production, development, Docker, and DIFC configuration | +| **Full Specification** | [MCP Gateway Configuration Reference](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/mcp-gateway.md) — Upstream spec with complete validation rules | +| **Guard Response Labeling** | [docs/GUARD_RESPONSE_LABELING.md](docs/GUARD_RESPONSE_LABELING.md) — How guards label MCP responses with secrecy/integrity tags | +| **HTTP Backend Sessions** | [docs/HTTP_BACKEND_SESSION_ID.md](docs/HTTP_BACKEND_SESSION_ID.md) — Session ID management for HTTP transport backends | +| **Architecture Patterns** | [docs/MCP_SERVER_ARCHITECTURE_PATTERNS.md](docs/MCP_SERVER_ARCHITECTURE_PATTERNS.md) — MCP server design patterns and compatibility | +| **Security Model** | [docs/aw-security.md](docs/aw-security.md) — Security architecture overview | +| **Contributing** | [CONTRIBUTING.md](CONTRIBUTING.md) — Development setup, building, testing, project structure | ## License diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index 2e60f6a7..ea424729 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -4,6 +4,103 @@ This document provides the complete field-by-field reference for MCP Gateway con For the upstream specification, see the **[MCP Gateway Configuration Reference](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/mcp-gateway.md)**. +## Configuration Formats + +MCP Gateway supports two configuration formats: +1. **JSON stdin** — Use with `--config-stdin` flag (primary format for containerized deployments) +2. **TOML file** — Use with `--config` flag for file-based configuration + +### TOML Format (`config.toml`) + +TOML configuration requires `command = "docker"` for stdio-based MCP servers to ensure containerization: + +```toml +[gateway] +port = 3000 +api_key = "your-api-key" + +[servers.github] +command = "docker" +args = ["run", "--rm", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "-i", "ghcr.io/github/github-mcp-server:latest"] + +[servers.github.guard_policies.allow-only] +repos = ["github/gh-aw-mcpg", "github/gh-aw"] +min-integrity = "unapproved" + +[servers.safeoutputs] +command = "docker" +args = ["run", "--rm", "-i", "ghcr.io/github/safe-outputs:latest"] + +[servers.safeoutputs.guard_policies.write-sink] +Accept = ["private:github/gh-aw-mcpg", "private:github/gh-aw"] +``` + +**Important**: Per [MCP Gateway Specification Section 3.2.1](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/mcp-gateway.md#321-containerization-requirement), all stdio-based MCP servers MUST be containerized. The gateway rejects configurations where `command` is not `"docker"`. + +For HTTP-based MCP servers, use the `url` field instead of `command`: + +```toml +[servers.myhttp] +type = "http" +url = "https://example.com/mcp" +``` + +> **Format note**: JSON format uses `"guard-policies"` (with hyphen), TOML uses `guard_policies` (with underscore). + +### JSON Stdin Format + +JSON configuration is the primary format for containerized deployments. Pass via stdin: + +```json +{ + "mcpServers": { + "github": { + "type": "stdio", + "container": "ghcr.io/github/github-mcp-server:latest", + "env": { + "GITHUB_PERSONAL_ACCESS_TOKEN": "" + }, + "guard-policies": { + "allow-only": { + "repos": ["github/gh-aw-mcpg", "github/gh-aw"], + "min-integrity": "unapproved" + } + } + }, + "safeoutputs": { + "type": "stdio", + "container": "ghcr.io/github/safe-outputs:latest", + "guard-policies": { + "write-sink": { + "accept": ["private:github/gh-aw-mcpg", "private:github/gh-aw"] + } + } + } + }, + "gateway": { + "port": 8080, + "apiKey": "${MCP_GATEWAY_API_KEY}", + "domain": "localhost" + } +} +``` + +### Configuration Validation + +The gateway provides fail-fast validation with precise error locations (line/column for TOML parse errors), unknown key detection (catches typos like `prot` instead of `port`), and environment variable expansion validation. Check log files for warnings after startup. + +### Usage + +Run `./awmg --help` for full CLI options. Key flags: + +```bash +./awmg --config config.toml # TOML config file +./awmg --config-stdin < config.json # JSON stdin +./awmg --config config.toml --routed # Routed mode (default) +./awmg --config config.toml --unified # Unified mode +./awmg --config config.toml --log-dir /path # Custom log directory +``` + ## Server Configuration Fields - **`type`** (optional): Server transport type