Skip to content
Merged
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,10 +147,25 @@ See **[docs/CONFIGURATION.md](docs/CONFIGURATION.md)** for the complete mapping

Supported MCP methods: `tools/list`, `tools/call`, and any other method (forwarded as-is).

## Proxy Mode

The gateway can also run as an HTTP forward proxy (`awmg proxy`) that intercepts GitHub API requests from tools like `gh` CLI and applies the same DIFC filtering:

```bash
awmg proxy \
--guard-wasm guards/github-guard/github_guard.wasm \
--policy '{"allow-only":{"repos":["org/repo"],"min-integrity":"approved"}}' \
--github-token "$GITHUB_TOKEN" \
--listen localhost:8080
```

This maps ~25 REST URL patterns and GraphQL queries to guard tool names, then runs the same 6-phase DIFC pipeline used by the MCP gateway. See [docs/PROXY_MODE.md](docs/PROXY_MODE.md) for full documentation.

## Further Reading

| Topic | Link |
|-------|------|
| **Proxy Mode** | [docs/PROXY_MODE.md](docs/PROXY_MODE.md) — HTTP forward proxy for DIFC filtering of `gh` CLI and REST/GraphQL requests |
| **Configuration Reference** | [docs/CONFIGURATION.md](docs/CONFIGURATION.md) — Server fields, TOML/JSON formats, guard-policy details, custom schemas, gateway fields, validation rules |
| **Environment Variables** | [docs/ENVIRONMENT_VARIABLES.md](docs/ENVIRONMENT_VARIABLES.md) — All env vars for production, development, Docker, and guard configuration |
| **Full Specification** | [MCP Gateway Configuration Reference](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/mcp-gateway.md) — Upstream spec with complete validation rules |
Expand Down
145 changes: 145 additions & 0 deletions docs/PROXY_MODE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Proxy Mode

Proxy mode (`awmg proxy`) is an HTTP forward proxy that intercepts GitHub API requests and applies DIFC (Data Information Flow Control) filtering using the same guard WASM module as the MCP gateway.

## Motivation

The MCP gateway enforces DIFC on MCP tool calls, but tools that call the GitHub API directly — such as `gh api`, `gh issue list`, or raw `curl` — bypass it entirely. Proxy mode closes this gap by sitting between the HTTP client and `api.github.com`, applying guard policies to REST and GraphQL requests.

## Quick Start

```bash
# Start the proxy
awmg proxy \
--guard-wasm guards/github-guard/github_guard.wasm \
--policy '{"allow-only":{"repos":["org/repo"],"min-integrity":"approved"}}' \
--github-token "$GITHUB_TOKEN" \
--listen localhost:8080

# Point gh CLI at the proxy
GH_HOST=localhost:8080 GH_TOKEN="$GITHUB_TOKEN" gh issue list -R org/repo

# Or use curl directly
curl -H "Authorization: token $GITHUB_TOKEN" \
http://localhost:8080/api/v3/repos/org/repo/issues
```

## How It Works

```
HTTP client → awmg proxy (localhost:8080) → api.github.com
6-phase DIFC pipeline
(same guard WASM module)
```

1. The proxy receives an HTTP request (REST GET or GraphQL POST)
2. It maps the URL/query to a guard tool name (e.g., `/repos/:owner/:repo/issues` → `list_issues`)
3. The guard WASM module evaluates access based on the configured policy
4. If allowed, the request is forwarded to `api.github.com`
5. The response is filtered per-item based on secrecy/integrity labels
6. The filtered response is returned to the client

Write operations (PUT, POST, DELETE, PATCH) pass through unmodified.

## Flags

| Flag | Default | Description |
|------|---------|-------------|
| `--guard-wasm` | *(required)* | Path to the guard WASM module |
| `--policy` | | Guard policy JSON (e.g., `{"allow-only":{"repos":["org/repo"]}}`) |
| `--github-token` | `$GITHUB_TOKEN` | GitHub API token for upstream requests |
| `--listen` / `-l` | `127.0.0.1:8080` | HTTP listen address |
| `--log-dir` | `/tmp/gh-aw/mcp-logs` | Log file directory |
| `--guards-mode` | `filter` | DIFC mode: `strict`, `filter`, or `propagate` |
| `--github-api-url` | `https://api.github.com` | Upstream GitHub API URL |

## DIFC Pipeline

The proxy reuses the same 6-phase pipeline as the MCP gateway, with Phase 3 adapted for HTTP forwarding:

| Phase | Description | Shared with Gateway? |
|-------|-------------|---------------------|
| **0** | Extract agent labels from registry | ✅ |
| **1** | `Guard.LabelResource()` — coarse access check | ✅ |
| **2** | `Evaluator.Evaluate()` — secrecy/integrity evaluation | ✅ |
| **3** | Forward request to GitHub API | ❌ Proxy-specific |
| **4** | `Guard.LabelResponse()` — per-item labeling | ✅ |
| **5** | `Evaluator.FilterCollection()` — fine-grained filtering | ✅ |

## REST Route Mapping

The proxy maps ~25 GitHub REST API URL patterns to guard tool names:

| URL Pattern | Guard Tool |
|-------------|-----------|
| `/repos/:owner/:repo/issues` | `list_issues` |
| `/repos/:owner/:repo/issues/:number` | `get_issue` |
| `/repos/:owner/:repo/pulls` | `list_pull_requests` |
| `/repos/:owner/:repo/pulls/:number` | `get_pull_request` |
| `/repos/:owner/:repo/commits` | `list_commits` |
| `/repos/:owner/:repo/commits/:sha` | `get_commit` |
| `/repos/:owner/:repo/contents/:path` | `get_file_contents` |
| `/repos/:owner/:repo/branches` | `list_branches` |
| `/repos/:owner/:repo/releases` | `list_releases` |
| `/search/issues` | `search_issues` |
| `/search/code` | `search_code` |
| `/search/repositories` | `search_repositories` |
| `/user` | `get_me` |
| ... | See `internal/proxy/router.go` for full list |

Unrecognized URLs pass through without DIFC filtering.

## GraphQL Support

GraphQL queries to `/graphql` are parsed to extract the operation type and owner/repo context:

- **Repository-scoped queries** (issues, PRs, commits) — mapped to corresponding tool names
- **Search queries** — mapped to `search_issues` or `search_code`
- **Viewer queries** — mapped to `get_me`
- **Unknown queries** — passed through without filtering

Owner and repo are extracted from GraphQL variables (`$owner`, `$name`/`$repo`) or inline string arguments.

## Policy Notes

- **Repo names must be lowercase** in policies (e.g., `octocat/hello-world` not `octocat/Hello-World`). The guard performs case-insensitive matching against actual GitHub data.
- All policy formats supported by the MCP gateway work identically in proxy mode:
- Specific repos: `{"allow-only":{"repos":["org/repo"]}}`
- Owner wildcards: `{"allow-only":{"repos":["org/*"]}}`
- Multiple repos: `{"allow-only":{"repos":["org/repo1","org/repo2"]}}`
- Integrity filtering: `{"allow-only":{"repos":["org/repo"],"min-integrity":"approved"}}`

## Container Usage

The proxy is included in the same container image as the MCP gateway:

```bash
docker run --rm \
--entrypoint /app/awmg \
-p 8080:8080 \
-e GITHUB_TOKEN \
ghcr.io/github/gh-aw-mcpg:latest \
proxy \
--guard-wasm /guards/github/00-github-guard.wasm \
--policy '{"allow-only":{"repos":["org/repo"],"min-integrity":"none"}}' \
--github-token "$GITHUB_TOKEN" \
--listen 0.0.0.0:8080 \
--guards-mode filter
```

Note: The container entrypoint defaults to `run_containerized.sh` (MCP gateway mode). Use `--entrypoint /app/awmg` to run proxy mode directly.

## Guards Mode

| Mode | Behavior |
|------|----------|
| `strict` | Blocks entire response if any items are filtered |
| `filter` | Removes filtered items, returns remaining (default) |
| `propagate` | Labels accumulate on the agent; no filtering |

## Known Limitations

- **gh CLI HTTPS requirement**: `gh` forces HTTPS when connecting to `GH_HOST`. The proxy serves plain HTTP, so direct `gh` CLI interception requires a TLS-terminating reverse proxy in front. Use `curl` or `gh api --hostname` with HTTP for testing.
- **GraphQL nested filtering**: Deeply nested GraphQL response structures depend on guard support for item-level labeling.
- **Read-only filtering**: Only GET requests and GraphQL POST queries are filtered. Write operations pass through unmodified.
141 changes: 141 additions & 0 deletions internal/cmd/proxy.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
package cmd

import (
"fmt"
"log"
"net"
"net/http"
"os"
"os/signal"
"syscall"

"github.com/github/gh-aw-mcpg/internal/logger"
"github.com/github/gh-aw-mcpg/internal/proxy"
"github.com/spf13/cobra"
)

// Proxy subcommand flag variables
var (
proxyGuardWasm string
proxyPolicy string
proxyToken string
proxyListen string
proxyLogDir string
proxyDIFCMode string
proxyAPIURL string
)

func init() {
rootCmd.AddCommand(newProxyCmd())
}

func newProxyCmd() *cobra.Command {
cmd := &cobra.Command{
Use: "proxy",
Short: "Run as a GitHub API filtering proxy",
Long: `Run the gateway in proxy mode — an HTTP forward proxy that intercepts
gh CLI requests and applies DIFC filtering using the same guard WASM module.

Usage with the gh CLI:

# Start the proxy
awmg proxy \
--guard-wasm guards/github-guard/github_guard.wasm \
--policy '{"allow-only":{"repos":["org/repo"],"min-integrity":"approved"}}' \
--github-token "$GITHUB_TOKEN" \
--listen localhost:8080

# Point gh at the proxy
GH_HOST=localhost:8080 GH_TOKEN="$GITHUB_TOKEN" gh issue list -R org/repo`,
SilenceUsage: true,
RunE: runProxy,
}

cmd.Flags().StringVar(&proxyGuardWasm, "guard-wasm", "", "Path to the guard WASM module (required)")
cmd.Flags().StringVar(&proxyPolicy, "policy", getDefaultGuardPolicyJSON(), "Guard policy JSON")
cmd.Flags().StringVar(&proxyToken, "github-token", os.Getenv("GITHUB_TOKEN"), "GitHub API token")
cmd.Flags().StringVarP(&proxyListen, "listen", "l", "127.0.0.1:8080", "HTTP proxy listen address")
cmd.Flags().StringVar(&proxyLogDir, "log-dir", getDefaultLogDir(), "Log file directory")
cmd.Flags().StringVar(&proxyDIFCMode, "guards-mode", "filter", "DIFC enforcement mode: strict, filter, propagate")
cmd.Flags().StringVar(&proxyAPIURL, "github-api-url", proxy.DefaultGitHubAPIBase, "Upstream GitHub API URL")

cmd.MarkFlagRequired("guard-wasm")

return cmd
}

func runProxy(cmd *cobra.Command, args []string) error {
ctx, cancel := signal.NotifyContext(cmd.Context(), os.Interrupt, syscall.SIGTERM)
defer cancel()

// Initialize loggers
if err := logger.InitFileLogger(proxyLogDir, "proxy.log"); err != nil {
log.Printf("Warning: Failed to initialize file logger: %v", err)
}
if err := logger.InitJSONLLogger(proxyLogDir, "proxy-rpc.jsonl"); err != nil {
log.Printf("Warning: Failed to initialize JSONL logger: %v", err)
}

logger.LogInfo("startup", "MCPG Proxy starting: listen=%s, guard=%s, mode=%s", proxyListen, proxyGuardWasm, proxyDIFCMode)

// Resolve GitHub token
token := proxyToken
if token == "" {
token = os.Getenv("GH_TOKEN")
}
if token == "" {
token = os.Getenv("GITHUB_PERSONAL_ACCESS_TOKEN")
}

// Create the proxy server
proxySrv, err := proxy.New(ctx, proxy.Config{
WasmPath: proxyGuardWasm,
Policy: proxyPolicy,
GitHubToken: token,
GitHubAPIURL: proxyAPIURL,
DIFCMode: proxyDIFCMode,
})
if err != nil {
return fmt.Errorf("failed to create proxy server: %w", err)
}

// Create and start the HTTP server
httpServer := &http.Server{
Addr: proxyListen,
Handler: proxySrv.Handler(),
}

// Start HTTP server in background
go func() {
listener, err := net.Listen("tcp", proxyListen)
if err != nil {
log.Printf("Failed to listen on %s: %v", proxyListen, err)
cancel()
return
}

actualAddr := listener.Addr().String()
log.Printf("MCPG Proxy listening on %s", actualAddr)
logger.LogInfo("startup", "Proxy listening on %s", actualAddr)

// Print connection info
fmt.Fprintf(os.Stderr, "\nMCPG GitHub API Proxy\n")
fmt.Fprintf(os.Stderr, " Listening: %s\n", actualAddr)
fmt.Fprintf(os.Stderr, " Mode: %s\n", proxyDIFCMode)
fmt.Fprintf(os.Stderr, " Guard: %s\n", proxyGuardWasm)
fmt.Fprintf(os.Stderr, "\nConnect with:\n")
fmt.Fprintf(os.Stderr, " GH_HOST=%s GH_TOKEN=<token> gh ...\n\n", actualAddr)

if err := httpServer.Serve(listener); err != nil && err != http.ErrServerClosed {
log.Printf("HTTP server error: %v", err)
cancel()
}
}()

// Wait for shutdown signal
<-ctx.Done()
log.Println("Shutting down proxy...")
logger.LogInfo("shutdown", "Proxy shutting down")

return httpServer.Close()
}
Loading
Loading