kiyo-e · kiyo-e · Aug 6, 2025 · Aug 6, 2025 · Aug 6, 2025
diff --git a/.dev.vars.example b/.dev.vars.example
@@ -6,8 +6,8 @@
 # ANTHROPIC_PROXY_BASE_URL=https://openrouter.ai/api/v1
 
 # Model configuration (optional)
-# REASONING_MODEL=deepseek/deepseek-r1-0528:free
-# COMPLETION_MODEL=deepseek/deepseek-r1-0528:free
+# REASONING_MODEL=z-ai/glm-4.5-air:free
+# COMPLETION_MODEL=z-ai/glm-4.5-air:free
 
 # Enable debug logging (optional)
 # DEBUG=true
diff --git a/.env.example b/.env.example
@@ -6,8 +6,8 @@
 # ANTHROPIC_PROXY_BASE_URL=https://openrouter.ai/api/v1
 
 # Model configuration (optional)
-# REASONING_MODEL=deepseek/deepseek-r1-0528:free
-# COMPLETION_MODEL=deepseek/deepseek-r1-0528:free
+# REASONING_MODEL=z-ai/glm-4.5-air:free
+# COMPLETION_MODEL=z-ai/glm-4.5-air:free
 
 # Enable debug logging (optional)
 # DEBUG=true

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -9,17 +9,28 @@ This is a Claude Code proxy service that translates between Anthropic's Claude A
 ## Architecture
 
 ### Core Components
-- **`src/index.ts`** - Main Hono application with API proxy logic
+- **`src/index.ts`** - Main Hono application with API proxy logic (src/index.ts:39-516)
 - **`src/server.ts`** - Node.js server wrapper for CLI distribution with argument parsing
+- **`src/transform.ts`** - API format transformation utilities between OpenAI and Claude formats
 
-### API Translation Logic
-The proxy service handles (in `src/index.ts:32-450`):
+### API Translation Logic (src/index.ts:39-516)
+The proxy handles `/v1/messages` POST requests and transforms between formats:
 - **Message normalization**: Converts Claude's nested content arrays to OpenAI's flat structure
 - **Tool call mapping**: Transforms Claude's `tool_use`/`tool_result` to OpenAI's `tool_calls`/`tool` roles
 - **Schema transformation**: Removes `format: 'uri'` constraints from JSON schemas for compatibility
-- **Model routing**: Dynamically selects models based on request type (reasoning vs completion)
+- **Model routing**: Dynamically selects models based on `thinking` flag in request
 - **Streaming support**: Handles both streaming and non-streaming responses with SSE
 
+### Transformation Module (src/transform.ts)
+Key exported functions:
+- `transformOpenAIToClaude()`: Main transformation from OpenAI to Claude format
+- `sanitizeRoot()`: Drops unsupported OpenAI parameters and ensures Claude requirements
+- `mapTools()`: Converts OpenAI tools/functions to Claude tool format
+- `mapToolChoice()`: Maps OpenAI function_call to Claude tool_choice
+- `transformMessages()`: Converts message roles and content blocks
+- `removeUriFormat()`: Recursively removes format:'uri' from JSON schemas
+- `transformClaudeToOpenAI()`: Converts Claude responses back to OpenAI format
+
 ### Dual Runtime Support
 - **Cloudflare Workers**: Uses Hono's built-in fetch handler (`src/index.ts`)
 - **Node.js**: Uses `@hono/node-server` adapter (`src/server.ts`)
@@ -30,38 +41,66 @@ The proxy service handles (in `src/index.ts:32-450`):
 # Install dependencies
 bun install
 
-# Local development server (hot reload)
+# Local development server (hot reload on port 3000)
 bun run start
 
-# Cloudflare Workers development
+# Cloudflare Workers development (local testing)
 bun run dev
 
-# Build CLI package
+# Build CLI binary to ./bin
 bun run build
 
+# Test the built CLI
+bun run bin --help
+./bin --help
+./bin -p 8080  # Run on different port
+
 # Deploy to Cloudflare Workers
 bun run deploy
+
+# Set environment variables for Cloudflare Workers
+npx wrangler secret put CLAUDE_CODE_PROXY_API_KEY
+npx wrangler env put REASONING_MODEL "z-ai/glm-4.5-air:free"
+
+# Publishing to npm
+npm version patch  # or minor/major
+npm publish
 ```
 
 ## CLI Package
 
 The project builds to an executable CLI via `bun run build`:
-- **Output**: `./bin` - Standalone Node.js executable
-- **Version management**: Reads from `package.json` dynamically
-- **CLI flags**: `-v/--version`, `--help`, `-p/--port`
+- **Output**: `./bin` - Standalone Node.js executable with ES module format
+- **Version management**: Reads dynamically from `package.json`
+- **CLI flags**: 
+  - `-v/--version`: Show version
+  - `--help`: Show help information
+  - `-p/--port <PORT>`: Set server port (default: 3000)
+- **Build process**: Uses Bun's native TypeScript compilation with executable permission
 
 ## Environment Variables
 
 Configure via `wrangler.toml` or environment:
-- `CLAUDE_CODE_PROXY_API_KEY` - Bearer token for upstream API
+
+### Required
+- `CLAUDE_CODE_PROXY_API_KEY` - Bearer token for upstream API authentication
+
+### Optional Configuration
 - `ANTHROPIC_PROXY_BASE_URL` - Upstream API URL (default: https://models.github.ai/inference)
 - `REASONING_MODEL` - Model for reasoning requests (default: openai/gpt-4.1)
 - `COMPLETION_MODEL` - Model for completion requests (default: openai/gpt-4.1)
-- `REASONING_MAX_TOKENS` - Max tokens for reasoning model (optional)
-- `COMPLETION_MAX_TOKENS` - Max tokens for completion model (optional)
+- `REASONING_MAX_TOKENS` - Max tokens for reasoning model (overrides request setting)
+- `COMPLETION_MAX_TOKENS` - Max tokens for completion model (overrides request setting)
+- `REASONING_EFFORT` - Reasoning effort level for reasoning model (values: "low", "medium", "high")
 - `DEBUG` - Enable debug logging (default: false)
 - `PORT` - Server port for Node.js mode (default: 3000)
 
+### Model Selection Logic
+- When request contains `thinking: true`, uses `REASONING_MODEL`
+- Otherwise uses `COMPLETION_MODEL`
+- `REASONING_EFFORT` only applies when using reasoning models
+- Max tokens overrides take precedence over request-provided values
+
 ## Deployment Options
 
 ### Cloudflare Workers
@@ -71,10 +110,19 @@ bun run deploy
 ```
 
 ### Docker
-Multi-stage build with production optimization:
+Multi-stage build with production optimization using distroless image:
 ```bash
+# Build image
 docker build -t claude-code-proxy .
-docker run -d -p 3000:3000 claude-code-proxy
+
+# Run with environment variables
+docker run -d -p 3000:3000 \
+  -e CLAUDE_CODE_PROXY_API_KEY=your_token \
+  -e ANTHROPIC_PROXY_BASE_URL=https://models.github.ai/inference \
+  claude-code-proxy
+
+# Development with hot reload (using compose.yml)
+docker compose up
 ```
 
 ### NPM Package
@@ -86,14 +134,28 @@ claude-code-proxy --help
 
 ## GitHub Actions Integration
 
-Service container setup for `@claude` mentions:
+Service container setup for `@claude` mentions in GitHub Actions:
 ```yaml
-services:
-  claude-code-proxy:
-    image: ghcr.io/kiyo-e/claude-code-proxy:latest
-    ports: [3000:3000]
-    env:
-      CLAUDE_CODE_PROXY_API_KEY: ${{ secrets.GITHUB_TOKEN }}
+jobs:
+  review:
+    runs-on: ubuntu-latest
+    services:
+      claude-code-proxy:
+        image: ghcr.io/kiyo-e/claude-code-proxy:latest
+        ports:
+          - 3000:3000
+        env:
+          CLAUDE_CODE_PROXY_API_KEY: ${{ secrets.GITHUB_TOKEN }}
+          ANTHROPIC_PROXY_BASE_URL: https://models.github.ai/inference
+          REASONING_MODEL: openai/gpt-4.1
+          COMPLETION_MODEL: openai/gpt-4.1
+
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run Claude Code
+        run: |
+          export ANTHROPIC_BASE_URL=http://localhost:3000
+          claude "Review the changes in this PR"
 ```
 
 ## Local Usage with Claude Code
@@ -120,6 +182,41 @@ ANTHROPIC_BASE_URL=http://localhost:3000 claude "Review the API code and suggest
 ```bash
 # Using environment file
 echo "ANTHROPIC_PROXY_BASE_URL=https://openrouter.ai/api/v1" > .env
-echo "REASONING_MODEL=deepseek/deepseek-r1-0528:free" >> .env
+echo "REASONING_MODEL=z-ai/glm-4.5-air:free" >> .env
+echo "COMPLETION_MODEL=z-ai/glm-4.5-air:free" >> .env
+echo "REASONING_EFFORT=high" >> .env
 docker run -d -p 3000:3000 --env-file .env ghcr.io/kiyo-e/claude-code-proxy:latest
-```
+```
+
+### Development with Local Claude Code
+```bash
+# Start the proxy
+bun run start
+
+# In another terminal, use with Claude Code
+ANTHROPIC_BASE_URL=http://localhost:3000 \
+CLAUDE_CODE_PROXY_API_KEY=your_token \
+claude "Review this code and suggest improvements"
+
+# Debug mode
+DEBUG=true bun run start
+```
+
+## Important Implementation Notes
+
+### Request Flow
+1. Client sends Claude API format request to `/v1/messages`
+2. Proxy transforms to OpenAI format using `transformOpenAIToClaude()`
+3. Request forwarded to upstream API (configured via `ANTHROPIC_PROXY_BASE_URL`)
+4. Response transformed back to Claude format
+5. Streaming responses handled with Server-Sent Events (SSE)
+
+### Error Handling
+- HTTP errors from upstream API: Returns same status code with error details
+- API errors in response body: Returns 500 with error message
+- Dropped parameters tracked in `X-Dropped-Params` header
+
+### Debugging
+- Enable `DEBUG=true` to log request/response payloads
+- Bearer tokens are automatically masked in logs
+- Check health endpoint at `/` for configuration status
diff --git a/README.md b/README.md
@@ -56,8 +56,9 @@ docker run -d -p 3000:3000 -e CLAUDE_CODE_PROXY_API_KEY=your_github_token ghcr.i
 docker run -d -p 3000:3000 \
   -e CLAUDE_CODE_PROXY_API_KEY=your_openrouter_key \
   -e ANTHROPIC_PROXY_BASE_URL=https://openrouter.ai/api/v1 \
-  -e REASONING_MODEL=deepseek/deepseek-r1-0528:free \
-  -e COMPLETION_MODEL=deepseek/deepseek-r1-0528:free \
+  -e REASONING_MODEL=z-ai/glm-4.5-air:free \
+  -e COMPLETION_MODEL=z-ai/glm-4.5-air:free \
+  -e REASONING_EFFORT=high \
   ghcr.io/kiyo-e/claude-code-proxy:latest
 
 # Use with Claude Code
@@ -71,10 +72,11 @@ ANTHROPIC_BASE_URL=http://localhost:3000 claude "Help me review this code"
 cat > .env << EOF
 CLAUDE_CODE_PROXY_API_KEY=your_api_key
 ANTHROPIC_PROXY_BASE_URL=https://openrouter.ai/api/v1
-REASONING_MODEL=deepseek/deepseek-r1-0528:free
-COMPLETION_MODEL=deepseek/deepseek-r1-0528:free
+REASONING_MODEL=z-ai/glm-4.5-air:free
+COMPLETION_MODEL=z-ai/glm-4.5-air:free
 REASONING_MAX_TOKENS=4096
 COMPLETION_MAX_TOKENS=2048
+REASONING_EFFORT=high
 DEBUG=false
 EOF
 
@@ -129,9 +131,9 @@ npx wrangler secret put CLAUDE_CODE_PROXY_API_KEY
 npx wrangler secret put ANTHROPIC_PROXY_BASE_URL
 # Enter: https://openrouter.ai/api/v1
 npx wrangler secret put REASONING_MODEL
-# Enter: deepseek/deepseek-r1-0528:free
+# Enter: z-ai/glm-4.5-air:free
 npx wrangler secret put COMPLETION_MODEL
-# Enter: deepseek/deepseek-r1-0528:free
+# Enter: z-ai/glm-4.5-air:free
 ```
 
 3. **Test the deployment:**
@@ -190,6 +192,7 @@ npm publish
 - `COMPLETION_MODEL` - Model for completion requests (default: openai/gpt-4.1)
 - `REASONING_MAX_TOKENS` - Max tokens for reasoning model (optional)
 - `COMPLETION_MAX_TOKENS` - Max tokens for completion model (optional)
+- `REASONING_EFFORT` - Reasoning effort level for reasoning model (optional, e.g., "low", "medium", "high")
 - `DEBUG` - Enable debug logging (default: false)
 - `PORT` - Server port for CLI mode (default: 3000)
 
@@ -203,17 +206,19 @@ npx wrangler secret put CLAUDE_CODE_PROXY_API_KEY
 npx wrangler secret put ANTHROPIC_PROXY_BASE_URL
 
 # Set regular environment variables
-npx wrangler env put REASONING_MODEL "deepseek/deepseek-r1-0528:free"
-npx wrangler env put COMPLETION_MODEL "deepseek/deepseek-r1-0528:free"
+npx wrangler env put REASONING_MODEL "z-ai/glm-4.5-air:free"
+npx wrangler env put COMPLETION_MODEL "z-ai/glm-4.5-air:free"
+npx wrangler env put REASONING_EFFORT "high"
 npx wrangler env put DEBUG "false"
 ```
 
 Alternatively, configure via `wrangler.toml`:
 
 ```toml
 [env.production.vars]
-REASONING_MODEL = "deepseek/deepseek-r1-0528:free"
-COMPLETION_MODEL = "deepseek/deepseek-r1-0528:free"
+REASONING_MODEL = "z-ai/glm-4.5-air:free"
+COMPLETION_MODEL = "z-ai/glm-4.5-air:free"
+REASONING_EFFORT = "high"
 DEBUG = "false"
 ```