HTTP proxy server that transforms Anthropic Messages API requests to Z.ai GLM-4.7 API format, enabling Claude-compatible tools and applications to use GLM models.
GLM-4.7 is a powerful model, but it has limitations when used directly:
| Limitation | Impact |
|---|---|
| No web search | Can't access current information or search the web |
| No reasoning mode | Lacks step-by-step thinking like Claude's extended thinking |
| Manual model switching | Developers must manually route between text/vision models |
| Limited tool ecosystem | Official docs state "does not support custom tools" |
| Complex integration | Each AI tool needs custom GLM API integration |
| Problem | Solution |
|---|---|
| No web search | MCP web_search/web_reader injection; intercepts Claude Code's native WebSearch/WebFetch |
| No reasoning | Automatic reasoning prompt injection with <reasoning_content> parsing to thinking blocks |
| Manual model switching | Auto-detects images/video in current message → routes to glm-4.6v, switches back to glm-4.7 for text |
| Limited tools | Dynamic MCP registry - add Playwright, Context7, or any MCP server via dashboard |
| Complex integration | Drop-in Anthropic API compatibility - works with any tool that supports custom base URLs |
- Claude Code users: Get web search without an Anthropic subscription
- Vision tasks: Automatic model switching - no manual configuration
- Reasoning: Step-by-step thinking blocks for complex problems
- Extensible: Add your own MCP servers for specialized tools
- Zero code changes: Point your tools at
http://127.0.0.1:4567and go
- Web Dashboard: Settings panel and MCP management (vanilla JS, no dependencies)
- Smart Backend Routing: Automatically routes text requests via Anthropic endpoint and vision requests via OpenAI endpoint for optimal results
- API Translation: Transparent conversion between Anthropic Messages API and OpenAI-compatible GLM API
- Intelligent Model Selection: Automatic selection of text (glm-4.7) or vision (glm-4.6v) models based on current message content
- Video Analysis: Full video support with automatic file path detection - just mention a video file and it's analyzed
- Reasoning Injection: Automatic reasoning prompt injection for step-by-step thinking with
<reasoning_content>tag parsing - Tool Execution: Internal tool loop for web_search and web_reader via Z.ai MCP servers, plus automatic interception of Claude Code's native WebSearch/WebFetch tools
- Client Tools: Pass-through support for client-defined tools
- Streaming: Full SSE streaming support for both backend paths
- Production Ready: Structured logging, error handling, graceful shutdown
- Node.js 18.0.0 or later
- Z.ai API key (get one at https://z.ai)
- Claude Code CLI (optional, for
ccglmcommand)
# Clone and install
git clone <repository-url>
cd glmproxy
npm install
# Install globally for ccglm command
npm install -g .API keys are configured via a .env file in the project root. This file is automatically loaded on startup.
Create your .env file:
# Copy the example file
cp .env.example .env
# Edit with your API key
nano .env # or use your preferred editorRequired keys:
| Variable | Description |
|---|---|
ZAI_API_KEY |
Your Z.ai API key (required) - get one at https://z.ai |
Optional keys for MCP servers:
| Variable | Description |
|---|---|
REF_API_KEY |
API key for Ref Tools MCP (documentation search) |
CONTEXT7_API_KEY |
API key for Context7 MCP (library docs lookup) |
Security best practices:
- Never commit
.envto git - It is already in.gitignore - Never share API keys - Treat them like passwords
- Use environment variables in CI/CD - Don't store keys in code or config files
- Rotate keys periodically - Regenerate if you suspect exposure
- Dashboard API key entry - Keys entered via the web UI are saved to
.envautomatically
If you don't have a .env.example file, create .env manually:
# .env
ZAI_API_KEY=your_api_key_here
# Optional: Server configuration
# PORT=4567
# HOST=127.0.0.1
# LOG_LEVEL=infoAlternatively, set the environment variable directly in your shell:
export ZAI_API_KEY="your-api-key-here"The easiest way to use the proxy:
# Start proxy and launch Claude Code in one command
ccglm
# Skip permission prompts (use with caution)
ccglm yolo
# Open the web dashboard to configure settings
ccglm ui
# Check proxy status
ccglm status# Start proxy server
npm start
# Development (with auto-reload)
npm run devThe proxy will start on http://127.0.0.1:4567 by default.
Open your browser and navigate to:
http://127.0.0.1:4567/
You'll see the settings dashboard where you can:
- Configure your Z.ai API key in the Settings panel
- Select endpoint mode (Anthropic, OpenAI, BigModel)
- Toggle features like web search, reasoning, and streaming
- Manage custom MCP servers
# Health check
curl http://127.0.0.1:4567/health
# Simple request
curl -X POST http://127.0.0.1:4567/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, what model are you?"}
]
}'All configuration is done via environment variables:
| Variable | Default | Description |
|---|---|---|
ZAI_API_KEY |
(required) | Your Z.ai API key |
PORT |
4567 |
Server port |
HOST |
127.0.0.1 |
Server host |
LOG_LEVEL |
info |
Logging level: debug, info, warn, error |
ZAI_BASE_URL |
https://api.z.ai/api/paas/v4/chat/completions |
GLM API endpoint (OpenAI path) |
ZAI_ANTHROPIC_URL |
https://api.z.ai/api/anthropic/v1/messages |
GLM API endpoint (Anthropic path) |
STREAMING_ENABLED |
false |
Enable SSE streaming for responses |
STREAMING_CHUNK_SIZE |
20 |
Characters per streaming chunk |
STREAMING_CHUNK_DELAY |
0 |
Delay between chunks (ms) |
USE_ANTHROPIC_ENDPOINT |
true |
Use native Anthropic-compatible endpoint for text requests |
WEB_SEARCH_ENABLED |
true |
Enable web_search/web_reader tools and Claude Code tool interception |
The ccglm command provides a convenient way to use the proxy:
| Command | Description |
|---|---|
ccglm |
Start proxy and launch Claude Code |
ccglm yolo |
Same as above, with --dangerously-skip-permissions |
ccglm ui |
Open the web dashboard in browser |
ccglm start |
Start proxy server in foreground |
ccglm stop |
Stop background proxy server |
ccglm status |
Check if proxy is running |
ccglm activate |
Print shell exports for manual use |
ccglm help |
Show help message |
When you run ccglm, it:
- Starts the proxy server in the background (if not already running)
- Sets environment variables to route Claude Code through the proxy:
ANTHROPIC_BASE_URL→ proxy URLANTHROPIC_AUTH_TOKEN→ dummy token (proxy uses your ZAI_API_KEY)ANTHROPIC_DEFAULT_*_MODEL→ glm4.6 for all model tiers
- Launches Claude Code
Use ccglm yolo to skip permission prompts.
# Start proxy + Claude Code
ccglm
# Skip permission prompts (use with caution)
ccglm yolo
# Open settings UI to configure API key and features
ccglm ui
# Use with shell activation (for advanced users)
eval $(ccglm activate)
claudeThe easiest way (using ccglm):
ccglmOr configure manually:
# In your shell config (.bashrc, .zshrc, etc.)
export ANTHROPIC_BASE_URL="http://127.0.0.1:4567"Or in the Claude Code settings, set the API base URL to http://127.0.0.1:4567.
Any tool that supports the Anthropic Messages API with a custom base URL can use this proxy. Simply configure:
- Base URL:
http://127.0.0.1:4567 - API Key: Any value (the proxy uses your configured ZAI_API_KEY)
Anthropic Messages API compatible endpoint.
Request:
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 4096,
"system": "You are a helpful assistant.",
"messages": [
{"role": "user", "content": "Hello!"}
],
"tools": [...],
"stream": false
}Response:
{
"id": "msg_1245677890_abc123",
"type": "message",
"role": "assistant",
"content": [
{"type": "text", "text": "Hello! How can I help you?"}
],
"model": "glm-4.7",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 8
}
}Health check endpoint with status and configuration.
Response:
{
"status": "ok",
"version": "1.0.0",
"uptime": 12345,
"config": {
"toolsEnabled": true,
"streamingEnabled": false,
"models": ["glm-4.7", "glm-4.6v"]
},
"validation": {
"isValid": true,
"errors": []
}
}Detailed configuration endpoint (for debugging).
Response:
{
"port": 4567,
"host": "127.0.0.1",
"apiKeyConfigured": true,
"models": {
"text": "glm-4.7",
"vision": "glm-4.6v"
},
"toolExecution": {
"maxIterations": 5,
"timeout": 30000
}
}Update runtime configuration. Changes apply to all clients (Claude Code, Cline, dashboard, etc.).
Request:
{
"streaming": false,
"webSearch": true,
"apiKey": "your-api-key",
"endpoint": "anthropic"
}Response:
{
"success": true,
"config": {
"streaming": false,
"webSearch": true,
"apiKeyConfigured": true,
"endpoint": "anthropic"
}
}The proxy supports two backend paths to Z.ai with intelligent routing:
The proxy automatically selects the best endpoint based on content:
- Text-only requests → Anthropic endpoint (glm-4.7) - faster, native format
- Vision requests → OpenAI endpoint (glm-4.6v) - full image analysis
This avoids Z.ai's server_tool_use interception on the Anthropic endpoint which truncates image analysis results.
- Transforms Anthropic Messages API to OpenAI Chat Completions format
- Routes to
https://api.z.ai/api/paas/v4/chat/completions - Used automatically for vision requests (glm-4.6v)
- Provides complete, untruncated image analysis
- Native passthrough to Z.ai's Anthropic-compatible endpoint
- Routes to
https://api.z.ai/api/anthropic/v1/messages - Used for text-only requests when enabled (default)
- Faster with no format conversion overhead
Toggle the Anthropic endpoint:
- Set
USE_ANTHROPIC_ENDPOINT=true/falseenvironment variable, or - Use the dashboard Settings panel toggle, or
- POST to
/configwith{"endpoint": "anthropic"}or{"endpoint": "openai"}
The proxy automatically selects the appropriate GLM model based on the current message:
- glm-4.7: Used for text-only messages (via Anthropic endpoint)
- glm-4.6v: Used when the current message contains images or videos (via OpenAI endpoint)
After processing an image or video, subsequent text-only messages automatically switch back to glm-4.7 for faster responses. Previous media in conversation history don't force the vision model.
Media detection scans for:
- Direct image/video content blocks
- Base64-encoded images and videos
- Tool results containing images (e.g., screenshots)
GLM-4.6v supports video analysis with up to ~1 hour of video content (128K context). The proxy makes video analysis seamless:
When using Claude Code, simply mention a video file path in your message and the proxy will automatically:
- Detect the video file reference
- Read the file from your working directory
- Convert it to a video content block
- Route to the vision model for analysis
Supported patterns:
@video.mp4 # File in current directory
./path/to/video.mp4 # Relative path
../downloads/clip.webm # Parent directory
/home/user/videos/movie.mov # Absolute path
~/Videos/recording.mp4 # Home directory
Example usage in Claude Code:
User: What's happening in @meeting-recording.mp4?
User: Analyze the video at ~/Downloads/demo.mp4
User: Describe /tmp/screen-capture.webm
In the web dashboard, you can:
- Drag and drop video files directly into the chat
- Use the file picker to select videos
- Paste video file paths
- MP4 (video/mp4)
- WebM (video/webm)
- MOV (video/quicktime)
- MPEG (video/mpeg)
Body size limit: 50MB (supports most short-to-medium videos)
For programmatic use, send videos in Anthropic-like format:
{
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What's in this video?"},
{
"type": "video",
"source": {
"type": "url",
"url": "https://example.com/video.mp4"
}
}
]
}]
}Or with base64:
{
"type": "video",
"source": {
"type": "base64",
"media_type": "video/mp4",
"data": "AAAAIGZ0eXBpc29t..."
}
}The proxy automatically injects a reasoning prompt before the last user message to encourage step-by-step thinking. The model's reasoning output is:
- Captured from
<reasoning_content>tags in the response - Transformed to Anthropic
thinkingblocks
Example response with reasoning:
{
"content": [
{"type": "thinking", "thinking": "Let me think about this..."},
{"type": "text", "text": "The answer is 42."}
]
}The proxy provides web search capabilities via Z.ai's MCP servers, with two internal tools:
- web_search: Search the web using Z.ai's search MCP
- web_reader: Read web page content using Z.ai's reader MCP
When WEB_SEARCH_ENABLED=true (the default), the proxy automatically intercepts Claude Code's native WebSearch and WebFetch tools. This is useful because:
- Claude Code's native web tools require an Anthropic API subscription
- The proxy routes these calls through Z.ai's MCP servers instead
- No changes needed to Claude Code - it works transparently
When Claude Code calls WebSearch or WebFetch, the proxy:
- Intercepts the tool call before it reaches the API
- Executes the equivalent MCP tool (
web_searchorweb_reader) - Returns the result to Claude Code as if the native tool worked
The proxy uses keyword-based triggers to inject web_search/web_reader tools only when the user explicitly requests web functionality. Trigger phrases include:
- "search the web", "search online", "look up online"
- "latest news", "current news", "recent news"
- "latest docs", "official documentation"
- "what is the latest", "what are the latest"
This prevents unwanted web searches on every request (e.g., during Claude Code startup).
Toggle in the dashboard settings, or via environment:
# Disable web search interception (tools passed through to client)
WEB_SEARCH_ENABLED=false ccglm startClient-defined tools are always passed through to the response for client handling.
Both backend paths support full SSE streaming with proper Anthropic event format:
event: message_start
data: {"type":"message_start","message":{...}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_stop
data: {"type":"message_stop"}
Streaming properly handles:
- Text content blocks
- Reasoning/thinking blocks
- Tool use blocks
- Recursive tool execution loops
All errors are returned in Anthropic error format:
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "messages is required"
}
}Error types:
invalid_request_error(400): Malformed requestauthentication_error(401): Invalid API keyrate_limit_error(429): Rate limit exceededapi_error(500): Internal server erroroverloaded_error(529): API overloaded
Structured logging with configurable levels:
# Enable debug logging
LOG_LEVEL=debug node src/index.jsLog format:
[2024-01-15T10:30:00.000Z] [INFO] [server] Listening on http://127.0.0.1:4567
[2024-01-15T10:30:05.000Z] [INFO] [request] POST /v1/messages {"messages":3}
[2024-01-15T10:30:05.000Z] [INFO] [routing] Vision request detected in current message, using OpenAI endpoint
[2024-01-15T10:30:06.000Z] [INFO] [tool] web_search completed {"duration":"1234ms","success":true}
[2024-01-15T10:30:07.000Z] [INFO] [response] 200 end_turn {"duration":"2345ms"}
Set your Z.ai API key:
export ZAI_API_KEY="your-key-here"Your API key is invalid or expired. Get a new key from https://z.ai.
You've hit the rate limit. Wait and retry, or upgrade your Z.ai plan.
GLM-4.7 can take 10-30 seconds for complex requests. For faster responses:
- Use shorter prompts
- Reduce
max_tokens
This should be fixed automatically - the proxy routes vision requests through the OpenAI endpoint which provides complete image analysis. If you still see truncated results, ensure you're using the latest version.
The proxy now only checks the current message for images. After an image request, subsequent text-only messages will automatically use glm-4.7. You don't need to start a new conversation.
Enable debug logs to see full request/response details:
LOG_LEVEL=debug node src/index.jsGLM Proxy is designed for localhost development use only. It is not intended for production deployment or multi-user environments.
- Development tool: For local development and testing with AI coding assistants
- Single-user localhost: Runs on
127.0.0.1by default for local-only access - Trusted environment: Assumes the localhost environment is trusted
This proxy operates under a localhost trust model:
- No authentication: The proxy itself has no authentication layer
- API key storage: Z.ai API keys are stored in memory (server) and localStorage (browser dashboard)
- No encryption: HTTP traffic on localhost is unencrypted (acceptable for local development)
- No rate limiting: Relies on upstream Z.ai rate limits
Do not expose this proxy to the public internet. Specifically:
- Do not bind to
0.0.0.0or your public IP in production - Do not expose port 4567 (or your configured port) through your firewall
- Do not use in shared hosting or multi-tenant environments
- Do not run in production or as a public service
The proxy handles API keys as follows:
- Environment variables:
ZAI_API_KEYis read from the environment (recommended for CLI use) - Dashboard configuration: API keys entered in the web UI are stored in browser localStorage
- Runtime updates: API keys can be updated via POST
/config(stored in memory only) - Upstream only: Keys are only sent to Z.ai's API endpoints (never logged or exposed)
- Not persisted: Runtime API keys are lost on server restart (use environment variables for persistence)
For safe localhost development:
- Use the default
HOST=127.0.0.1binding - Store your
ZAI_API_KEYin your shell profile or.envfile (not in version control) - Use the
ccglmcommand which starts the proxy with safe defaults - Keep your development environment secure (encrypted disk, screen lock, etc.)
If you must deploy this proxy in a production or shared environment, you will need to add:
- Authentication and authorization (e.g., API keys, OAuth)
- HTTPS/TLS encryption
- Rate limiting and DoS protection
- Input validation and sanitization hardening
- Security headers (CSP, HSTS, etc.)
- Audit logging
- Network isolation and firewall rules
We do not recommend production deployment as this is a development tool, but if you proceed, you assume full responsibility for security hardening.
glmproxy/
├── src/
│ ├── index.js # Entry point
│ ├── cli.js # CLI entry point (ccglm command)
│ ├── server.js # HTTP server with smart routing
│ ├── config.js # Configuration with runtime state
│ ├── middleware/
│ │ └── validate.js # Request validation
│ ├── transformers/
│ │ ├── request.js # Anthropic -> GLM (with reasoning injection)
│ │ ├── response.js # GLM -> Anthropic
│ │ ├── messages.js # Message conversion
│ │ ├── anthropic-request.js # Request preparer for Anthropic endpoint
│ │ └── anthropic-response.js # Response cleaner for Anthropic endpoint
│ ├── reasoning/
│ │ └── injector.js # Reasoning prompt injection
│ ├── routing/
│ │ └── model-router.js # Model selection (current message only)
│ ├── tools/
│ │ ├── definitions.js # Tool schemas (web_search, web_reader)
│ │ ├── executor.js # Tool loop with MCP integration (OpenAI path)
│ │ ├── anthropic-executor.js # Tool loop for Anthropic path
│ │ └── mcp-client.js # MCP client
│ ├── streaming/
│ │ ├── sse.js # SSE streaming support
│ │ ├── glm-stream.js # Real-time GLM API streaming
│ │ └── anthropic-stream.js # Anthropic endpoint streaming
│ └── utils/
│ ├── logger.js # Structured logging
│ ├── errors.js # Error classes (Anthropic format)
│ └── video-detector.js # Auto-detect video paths in messages
├── public/
│ ├── index.html # Dashboard entry point
│ ├── css/
│ │ └── styles.css # Styles with theme variables
│ └── js/
│ ├── app.js # Main application orchestrator
│ ├── api.js # API client
│ ├── settings.js # Settings panel
│ ├── mcp-manager.js # MCP server management
│ ├── theme.js # Theme switching
│ └── utils.js # Utility functions
├── package.json
└── README.md
MIT