Skip to content

LeenHawk/gproxy

Repository files navigation

gproxy

gproxy is a Rust-based multi-channel LLM proxy that exposes OpenAI / Claude / Gemini-style APIs through a unified gateway, with a built-in admin console, user/key management, and request/usage auditing.

Chinese version: README.zh.md

If you want to look at the full docs, click here.

Key Features

  • Unified multi-channel gateway: route requests to different upstreams by channel (builtin + custom).
  • Multi-protocol compatibility: one upstream can accept OpenAI/Claude/Gemini requests (controlled by dispatch rules).
  • Credential pool and health states: supports healthy / partial / dead with model-level cooldown retry.
  • OAuth and API Key support: OAuth channels (Codex, ClaudeCode, GeminiCli, Antigravity) and API Key channels.
  • Built-in Web console: available at /, supports English and Chinese.
  • Observability: records upstream/downstream requests and usage metrics (filterable by user/model/time).
  • Async batched storage writes: queue + aggregation to reduce database pressure under load.

Built-in Channels

Channel ID Default Upstream Auth Type
openai https://api.openai.com API Key
anthropic https://api.anthropic.com API Key
aistudio https://generativelanguage.googleapis.com API Key
vertexexpress https://aiplatform.googleapis.com API Key
vertex https://aiplatform.googleapis.com GCP service account (builtin object)
geminicli https://cloudcode-pa.googleapis.com OAuth (builtin object)
claudecode https://api.anthropic.com OAuth/Cookie (builtin object)
codex https://chatgpt.com/backend-api/codex OAuth (builtin object)
antigravity https://daily-cloudcode-pa.sandbox.googleapis.com OAuth (builtin object)
nvidia https://integrate.api.nvidia.com API Key
deepseek https://api.deepseek.com API Key
custom (for example mycustom) your configured base_url API Key (secret)

Quick Start

1. Prerequisites

  • Rust (must support edition = 2024)
  • SQLite (default DSN uses sqlite)
  • Optional: Node.js + pnpm (if you want to rebuild the admin frontend)

2. Prepare Config

cp gproxy.example.toml gproxy.toml

At minimum, set:

  • global.admin_key
  • at least one enabled channel credential (credentials.secret or builtin credential object)

Bootstrap login defaults:

  • username: admin
  • password: value of global.admin_key

3. Run

cargo run -p gproxy

On startup, gproxy prints:

  • listening address (default http://127.0.0.1:8787)
  • current admin key (password:)

If ./gproxy.toml does not exist, gproxy starts with in-memory defaults and auto-generates a 16-digit admin key (printed to stdout).

4. Minimal Verification

curl -sS http://127.0.0.1:8787/openai/v1/models \
  -H "x-api-key: <your user key or admin key>"

Get a user/admin API key via password login:

curl -sS http://127.0.0.1:8787/login \
  -H "content-type: application/json" \
  -d '{
    "name": "admin",
    "password": "<your admin_key>"
  }'

Deployment

Local deployment

Binary

  1. Download the binary from Releases.
  2. Prepare config:
cp gproxy.example.toml gproxy.toml
  1. Run binary:
./gproxy

Docker

Pull prebuilt image (recommended):

docker pull ghcr.io/leenhawk/gproxy:latest

Build from local source (only if you need local code changes):

docker build -t gproxy:local .

Run:

docker run --rm -p 8787:8787 \
  -e GPROXY_HOST=0.0.0.0 \
  -e GPROXY_PORT=8787 \
  -e GPROXY_ADMIN_KEY=your-admin-key \
  -e DATABASE_SECRET_KEY='replace-with-long-random-string' \
  -e GPROXY_DSN='sqlite:///app/data/gproxy.db?mode=rwc' \
  -v $(pwd)/data:/app/data \
  ghcr.io/leenhawk/gproxy:latest

Set DATABASE_SECRET_KEY via env vars or your platform secret manager rather than committing it to the repo. Especially on free-tier or shared managed databases, configure it before the first bootstrap so sensitive fields are not stored in plaintext, and keep the same key on every instance using that database.

Cloud deployment

ClawCloud Run

Run on ClawCloud

  • Template file: claw.yaml
  • Use claw.yaml as a custom template in ClawCloud Run App Store -> My Apps -> Debugging.
  • Key inputs: admin_key (generated by default), proxy_url, rust_log, volume_size
  • Recommended persistence: mount /app/data as a persistent volume.

Release downloads and self-update (Cloudflare Pages)

  • Release CI publishes signed binaries and update manifests to a dedicated Cloudflare Pages downloads project.
  • Default public base URL: https://download-gproxy.leenhawk.com
  • Generated manifests:
    • /manifest.json — full download index used by the docs downloads page
    • /releases/manifest.json — stable self-update feed
    • /staging/manifest.json — staging self-update feed
  • The admin UI Cloudflare update source and /admin/system/self_update read from this downloads site.
  • Required GitHub Actions secrets for the downloads deployment:
    • CLOUDFLARE_API_TOKEN
    • CLOUDFLARE_ACCOUNT_ID
    • CLOUDFLARE_DOWNLOADS_PROJECT_NAME
  • Optional secrets:
    • DOWNLOAD_PUBLIC_BASE_URL — custom public domain or Pages URL exposed in docs/manifests
    • UPDATE_SIGNING_KEY_ID — manifest key id override (default gproxy-release-v1)
    • UPDATE_SIGNING_PRIVATE_KEY_B64 and UPDATE_SIGNING_PUBLIC_KEY_B64 — checksum signature generation and verification

Admin Frontend

  • Console entry: GET /
  • Static assets: /assets/*
  • Frontend build output: apps/gproxy/frontend/dist
  • Backend embeds dist into the binary via rust-embed

If you changed frontend code, rebuild first:

cd apps/gproxy/frontend
pnpm install
pnpm build
cd ../../..
cargo run -p gproxy

Configuration (gproxy.toml)

Reference files:

  • gproxy.example.toml (minimal)
  • gproxy.example.full.toml (full)

global

Field Description
host Bind host, default 127.0.0.1
port Bind port, default 8787
proxy Upstream proxy (empty string means disabled)
hf_token HuggingFace token (optional for tokenizer download)
hf_url HuggingFace base URL, default https://huggingface.co
admin_key Admin bootstrap credential; used as admin password and admin API key on bootstrap, auto-generated if empty
mask_sensitive_info Redact sensitive request/response payloads in logs/events
data_dir Data directory, default ./data
dsn Database DSN; if omitted and data_dir is changed, sqlite DSN is derived automatically

runtime

Field Default Description
storage_write_queue_capacity 4096 Storage write queue size
storage_write_max_batch_size 1024 Max events per aggregated storage batch
storage_write_aggregate_window_ms 25 Aggregation window (ms)

channels

Each channel is declared with [[channels]]:

  • id: channel id (for example openai, claude, mycustom)
  • enabled: runtime enable switch (false disables routing to this channel)
  • settings: channel settings (must include base_url)
  • dispatch: optional; defaults to channel-specific dispatch table when omitted
  • credentials: credential list (supports multi-credential retry/fallback)

Anthropic/ClaudeCode Cache Rewrite (cache_breakpoints)

For anthropic and claudecode, configure cache-control rewrite with:

  • setting key: channels.settings.cache_breakpoints
  • max 4 rules
  • targets: top_level (global alias), tools, system, messages
  • messages indexing uses flattened messages[*].content blocks after normalizing Claude shorthands (content: "..." becomes one text block)
  • for messages, you may also set content_position / content_index; when either field is present, position / index first select a message, then content_* selects a block inside that message
  • ttl: auto / 5m / 1h (auto means no ttl field is injected)
  • existing request-side cache_control is always preserved and counts toward the 4-rule limit

No-ttl default note:

  • anthropic: upstream default is 5m
  • claudecode: upstream default is 5m
  • use explicit ttl when you need deterministic behavior

Example:

[[channels]]
id = "anthropic"
enabled = true

[channels.settings]
base_url = "https://api.anthropic.com"
cache_breakpoints = [
  { target = "top_level", ttl = "auto" },
  { target = "messages", position = "last_nth", index = 1, ttl = "5m" },
  { target = "messages", position = "last_nth", index = 1, content_position = "last_nth", content_index = 1, ttl = "5m" }
]

[[channels]]
id = "claudecode"
enabled = true

[channels.settings]
base_url = "https://api.anthropic.com"
cache_breakpoints = [
  { target = "top_level", ttl = "auto" },
  { target = "messages", position = "last_nth", index = 1, content_position = "last_nth", content_index = 1, ttl = "1h" }
]

channels.credentials

Each credential can include:

  • id / label: optional identifiers
  • secret: for API key channels
  • builtin: structured credential object for OAuth/service-account channels
  • state: optional health-state seed

state.health.kind supports:

  • healthy
  • partial (with model cooldown list)
  • dead

Credential Selection and Cache Affinity

Provider credential routing is controlled by two settings in channels.settings:

  • credential_round_robin_enabled (default true)
  • credential_cache_affinity_enabled (default true, only effective when round-robin is enabled)
  • credential_cache_affinity_max_keys (default 4096, max retained affinity keys per channel)

Effective behavior:

  • credential_round_robin_enabled = false -> StickyNoCache
    • no round-robin
    • no cache affinity pool
    • picks the smallest available credential id and keeps using it until unavailable/cooldown
  • credential_round_robin_enabled = true and credential_cache_affinity_enabled = true -> RoundRobinWithCache
    • round-robin/random among eligible credentials
    • enables internal cache affinity pool for cache-key sticky routing
  • credential_round_robin_enabled = true and credential_cache_affinity_enabled = false -> RoundRobinNoCache
    • round-robin/random among eligible credentials
    • no cache affinity pool

Example:

[[channels]]
id = "openai"
enabled = true

[channels.settings]
base_url = "https://api.openai.com"
credential_round_robin_enabled = true
credential_cache_affinity_enabled = true
credential_cache_affinity_max_keys = 4096

Legacy compatibility:

  • credential_pick_mode is still accepted for backward compatibility.

Detailed design and cache-hit strategy (OpenAI/Claude/Gemini):
https://gproxy.leenhawk.com/guides/credential-selection-cache-affinity/

CLI and Environment Overrides

Priority: CLI flags / env vars > gproxy.toml > defaults

Supported overrides:

  • --config / GPROXY_CONFIG_PATH
  • --host / GPROXY_HOST
  • --port / GPROXY_PORT
  • --proxy / GPROXY_PROXY
  • --admin-key / GPROXY_ADMIN_KEY
  • --bootstrap-force-config / GPROXY_BOOTSTRAP_FORCE_CONFIG
  • --mask-sensitive-info / GPROXY_MASK_SENSITIVE_INFO
  • --data-dir / GPROXY_DATA_DIR
  • --dsn / GPROXY_DSN
  • --storage-write-queue-capacity / GPROXY_STORAGE_WRITE_QUEUE_CAPACITY
  • --storage-write-max-batch-size / GPROXY_STORAGE_WRITE_MAX_BATCH_SIZE
  • --storage-write-aggregate-window-ms / GPROXY_STORAGE_WRITE_AGGREGATE_WINDOW_MS
  • --database-secret-key / DATABASE_SECRET_KEY

Configure the database-at-rest encryption key with --database-secret-key or the DATABASE_SECRET_KEY environment variable.

When DATABASE_SECRET_KEY is unset, gproxy stores and reads DB values as plaintext. When it is set, gproxy transparently encrypts at rest for credential.secret_json, user API keys, user passwords, admin_key, and hf_token.

Recommendations:

  • set the key before the first database bootstrap and keep it identical on every instance using the same database;
  • on free-tier or shared managed databases, strongly prefer setting the key so sensitive values are not stored in plaintext;
  • inject it via env vars / platform secrets instead of committing it to the repo;
  • do not change it casually after encrypted data exists, or older ciphertext will become unreadable without migration / re-encryption.

Bootstrap Source Mode

--bootstrap-force-config / GPROXY_BOOTSTRAP_FORCE_CONFIG is a startup-only switch (CLI/env only, not a gproxy.toml field).

  • default (false or unset):
    • if DB is not initialized, bootstrap from gproxy.toml as usual.
    • if DB is already initialized, prefer DB state and skip config-file channel/provider import.
    • startup admin_key override is still honored.
  • true:
    • force apply config-file channels/settings/credentials/global values on boot.
    • this mode is useful when you intentionally want config file to overwrite existing DB bootstrap state.

API Overview

All errors return:

{ "error": "..." }

Auth Headers

  • POST /login uses JSON body { "name": "...", "password": "..." } and returns api_key
  • Admin/User APIs (except /login): use x-api-key
  • Provider APIs also accept:
    • x-api-key
    • x-goog-api-key
    • Authorization: Bearer ...
    • Gemini query key ?key=... (normalized into x-api-key)

Provider Routes

1) Scoped (recommended)

Provider is explicit in path, examples:

  • POST /openai/v1/chat/completions
  • POST /anthropic/v1/messages
  • POST /aistudio/v1beta/models/{model}:generateContent

2) Unscoped (single unified entry)

Provider is resolved from model prefix:

  • POST /v1/chat/completions
  • POST /v1/responses
  • POST /v1/messages
  • GET /v1/models
  • GET /v1/models/{provider}/{model}

Constraints:

  • For OpenAI/Claude-style request bodies, model must be <provider>/<model>, for example openai/gpt-4.1.
  • For Gemini target paths, provider must be included, for example models/aistudio/gemini-2.5-flash:generateContent.

OAuth and Upstream Usage

  • GET /{provider}/v1/oauth
  • GET /{provider}/v1/oauth/callback
  • GET /{provider}/v1/usage

OAuth-capable channels: codex, claudecode, geminicli, antigravity

Admin APIs (/admin/*)

Main groups:

  • Global settings: /admin/global-settings, /admin/global-settings/upsert
  • Config export/import: /admin/config/export-toml, /admin/config/import-toml
  • Self update: /admin/system/self_update
  • Providers/Credentials/CredentialStatuses: query/upsert/delete
  • Users: query/upsert/delete (/admin/users/upsert requires password)
  • UserKeys: query/generate/delete
  • Requests: /admin/requests/upstream/query, /admin/requests/downstream/query
  • Usage: /admin/usages/query, /admin/usages/summary

User APIs (/user/*)

  • POST /user/keys/query
  • POST /user/keys/generate
  • POST /user/keys/delete
  • POST /user/usages/query
  • POST /user/usages/summary

Request Examples

Scoped OpenAI Chat

curl -sS http://127.0.0.1:8787/openai/v1/chat/completions \
  -H "x-api-key: <key>" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [{"role":"user","content":"hello"}],
    "stream": false
  }'

Unscoped OpenAI Chat (model-prefixed routing)

curl -sS http://127.0.0.1:8787/v1/chat/completions \
  -H "x-api-key: <key>" \
  -H "content-type: application/json" \
  -d '{
    "model": "openai/gpt-4.1",
    "messages": [{"role":"user","content":"hello"}],
    "stream": false
  }'

Scoped Gemini GenerateContent

curl -sS "http://127.0.0.1:8787/aistudio/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-api-key: <key>" \
  -H "content-type: application/json" \
  -d '{
    "contents":[{"role":"user","parts":[{"text":"hello"}]}]
  }'

Anthropic/ClaudeCode Prompt Cache Quick Check (4 curls)

Make sure both providers have at least one cache_breakpoints rule (for example { target = "top_level", ttl = "auto" }).

BASE="http://127.0.0.1:8787"
KEY="<your x-api-key>"
SYS="$(for i in $(seq 1 1800); do printf 'cache-prefix-%04d ' "$i"; done)"
# 1) Claude first request (cache write)
curl -sS "$BASE/anthropic/v1/messages" \
  -H "x-api-key: $KEY" \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  --data-binary @- <<JSON | jq '.usage'
{
  "model": "claude-neptune-v3",
  "max_tokens": 128,
  "stream": false,
  "system": "$SYS",
  "messages": [
    { "role": "user", "content": "only reply: cache-ok" }
  ]
}
JSON
# 2) Claude second request (cache read)
curl -sS "$BASE/anthropic/v1/messages" \
  -H "x-api-key: $KEY" \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  --data-binary @- <<JSON | jq '.usage'
{
  "model": "claude-neptune-v3",
  "max_tokens": 128,
  "stream": false,
  "system": "$SYS",
  "messages": [
    { "role": "user", "content": "only reply: cache-ok" }
  ]
}
JSON
# 3) ClaudeCode first request (cache write)
curl -sS "$BASE/claudecode/v1/messages" \
  -H "x-api-key: $KEY" \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  --data-binary @- <<JSON | jq '.usage'
{
  "model": "claude-sonnet-4-6",
  "max_tokens": 128,
  "stream": false,
  "system": "$SYS",
  "messages": [
    { "role": "user", "content": "only reply: cache-ok" }
  ]
}
JSON
# 4) ClaudeCode second request (cache read)
curl -sS "$BASE/claudecode/v1/messages" \
  -H "x-api-key: $KEY" \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  --data-binary @- <<JSON | jq '.usage'
{
  "model": "claude-sonnet-4-6",
  "max_tokens": 128,
  "stream": false,
  "system": "$SYS",
  "messages": [
    { "role": "user", "content": "only reply: cache-ok" }
  ]
}
JSON

Architecture

Workspace Layout

Path Responsibility
apps/gproxy Executable service entry (Axum + embedded admin frontend)
crates/gproxy-core AppState, router orchestration, auth, request execution
crates/gproxy-provider channel implementations, retry, OAuth, dispatch, tokenizers
crates/gproxy-middleware protocol transform middleware, usage extraction
crates/gproxy-protocol OpenAI/Claude/Gemini typed protocol models and transforms
crates/gproxy-storage SeaORM storage layer, query models, async write queue
crates/gproxy-admin admin/user domain operations

Runtime Flow

  • On bootstrap:
    • load config and apply CLI/env overrides
    • choose bootstrap source mode (bootstrap_force_config): prefer DB state by default once initialized
    • connect database and sync schema automatically
    • initialize provider registry, credentials, and credential states
    • ensure admin principal (id=0) and admin key exist
  • On request:
    • authenticate user key
    • route + transform/forward according to dispatch table
    • pick eligible credentials and retry/fallback
    • persist upstream/downstream events and usage records

Credential States and Cooldown

  • healthy: available
  • partial: model-level cooldown
  • dead: unavailable

Default cooldowns:

  • rate limit: 60s
  • transient failure: 15s

Testing

Provider smoke/regression scripts:

  • tests/provider/curl_provider.sh
  • tests/provider/run_channel_regression.sh

Examples:

API_KEY='<key>' tests/provider/curl_provider.sh \
  --provider openai \
  --method openai_chat \
  --model gpt-4.1
API_KEY='<key>' tests/provider/run_channel_regression.sh \
  --provider openai \
  --model gpt-5-nano \
  --embedding-model text-embedding-3-small

Common Issues

1) 401 unauthorized

  • Ensure x-api-key is provided for key-protected routes, and both the key and its owner user are enabled.
  • If you don't have a key yet, call POST /login with username/password first.

2) 403 forbidden on admin routes

  • The key is not owned by the admin user (id=0).

3) 503 all eligible credentials exhausted

  • Check:
    • whether the channel has any available credential
    • whether credential status is dead or currently partial for the target model
    • whether upstream keeps returning 429/5xx

4) model must be prefixed as <provider>/...

  • You called an unscoped route without provider-prefixed model.

5) Realtime WebSocket unavailable

  • /v1/realtime is currently not implemented; use /v1/responses (HTTP) instead.

Security Notes

  • Set a strong admin_key in production.
  • Keep mask_sensitive_info = true unless you explicitly need full payload visibility for debugging.
  • If you configure an outbound proxy, ensure the proxy path is trusted and access-controlled.

Data and Directories

By default:

  • data dir: ./data
  • default DB: sqlite://./data/gproxy.db?mode=rwc
  • tokenizer cache: ./data/tokenizers

gproxy-storage supports sqlite / mysql / postgres via dsn.

Development Commands

# backend format/lint/check
cargo fmt
cargo check
cargo clippy --workspace --all-targets

# tests
cargo test --workspace

# run service
cargo run -p gproxy

Frontend:

cd apps/gproxy/frontend
pnpm install
pnpm typecheck
pnpm build

License

This project is licensed under AGPL-3.0-or-later (see LICENSE).

Star History

Star History Chart

About

gproxy is a Rust-based multi-channel LLM proxy that exposes OpenAI / Claude / Gemini-style APIs through a unified gateway, with a built-in admin console, user/key management, and request/usage auditing.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors