This is a Go implementation of an Ollama-compatible proxy server that connects to a Lite LLM backend. The proxy server allows JetBrains AI Assistant to connect to an internal LLM server via Ollama.
- Implements Ollama-compatible API endpoints:
/api/generatefor text generation/api/chatfor chat-based interactions/api/tagsto list available models
- Proxies requests to a Lite LLM backend
- Supports streaming responses
- Handles authentication with API keys
- Configurable base URL for the Lite LLM server
- Comprehensive logging system for monitoring and debugging
- Go 1.20 or later
- Clone the repository:
git clone <repository-url>
cd ollamaproxy- Build the application:
go build -o ollamaproxyRun the proxy server with the following command:
./ollamaproxy --port=11434 --api-key=xxxxxx --base-url=https://xxxxxx--base-url: (Required) Base URL for the Lite LLM API--api-key: (Optional) API key for the Lite LLM API--port: (Optional) Port to run the server on (default: 11434)--host: (Optional) Host to run the server on (default: 0.0.0.0)
POST /api/generate
Example request:
{
"model": "model-name",
"prompt": "Hello, how are you?",
"stream": true,
"system": "You are a helpful assistant."
}POST /api/chat
Example request:
{
"model": "model-name",
"prompt": "Hello, how are you?",
"stream": true,
"system": "You are a helpful assistant."
}GET /api/tags
The proxy server includes a streamlined logging system focused on access logs only, minimizing noise while providing essential information for monitoring and troubleshooting.
The server generates access logs in a standard format for every HTTP request. These access logs provide a concise record of all HTTP traffic and include:
- Client IP address
- HTTP method (GET, POST, etc.)
- Request path
- HTTP protocol version
- Response status code
- Response size in bytes
- Request processing duration
Access logs follow this format:
ACCESS: [timestamp] client_ip "method path protocol" status_code content_length duration
Example:
ACCESS: 2023/04/15 12:34:56 127.0.0.1 "POST /api/generate HTTP/1.1" 200 1024 45.6ms
This focused logging system is valuable for:
- Monitoring server activity and traffic patterns
- Performance analysis and optimization
- Security auditing and compliance
- Traffic analysis and capacity planning
All logs are output to stdout by default.
- This implementation focuses on the core functionality needed to connect JetBrains AI Assistant to your internal LLM server.
- You may need to adjust the request/response format conversion based on the specific implementation of your Lite LLM server.
- Error handling is included but may need to be enhanced for production use.
- Additional Ollama endpoints can be added as needed.