Skip to content

TriDefender/jina-embedding-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jina Embedding & Reranker Server

OpenAI-compatible API server for Jina embeddings and reranking models.

Features

  • Embedding: jina-embeddings-v5-text-small (with LoRA task adapters)
  • Reranking: jina-reranker-v3
  • OpenAI-compatible API: Works with existing OpenAI client libraries
  • Task-adaptive embeddings: Switch LoRA adapter per request (retrieval, text-matching, clustering, classification)

Installation

uv sync

Fetch_Models.bat  #This is for windows, it contains two 'hf download' commands, adapt for linux accordingly

Usage

Start the server:

uv run jina_server.py

Or activate the virtual environment:

uv venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
python jina_server.py

API Endpoints

POST /v1/embeddings

Create embeddings for input text. Supports task-adaptive LoRA adapters.

Parameters

Parameter Type Default Description
input string | string[] (required) Text(s) to embed
task string "retrieval" Task adapter: retrieval, text-matching, clustering, classification
prompt_name string null Required when task="retrieval": "query" or "document"
model string "jina-embeddings-v5-text-small" Model name
batch_size int 32 Processing batch size (1–128)

Examples

Retrieval (query side):

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "What is machine learning?", "task": "retrieval", "prompt_name": "query"}'

Retrieval (document side):

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "Machine learning is a field of AI...", "task": "retrieval", "prompt_name": "document"}'

Text matching:

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": ["Hello", "Hi"], "task": "text-matching"}'

Classification:

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "This product is great!", "task": "classification"}'

Clustering:

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": ["neural networks", "monetary policy"], "task": "clustering"}'

POST /v1/rerank

Rerank documents based on query relevance.

curl http://localhost:8000/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is machine learning?",
    "documents": ["Doc 1", "Doc 2"],
    "top_n": 3
  }'

GET /v1/models

List available models.

GET /

Health check endpoint.

Testing

Run the test suite:

uv run test_server.py

WIP:

I'm telling my opencode to implement avx512 for my 9900x rig, brb

License

MIT

About

I rewrote the wheel so you don't have to pay for embed or rerank. The embeddings are quite light weight so should be able run local.

Topics

Resources

Stars

Watchers

Forks

Contributors