OpenRouter.jl is an unofficial, modern Julia wrapper around the OpenRouter REST API.
It focuses on:
- Streaming-first inference
- Always-fresh model metadata from OpenRouter
- Provider-aware cost & token accounting (including cache + discounts)
- A single, simple entrypoint function:
aigen - A modular core that is easy to extend from other packages
It does not try to be a full “AI framework” – it is a thin, opinionated layer over the REST API that makes it pleasant to use OpenRouter-hosted models from Julia.
The library is in active use at TodoFor.ai, so it will be maintained and extended over time.
Add the package from the Julia REPL:
] add OpenRouterTo use the package you need OpenRouter API key:
-
Create an API key at https://openrouter.ai/settings/keys
-
Set the
OPENROUTER_API_KEYenvironment variable, e.g.:export OPENROUTER_API_KEY="sk-or-..."
This is required to download the list of models and providers available. These APIs are free. The models you want to use might require other API_KEYs to access provider endpoints, ex: openai models will require OPENAI_API_KEY.
You can always inspect the raw OpenRouter API with curl:
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" https://openrouter.ai/api/v1/providers | jq '.data[0:3]' 2>/dev/null
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" https://openrouter.ai/api/v1/models | jq '.data[0:3]'
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" https://openrouter.ai/api/v1/models | jq '.data[0:10] | map({id, name, context_length, top_provider: {context_length: .top_provider.context_length, max_completion_tokens: .top_provider.max_completion_tokens}})'
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" https://openrouter.ai/api/v1/providers | jq '.data | map({name, slug})'
curl -s -H "Authorization: Bearer $OPENROUTER_API_KEY" https://openrouter.ai/api/v1/models/moonshotai/kimi-k2-thinking/endpoints | jq '.'The library provides typed equivalents:
using OpenRouter
# 1. Endpoints / providers for a specific model
raw_ep_json = list_endpoints_raw("moonshotai/kimi-k2-thinking")
endpoints = list_endpoints("moonshotai/kimi-k2-thinking") # -> ModelProviders
# 2. All models
raw_models_json = list_models_raw()
models = parse_models(raw_models_json) # or simply:
models2 = list_models() # convenience wrapper
# 3. Filtered by provider (e.g. only Groq-hosted models)
groq_models = list_models("groq")
# 4. Known provider slugs
known = list_known_providers()
# 5. All endpoints hosted by a provider
groq_eps = list_provider_endpoints("groq")For day-to-day usage you typically combine these with the cache layer, which automatically keeps you in sync with changes on OpenRouter:
# One-shot sync with OpenRouter (models + endpoints)
update_db(; fetch_endpoints = true)
# Explore locally from the cache
cached_models = list_cached_models()
search_models("llama")OpenRouter.jl does not introduce its own model aliases.
Instead, models are always referred to explicitly as:
provider:author/model_id
provider– the upstream provider slug (e.g.openai,groq,anthropic,google-ai-studio,cerebras,moonshotai, …)author– the model author / namespace (often the same as the provider, e.g.openai/gpt-5.1)model_id– the concrete model identifier on OpenRouter
Examples:
openai:openai/gpt-5.1
groq:moonshotai/kimi-k2-0905
anthropic:anthropic/claude-haiku-4.5
google-ai-studio:google/gemini-2.5-flash
cerebras:openai/gpt-oss-120b
This explicit style avoids hidden indirections and keeps logs, configs, and code aligned with OpenRouter’s own model catalog.
PromptingTools.jl is a higher-level framework that covers prompting patterns, tooling, and multiple providers.
OpenRouter.jl has a narrower, complementary goal:
- It is a cleaned up, modular REST wrapper for OpenRouter:
- No internal prompt templating
- No “one true” workflow
- Minimal public surface (
aigen,aigen_raw, model / cost utilities)
- It aims for full modularity:
- You can layer your own prompting, RAG, orchestration, or logging on top
- Third-party packages can extend it (e.g. alternative streaming backends, custom providers, extra metrics) without fighting the core design
- There is no default system message – you are always in full control of
sys_msg
- It focuses on OpenRouter’s model catalog, pricing, and provider quirks:
- Automatic model metadata and endpoint discovery
- Provider-specific model ID mapping
- Token + cost accounting that understands caching and discounts
In short: PromptingTools is an opinionated “batteries included” toolkit; OpenRouter.jl is a minimal, extensible building block.
using OpenRouter
long_text = "Some long user-specific context..."
# Same task across multiple providers / models
response = aigen("$long_text Count to 1-10 in 1 line:", "anthropic:anthropic/claude-haiku-4.5")
response = aigen("$long_text Count to 1-10 in 1 line:", "openai:openai/gpt-5.1")
response = aigen("$long_text Count to 1-10 in 1 line:", "google-ai-studio:google/gemini-2.5-flash")
response = aigen("$long_text Count to 1-10 in 1 line:", "groq:moonshotai/kimi-k2-0905")
response = aigen("$long_text Count to 1-10 in 1 line:", "cerebras:openai/gpt-oss-120b")A more basic example:
using OpenRouter
# Basic call
msg = aigen(
"Write a short introduction to Julia for Python users.",
"openai:openai/gpt-5.1";
sys_msg = "You are a concise technical writer.", # no hidden default system prompt
)
println(msg.content)
println("Tokens: ", msg.tokens)
println("Cost (USD): ", msg.cost)
println("Elapsed (s): ", msg.elapsed)Streaming swaps a single HTTP request for a streaming HTTP request; the rest of the pipeline is shared.
The built-in streaming callback is HttpStreamCallback, which is the drop in replacement of a “StreamCallback” for HTTP-based streaming in this package.
using OpenRouter
callback = HttpStreamCallback(; out = stdout)
msg = aigen(
"Count to 20, one number per line.",
"cerebras:openai/gpt-oss-120b";
streamcallback = callback,
)
println()
println("Final message: ", msg.content)
println("Tokens: ", msg.tokens)
println("Cost: ", msg.cost)Under the hood:
aigencallsaigen_raw_aigen_corebranches only at the HTTP layer:- non-streaming ⇒
HTTP.post - streaming ⇒
streamed_request!+ SSE parsing, then reconstructs a POST-like response (same shape as a non-streaming call)
- non-streaming ⇒
This design keeps streaming and non-streaming behavior as similar as possible from the caller’s perspective, while still allowing custom stream handlers via the AbstractLLMStream interface.
OpenRouter.jl normalizes provider-specific usage fields into a single struct:
TokenCounts(
prompt_tokens::Int,
completion_tokens::Int,
total_tokens::Int,
input_cache_read::Int,
input_cache_write::Int,
internal_reasoning::Int,
input_audio_cache::Int,
)TokenCounts is opinionated but tries hard to be universal: it chooses a small, consistent set of fields and maps provider-specific names into them.
Extraction is schema-specific:
- OpenAI-compatible (
ChatCompletionSchema) – readsusage.prompt_tokens,usage.completion_tokens, and (if present) cached tokens - Anthropic (
AnthropicSchema) – mapsinput_tokens,output_tokens,cache_read_input_tokens,cache_creation_input_tokens, … - Gemini (
GeminiSchema) – mapspromptTokenCount,candidatesTokenCount,thoughtsTokenCount, …
Pricing is modeled via:
Pricing– per-token and per-request prices (often in USD / tokens), including:prompt,completioninput_cache_read,input_cache_writeinternal_reasoning,input_audio_cachediscount(global discount factor when present)
ProviderEndpoint– combinesPricingwith per-endpoint metadata (context length, parameters, status, etc.)
Cost is computed as:
cost = calculate_cost(endpoint::ProviderEndpoint, tokens::TokenCounts)and exposed on AIMessage.cost.
Cache-related and reasoning-related prices, as well as discounts, are first-class in this calculation – they are not bolted on later.
A subtle but important OpenRouter quirk:
- OpenRouter exposes its own model IDs (e.g.
openai/gpt-5.1) - Individual providers (OpenAI, Anthropic, Groq, Cerebras, xAI, …) sometimes expose different IDs or slightly different naming/versioning for the same underlying model
Because of this, OpenRouter.jl includes a provider mapping layer in src/model_mapping.jl:
openai_model_transformanthropic_model_transformgroq_model_transformcerebras_model_transformxai_model_transform- others…
These functions:
- Strip provider prefixes where needed (e.g.
openai/gpt-4o→gpt-4o) - Normalize versioning where required (e.g. Anthropic’s dated model IDs)
- Map special or deprecated IDs into currently supported upstream IDs
- Are intended to be automatically updated by tooling / AI assistance as new models and edge cases appear
This allows you to use one consistent OpenRouter-flavored model ID and still transparently call the native provider APIs.
OpenRouter.jl keeps a lightweight local cache (via Scratch.jl):
update_db()– refresh OpenRouter models (and optionally endpoints); automatically keeps you in sync with newly added or updated modelsget_model(id; fetch_endpoints = false)– look up a single cached model, optionally ensuring endpoints are fetchedlist_cached_models()– list all cachedOpenRouterModelssearch_models(query)– simple substring search in cached model IDs & names
You can also define custom models for local experimentation:
using OpenRouter
add_custom_model(
"echo/100tps",
"Echo 100 TPS",
"Fast echo model for testing",
8192,
)
add_model( # alias form, same idea
"ollama/llama3",
"Local Llama 3",
"Self-hosted Llama 3",
4096,
)These are purely local to your environment and can be used for testing alongside real OpenRouter-backed models.
Streaming is currently implemented using HTTP.jl and Server-Sent Events (SSE):
HttpStreamCallbackhandles incremental chunks, printing or processing as they arrivestreamed_request!wraps the HTTP streaming loop, reconstructs a POST-like response body, and keeps the API compatible with non-streaming calls
The design goal is to keep streaming pluggable:
- The streaming abstraction (
AbstractLLMStream+ callbacks + schema-specific chunk parsing) is part of the package today - It is intentionally written so that alternative transports (e.g.
LibCurl.jlor other HTTP clients) can be dropped in with minimal changes - Over time, this layer may be factorized into a small, independent streaming library if that proves useful
The library aims to handle modern OpenRouter + provider features as they appear, with a bias for:
- Correct streaming behavior across providers
- Accurate pricing / token accounting, including caching & discounts
- Robust handling of new model naming schemes via
model_mapping.jl
Current limitations / intentional non-goals:
- No high-level embeddings API yet (even though embedding models are parsed and represented)
- No built-in prompt templating (keep that in separate, opinionated packages)
Planned / TODO items include:
- Ollama integration for local models
- Echo models for robust testing of client behavior
- Embedding APIs (on top of the already present
OpenRouterEmbeddingModeltypes) - Extended reasoning support as more providers expose explicit reasoning / thinking token accounting
- Licensed under the same terms as this repository (see
LICENSE) - Unofficial, but actively used and maintained by TodoFor.ai
- Issues and PRs are welcome on GitHub