Skip to content

Releases: mostlydev/cllama

v0.3.6 — managed-prefix mixed tool serialization

16 Apr 19:00
270976e

Choose a tag to compare

Highlights

  • Managed-prefix mixed tool batches now serialize instead of hard-failing — when a single mediated model response contains a managed tool prefix followed by runner-native tool calls, the proxy occludes the runner-native suffix, executes the managed prefix internally, feeds those results back upstream, and waits for the model to re-emit any runner-native step cleanly in a later response. Native-first or interleaved mixed batches still fail closed, but the returned proxy error now explicitly tells the agent to emit managed service tools first and runner-native tools in a later response. Both OpenAI-compatible and Anthropic paths have regression coverage.
  • New intervention marker — the serialized path emits a managed_prefix_native_suffix_serialized intervention log so operators can observe when the proxy reordered a turn rather than failing it.

Notes

  • Hidden continuity transcripts now persist only the executed managed prefix; the model's original mixed plan is not replayed back. This is intentional (the rerun asks the model for a clean native step) but means session-history dumps will not show the dropped runner-native suffix.

v0.3.5 — native tool handoff after managed rounds

14 Apr 18:07

Choose a tag to compare

Highlights

  • Native tool handoff after managed rounds — when a managed tool mediation round completes, the proxy now lets the runner's next response use runner-native tools directly without re-injecting the managed tool surface. This cleanly hands control back to the runner after managed tools have done their work, so OpenClaw and other drivers can drive a native tool call as the natural next step of the same turn. This builds on the v0.3.4 additive mediation contract.

v0.3.4 — additive managed tool mediation

14 Apr 01:56
0a7bb94

Choose a tag to compare

Highlights

  • Managed tool mediation is now additive (#7) — when an upstream runner already declares its own tools, cllama no longer overwrites them on its way out to the model. Compiled managed tools are appended to the runner's outbound tools[] (OpenAI format) or Anthropic tools array, so OpenClaw and other drivers keep their native tool surface even when managed mediation is active. Runner-native tool calls in the model's response pass straight back to the runner unchanged, managed tool calls are still executed inside cllama as before, and a response that mixes both fail-closes with a precise error rather than silently dropping or replacing tools. OpenAI and Anthropic surfaces have parity coverage. This unblocks pods that need both compiled managed tools and runner-native tool access from the same agent in the same session.

v0.3.3 — native Google Gemini provider support

09 Apr 01:07

Choose a tag to compare

Highlights

  • Native google provider — adds first-class Google Gemini provider support to the cllama provider registry. Operators can now route models through google/<model> refs (e.g. google/gemini-2.5-flash) using a native GEMINI_API_KEY (with GOOGLE_API_KEY accepted as a lower-priority alias) instead of going through OpenRouter. Default API format is OpenAI-compatible against the Gemini OpenAI endpoint.
  • Gemini cost tracking — pricing entries for Gemini 2.5 Flash and Gemini 2.5 Pro are now wired into the cost telemetry path so direct-Google routing carries accurate per-token accounting alongside the existing OpenRouter route.

Companion to mostlydev/clawdapus#119.

v0.3.2 — Anthropic prompt cache fix

08 Apr 19:31

Choose a tag to compare

Fixes

  • Cache-friendly prefix ordering for feed injection — Feeds and timestamps are now appended after the system prompt instead of prepended before it. Anthropic prompt caching is prefix-matched with a 5-min TTL, so prepending dynamic content was invalidating the cache on every request. A 3-agent pod was paying ~$60/week in unnecessary cache_creation costs; this fix eliminates that. Fixes mostlydev/clawdapus#122.

v0.3.1 — managed tool manifest state observability

05 Apr 19:11

Choose a tag to compare

Highlights

  • log managed tool manifest state on every request: proxy telemetry now emits manifest_present (bool) and tools_count (int) so operators can verify at runtime whether a per-agent tool manifest was loaded and how many tools it contained

This closes an observability gap that made it hard to diagnose cases where compiled tools.json existed on disk but tools were not being injected into upstream requests — there was no runtime signal telling operators which agents had live tool manifests.

Artifacts

  • container image: `ghcr.io/mostlydev/cllama:v0.3.1`
  • rolling tag: `ghcr.io/mostlydev/cllama:latest`

Validation

  • `go test ./...`

v0.3.0 — managed tool mediation + memory plane + scoped history API

05 Apr 19:10

Choose a tag to compare

Highlights

Managed tool mediation

  • load and inject compiled tools.json manifests into upstream LLM requests
  • execute managed tools via HTTP against declared services (OpenAI-compatible format)
  • Anthropic-format tool mediation (parallel path to OpenAI)
  • cross-turn continuity: replay hidden tool rounds into subsequent upstream requests so the LLM sees the transcript that produced each runner-visible reply
  • re-stream final text as synthetic SSE after mediated loops complete; keepalive comments prevent runner timeouts during long loops
  • budget limits: max rounds, per-tool timeout, total timeout, result size truncation with explicit truncated: true flag
  • body_key execution: wrap tool arguments as {body_key: args} when declared in the tool descriptor
  • sanitize managed tool names for provider compatibility

Memory plane

  • pre-turn recall and post-turn best-effort retain hooks
  • memory_op telemetry events with recall/retain outcome, latency, block count, injected bytes, policy-removal counts
  • secret-shaped value scrubbing on both retain payloads and recalled blocks
  • tightened memory recall auth and history auth handling

Session history API

  • scoped history read API for agents querying their own transcripts
  • dedicated replay auth tokens separate from agent bearer tokens
  • stable per-entry IDs
  • index replay for after queries (no full-rescan)

Provider fixes

  • xAI env seeding regression coverage

Artifacts

  • container image: ghcr.io/mostlydev/cllama:v0.3.0
  • rolling tag: ghcr.io/mostlydev/cllama:latest

Validation

  • go test ./...

v0.2.5

28 Mar 00:20
2ff644e

Choose a tag to compare

Highlights

  • enforce declared per-agent model policy in cllama
  • normalize runner model requests against the compiled allowlist
  • restrict provider failover to the pod-declared fallback chain
  • add xAI routing/policy fixes needed for direct xai/... model refs

Artifacts

  • container image: ghcr.io/mostlydev/cllama:v0.2.5
  • rolling tag: ghcr.io/mostlydev/cllama:latest

Validation

  • go test ./...

v0.2.3

26 Mar 02:33

Choose a tag to compare

Changes

  • Unpriced request tracking: requests where the upstream provider returns no cost data are now counted separately as unpriced_requests in the cost API response and surfaced in the dashboard UI
  • Reported cost passthrough: CostInfo.CostUSD is now *float64 (nil = unpriced, not zero); provider-reported cost fields are propagated through the proxy
  • Timezone context: time_context.go injects timezone-aware current time for agents that declare a TZ environment variable
  • Dashboard: total_requests and unpriced_requests exposed in the costs API endpoint

v0.2.2 — provider token pool + runtime provider add

25 Mar 02:15
b20e7e1

Choose a tag to compare

What's new

  • Provider token pool: Multi-key pool per provider with states ready/cooldown/dead/disabled. Proxy retries across keys on 401/429/5xx with failure classification and Retry-After support.
  • Runtime provider add: POST /providers/add UI route — add a new provider (name, base URL, auth type, API key) at runtime with no restart. Persists to .claw-auth/providers.json with source: runtime.
  • ProviderState.Source: New field (seed/runtime) survives JSON round-trips.
  • UI bearer auth: All routes gated by CLLAMA_UI_TOKEN when configured.
  • Key management routes: POST /keys/add and POST /keys/delete.
  • Webhook alerts: CLLAMA_ALERT_WEBHOOKS and CLLAMA_ALERT_MENTIONS for pool events.