Summary
When a model returns both managed and runner-native tool calls in the same response, cllama currently fail-closes with:
mixed managed and runner-native tool calls are not supported in one model response
That is operationally too brittle now that additive managed tools are normal. Agents naturally emit both classes in one response, their turn aborts, and they are not told how to recover.
Current behavior
Today cllama/internal/proxy/toolmediation.go partitions the response into managed vs runner-native calls and immediately returns 502 if both are present in the same model response.
This means:
- the turn stops at the proxy boundary
- the runner never receives an actionable instruction about how to retry
- operators have to prompt around proxy internals
- a normal plan like "call service tool, then local file/shell/search tool" fails if the model batches both into one response
Desired behavior
We need two improvements:
-
Recovery feedback
- If mixed ownership still cannot be executed safely, the agent/runner should receive an explicit response telling it to split the calls into separate responses in a safe order.
- The failure should be visible in audit/telemetry instead of looking like a silent stop.
-
Transparent execution when safe
- Support mixed managed + runner-native tool-call responses without requiring agents to understand proxy ownership internals.
- Prefer an ordering that preserves semantics and transcript continuity.
Proposed direction
Treat the mixed response as an ordered sequence rather than an invalid set when it can be reduced safely.
OpenAI / Anthropic
- Parse tool calls in the exact order emitted by the model.
- Execute the maximal leading run of managed tool calls inside
cllama.
- If the first runner-native call appears after one or more managed calls:
- append the managed tool results into the hidden transcript as usual
- return a runner-visible response containing only the remaining runner-native tool calls in original order
- persist continuity so the upstream model sees the hidden managed rounds before the runner-native call on the follow-up request
- If a runner-native call appears before a later managed call in the same response, do not guess by reordering. Fail closed, but return a structured message that instructs the agent to retry with managed calls first and runner-native calls in a later response.
This keeps the proxy from silently reordering the model's plan while still making the common "managed first, native second" pattern work.
Constraints
- No silent semantic reordering.
- Maintain current managed-only mediation behavior.
- Maintain current native-only pass-through behavior.
- Preserve hidden continuity and session-history
tool_trace behavior for managed rounds only.
- Support both OpenAI-compatible and Anthropic request paths.
- Streaming re-synthesis for runner-visible native tool-call responses must keep working.
Acceptance criteria
- Mixed responses where managed calls come first and runner-native calls follow no longer hard-fail.
- Mixed responses where native calls precede later managed calls still fail closed, but the returned error explicitly tells the agent to split the actions into separate responses and to place managed calls before runner-native calls.
- Audit/session-history shows a clear managed mediation failure message when the proxy refuses an unsafe mixed order.
- Regression tests cover OpenAI and Anthropic ordered-mix success and unsafe-order failure.
Likely files
cllama/internal/proxy/toolmediation.go
cllama/internal/proxy/handler_test.go
cllama/internal/proxy/managedcontinuity*.go
- docs under
site/guide/tools.md and site/changelog.md if behavior changes land on master
Summary
When a model returns both managed and runner-native tool calls in the same response,
cllamacurrently fail-closes with:That is operationally too brittle now that additive managed tools are normal. Agents naturally emit both classes in one response, their turn aborts, and they are not told how to recover.
Current behavior
Today
cllama/internal/proxy/toolmediation.gopartitions the response into managed vs runner-native calls and immediately returns502if both are present in the same model response.This means:
Desired behavior
We need two improvements:
Recovery feedback
Transparent execution when safe
Proposed direction
Treat the mixed response as an ordered sequence rather than an invalid set when it can be reduced safely.
OpenAI / Anthropic
cllama.This keeps the proxy from silently reordering the model's plan while still making the common "managed first, native second" pattern work.
Constraints
tool_tracebehavior for managed rounds only.Acceptance criteria
Likely files
cllama/internal/proxy/toolmediation.gocllama/internal/proxy/handler_test.gocllama/internal/proxy/managedcontinuity*.gosite/guide/tools.mdandsite/changelog.mdif behavior changes land onmaster