Support mixed managed and runner-native tool-call responses with ordered execution and agent feedback

## Summary

When a model returns both managed and runner-native tool calls in the same response, `cllama` currently fail-closes with:

```text
mixed managed and runner-native tool calls are not supported in one model response
```

That is operationally too brittle now that additive managed tools are normal. Agents naturally emit both classes in one response, their turn aborts, and they are not told how to recover.

## Current behavior

Today `cllama/internal/proxy/toolmediation.go` partitions the response into managed vs runner-native calls and immediately returns `502` if both are present in the same model response.

This means:

- the turn stops at the proxy boundary
- the runner never receives an actionable instruction about how to retry
- operators have to prompt around proxy internals
- a normal plan like "call service tool, then local file/shell/search tool" fails if the model batches both into one response

## Desired behavior

We need two improvements:

1. **Recovery feedback**
   - If mixed ownership still cannot be executed safely, the agent/runner should receive an explicit response telling it to split the calls into separate responses in a safe order.
   - The failure should be visible in audit/telemetry instead of looking like a silent stop.

2. **Transparent execution when safe**
   - Support mixed managed + runner-native tool-call responses without requiring agents to understand proxy ownership internals.
   - Prefer an ordering that preserves semantics and transcript continuity.

## Proposed direction

Treat the mixed response as an ordered sequence rather than an invalid set when it can be reduced safely.

### OpenAI / Anthropic

- Parse tool calls in the exact order emitted by the model.
- Execute the maximal leading run of managed tool calls inside `cllama`.
- If the first runner-native call appears after one or more managed calls:
  - append the managed tool results into the hidden transcript as usual
  - return a runner-visible response containing only the remaining runner-native tool calls in original order
  - persist continuity so the upstream model sees the hidden managed rounds before the runner-native call on the follow-up request
- If a runner-native call appears before a later managed call in the same response, do **not** guess by reordering. Fail closed, but return a structured message that instructs the agent to retry with managed calls first and runner-native calls in a later response.

This keeps the proxy from silently reordering the model's plan while still making the common "managed first, native second" pattern work.

## Constraints

- No silent semantic reordering.
- Maintain current managed-only mediation behavior.
- Maintain current native-only pass-through behavior.
- Preserve hidden continuity and session-history `tool_trace` behavior for managed rounds only.
- Support both OpenAI-compatible and Anthropic request paths.
- Streaming re-synthesis for runner-visible native tool-call responses must keep working.

## Acceptance criteria

- Mixed responses where managed calls come first and runner-native calls follow no longer hard-fail.
- Mixed responses where native calls precede later managed calls still fail closed, but the returned error explicitly tells the agent to split the actions into separate responses and to place managed calls before runner-native calls.
- Audit/session-history shows a clear managed mediation failure message when the proxy refuses an unsafe mixed order.
- Regression tests cover OpenAI and Anthropic ordered-mix success and unsafe-order failure.

## Likely files

- `cllama/internal/proxy/toolmediation.go`
- `cllama/internal/proxy/handler_test.go`
- `cllama/internal/proxy/managedcontinuity*.go`
- docs under `site/guide/tools.md` and `site/changelog.md` if behavior changes land on `master`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support mixed managed and runner-native tool-call responses with ordered execution and agent feedback #165

Summary

Current behavior

Desired behavior

Proposed direction

OpenAI / Anthropic

Constraints

Acceptance criteria

Likely files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support mixed managed and runner-native tool-call responses with ordered execution and agent feedback #165

Description

Summary

Current behavior

Desired behavior

Proposed direction

OpenAI / Anthropic

Constraints

Acceptance criteria

Likely files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions