Spike: multi-model rollcall matrix for cllama ingress surfaces by mostlydev · Pull Request #139 · mostlydev/clawdapus

mostlydev · 2026-04-10T20:33:44Z

Summary

Expand TestSpikeRollCall into a per-runtime config matrix that exercises both canonical cllama ingress surfaces (/v1/messages and /v1/chat/completions) in a single run.
Add a dedicated openclaw + google/gemini variant as the end-to-end regression test for the original Fix OpenClaw Google model API when routing through cllama #127 incident that triggered ADR-023.
Add an ingress_surface_coverage subtest that asserts both surfaces were actually hit.
Gate pc-roll behind CLAW_SPIKE_ENABLE_PICOCLAW until picoclaw upstream gateway port=0 breaks spike test #137 (pre-existing upstream picoclaw port=0 breakage, reproduces on master) is fixed.

Why

Follow-up to #136. The shared cllama ingress surface contract landed in that PR, but the rollcall spike was still picking a single global (format, model) pair — so both ingress surfaces were rarely exercised in one run, and there was no direct end-to-end regression test for the Gemini-behind-openclaw bug from #127. This PR closes that gap.

How the matrix is structured

Each matrix entry pins:

a model (modelOverride for real runners → x-claw.models.primary; proxyModel for stubs → ROLLCALL_CLLAMA_MODEL + matching policy override)
an expectedSurface that the final coverage subtest asserts has been hit

Models used across the matrix:

anthropic/claude-sonnet-4-6 (native Anthropic → /v1/messages)
google/gemini-2.5-flash (Gemini via cllama → /v1/chat/completions, the Fix OpenClaw Google model API when routing through cllama #127 regression)
openrouter/anthropic/claude-sonnet-4 (OpenAI-wire via OpenRouter)

Tested

go test ./... and go vet -tags spike ./cmd/claw/... pass.
Full TestSpikeRollCall green with 9 matrix entries + 2 wrap-up subtests (picoclaw skipped per picoclaw upstream gateway port=0 breaks spike test #137).
ingress_surface_coverage reports [anthropic-messages openai-chat-completions].

Closes nothing on its own; it's the spike-level regression protection for ADR-023 and #127.

…faces Refactor TestSpikeRollCall around a per-runtime config matrix so the spike exercises both `/v1/messages` and `/v1/chat/completions` in a single run, and add a dedicated openclaw + google/gemini variant as the end-to-end regression test for the original #127 incident that motivated ADR-023. Each matrix entry pins a model (real runners override `x-claw.models.primary` so cllama's policy actually allows it; stubs set both `ROLLCALL_CLLAMA_MODEL` and the same policy override) and an `expectedSurface` that a final `ingress_surface_coverage` subtest asserts both canonical cllama surfaces have been hit. `session_history_persistence` now skips matrix entries whose required keys were missing so partial-coverage runs do not false-fail. `pc-roll` is gated behind `CLAW_SPIKE_ENABLE_PICOCLAW` while #137 is open — the picoclaw gateway port=0 failure reproduces on master and is unrelated to this change. The capability-wave-live spike grows its own `capabilityWaveProxyRequest` helper since the old shared single-model picker was deleted.

mostlydev merged commit c27c29d into master Apr 10, 2026

mostlydev deleted the issue-134-cllama-ingress-surfaces branch April 10, 2026 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: multi-model rollcall matrix for cllama ingress surfaces#139

Spike: multi-model rollcall matrix for cllama ingress surfaces#139
mostlydev merged 1 commit intomasterfrom
issue-134-cllama-ingress-surfaces

mostlydev commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mostlydev commented Apr 10, 2026

Summary

Why

How the matrix is structured

Tested

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant