Skip to content

Spike: multi-model rollcall matrix for cllama ingress surfaces#139

Merged
mostlydev merged 1 commit intomasterfrom
issue-134-cllama-ingress-surfaces
Apr 10, 2026
Merged

Spike: multi-model rollcall matrix for cllama ingress surfaces#139
mostlydev merged 1 commit intomasterfrom
issue-134-cllama-ingress-surfaces

Conversation

@mostlydev
Copy link
Copy Markdown
Owner

Summary

  • Expand TestSpikeRollCall into a per-runtime config matrix that exercises both canonical cllama ingress surfaces (/v1/messages and /v1/chat/completions) in a single run.
  • Add a dedicated openclaw + google/gemini variant as the end-to-end regression test for the original Fix OpenClaw Google model API when routing through cllama #127 incident that triggered ADR-023.
  • Add an ingress_surface_coverage subtest that asserts both surfaces were actually hit.
  • Gate pc-roll behind CLAW_SPIKE_ENABLE_PICOCLAW until picoclaw upstream gateway port=0 breaks spike test #137 (pre-existing upstream picoclaw port=0 breakage, reproduces on master) is fixed.

Why

Follow-up to #136. The shared cllama ingress surface contract landed in that PR, but the rollcall spike was still picking a single global (format, model) pair — so both ingress surfaces were rarely exercised in one run, and there was no direct end-to-end regression test for the Gemini-behind-openclaw bug from #127. This PR closes that gap.

How the matrix is structured

Each matrix entry pins:

  • a model (modelOverride for real runners → x-claw.models.primary; proxyModel for stubs → ROLLCALL_CLLAMA_MODEL + matching policy override)
  • an expectedSurface that the final coverage subtest asserts has been hit

Models used across the matrix:

Tested

  • go test ./... and go vet -tags spike ./cmd/claw/... pass.
  • Full TestSpikeRollCall green with 9 matrix entries + 2 wrap-up subtests (picoclaw skipped per picoclaw upstream gateway port=0 breaks spike test #137).
  • ingress_surface_coverage reports [anthropic-messages openai-chat-completions].

Closes nothing on its own; it's the spike-level regression protection for ADR-023 and #127.

…faces

Refactor TestSpikeRollCall around a per-runtime config matrix so the spike
exercises both `/v1/messages` and `/v1/chat/completions` in a single run, and
add a dedicated openclaw + google/gemini variant as the end-to-end regression
test for the original #127 incident that motivated ADR-023.

Each matrix entry pins a model (real runners override `x-claw.models.primary`
so cllama's policy actually allows it; stubs set both `ROLLCALL_CLLAMA_MODEL`
and the same policy override) and an `expectedSurface` that a final
`ingress_surface_coverage` subtest asserts both canonical cllama surfaces have
been hit. `session_history_persistence` now skips matrix entries whose
required keys were missing so partial-coverage runs do not false-fail.

`pc-roll` is gated behind `CLAW_SPIKE_ENABLE_PICOCLAW` while #137 is open —
the picoclaw gateway port=0 failure reproduces on master and is unrelated to
this change. The capability-wave-live spike grows its own
`capabilityWaveProxyRequest` helper since the old shared single-model picker
was deleted.
@mostlydev mostlydev merged commit c27c29d into master Apr 10, 2026
@mostlydev mostlydev deleted the issue-134-cllama-ingress-surfaces branch April 10, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant