Spike: multi-model rollcall matrix for cllama ingress surfaces#139
Merged
Spike: multi-model rollcall matrix for cllama ingress surfaces#139
Conversation
…faces Refactor TestSpikeRollCall around a per-runtime config matrix so the spike exercises both `/v1/messages` and `/v1/chat/completions` in a single run, and add a dedicated openclaw + google/gemini variant as the end-to-end regression test for the original #127 incident that motivated ADR-023. Each matrix entry pins a model (real runners override `x-claw.models.primary` so cllama's policy actually allows it; stubs set both `ROLLCALL_CLLAMA_MODEL` and the same policy override) and an `expectedSurface` that a final `ingress_surface_coverage` subtest asserts both canonical cllama surfaces have been hit. `session_history_persistence` now skips matrix entries whose required keys were missing so partial-coverage runs do not false-fail. `pc-roll` is gated behind `CLAW_SPIKE_ENABLE_PICOCLAW` while #137 is open — the picoclaw gateway port=0 failure reproduces on master and is unrelated to this change. The capability-wave-live spike grows its own `capabilityWaveProxyRequest` helper since the old shared single-model picker was deleted.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TestSpikeRollCallinto a per-runtime config matrix that exercises both canonical cllama ingress surfaces (/v1/messagesand/v1/chat/completions) in a single run.openclaw + google/geminivariant as the end-to-end regression test for the original Fix OpenClaw Google model API when routing through cllama #127 incident that triggered ADR-023.ingress_surface_coveragesubtest that asserts both surfaces were actually hit.pc-rollbehindCLAW_SPIKE_ENABLE_PICOCLAWuntil picoclaw upstream gateway port=0 breaks spike test #137 (pre-existing upstream picoclaw port=0 breakage, reproduces on master) is fixed.Why
Follow-up to #136. The shared cllama ingress surface contract landed in that PR, but the rollcall spike was still picking a single global
(format, model)pair — so both ingress surfaces were rarely exercised in one run, and there was no direct end-to-end regression test for the Gemini-behind-openclaw bug from #127. This PR closes that gap.How the matrix is structured
Each matrix entry pins:
modelOverridefor real runners →x-claw.models.primary;proxyModelfor stubs →ROLLCALL_CLLAMA_MODEL+ matching policy override)expectedSurfacethat the final coverage subtest asserts has been hitModels used across the matrix:
anthropic/claude-sonnet-4-6(native Anthropic →/v1/messages)google/gemini-2.5-flash(Gemini via cllama →/v1/chat/completions, the Fix OpenClaw Google model API when routing through cllama #127 regression)openrouter/anthropic/claude-sonnet-4(OpenAI-wire via OpenRouter)Tested
go test ./...andgo vet -tags spike ./cmd/claw/...pass.TestSpikeRollCallgreen with 9 matrix entries + 2 wrap-up subtests (picoclaw skipped per picoclaw upstream gateway port=0 breaks spike test #137).ingress_surface_coveragereports[anthropic-messages openai-chat-completions].Closes nothing on its own; it's the spike-level regression protection for ADR-023 and #127.