docs: Model Pinning for Cost Optimization (#49)#644
docs: Model Pinning for Cost Optimization (#49)#644diberry wants to merge 2 commits intobradygaster:devfrom
Conversation
…er#650) * feat: REPL UX polish — figlet banner, proper-case names, keyword emoji, 80-col wrap, activity hints - Add figlet ASCII art SQUAD banner in header (App.tsx) - Fix agent name casing: SessionRegistry uses case-insensitive keys, preserves display names (sessions.ts) - Add keyword-based role emoji fallback matching in getRoleEmoji (lifecycle.ts) - Add extractAgentHint() to parse coordinator text for specific task descriptions (index.ts) - Cap content width at 80 columns across App.tsx, MessageStream.tsx, AgentPanel.tsx - Remove [system] prefix from system messages in MessageStream - Add slash command passthrough while processing (InputPrompt.tsx) - Pin header to top via Ink Static block, simplified header content - Register agents with proper case names, use display names in activity hints - Update tests: 192 passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: wire REPL telemetry — EventBus bridge, StreamingPipeline, agent metrics, model display Wire all defined OTel instrumentation into the REPL code path: - Pass EventBus to initSquadTelemetry() so EventBus→OTel bridge activates (bradygaster#645) - Instantiate StreamingPipeline and feed message_delta/usage events for token and response latency metrics (TTFT, duration, tokens/sec) (bradygaster#646) - Call recordAgentSpawn/Duration/Error/Destroy from dispatch handlers (bradygaster#647) - Add model field to AgentSession, display model name in AgentPanel (bradygaster#648) - Enable shell metrics when OTel endpoint is configured (no longer requires SQUAD_TELEMETRY=1) (bradygaster#649) - Export RuntimeEventBus from SDK barrel for REPL consumption - Listen for usage events to capture model name and feed to StreamingPipeline - Update tests: replace stale [WORK]/[STREAM] assertions, update banner/header checks for figlet art, update source-code parsing tests for simplified header Closes bradygaster#644, bradygaster#645, bradygaster#646, bradygaster#647, bradygaster#648, bradygaster#649 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new documentation page describing Squad’s “model pinning” system (persistent model preferences and per-agent overrides) and wires it into the docs site navigation and docs build validation.
Changes:
- Added new docs page:
Model Pinning for Cost Optimization(features/model-pinning). - Added navigation entry for the new page under Features.
- Updated docs build test expectations to include the new feature slug.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| docs/src/content/docs/features/model-pinning.md | New documentation page describing persistent model pinning, layers, fallbacks, and examples. |
| docs/src/navigation.ts | Adds “Model Pinning” to the Features navigation section. |
| test/docs-build.test.ts | Adds model-pinning to the expected Features docs list. |
| ```bash | ||
| # Set Sonnet as the default for all agents | ||
| squad config model claude-sonnet-4-5 | ||
|
|
||
| # Pin Scribe to Haiku (cheaper — it only writes docs) | ||
| squad config model claude-haiku-4-5 --agent Scribe | ||
|
|
||
| # Pin Tester to Haiku (cheaper — simple test scaffolding) | ||
| squad config model claude-haiku-4-5 --agent Tester |
There was a problem hiding this comment.
The CLI examples use model IDs like claude-sonnet-4-5 / claude-haiku-4-5, but the CLI validates against MODEL_CATALOG (e.g. claude-sonnet-4.5, claude-haiku-4.5) and will reject the hyphenated form as an unknown model. Update these commands to use the exact catalog IDs (dot version) to ensure the quick start works.
| | Lead / Architect | auto (per-task) | Mixed: code review needs quality, planning needs cost | Architecture proposals → premium; triage → haiku | | ||
| | Prompt Engineer | auto (per-task) | Prompt design is like code, research is not | Prompt architecture → sonnet; research → haiku | | ||
| | SDK Expert | `claude-sonnet-4.6` | Technical analysis that often touches code | Pure research → `claude-haiku-4.5` | | ||
| | Designer / Visual | `claude-opus-4.5` | Vision-capable model required | Never downgrade — vision is non-negotiable | |
There was a problem hiding this comment.
The role-to-model table lists Designer / Visual defaulting to claude-opus-4.5, but the model selector defaults visual work to claude-opus-4.6 and the catalog includes claude-opus-4.6-fast as well. Consider updating this default to claude-opus-4.6 to match runtime behavior.
| | Designer / Visual | `claude-opus-4.5` | Vision-capable model required | Never downgrade — vision is non-negotiable | | |
| | Designer / Visual | `claude-opus-4.6` | Vision-capable model required | Never downgrade — vision is non-negotiable | |
| ```bash | ||
| squad config model claude-haiku-4-5 | ||
| ``` | ||
|
|
||
| ### Bump quality for an architecture review | ||
|
|
||
| Override to premium for a high-stakes architecture discussion: | ||
|
|
||
| ``` | ||
| use opus for this architecture review | ||
| ``` | ||
|
|
||
| Or pin your architect permanently: | ||
|
|
||
| ```bash | ||
| squad config model claude-opus-4-6 --agent Flight | ||
| ``` |
There was a problem hiding this comment.
These CLI examples again use hyphenated model IDs (claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-6) which will fail CLI validation (the catalog IDs use dot versions like claude-haiku-4.5, claude-sonnet-4.5, claude-opus-4.6). Update these commands to use the exact IDs from the model catalog.
| # Haiku for non-code agents | ||
| squad config model claude-haiku-4-5 --agent Scribe | ||
| squad config model claude-haiku-4-5 --agent Tester | ||
| ``` |
There was a problem hiding this comment.
These examples use --agent Scribe / --agent Tester, but the CLI checks the agent name against .squad/agents/* directory names with a case-sensitive match. Since system agents are typically stored under lower-case directories (e.g. .squad/agents/scribe/), prefer --agent scribe / --agent tester here (or add a note that the flag expects the directory name exactly).
| ``` | ||
| Model configuration: | ||
| Default model: claude-sonnet-4-5 | ||
|
|
||
| Agent overrides: | ||
| Scribe → claude-haiku-4-5 | ||
| Tester → claude-haiku-4-5 | ||
| ``` |
There was a problem hiding this comment.
The sample squad config model output shows claude-sonnet-4-5 / claude-haiku-4-5, but the CLI prints the exact persisted values and only accepts catalog IDs (e.g. claude-sonnet-4.5, claude-haiku-4.5). Adjust the displayed output to match the valid model IDs used by the CLI.
| | Writing code (implementation, refactoring, tests, bug fixes) | `claude-sonnet-4.6` | Standard | Quality and accuracy matter for code | | ||
| | Writing prompts or agent designs | `claude-sonnet-4.6` | Standard | Prompts are executable — treat like code | | ||
| | Non-code work (docs, planning, triage, logs, changelogs) | `claude-haiku-4.5` | Fast | Cost first — Haiku handles non-code tasks | | ||
| | Visual/design work requiring image analysis | `claude-opus-4.5` | Premium | Vision capability required — overrides cost rule | |
There was a problem hiding this comment.
For visual/image-analysis tasks the runtime auto-selection uses claude-opus-4.6 as the default (see task type visual in the model selector). This row currently documents claude-opus-4.5, which makes the cost-first table inconsistent with actual behavior and with the rest of the docs.
| | Visual/design work requiring image analysis | `claude-opus-4.5` | Premium | Vision capability required — overrides cost rule | | |
| | Visual/design work requiring image analysis | `claude-opus-4.6` | Premium | Vision capability required — overrides cost rule | |
| ``` | ||
| Premium: claude-opus-4.6 → claude-opus-4.5 → claude-sonnet-4.6 → claude-sonnet-4.5 → (platform default) | ||
| Standard: claude-sonnet-4.6 → claude-sonnet-4.5 → gpt-5.4 → gpt-5.3-codex → claude-sonnet-4 → (platform default) | ||
| Fast: claude-haiku-4.5 → gpt-5.4-mini → gpt-5.1-codex-mini → gpt-4.1 → (platform default) | ||
| ``` |
There was a problem hiding this comment.
The documented fallback chains don’t match the runtime fallback chain constants and include a model ID (gpt-5.4-mini) that is not in the model catalog. Also, the implementation currently attempts the resolved model plus the configured chain and throws if exhausted (there isn’t a “platform default” step). Please align these chains and rules with MODELS.FALLBACK_CHAINS (and remove/clarify the “platform default” and “maximum 3 retries” claims).
| |------|--------| | ||
| | **Premium** | `claude-opus-4.6`, `claude-opus-4.5` | | ||
| | **Standard** | `claude-sonnet-4.6`, `claude-sonnet-4.5`, `claude-sonnet-4`, `gpt-5.4`, `gpt-5.3-codex`, `gpt-5.2-codex`, `gpt-5.2`, `gpt-5.1-codex`, `gpt-5.1`, `gemini-3-pro-preview` | | ||
| | **Fast/Cheap** | `claude-haiku-4.5`, `gpt-5.4-mini`, `gpt-5.1-codex-mini`, `gpt-5-mini`, `gpt-4.1` | |
There was a problem hiding this comment.
The “Available models” list includes gpt-5.4-mini, but that model ID is not present in MODEL_CATALOG (while claude-opus-4.6-fast is). To avoid users configuring unsupported models, please make this list match the actual MODEL_CATALOG IDs (or clearly note which IDs are illustrative vs. validated).
| | **Fast/Cheap** | `claude-haiku-4.5`, `gpt-5.4-mini`, `gpt-5.1-codex-mini`, `gpt-5-mini`, `gpt-4.1` | | |
| | **Fast/Cheap** | `claude-haiku-4.5`, `claude-opus-4.6-fast`, `gpt-5.1-codex-mini`, `gpt-5-mini`, `gpt-4.1` | |
| # Pin Scribe to Haiku (cheaper — it only writes docs) | ||
| squad config model claude-haiku-4-5 --agent Scribe | ||
|
|
||
| # Pin Tester to Haiku (cheaper — simple test scaffolding) | ||
| squad config model claude-haiku-4-5 --agent Tester |
There was a problem hiding this comment.
The --agent flag is validated against the actual .squad/agents/* directory names with a case-sensitive match. Since the built-in Scribe charter path is .squad/agents/scribe/charter.md, examples like --agent Scribe (and --agent Tester) can fail on case-sensitive filesystems; prefer --agent scribe / --agent tester (or clarify that the flag expects the directory name).
|
@diberry letting you respond to review comments |
Add dedicated docs page covering the 4-layer model selection hierarchy, per-agent overrides via config.json and CLI, cost-first principle, role-to-model mapping, task complexity adjustments, session directives, and fallback chains. - Page: features/model-pinning.md - Nav: after model-selection in Features section - Test: updated docs-build.test.ts assertions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ec3f23c to
a954626
Compare
…nse protocol - Add comprehensive 4-phase fork-first pipeline workflow documentation - Include comment response protocol for Copilot review gates - Add anti-pattern: NEVER fix code without replying to Copilot comments - Reference commands for replying to PR comments with gh api Closes #xxx
|
Closing -- will re-open via fork-first pipeline when fully polished. |
What
New docs page covering the 5-layer model selection hierarchy, per-agent overrides, cost-first principle, role-to-model mapping, and fallback chains.
Closes diberry#49
Files
Review
@bradygaster Please review — this documents the model selection system you designed.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com