Skip to content

docs: Model Pinning for Cost Optimization (#49)#644

Closed
diberry wants to merge 2 commits intobradygaster:devfrom
diberry:squad/49-model-pinning-docs
Closed

docs: Model Pinning for Cost Optimization (#49)#644
diberry wants to merge 2 commits intobradygaster:devfrom
diberry:squad/49-model-pinning-docs

Conversation

@diberry
Copy link
Copy Markdown
Collaborator

@diberry diberry commented Mar 26, 2026

What

New docs page covering the 5-layer model selection hierarchy, per-agent overrides, cost-first principle, role-to-model mapping, and fallback chains.

Closes diberry#49

Files

  • \docs/src/content/docs/features/model-pinning.md\ — new 284-line page
  • \docs/src/navigation.ts\ — nav entry after model-selection
  • \ est/docs-build.test.ts\ — added to expected features

Review

  • Squad team approved (Flight ✅ technical accuracy, Procedures ✅ clarity)
  • All content grounded in squad.agent.md Per-Agent Model Selection section

@bradygaster Please review — this documents the model selection system you designed.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

larsontim12 pushed a commit to larsontim12/squad that referenced this pull request Mar 27, 2026
…er#650)

* feat: REPL UX polish — figlet banner, proper-case names, keyword emoji, 80-col wrap, activity hints

- Add figlet ASCII art SQUAD banner in header (App.tsx)
- Fix agent name casing: SessionRegistry uses case-insensitive keys, preserves display names (sessions.ts)
- Add keyword-based role emoji fallback matching in getRoleEmoji (lifecycle.ts)
- Add extractAgentHint() to parse coordinator text for specific task descriptions (index.ts)
- Cap content width at 80 columns across App.tsx, MessageStream.tsx, AgentPanel.tsx
- Remove [system] prefix from system messages in MessageStream
- Add slash command passthrough while processing (InputPrompt.tsx)
- Pin header to top via Ink Static block, simplified header content
- Register agents with proper case names, use display names in activity hints
- Update tests: 192 passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: wire REPL telemetry — EventBus bridge, StreamingPipeline, agent metrics, model display

Wire all defined OTel instrumentation into the REPL code path:

- Pass EventBus to initSquadTelemetry() so EventBus→OTel bridge activates (bradygaster#645)
- Instantiate StreamingPipeline and feed message_delta/usage events for token
  and response latency metrics (TTFT, duration, tokens/sec) (bradygaster#646)
- Call recordAgentSpawn/Duration/Error/Destroy from dispatch handlers (bradygaster#647)
- Add model field to AgentSession, display model name in AgentPanel (bradygaster#648)
- Enable shell metrics when OTel endpoint is configured (no longer requires
  SQUAD_TELEMETRY=1) (bradygaster#649)
- Export RuntimeEventBus from SDK barrel for REPL consumption
- Listen for usage events to capture model name and feed to StreamingPipeline
- Update tests: replace stale [WORK]/[STREAM] assertions, update banner/header
  checks for figlet art, update source-code parsing tests for simplified header

Closes bradygaster#644, bradygaster#645, bradygaster#646, bradygaster#647, bradygaster#648, bradygaster#649

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bradygaster bradygaster requested a review from Copilot March 27, 2026 21:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new documentation page describing Squad’s “model pinning” system (persistent model preferences and per-agent overrides) and wires it into the docs site navigation and docs build validation.

Changes:

  • Added new docs page: Model Pinning for Cost Optimization (features/model-pinning).
  • Added navigation entry for the new page under Features.
  • Updated docs build test expectations to include the new feature slug.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
docs/src/content/docs/features/model-pinning.md New documentation page describing persistent model pinning, layers, fallbacks, and examples.
docs/src/navigation.ts Adds “Model Pinning” to the Features navigation section.
test/docs-build.test.ts Adds model-pinning to the expected Features docs list.

Comment on lines +23 to +31
```bash
# Set Sonnet as the default for all agents
squad config model claude-sonnet-4-5

# Pin Scribe to Haiku (cheaper — it only writes docs)
squad config model claude-haiku-4-5 --agent Scribe

# Pin Tester to Haiku (cheaper — simple test scaffolding)
squad config model claude-haiku-4-5 --agent Tester
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI examples use model IDs like claude-sonnet-4-5 / claude-haiku-4-5, but the CLI validates against MODEL_CATALOG (e.g. claude-sonnet-4.5, claude-haiku-4.5) and will reject the hyphenated form as an unknown model. Update these commands to use the exact catalog IDs (dot version) to ensure the quick start works.

Copilot uses AI. Check for mistakes.
| Lead / Architect | auto (per-task) | Mixed: code review needs quality, planning needs cost | Architecture proposals → premium; triage → haiku |
| Prompt Engineer | auto (per-task) | Prompt design is like code, research is not | Prompt architecture → sonnet; research → haiku |
| SDK Expert | `claude-sonnet-4.6` | Technical analysis that often touches code | Pure research → `claude-haiku-4.5` |
| Designer / Visual | `claude-opus-4.5` | Vision-capable model required | Never downgrade — vision is non-negotiable |
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role-to-model table lists Designer / Visual defaulting to claude-opus-4.5, but the model selector defaults visual work to claude-opus-4.6 and the catalog includes claude-opus-4.6-fast as well. Consider updating this default to claude-opus-4.6 to match runtime behavior.

Suggested change
| Designer / Visual | `claude-opus-4.5` | Vision-capable model required | Never downgrade — vision is non-negotiable |
| Designer / Visual | `claude-opus-4.6` | Vision-capable model required | Never downgrade — vision is non-negotiable |

Copilot uses AI. Check for mistakes.
Comment on lines +235 to +251
```bash
squad config model claude-haiku-4-5
```

### Bump quality for an architecture review

Override to premium for a high-stakes architecture discussion:

```
use opus for this architecture review
```

Or pin your architect permanently:

```bash
squad config model claude-opus-4-6 --agent Flight
```
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These CLI examples again use hyphenated model IDs (claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-6) which will fail CLI validation (the catalog IDs use dot versions like claude-haiku-4.5, claude-sonnet-4.5, claude-opus-4.6). Update these commands to use the exact IDs from the model catalog.

Copilot uses AI. Check for mistakes.
Comment on lines +261 to +264
# Haiku for non-code agents
squad config model claude-haiku-4-5 --agent Scribe
squad config model claude-haiku-4-5 --agent Tester
```
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples use --agent Scribe / --agent Tester, but the CLI checks the agent name against .squad/agents/* directory names with a case-sensitive match. Since system agents are typically stored under lower-case directories (e.g. .squad/agents/scribe/), prefer --agent scribe / --agent tester here (or add a note that the flag expects the directory name exactly).

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +47
```
Model configuration:
Default model: claude-sonnet-4-5

Agent overrides:
Scribe → claude-haiku-4-5
Tester → claude-haiku-4-5
```
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample squad config model output shows claude-sonnet-4-5 / claude-haiku-4-5, but the CLI prints the exact persisted values and only accepts catalog IDs (e.g. claude-sonnet-4.5, claude-haiku-4.5). Adjust the displayed output to match the valid model IDs used by the CLI.

Copilot uses AI. Check for mistakes.
| Writing code (implementation, refactoring, tests, bug fixes) | `claude-sonnet-4.6` | Standard | Quality and accuracy matter for code |
| Writing prompts or agent designs | `claude-sonnet-4.6` | Standard | Prompts are executable — treat like code |
| Non-code work (docs, planning, triage, logs, changelogs) | `claude-haiku-4.5` | Fast | Cost first — Haiku handles non-code tasks |
| Visual/design work requiring image analysis | `claude-opus-4.5` | Premium | Vision capability required — overrides cost rule |
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For visual/image-analysis tasks the runtime auto-selection uses claude-opus-4.6 as the default (see task type visual in the model selector). This row currently documents claude-opus-4.5, which makes the cost-first table inconsistent with actual behavior and with the rest of the docs.

Suggested change
| Visual/design work requiring image analysis | `claude-opus-4.5` | Premium | Vision capability required — overrides cost rule |
| Visual/design work requiring image analysis | `claude-opus-4.6` | Premium | Vision capability required — overrides cost rule |

Copilot uses AI. Check for mistakes.
Comment on lines +210 to +214
```
Premium: claude-opus-4.6 → claude-opus-4.5 → claude-sonnet-4.6 → claude-sonnet-4.5 → (platform default)
Standard: claude-sonnet-4.6 → claude-sonnet-4.5 → gpt-5.4 → gpt-5.3-codex → claude-sonnet-4 → (platform default)
Fast: claude-haiku-4.5 → gpt-5.4-mini → gpt-5.1-codex-mini → gpt-4.1 → (platform default)
```
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented fallback chains don’t match the runtime fallback chain constants and include a model ID (gpt-5.4-mini) that is not in the model catalog. Also, the implementation currently attempts the resolved model plus the configured chain and throws if exhausted (there isn’t a “platform default” step). Please align these chains and rules with MODELS.FALLBACK_CHAINS (and remove/clarify the “platform default” and “maximum 3 retries” claims).

Copilot uses AI. Check for mistakes.
|------|--------|
| **Premium** | `claude-opus-4.6`, `claude-opus-4.5` |
| **Standard** | `claude-sonnet-4.6`, `claude-sonnet-4.5`, `claude-sonnet-4`, `gpt-5.4`, `gpt-5.3-codex`, `gpt-5.2-codex`, `gpt-5.2`, `gpt-5.1-codex`, `gpt-5.1`, `gemini-3-pro-preview` |
| **Fast/Cheap** | `claude-haiku-4.5`, `gpt-5.4-mini`, `gpt-5.1-codex-mini`, `gpt-5-mini`, `gpt-4.1` |
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Available models” list includes gpt-5.4-mini, but that model ID is not present in MODEL_CATALOG (while claude-opus-4.6-fast is). To avoid users configuring unsupported models, please make this list match the actual MODEL_CATALOG IDs (or clearly note which IDs are illustrative vs. validated).

Suggested change
| **Fast/Cheap** | `claude-haiku-4.5`, `gpt-5.4-mini`, `gpt-5.1-codex-mini`, `gpt-5-mini`, `gpt-4.1` |
| **Fast/Cheap** | `claude-haiku-4.5`, `claude-opus-4.6-fast`, `gpt-5.1-codex-mini`, `gpt-5-mini`, `gpt-4.1` |

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +31
# Pin Scribe to Haiku (cheaper — it only writes docs)
squad config model claude-haiku-4-5 --agent Scribe

# Pin Tester to Haiku (cheaper — simple test scaffolding)
squad config model claude-haiku-4-5 --agent Tester
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --agent flag is validated against the actual .squad/agents/* directory names with a case-sensitive match. Since the built-in Scribe charter path is .squad/agents/scribe/charter.md, examples like --agent Scribe (and --agent Tester) can fail on case-sensitive filesystems; prefer --agent scribe / --agent tester (or clarify that the flag expects the directory name).

Copilot uses AI. Check for mistakes.
@bradygaster
Copy link
Copy Markdown
Owner

@diberry letting you respond to review comments

@diberry diberry marked this pull request as draft March 28, 2026 14:13
Add dedicated docs page covering the 4-layer model selection hierarchy,
per-agent overrides via config.json and CLI, cost-first principle,
role-to-model mapping, task complexity adjustments, session directives,
and fallback chains.

- Page: features/model-pinning.md
- Nav: after model-selection in Features section
- Test: updated docs-build.test.ts assertions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@diberry diberry force-pushed the squad/49-model-pinning-docs branch from ec3f23c to a954626 Compare March 28, 2026 14:19
…nse protocol

- Add comprehensive 4-phase fork-first pipeline workflow documentation
- Include comment response protocol for Copilot review gates
- Add anti-pattern: NEVER fix code without replying to Copilot comments
- Reference commands for replying to PR comments with gh api

Closes #xxx
@diberry
Copy link
Copy Markdown
Collaborator Author

diberry commented Mar 28, 2026

Closing -- will re-open via fork-first pipeline when fully polished.

@diberry diberry closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

content: Model Pinning in Squad: Control Which LLMs Power Your Agents

4 participants