Ask Copilot CLI for the list of models instead of hardcoding it#474
Ask Copilot CLI for the list of models instead of hardcoding it#474PureWeen merged 4 commits intoPureWeen:mainfrom
Conversation
PR #474 — R1 Multi-Model ReviewPR: Ask Copilot CLI for the list of models instead of hardcoding it Consensus Findings1. 🟡 MODERATE — Unbounded recursion in
|
| # | Sev | Finding | Models |
|---|---|---|---|
| 1 | 🟡 | Unbounded recursion in ParseDirectCliModelProbeOutput | Sonnet + Codex + Direct |
| 2 | 🟡 | FallbackModels safety net removed — single-model picker on failure | Sonnet + Direct |
| 3 | 🟡 | LLM probe can hallucinate model IDs, treated as primary over SDK | Sonnet + Codex |
| 4 | 🟡 | outputTask/errorTask abandoned on timeout; CTS edge case | Sonnet + Direct |
| 5 | 🟢 | No tests for ParseDirectCliModelProbeOutput or coalescing logic | Codex + Direct |
Recommended action:
- Must fix: Polish UI, Rename Sessions, Markdown Output Support, Queued Messages #1 (add recursion depth guard — trivial fix, prevents crash)
- Should fix: Add mobile sidebar top-bar & flyout UI #2 (restore FallbackModels as supplemental for empty-discovery case)
- Consider: Persistent sessions, streaming fixes, stop button #3 (prefer SDK over CLI when both available), Tab navigation + shared ChatMessageList component #4 (separate timeout CTS), Add DevTunnel, QR, and MauiDevFlow support #5 (add parser tests)
- Add maxDepth guard (5) to ParseDirectCliModelProbeOutput to prevent stack overflow from pathological nested LLM output - Restore FallbackModels safety net: when discovered models list has only 1 entry (just DefaultModel), fall back to the curated list - Fix CTS timeout race: read stdout/stderr without cancellation token so a successful process exit isn't discarded by a late CTS firing - Add IsValidModelSlug() regex filter to discard hallucinated model IDs from LLM probe output (only accepts lowercase alphanumeric with hyphens and dots) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR #474 — R2 Multi-Model ReviewPR: Ask Copilot CLI for the list of models instead of hardcoding it R1 Findings Status
New Consensus Findings1. 🟡 MODERATE —
|
| # | Sev | Finding | Models | Status |
|---|---|---|---|---|
| 1 | 🟡 | Count > 1 heuristic bypasses FallbackModels |
Opus + Sonnet + Codex | NEW |
| 2 | 🟢 | Unobserved task exceptions (fire-and-forget + timeout) | Opus + Sonnet | NEW |
| 3 | 🟢 | No tests for ParseDirectCliModelProbeOutput | Opus + Sonnet + Codex | R1 carry |
Recommended action: Count > 1 with AvailableModels.Count > 0 check). The other items are minor improvements that can be done in a follow-up.
…heuristic The Count > 1 check incorrectly bypassed FallbackModels when selectedModel != DefaultModel (producing a 2-item list from required models alone). Also adds fallback guard to SessionListItem.GetSelectableModels() which had none. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR #474 — R3 Multi-Model ReviewPR: Ask Copilot CLI for the list of models instead of hardcoding it R2 Findings Status — All Fixed ✅
R1 Findings Status — All Fixed ✅
Remaining Minor Items (no consensus for blocking)
Both are acknowledged minor gaps — neither blocks merge. The parser test gap is the more impactful one given the method's complexity, but can be addressed in a follow-up. What's Good
Recommended action: ✅ Approve All 6 findings from R1+R2 are resolved. The 2 remaining 🟢 items (fire-and-forget exception handling + parser tests) are acknowledged minor improvements suitable for follow-up. 3/3 models agree the PR is ready to merge. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR #474 — R4 Review (post author commit)Reviewer: Multi-model consensus (Claude Opus 4.6 + Claude Sonnet 4.6 + GPT-5.3-Codex) New Commit ReviewedCommit 4: "Preserve selected model during usage updates" (by PureWeen)
R1/R2/R3 Fix Status
Consensus Findings (2+ of 3 models)🟢 MINOR —
|
| Finding | Model | Assessment |
|---|---|---|
"resumed" could leak into model picker dropdown for old sessions |
Sonnet | Edge case for legacy data only; low practical risk |
Race in check-then-set on state.Info.Model |
Codex | Explicitly dismissed by Sonnet — PR actually narrows the existing race window |
Unobserved fire-and-forget _ = FetchAvailableModelsAsync() |
Opus (R3 carry) | Internal try-catches make silent failure unlikely |
No unit tests for ParseDirectCliModelProbeOutput |
Opus (R3 carry) | Good follow-up item, not blocking |
New Code Assessment
ShouldAcceptObservedModel logic — All 3 models confirmed the decision matrix is correct:
- Empty/null current → accept (no user choice yet) ✅
- "resumed" placeholder → accept (legacy sentinel) ✅
- Same model after normalization → accept (no conflict) ✅
- Different model → reject (preserve user choice) ✅
Test coverage — 3 new tests cover the main paths (empty, resumed, same-model, different-model). NormalizeToSlug(null) path is untested but trivially correct.
Verdict
✅ Approve
The PR is in good shape after 4 commits and 3 rounds of fixes. The new ShouldAcceptObservedModel commit is well-designed and correctly solves the model-overwrite-on-usage-events problem. All prior R1/R2 findings are verified fixed. The one consensus finding (magic string duplication) is 🟢 minor and non-blocking.
Recommended follow-ups (not blocking merge):
- Extract
"resumed"sentinel to a shared constant - Add unit tests for
ParseDirectCliModelProbeOutput
…Ween#474) Co-authored-by: Shane Neuville <shneuvil@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
No description provided.