Ask Copilot CLI for the list of models instead of hardcoding it by vitek-karas · Pull Request #474 · PureWeen/PolyPilot

vitek-karas · 2026-04-02T19:07:23Z

No description provided.

PureWeen · 2026-04-02T20:03:23Z

PR #474 — R1 Multi-Model Review

PR: Ask Copilot CLI for the list of models instead of hardcoding it
Author: vitek-karas | Branch: fix/all-models → main
Files: 10 changed, +335/−32 | Commits: 1
CI: ⚠️ No checks reported on branch
Prior reviews: None (R1)
Models: Claude Sonnet 4.6, GPT-5.3-Codex, direct analysis (Opus timed out)

Consensus Findings

1. 🟡 MODERATE — Unbounded recursion in `ParseDirectCliModelProbeOutput`

File: PolyPilot/Services/CopilotService.Utilities.cs — ParseDirectCliModelProbeOutput
Flagged by: Sonnet ✓ · Codex ✓ · Direct ✓

var nested = ParseDirectCliModelProbeOutput(content.GetString());

The method calls itself when it finds data.content with no depth guard. Since this parses LLM output (unpredictable), pathological nesting like {"data":{"content":"{\"data\":{\"content\":\"...\"}}"}} would recurse until StackOverflowException, crashing the process. Add a depth parameter capped at ~5.

2. 🟡 MODERATE — FallbackModels safety net silently removed

Files: Dashboard.razor:3353, SessionSidebar.razor:1439
Flagged by: Sonnet ✓ · Direct ✓

Before:

CopilotService.AvailableModels.Count > 0 ? CopilotService.AvailableModels : ModelHelper.FallbackModels;

After:

ModelHelper.BuildSelectionList(CopilotService.AvailableModels, CopilotService.DefaultModel);

When AvailableModels is empty (first launch, before fetch completes — up to 45s — or total fetch failure), the picker shows only 1 model (DefaultModel) instead of the previous ~10 FallbackModels. This degrades new-session UX during the common initial-load window. Consider passing ModelHelper.FallbackModels as supplemental to BuildSelectionList when discovery is empty.

3. 🟡 MODERATE — LLM probe can return hallucinated/invalid model IDs

File: CopilotService.Utilities.cs — GetDirectCliAvailableModelsAsync
Flagged by: Sonnet ✓ · Codex ✓

The probe asks an LLM to "return the list of model IDs as a JSON array." The model may hallucinate IDs, return stale info, or fabricate plausible-but-nonexistent model names. These are surfaced directly in the picker with no validation against known models. When direct CLI output is non-empty, it's treated as primary over SDK metadata — which means hallucinated IDs take precedence over authoritative SDK data.

This is partly an architectural concern, but worth acknowledging. Consider: (a) validating returned IDs against a known-slug list, or (b) always preferring SDK results when available since those are authoritative.

4. 🟡 MODERATE — `outputTask`/`errorTask` abandoned on timeout; CTS edge case

File: CopilotService.Utilities.cs — GetDirectCliAvailableModelsAsync
Flagged by: Sonnet ✓ · Direct ✓

using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(45));
var outputTask = process.StandardOutput.ReadToEndAsync(cts.Token);
var errorTask  = process.StandardError.ReadToEndAsync(cts.Token);
try { await process.WaitForExitAsync(cts.Token); }
catch (OperationCanceledException ex) {
    TryTerminateProcess(process);
    throw;  // outputTask/errorTask abandoned, unobserved faulted tasks
}
var output = await outputTask;  // CTS may fire between WaitForExit and here

Timeout path: outputTask and errorTask are abandoned as unobserved faulted tasks (risk for UnobservedTaskException).
Normal-exit edge case: If the process exits at t≈44.9s and the CTS fires before await outputTask, the successful result is discarded. Consider separating the process timeout from the stream-read CTS, or draining streams before awaiting exit.

5. 🟢 MINOR — No tests for `ParseDirectCliModelProbeOutput` or coalescing logic

Files: PolyPilot.Tests/ModelSelectionTests.cs
Flagged by: Codex ✓ · Direct ✓

The new BuildSelectionList has 3 good tests. However, the highest-risk new code — ParseDirectCliModelProbeOutput, GetJsonArrayCandidates, and the coalescing fetch loop — has zero test coverage. The parser handles complex LLM output formats (markdown fences, nested JSON, code-block extraction) and the recursion bug (#1) — unit tests would catch these.

Test Coverage Assessment

✅ BuildSelectionList — 3 new tests covering empty discovery, dedup, ordering
✅ StateChangeCoalescerTests — flaky test properly fixed with TCS pattern
❌ ParseDirectCliModelProbeOutput — no tests (handles LLM output parsing, recursive unwrapping, code fence extraction)
❌ GetJsonArrayCandidates — no tests
❌ RunAvailableModelsFetchLoopAsync / coalescing — no tests
❌ GetDirectCliAvailableModelsAsync — no tests (spawns process, timeout handling)

What's Good

BuildSelectionList is well-designed — normalize + dedup + append pattern is clean and testable
Coalescing lock pattern prevents redundant fetches during reconnect bursts
NormalizeToSlug on restored selectedModel prevents stale display names from persisting
InvokeOnUI for OnStateChanged — correct thread safety pattern
Flaky SeparateBursts_FireSeparately test replaced with proper TCS-based signaling — excellent fix
Process cleanup via TryTerminateProcess with entireProcessTree: true

Summary

#	Sev	Finding	Models
1	🟡	Unbounded recursion in ParseDirectCliModelProbeOutput	Sonnet + Codex + Direct
2	🟡	FallbackModels safety net removed — single-model picker on failure	Sonnet + Direct
3	🟡	LLM probe can hallucinate model IDs, treated as primary over SDK	Sonnet + Codex
4	🟡	outputTask/errorTask abandoned on timeout; CTS edge case	Sonnet + Direct
5	🟢	No tests for ParseDirectCliModelProbeOutput or coalescing logic	Codex + Direct

Recommended action: ⚠️ Request changes

Must fix: Polish UI, Rename Sessions, Markdown Output Support, Queued Messages #1 (add recursion depth guard — trivial fix, prevents crash)
Should fix: Add mobile sidebar top-bar & flyout UI #2 (restore FallbackModels as supplemental for empty-discovery case)
Consider: Persistent sessions, streaming fixes, stop button #3 (prefer SDK over CLI when both available), Tab navigation + shared ChatMessageList component #4 (separate timeout CTS), Add DevTunnel, QR, and MauiDevFlow support #5 (add parser tests)

- Add maxDepth guard (5) to ParseDirectCliModelProbeOutput to prevent stack overflow from pathological nested LLM output - Restore FallbackModels safety net: when discovered models list has only 1 entry (just DefaultModel), fall back to the curated list - Fix CTS timeout race: read stdout/stderr without cancellation token so a successful process exit isn't discarded by a late CTS firing - Add IsValidModelSlug() regex filter to discard hallucinated model IDs from LLM probe output (only accepts lowercase alphanumeric with hyphens and dots) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-02T20:43:19Z

PR #474 — R2 Multi-Model Review

PR: Ask Copilot CLI for the list of models instead of hardcoding it
Author: vitek-karas | Branch: fix/all-models → main
Files: 10 changed, +361/−45 | Commits: 2 (original + R1 fix commit)
CI: ⚠️ No checks reported on branch
Prior reviews: R1 posted (5 findings)
Models: Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.3-Codex

R1 Findings Status

#	R1 Finding	Status	Verification
1	🟡 Unbounded recursion in `ParseDirectCliModelProbeOutput`	✅ FIXED	`maxDepth=5` param added, guarded at `maxDepth <= 0` (3/3 models confirmed)
2	🟡 FallbackModels safety net removed	⚠️ PARTIALLY FIXED	`Count > 1` guard added but has edge case — see new finding #1
3	🟡 LLM probe can hallucinate model IDs	✅ FIXED	`IsValidModelSlug` regex + `.Where()` filter applied (3/3 confirmed)
4	🟡 CTS timeout race on stdout/stderr	✅ FIXED	`ReadToEndAsync()` without CTS; only `WaitForExitAsync` uses CTS (3/3 confirmed)
5	🟢 No tests for `ParseDirectCliModelProbeOutput`	❌ STILL OUTSTANDING	0 tests for JSON parser, markdown fence extraction, slug filter, maxDepth (3/3 noted)

New Consensus Findings

1. 🟡 MODERATE — `Count > 1` heuristic incorrectly bypasses FallbackModels

Files: SessionSidebar.razor:1441-1443, Dashboard.razor:3355-3357
Flagged by: Opus ✓ · Sonnet ✓ · Codex ✓

ModelHelper.BuildSelectionList(CopilotService.AvailableModels, selectedModel, CopilotService.DefaultModel)
    is { Count: > 1 } list ? list
    : ModelHelper.BuildSelectionList(ModelHelper.FallbackModels, selectedModel, CopilotService.DefaultModel);

Bug: When AvailableModels is empty but selectedModel ≠ DefaultModel (both non-null), BuildSelectionList returns [selectedModel, DefaultModel] — count 2. The Count > 1 check passes, so FallbackModels is never used. Users with a previously-saved non-default model see a 2-item picker instead of the full ~20-item fallback.

Fix: Replace the heuristic with an explicit check:

CopilotService.AvailableModels.Count > 0
    ? ModelHelper.BuildSelectionList(CopilotService.AvailableModels, selectedModel, CopilotService.DefaultModel)
    : ModelHelper.BuildSelectionList(ModelHelper.FallbackModels, selectedModel, CopilotService.DefaultModel);

2. 🟢 MINOR — Unobserved task exceptions from fire-and-forget fetch

Files: CopilotService.cs:1276,1369,1468, CopilotService.Utilities.cs timeout path
Flagged by: Opus ✓ · Sonnet ✓

_ = FetchAvailableModelsAsync() at 3 call sites. If an unexpected exception escapes RunAvailableModelsFetchLoopAsync (e.g., from InvokeOnUI, SequenceEqual), it becomes an unobserved faulted task. Additionally, on the timeout path, outputTask/errorTask are abandoned while process.Dispose() races to close the streams.

Consider wrapping the loop body in a top-level try/catch, and awaiting stream tasks after TryTerminateProcess before rethrowing.

3. 🟢 MINOR — No tests for `ParseDirectCliModelProbeOutput` (carried from R1)

Flagged by: Opus ✓ · Sonnet ✓ · Codex ✓

The method is internal static — ideal for unit testing. It handles: valid JSON array, nested data.content unwrapping, markdown fences, hallucinated ID filtering, maxDepth exhaustion, empty/garbage input. All untested. Tests for BuildSelectionList exist and are good, but the highest-risk parser code has zero coverage.

What's Good

All 4 core R1 fixes are solid and well-implemented
IsValidModelSlug regex is clean and correctly anchored
CTS fix comment explains the rationale clearly
maxDepth default of 5 is reasonable
StateChangeCoalescerTests flaky test fix (TCS-based) is excellent
InvokeOnUI for OnStateChanged — correct thread marshaling
Coalescing lock pattern prevents redundant concurrent fetches
3065 tests pass ✅

Summary

#	Sev	Finding	Models	Status
1	🟡	`Count > 1` heuristic bypasses FallbackModels	Opus + Sonnet + Codex	NEW
2	🟢	Unobserved task exceptions (fire-and-forget + timeout)	Opus + Sonnet	NEW
3	🟢	No tests for ParseDirectCliModelProbeOutput	Opus + Sonnet + Codex	R1 carry

Recommended action: ⚠️ Request changes — Fix #1 (replace Count > 1 with AvailableModels.Count > 0 check). The other items are minor improvements that can be done in a follow-up.

…heuristic The Count > 1 check incorrectly bypassed FallbackModels when selectedModel != DefaultModel (producing a 2-item list from required models alone). Also adds fallback guard to SessionListItem.GetSelectableModels() which had none. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-02T21:11:16Z

PR #474 — R3 Multi-Model Review

PR: Ask Copilot CLI for the list of models instead of hardcoding it
Author: vitek-karas | Branch: fix/all-models → main
Files: 10 changed, +363/−45 | Commits: 3 (original + 2 review fix commits)
CI: ⚠️ No checks reported on branch
Tests: 3065 pass ✅
Models: Claude Opus 4.6 ✓ · Claude Sonnet 4.6 ✓ · GPT-5.3-Codex ✓

R2 Findings Status — All Fixed ✅

#	R2 Finding	Status	Verification
1	🟡 `Count > 1` heuristic bypasses FallbackModels	✅ FIXED	Now `AvailableModels.Count > 0` in all 3 sites (3/3 confirmed)
2	🟡 SessionListItem.GetSelectableModels() missing fallback	✅ FIXED	Same `Count > 0` guard added (3/3 confirmed)

R1 Findings Status — All Fixed ✅

#	R1 Finding	Status
1	🟡 Unbounded recursion in ParseDirectCliModelProbeOutput	✅ FIXED (maxDepth=5)
2	🟡 FallbackModels safety net removed	✅ FIXED (explicit Count > 0 guard)
3	🟡 LLM probe can hallucinate model IDs	✅ FIXED (IsValidModelSlug regex)
4	🟡 CTS timeout race on stdout/stderr	✅ FIXED (ReadToEndAsync without CTS)

Remaining Minor Items (no consensus for blocking)

#	Sev	Finding	Models
1	🟢	Unobserved task exceptions from `_ = FetchAvailableModelsAsync()` fire-and-forget	Opus + Sonnet + Codex
2	🟢	No tests for `ParseDirectCliModelProbeOutput` / `GetJsonArrayCandidates` / `IsValidModelSlug`	Opus + Sonnet + Codex

Both are acknowledged minor gaps — neither blocks merge. The parser test gap is the more impactful one given the method's complexity, but can be addressed in a follow-up.

What's Good

Clean BuildSelectionList API — normalize + dedup + append pattern
Coalescing lock pattern prevents redundant concurrent fetches
maxDepth recursion guard, IsValidModelSlug regex filter
CTS fix with clear explanatory comment
Consistent fallback pattern across all 3 UI components
Excellent flaky test fix (TCS-based StateChangeCoalescerTests)
Atomic reference swap for _localAvailableModels — no torn reads

Recommended action: ✅ Approve

All 6 findings from R1+R2 are resolved. The 2 remaining 🟢 items (fire-and-forget exception handling + parser tests) are acknowledged minor improvements suitable for follow-up. 3/3 models agree the PR is ready to merge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-02T21:59:01Z

PR #474 — R4 Review (post author commit)

Reviewer: Multi-model consensus (Claude Opus 4.6 + Claude Sonnet 4.6 + GPT-5.3-Codex)
Diff: 677 lines across 10 files (4 commits total)
CI: No checks configured ⚠️

New Commit Reviewed

Commit 4: "Preserve selected model during usage updates" (by PureWeen)

Adds ModelHelper.ShouldAcceptObservedModel() — guards against backend usage events overwriting explicit user model selection
Applied at SessionUsageInfoEvent and AssistantUsageEvent handlers in Events.cs
3 new tests in ModelSelectionTests.cs

R1/R2/R3 Fix Status

Finding	Status
R1: ModelDisplayNames never populated	✅ Dead code — correctly skipped
R1: Unbounded recursion in ParseDirectCliModelProbeOutput	✅ Fixed (maxDepth=5)
R1: FallbackModels safety net removed	✅ Fixed (Count > 0 pattern)
R1: CTS timeout race	✅ Fixed (ReadToEndAsync without CTS)
R1: LLM probe output not validated	✅ Fixed (IsValidModelSlug regex)
R2: Count > 1 heuristic wrong	✅ Fixed (Count > 0 in all 3 UI files)

Consensus Findings (2+ of 3 models)

🟢 MINOR — `"resumed"` magic string duplicated without shared constant

Flagged by: Opus, Sonnet | Files: ModelHelper.cs, CopilotService.Events.cs:741, SessionManager.cs:68

ShouldAcceptObservedModel checks normalizedCurrent == "resumed" (after NormalizeToSlug), while Events.cs:741 checks the raw value state.Info.Model == "resumed". Both work today because "resumed" is always set as a lowercase literal, but if the sentinel definition ever changed, only one path would break silently. Consider extracting a const string ResumedModelPlaceholder = "resumed" to ModelHelper.

Non-blocking — the code is correct as-is. This is a maintainability suggestion.

Below Consensus Threshold (1/3 — noted for awareness)

Finding	Model	Assessment
`"resumed"` could leak into model picker dropdown for old sessions	Sonnet	Edge case for legacy data only; low practical risk
Race in check-then-set on `state.Info.Model`	Codex	Explicitly dismissed by Sonnet — PR actually narrows the existing race window
Unobserved fire-and-forget `_ = FetchAvailableModelsAsync()`	Opus (R3 carry)	Internal try-catches make silent failure unlikely
No unit tests for `ParseDirectCliModelProbeOutput`	Opus (R3 carry)	Good follow-up item, not blocking

New Code Assessment

ShouldAcceptObservedModel logic — All 3 models confirmed the decision matrix is correct:

Empty/null current → accept (no user choice yet) ✅
"resumed" placeholder → accept (legacy sentinel) ✅
Same model after normalization → accept (no conflict) ✅
Different model → reject (preserve user choice) ✅

Test coverage — 3 new tests cover the main paths (empty, resumed, same-model, different-model). NormalizeToSlug(null) path is untested but trivially correct.

Verdict

✅ Approve

The PR is in good shape after 4 commits and 3 rounds of fixes. The new ShouldAcceptObservedModel commit is well-designed and correctly solves the model-overwrite-on-usage-events problem. All prior R1/R2 findings are verified fixed. The one consensus finding (magic string duplication) is 🟢 minor and non-blocking.

Recommended follow-ups (not blocking merge):

Extract "resumed" sentinel to a shared constant
Add unit tests for ParseDirectCliModelProbeOutput

…Ween#474) Co-authored-by: Shane Neuville <shneuvil@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vitek-karas and others added 2 commits April 2, 2026 15:34

Ask Copilot CLI for the list of models instead of hardcoding it.

d226473

PureWeen force-pushed the fix/all-models branch from 84279bb to 931a17c Compare April 2, 2026 20:37

PureWeen marked this pull request as draft April 2, 2026 21:21

PureWeen marked this pull request as ready for review April 2, 2026 21:43

Preserve selected model during usage updates

dd2f704

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen merged commit 926265c into PureWeen:main Apr 2, 2026

vitek-karas deleted the fix/all-models branch April 3, 2026 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ask Copilot CLI for the list of models instead of hardcoding it#474

Ask Copilot CLI for the list of models instead of hardcoding it#474
PureWeen merged 4 commits intoPureWeen:mainfrom
vitek-karas:fix/all-models

vitek-karas commented Apr 2, 2026

Uh oh!

PureWeen commented Apr 2, 2026

Uh oh!

PureWeen commented Apr 2, 2026

Uh oh!

PureWeen commented Apr 2, 2026

Uh oh!

PureWeen commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vitek-karas commented Apr 2, 2026

Uh oh!

PureWeen commented Apr 2, 2026

PR #474 — R1 Multi-Model Review

Consensus Findings

1. 🟡 MODERATE — Unbounded recursion in ParseDirectCliModelProbeOutput

2. 🟡 MODERATE — FallbackModels safety net silently removed

3. 🟡 MODERATE — LLM probe can return hallucinated/invalid model IDs

4. 🟡 MODERATE — outputTask/errorTask abandoned on timeout; CTS edge case

5. 🟢 MINOR — No tests for ParseDirectCliModelProbeOutput or coalescing logic

Test Coverage Assessment

What's Good

Summary

Uh oh!

PureWeen commented Apr 2, 2026

PR #474 — R2 Multi-Model Review

R1 Findings Status

New Consensus Findings

1. 🟡 MODERATE — Count > 1 heuristic incorrectly bypasses FallbackModels

2. 🟢 MINOR — Unobserved task exceptions from fire-and-forget fetch

3. 🟢 MINOR — No tests for ParseDirectCliModelProbeOutput (carried from R1)

What's Good

Summary

Uh oh!

PureWeen commented Apr 2, 2026

PR #474 — R3 Multi-Model Review

R2 Findings Status — All Fixed ✅

R1 Findings Status — All Fixed ✅

Remaining Minor Items (no consensus for blocking)

What's Good

Uh oh!

PureWeen commented Apr 2, 2026

PR #474 — R4 Review (post author commit)

New Commit Reviewed

R1/R2/R3 Fix Status

Consensus Findings (2+ of 3 models)

🟢 MINOR — "resumed" magic string duplicated without shared constant

Below Consensus Threshold (1/3 — noted for awareness)

New Code Assessment

Verdict

✅ Approve

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. 🟡 MODERATE — Unbounded recursion in `ParseDirectCliModelProbeOutput`

4. 🟡 MODERATE — `outputTask`/`errorTask` abandoned on timeout; CTS edge case

5. 🟢 MINOR — No tests for `ParseDirectCliModelProbeOutput` or coalescing logic

1. 🟡 MODERATE — `Count > 1` heuristic incorrectly bypasses FallbackModels

3. 🟢 MINOR — No tests for `ParseDirectCliModelProbeOutput` (carried from R1)

🟢 MINOR — `"resumed"` magic string duplicated without shared constant