Skip to content

Ask Copilot CLI for the list of models instead of hardcoding it#474

Merged
PureWeen merged 4 commits intoPureWeen:mainfrom
vitek-karas:fix/all-models
Apr 2, 2026
Merged

Ask Copilot CLI for the list of models instead of hardcoding it#474
PureWeen merged 4 commits intoPureWeen:mainfrom
vitek-karas:fix/all-models

Conversation

@vitek-karas
Copy link
Copy Markdown
Contributor

No description provided.

@PureWeen
Copy link
Copy Markdown
Owner

PureWeen commented Apr 2, 2026

PR #474 — R1 Multi-Model Review

PR: Ask Copilot CLI for the list of models instead of hardcoding it
Author: vitek-karas | Branch: fix/all-modelsmain
Files: 10 changed, +335/−32 | Commits: 1
CI: ⚠️ No checks reported on branch
Prior reviews: None (R1)
Models: Claude Sonnet 4.6, GPT-5.3-Codex, direct analysis (Opus timed out)


Consensus Findings

1. 🟡 MODERATE — Unbounded recursion in ParseDirectCliModelProbeOutput

File: PolyPilot/Services/CopilotService.Utilities.csParseDirectCliModelProbeOutput
Flagged by: Sonnet ✓ · Codex ✓ · Direct ✓

var nested = ParseDirectCliModelProbeOutput(content.GetString());

The method calls itself when it finds data.content with no depth guard. Since this parses LLM output (unpredictable), pathological nesting like {"data":{"content":"{\"data\":{\"content\":\"...\"}}"}} would recurse until StackOverflowException, crashing the process. Add a depth parameter capped at ~5.

2. 🟡 MODERATE — FallbackModels safety net silently removed

Files: Dashboard.razor:3353, SessionSidebar.razor:1439
Flagged by: Sonnet ✓ · Direct ✓

Before:

CopilotService.AvailableModels.Count > 0 ? CopilotService.AvailableModels : ModelHelper.FallbackModels;

After:

ModelHelper.BuildSelectionList(CopilotService.AvailableModels, CopilotService.DefaultModel);

When AvailableModels is empty (first launch, before fetch completes — up to 45s — or total fetch failure), the picker shows only 1 model (DefaultModel) instead of the previous ~10 FallbackModels. This degrades new-session UX during the common initial-load window. Consider passing ModelHelper.FallbackModels as supplemental to BuildSelectionList when discovery is empty.

3. 🟡 MODERATE — LLM probe can return hallucinated/invalid model IDs

File: CopilotService.Utilities.csGetDirectCliAvailableModelsAsync
Flagged by: Sonnet ✓ · Codex ✓

The probe asks an LLM to "return the list of model IDs as a JSON array." The model may hallucinate IDs, return stale info, or fabricate plausible-but-nonexistent model names. These are surfaced directly in the picker with no validation against known models. When direct CLI output is non-empty, it's treated as primary over SDK metadata — which means hallucinated IDs take precedence over authoritative SDK data.

This is partly an architectural concern, but worth acknowledging. Consider: (a) validating returned IDs against a known-slug list, or (b) always preferring SDK results when available since those are authoritative.

4. 🟡 MODERATE — outputTask/errorTask abandoned on timeout; CTS edge case

File: CopilotService.Utilities.csGetDirectCliAvailableModelsAsync
Flagged by: Sonnet ✓ · Direct ✓

using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(45));
var outputTask = process.StandardOutput.ReadToEndAsync(cts.Token);
var errorTask  = process.StandardError.ReadToEndAsync(cts.Token);
try { await process.WaitForExitAsync(cts.Token); }
catch (OperationCanceledException ex) {
    TryTerminateProcess(process);
    throw;  // outputTask/errorTask abandoned, unobserved faulted tasks
}
var output = await outputTask;  // CTS may fire between WaitForExit and here

Timeout path: outputTask and errorTask are abandoned as unobserved faulted tasks (risk for UnobservedTaskException).
Normal-exit edge case: If the process exits at t≈44.9s and the CTS fires before await outputTask, the successful result is discarded. Consider separating the process timeout from the stream-read CTS, or draining streams before awaiting exit.

5. 🟢 MINOR — No tests for ParseDirectCliModelProbeOutput or coalescing logic

Files: PolyPilot.Tests/ModelSelectionTests.cs
Flagged by: Codex ✓ · Direct ✓

The new BuildSelectionList has 3 good tests. However, the highest-risk new code — ParseDirectCliModelProbeOutput, GetJsonArrayCandidates, and the coalescing fetch loop — has zero test coverage. The parser handles complex LLM output formats (markdown fences, nested JSON, code-block extraction) and the recursion bug (#1) — unit tests would catch these.


Test Coverage Assessment

  • BuildSelectionList — 3 new tests covering empty discovery, dedup, ordering
  • StateChangeCoalescerTests — flaky test properly fixed with TCS pattern
  • ParseDirectCliModelProbeOutput — no tests (handles LLM output parsing, recursive unwrapping, code fence extraction)
  • GetJsonArrayCandidates — no tests
  • RunAvailableModelsFetchLoopAsync / coalescing — no tests
  • GetDirectCliAvailableModelsAsync — no tests (spawns process, timeout handling)

What's Good

  • BuildSelectionList is well-designed — normalize + dedup + append pattern is clean and testable
  • Coalescing lock pattern prevents redundant fetches during reconnect bursts
  • NormalizeToSlug on restored selectedModel prevents stale display names from persisting
  • InvokeOnUI for OnStateChanged — correct thread safety pattern
  • Flaky SeparateBursts_FireSeparately test replaced with proper TCS-based signaling — excellent fix
  • Process cleanup via TryTerminateProcess with entireProcessTree: true

Summary

# Sev Finding Models
1 🟡 Unbounded recursion in ParseDirectCliModelProbeOutput Sonnet + Codex + Direct
2 🟡 FallbackModels safety net removed — single-model picker on failure Sonnet + Direct
3 🟡 LLM probe can hallucinate model IDs, treated as primary over SDK Sonnet + Codex
4 🟡 outputTask/errorTask abandoned on timeout; CTS edge case Sonnet + Direct
5 🟢 No tests for ParseDirectCliModelProbeOutput or coalescing logic Codex + Direct

Recommended action: ⚠️ Request changes

vitek-karas and others added 2 commits April 2, 2026 15:34
- Add maxDepth guard (5) to ParseDirectCliModelProbeOutput to prevent
  stack overflow from pathological nested LLM output
- Restore FallbackModels safety net: when discovered models list has
  only 1 entry (just DefaultModel), fall back to the curated list
- Fix CTS timeout race: read stdout/stderr without cancellation token
  so a successful process exit isn't discarded by a late CTS firing
- Add IsValidModelSlug() regex filter to discard hallucinated model IDs
  from LLM probe output (only accepts lowercase alphanumeric with
  hyphens and dots)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner

PureWeen commented Apr 2, 2026

PR #474 — R2 Multi-Model Review

PR: Ask Copilot CLI for the list of models instead of hardcoding it
Author: vitek-karas | Branch: fix/all-modelsmain
Files: 10 changed, +361/−45 | Commits: 2 (original + R1 fix commit)
CI: ⚠️ No checks reported on branch
Prior reviews: R1 posted (5 findings)
Models: Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.3-Codex


R1 Findings Status

# R1 Finding Status Verification
1 🟡 Unbounded recursion in ParseDirectCliModelProbeOutput FIXED maxDepth=5 param added, guarded at maxDepth <= 0 (3/3 models confirmed)
2 🟡 FallbackModels safety net removed ⚠️ PARTIALLY FIXED Count > 1 guard added but has edge case — see new finding #1
3 🟡 LLM probe can hallucinate model IDs FIXED IsValidModelSlug regex + .Where() filter applied (3/3 confirmed)
4 🟡 CTS timeout race on stdout/stderr FIXED ReadToEndAsync() without CTS; only WaitForExitAsync uses CTS (3/3 confirmed)
5 🟢 No tests for ParseDirectCliModelProbeOutput STILL OUTSTANDING 0 tests for JSON parser, markdown fence extraction, slug filter, maxDepth (3/3 noted)

New Consensus Findings

1. 🟡 MODERATE — Count > 1 heuristic incorrectly bypasses FallbackModels

Files: SessionSidebar.razor:1441-1443, Dashboard.razor:3355-3357
Flagged by: Opus ✓ · Sonnet ✓ · Codex ✓

ModelHelper.BuildSelectionList(CopilotService.AvailableModels, selectedModel, CopilotService.DefaultModel)
    is { Count: > 1 } list ? list
    : ModelHelper.BuildSelectionList(ModelHelper.FallbackModels, selectedModel, CopilotService.DefaultModel);

Bug: When AvailableModels is empty but selectedModelDefaultModel (both non-null), BuildSelectionList returns [selectedModel, DefaultModel] — count 2. The Count > 1 check passes, so FallbackModels is never used. Users with a previously-saved non-default model see a 2-item picker instead of the full ~20-item fallback.

Fix: Replace the heuristic with an explicit check:

CopilotService.AvailableModels.Count > 0
    ? ModelHelper.BuildSelectionList(CopilotService.AvailableModels, selectedModel, CopilotService.DefaultModel)
    : ModelHelper.BuildSelectionList(ModelHelper.FallbackModels, selectedModel, CopilotService.DefaultModel);

2. 🟢 MINOR — Unobserved task exceptions from fire-and-forget fetch

Files: CopilotService.cs:1276,1369,1468, CopilotService.Utilities.cs timeout path
Flagged by: Opus ✓ · Sonnet ✓

_ = FetchAvailableModelsAsync() at 3 call sites. If an unexpected exception escapes RunAvailableModelsFetchLoopAsync (e.g., from InvokeOnUI, SequenceEqual), it becomes an unobserved faulted task. Additionally, on the timeout path, outputTask/errorTask are abandoned while process.Dispose() races to close the streams.

Consider wrapping the loop body in a top-level try/catch, and awaiting stream tasks after TryTerminateProcess before rethrowing.

3. 🟢 MINOR — No tests for ParseDirectCliModelProbeOutput (carried from R1)

Flagged by: Opus ✓ · Sonnet ✓ · Codex ✓

The method is internal static — ideal for unit testing. It handles: valid JSON array, nested data.content unwrapping, markdown fences, hallucinated ID filtering, maxDepth exhaustion, empty/garbage input. All untested. Tests for BuildSelectionList exist and are good, but the highest-risk parser code has zero coverage.


What's Good

  • All 4 core R1 fixes are solid and well-implemented
  • IsValidModelSlug regex is clean and correctly anchored
  • CTS fix comment explains the rationale clearly
  • maxDepth default of 5 is reasonable
  • StateChangeCoalescerTests flaky test fix (TCS-based) is excellent
  • InvokeOnUI for OnStateChanged — correct thread marshaling
  • Coalescing lock pattern prevents redundant concurrent fetches
  • 3065 tests pass ✅

Summary

# Sev Finding Models Status
1 🟡 Count > 1 heuristic bypasses FallbackModels Opus + Sonnet + Codex NEW
2 🟢 Unobserved task exceptions (fire-and-forget + timeout) Opus + Sonnet NEW
3 🟢 No tests for ParseDirectCliModelProbeOutput Opus + Sonnet + Codex R1 carry

Recommended action: ⚠️ Request changes — Fix #1 (replace Count > 1 with AvailableModels.Count > 0 check). The other items are minor improvements that can be done in a follow-up.

…heuristic

The Count > 1 check incorrectly bypassed FallbackModels when
selectedModel != DefaultModel (producing a 2-item list from
required models alone). Also adds fallback guard to
SessionListItem.GetSelectableModels() which had none.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner

PureWeen commented Apr 2, 2026

PR #474 — R3 Multi-Model Review

PR: Ask Copilot CLI for the list of models instead of hardcoding it
Author: vitek-karas | Branch: fix/all-modelsmain
Files: 10 changed, +363/−45 | Commits: 3 (original + 2 review fix commits)
CI: ⚠️ No checks reported on branch
Tests: 3065 pass ✅
Models: Claude Opus 4.6 ✓ · Claude Sonnet 4.6 ✓ · GPT-5.3-Codex ✓


R2 Findings Status — All Fixed ✅

# R2 Finding Status Verification
1 🟡 Count > 1 heuristic bypasses FallbackModels FIXED Now AvailableModels.Count > 0 in all 3 sites (3/3 confirmed)
2 🟡 SessionListItem.GetSelectableModels() missing fallback FIXED Same Count > 0 guard added (3/3 confirmed)

R1 Findings Status — All Fixed ✅

# R1 Finding Status
1 🟡 Unbounded recursion in ParseDirectCliModelProbeOutput ✅ FIXED (maxDepth=5)
2 🟡 FallbackModels safety net removed ✅ FIXED (explicit Count > 0 guard)
3 🟡 LLM probe can hallucinate model IDs ✅ FIXED (IsValidModelSlug regex)
4 🟡 CTS timeout race on stdout/stderr ✅ FIXED (ReadToEndAsync without CTS)

Remaining Minor Items (no consensus for blocking)

# Sev Finding Models
1 🟢 Unobserved task exceptions from _ = FetchAvailableModelsAsync() fire-and-forget Opus + Sonnet + Codex
2 🟢 No tests for ParseDirectCliModelProbeOutput / GetJsonArrayCandidates / IsValidModelSlug Opus + Sonnet + Codex

Both are acknowledged minor gaps — neither blocks merge. The parser test gap is the more impactful one given the method's complexity, but can be addressed in a follow-up.

What's Good

  • Clean BuildSelectionList API — normalize + dedup + append pattern
  • Coalescing lock pattern prevents redundant concurrent fetches
  • maxDepth recursion guard, IsValidModelSlug regex filter
  • CTS fix with clear explanatory comment
  • Consistent fallback pattern across all 3 UI components
  • Excellent flaky test fix (TCS-based StateChangeCoalescerTests)
  • Atomic reference swap for _localAvailableModels — no torn reads

Recommended action:Approve

All 6 findings from R1+R2 are resolved. The 2 remaining 🟢 items (fire-and-forget exception handling + parser tests) are acknowledged minor improvements suitable for follow-up. 3/3 models agree the PR is ready to merge.

@PureWeen PureWeen marked this pull request as draft April 2, 2026 21:21
@PureWeen PureWeen marked this pull request as ready for review April 2, 2026 21:43
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner

PureWeen commented Apr 2, 2026

PR #474 — R4 Review (post author commit)

Reviewer: Multi-model consensus (Claude Opus 4.6 + Claude Sonnet 4.6 + GPT-5.3-Codex)
Diff: 677 lines across 10 files (4 commits total)
CI: No checks configured ⚠️

New Commit Reviewed

Commit 4: "Preserve selected model during usage updates" (by PureWeen)

  • Adds ModelHelper.ShouldAcceptObservedModel() — guards against backend usage events overwriting explicit user model selection
  • Applied at SessionUsageInfoEvent and AssistantUsageEvent handlers in Events.cs
  • 3 new tests in ModelSelectionTests.cs

R1/R2/R3 Fix Status

Finding Status
R1: ModelDisplayNames never populated ✅ Dead code — correctly skipped
R1: Unbounded recursion in ParseDirectCliModelProbeOutput ✅ Fixed (maxDepth=5)
R1: FallbackModels safety net removed ✅ Fixed (Count > 0 pattern)
R1: CTS timeout race ✅ Fixed (ReadToEndAsync without CTS)
R1: LLM probe output not validated ✅ Fixed (IsValidModelSlug regex)
R2: Count > 1 heuristic wrong ✅ Fixed (Count > 0 in all 3 UI files)

Consensus Findings (2+ of 3 models)

🟢 MINOR — "resumed" magic string duplicated without shared constant

Flagged by: Opus, Sonnet | Files: ModelHelper.cs, CopilotService.Events.cs:741, SessionManager.cs:68

ShouldAcceptObservedModel checks normalizedCurrent == "resumed" (after NormalizeToSlug), while Events.cs:741 checks the raw value state.Info.Model == "resumed". Both work today because "resumed" is always set as a lowercase literal, but if the sentinel definition ever changed, only one path would break silently. Consider extracting a const string ResumedModelPlaceholder = "resumed" to ModelHelper.

Non-blocking — the code is correct as-is. This is a maintainability suggestion.

Below Consensus Threshold (1/3 — noted for awareness)

Finding Model Assessment
"resumed" could leak into model picker dropdown for old sessions Sonnet Edge case for legacy data only; low practical risk
Race in check-then-set on state.Info.Model Codex Explicitly dismissed by Sonnet — PR actually narrows the existing race window
Unobserved fire-and-forget _ = FetchAvailableModelsAsync() Opus (R3 carry) Internal try-catches make silent failure unlikely
No unit tests for ParseDirectCliModelProbeOutput Opus (R3 carry) Good follow-up item, not blocking

New Code Assessment

ShouldAcceptObservedModel logic — All 3 models confirmed the decision matrix is correct:

  • Empty/null current → accept (no user choice yet) ✅
  • "resumed" placeholder → accept (legacy sentinel) ✅
  • Same model after normalization → accept (no conflict) ✅
  • Different model → reject (preserve user choice) ✅

Test coverage — 3 new tests cover the main paths (empty, resumed, same-model, different-model). NormalizeToSlug(null) path is untested but trivially correct.

Verdict

✅ Approve

The PR is in good shape after 4 commits and 3 rounds of fixes. The new ShouldAcceptObservedModel commit is well-designed and correctly solves the model-overwrite-on-usage-events problem. All prior R1/R2 findings are verified fixed. The one consensus finding (magic string duplication) is 🟢 minor and non-blocking.

Recommended follow-ups (not blocking merge):

  1. Extract "resumed" sentinel to a shared constant
  2. Add unit tests for ParseDirectCliModelProbeOutput

@PureWeen PureWeen merged commit 926265c into PureWeen:main Apr 2, 2026
@vitek-karas vitek-karas deleted the fix/all-models branch April 3, 2026 12:33
arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
…Ween#474)

Co-authored-by: Shane Neuville <shneuvil@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants