feat: PolyPilot CI integration test workflow (Linux/GTK + MauiDevFlow)#615
feat: PolyPilot CI integration test workflow (Linux/GTK + MauiDevFlow)#615
Conversation
Builds PolyPilot.Gtk in Debug mode, launches under xvfb virtual display, and runs MauiDevFlow smoke tests: - Visual tree inspection - Screenshot capture - CDP/Blazor WebView status - Application log capture Triggered by workflow_dispatch with optional issue_number input. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The first run showed the app launches successfully but MauiDevFlow agent wasn't connecting. Added: - Explicit broker start before app launch - Port scanning fallback (9223-9230) if broker discovery fails - Direct --agent-port on all smoke test commands - Stderr capture from app launch - Diagnostics step when agent not found (ps, ss, stderr log) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Expert Code Review — PR #615
Methodology: 3 independent reviewers with adversarial consensus. Findings required 2/3+ agreement. Single-reviewer findings were challenged via follow-up and included only if 2+ ultimately agreed.
Findings Summary
| # | Severity | Consensus | Issue |
|---|---|---|---|
| 1 | 🟡 MODERATE | 3/3 | Binary name mismatch (PolyPilot.Gtk vs PolyPilot) + missing empty-binary guard |
| 2 | 🟡 MODERATE | 2/3 | Unquoted ${{ }} expression in run: block — breakage on empty/spaces |
| 3 | 🟡 MODERATE | 3/3 | ` |
| 4 | 🟡 MODERATE | 2/3 | Wrong MAUI workload (maui-android instead of maui for GTK target) |
| 5 | 🟡 MODERATE | 3/3 | All smoke tests swallow failures — workflow always reports green |
| 6 | 🟡 MODERATE | 3/3 | No permissions: block — inherits excessive write-all token scope |
| 7 | 🟢 MINOR | 3/3 | issue_number and scenario inputs declared but never used |
| 8 | 🟢 MINOR | 2/3 | Stale action versions (@v4 vs @v6/@v7 used in build.yml) |
Key Interaction Chain
Findings 1, 2, and 3 compound into a complete workflow failure: the binary name mismatch (#1) guarantees the primary find always misses → the missing guard lets an empty value through → the unquoted expression (#2) silently backgrounds nothing → MauiDevFlow wait times out 60s later with a misleading error. Meanwhile, the wrong workload (#4) may fail the build entirely, and || true on tool install (#3) ensures that failure is also invisible.
Even after fixing these correctness issues, the swallowed smoke test failures (#5) mean the workflow would report false-positive green runs, defeating the purpose of integration testing.
Discarded Findings (no consensus)
/tmpscreenshot path — 1/3; correctly identified as standard CI runner practice, not applicable to the agent-workspace/tmpprohibition- Wrong package ID (
Microsoft.Maui.Cli) — 1/3; verified incorrect —Microsoft.Maui.Cliis the correct package that wrapsmaui devflow - Manual-only trigger — 1/3; follow-up reviewers unanimously disagreed —
workflow_dispatchis intentional for heavyweight integration tests
CI Status
- No CI checks have run on this PR (new workflow file only — no build targets to validate)
Test Coverage
- N/A — this is a workflow file, not application code. No unit tests affected.
Prior Reviews
- No prior human reviews found on this PR.
Generated by Expert Code Review (auto) for issue #615
| - name: Find built binary | ||
| id: find-binary | ||
| run: | | ||
| BINARY=$(find PolyPilot.Gtk/bin/Debug -name "PolyPilot.Gtk" -type f -executable | head -1) |
There was a problem hiding this comment.
🟡 MODERATE — Binary name mismatch: primary search always fails (Flagged by: 3/3 reviewers)
PolyPilot.Gtk.csproj declares (AssemblyName)PolyPilot(/AssemblyName), so the output binary is named PolyPilot, not PolyPilot.Gtk. This find will never match. The fallback on line 64 (PolyPilot*) is overly broad and may match .dll, .runtimeconfig.json, or other artifacts.
If both searches fail, BINARY is empty and line 79 becomes a bare & — a shell no-op that assigns a transient PID. kill -0 succeeds momentarily (subshell still in process table), printing a false "✅ PolyPilot is running" before failing 60s later at MauiDevFlow wait with a misleading error.
Suggested fix:
BINARY=$(find PolyPilot.Gtk/bin/Debug -name "PolyPilot" -type f -executable | head -1)
if [ -z "$BINARY" ]; then
echo "❌ No executable binary found"; exit 1
fi| sleep 2 | ||
|
|
||
| # Launch PolyPilot in background | ||
| ${{ steps.find-binary.outputs.binary }} & |
There was a problem hiding this comment.
🟡 MODERATE — Unquoted expression interpolation (Flagged by: 2/3 reviewers)
GitHub Actions substitutes $\{\{ steps.find-binary.outputs.binary }} at YAML parse time, before the shell sees it. If the resolved path is empty (see binary name finding above) or contains spaces, the shell splits it incorrectly. An empty value makes this line & — backgrounding nothing but assigning a PID that briefly appears valid.
Suggested fix: Pass through an environment variable:
env:
BINARY: $\{\{ steps.find-binary.outputs.binary }}
run: |
if [ -z "$BINARY" ]; then echo "❌ No binary"; exit 1; fi
"$BINARY" &| run: | | ||
| dotnet tool install --global Microsoft.Maui.Cli \ | ||
| --add-source https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet10/nuget/v3/index.json \ | ||
| --prerelease || true |
There was a problem hiding this comment.
🟡 MODERATE — || true silently swallows tool install failure (Flagged by: 3/3 reviewers)
If the MauiDevFlow CLI install fails (feed outage, version conflict, network error), the step exits 0 and the workflow proceeds. All subsequent maui devflow commands fail with "command not found" after burning 60s on the wait timeout, with a misleading "agent did not connect" error.
Suggested fix: Remove || true and add a verification:
dotnet tool install --global Microsoft.Maui.Cli \
--add-source https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet10/nuget/v3/index.json \
--prerelease
command -v maui || { echo "❌ MauiDevFlow CLI not installed"; exit 1; }| sudo apt-get install -y libgtk-4-dev libadwaita-1-dev libwebkitgtk-6.0-dev xvfb | ||
|
|
||
| - name: Install MAUI workload | ||
| run: dotnet workload install maui-android |
There was a problem hiding this comment.
🟡 MODERATE — Wrong MAUI workload for GTK/Linux target (Flagged by: 2/3 reviewers)
maui-android installs only Android SDK tooling. The GTK project targets net10.0 (plain .NET) and needs the base MAUI workload. The existing build.yml uses dotnet workload install maui for its non-Android jobs.
Suggested fix:
run: dotnet workload install maui| - name: "Smoke Test: MAUI status" | ||
| run: | | ||
| export DISPLAY=:99 | ||
| echo "=== Agent Status ===" | ||
| maui devflow MAUI status || echo "MAUI status failed" | ||
|
|
||
| - name: "Smoke Test: Visual tree" | ||
| run: | | ||
| export DISPLAY=:99 | ||
| echo "=== Visual Tree ===" | ||
| maui devflow MAUI tree --depth 5 || echo "Tree inspection failed" | ||
|
|
||
| - name: "Smoke Test: Screenshot" | ||
| run: | | ||
| export DISPLAY=:99 | ||
| echo "=== Taking screenshot ===" | ||
| maui devflow MAUI screenshot --output /tmp/polypilot-ci-screenshot.png || echo "Screenshot failed" | ||
|
|
||
| - name: "Smoke Test: CDP/Blazor status" | ||
| run: | | ||
| export DISPLAY=:99 | ||
| echo "=== CDP Status ===" | ||
| maui devflow cdp status || echo "CDP status failed" | ||
|
|
||
| - name: "Smoke Test: Application logs" | ||
| run: | | ||
| export DISPLAY=:99 | ||
| echo "=== Recent logs ===" | ||
| timeout 5 maui devflow MAUI logs || true |
There was a problem hiding this comment.
🟡 MODERATE — Smoke tests swallow all failures; CI always reports green (Flagged by: 3/3 reviewers)
Every smoke test uses || echo "... failed" or || true, exiting 0 regardless. A completely broken app produces a green checkmark. The only step that can fail the job is the MauiDevFlow wait — and that indicates infrastructure failure, not behavioral regression.
Suggested fix: Use continue-on-error: true at the step level for informational steps (logs) but let core validation steps (MAUI status, visual tree) fail the workflow:
- name: "Smoke Test: MAUI status"
run: |
export DISPLAY=:99
maui devflow MAUI status
- name: "Smoke Test: Application logs"
continue-on-error: true
run: |
export DISPLAY=:99
timeout 5 maui devflow MAUI logs || true|
|
||
| jobs: | ||
| integration-test: | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
🟡 MODERATE — No permissions: block; inherits excessive token scope (Flagged by: 3/3 reviewers after follow-up)
Every job in the existing build.yml declares an explicit permissions: block. This workflow has none, inheriting the repository default (write-all for workflow_dispatch). The workflow only needs read access.
Suggested fix:
jobs:
integration-test:
runs-on: ubuntu-latest
timeout-minutes: 30
permissions:
contents: read| inputs: | ||
| issue_number: | ||
| description: 'Issue number to analyze (optional)' | ||
| required: false | ||
| type: number | ||
| scenario: | ||
| description: 'Test scenario to run (smoke, full)' | ||
| required: false | ||
| type: choice | ||
| options: | ||
| - smoke | ||
| - full | ||
| default: smoke |
There was a problem hiding this comment.
🟢 MINOR — Unused issue_number and scenario inputs (Flagged by: 3/3 reviewers)
Both inputs are declared but never referenced in any step. A user selecting scenario: full gets the same smoke tests — silently misleading.
Suggested fix: Remove both inputs until implemented, or wire scenario into conditional steps with if: inputs.scenario == 'full'.
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 |
There was a problem hiding this comment.
🟢 MINOR — Stale action versions (Flagged by: 2/3 reviewers)
build.yml uses actions/checkout@v6 and actions/upload-artifact@v7. This workflow uses @v4 for both. Align with existing repo conventions.
Suggested fix: actions/checkout@v6 (line 26), actions/upload-artifact@v7 (line 141).
Previous run showed app running but DevFlow agent not listening. - Increase startup wait from 15s to 30s for GTK+WebView+DevFlow init - Capture stdout (agent may log port there) - Search for .mauidevflow file - Scan broader port range including 5000/5001/8080 - Remove broken broker start step - Upload app logs as artifacts for debugging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The app crashed with 'Unable to resolve CodespaceService'. The GTK MauiProgram.cs was missing 5 service registrations that were added to the main MAUI project but not synced to GTK: - CodespaceService - ScheduledTaskService - PrLinkService - AuditLogService - EfficiencyAnalysisService Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The maui devflow CLI fails with 'Cannot connect to agent' even though curl confirms the agent responds on port 9223. The CLI may require broker registration that doesn't work in CI. Switched to direct curl calls to the agent's HTTP API endpoints: - /api/status, /api/maui/tree, /api/maui/screenshot - /api/cdp/status, /api/maui/logs Also kept CLI attempts with --verbose for debugging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The /api/maui/tree and /api/cdp/status routes returned 'Route not found'. Need to discover the actual API routes the agent exposes. This commit scans ~25 common route patterns and reports which ones respond with non-404 status codes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three parallel platform jobs: - Linux/GTK (ubuntu-latest) — working with xvfb + DevFlow on 9223 - Mac Catalyst (macos-15) — native macOS runner, no virtual display - Windows (windows-latest) — WinUI, PowerShell-based steps All use the same pattern: build Debug → launch → port scan for DevFlow agent → status/tree/screenshot smoke tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Mac: Add -p:ValidateXcodeVersion=false (macos-15 has Xcode 26.2, SDK wants 26.3) - Windows: Rename $pid to $appPid (read-only in PowerShell) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Passes COPILOT_GITHUB_TOKEN via env (masked by GitHub Actions). PolyPilot's ServerManager auto-reads this env var and forwards it to the headless copilot process — no manual CLI install needed. New steps: - Check for copilot headless server on port 4321 - Verify copilot process running - Check PID file and app logs for server startup - CDP snapshot of Blazor UI to verify session state - Post-startup screenshot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iterating toward creating a Copilot session in CI: - Scan all DevFlow API routes to discover interaction endpoints - Parse visual tree to find clickable buttons (New Session, etc.) - Try POST-based tap/navigate/evaluate routes - Attempt JS evaluation via DevFlow for Blazor UI interaction - Multiple screenshots (before/after) for visual debugging - App log capture for diagnostics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous run confirmed: - Copilot headless server running on port 4321 - Authenticated as PureWeen via env token - 15 models loaded - Persistent mode connected Now trying to create a session via the bundled copilot CLI binary connecting to the running server. Also simplified the DevFlow verification to focus on session creation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous attempt used non-existent CLI flags. This run: - Checks copilot --help to discover actual args - Tries stdin/stdout ACP protocol connection - Dumps all session-related app logs - Gets final screenshot for visual verification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
copilot --help revealed -p/--prompt flag for non-interactive scripting. Use this to create a session and send a test message directly via the bundled copilot CLI binary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous run: copilot -p produced no output (--no-edit invalid). This run: removed --no-edit, added tee to capture full output, added grep check for expected response, log full output on failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Key fix: use -ap flag (not --agent-port) for direct DevFlow connection without broker. This enables full CLI + CDP interaction. New flow: 1. Verify CLI connects with -ap 2. Check Blazor UI state via CDP (document.title, session count) 3. Click New Session button via JS DOM manipulation 4. Fill chat input with test message 5. Click Send button 6. Wait 15s for response 7. Verify session state and capture screenshots Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CLI -ap flag doesn't work in CI (different CLI version or transport). Bypass CLI entirely — scan all HTTP API endpoints including POST methods. Need to find tap/fill/click/evaluate routes via HTTP. Also tries /api/query with different parameter formats. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Found that POST /api/cdp accepts Chrome DevTools Protocol messages. This is the key to driving the Blazor UI — Runtime.evaluate lets us execute JavaScript to click buttons, fill inputs, and read state. Flow: 1. Runtime.evaluate to get page state (title, buttons list) 2. Runtime.evaluate to click New Session button via DOM 3. Runtime.evaluate to fill chat input with test message 4. Runtime.evaluate to click Send button 5. Wait 15s for Copilot response 6. Runtime.evaluate to read message state Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous run successfully clicked 'Create something new' but it opened a popover menu. Now does the full two-step flow: 1. Click '+' button to open popover 2. Click 'Session' option in the popover 3. Find and fill session name input 4. Click Create/Submit button 5. Verify session appears Also adds cdp_eval helper function for cleaner CDP calls. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After session creation (confirmed working), now: 7. Click session item to expand/select it 8. Find visible chat input (.card-input input/textarea) 9. Fill with test message + dispatch input/change events 10. Click .send-btn or dispatch Enter keydown 11. Wait 20s for Copilot response 12. Check for response messages in DOM Multiple screenshots captured at each stage for debugging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Code Review — PR #615feat: PolyPilot CI integration test workflow All 8 findings from Round 1 — FIXED ✅
Verified Clean
Recommendation✅ Approve — all findings addressed, CI green on all 3 platforms. |
Addresses all review findings from PR #615: 1. Add missing if: guard on CDP session-creation step 2. Fix Windows artifact paths: $env:TEMP → $env:RUNNER_TEMP 3. Remove unnecessary COPILOT_GITHUB_TOKEN on CDP step 4. Add explicit guard for empty binary path 5. Use default Xcode instead of hardcoded versions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3. Replace sleep 30 with poll loop (up to 60s) on Linux and Mac — checks both process health and DevFlow agent readiness 5. Tool install: verify maui --version on failure instead of || true 6. cdp_eval: use python3 json.dumps for safe JSON construction instead of fragile shell string interpolation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The linker fails with MT0180 when Xcode version doesn't match the SDK. ValidateXcodeVersion=false bypasses the initial check but the linker still fails. MtouchLink=SdkOnly avoids linking against new APIs that require the newer SDK headers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Goal
A GitHub Action that builds and runs PolyPilot on Linux/GTK with MauiDevFlow for automated UI inspection and end-to-end testing.
Architecture
Current State
DO NOT MERGE — work in progress