feat: PolyPilot CI integration test workflow (Linux/GTK + MauiDevFlow) by PureWeen · Pull Request #615 · PureWeen/PolyPilot

PureWeen · 2026-04-18T18:30:26Z

Goal

A GitHub Action that builds and runs PolyPilot on Linux/GTK with MauiDevFlow for automated UI inspection and end-to-end testing.

Architecture

ubuntu-latest → Build PolyPilot.Gtk (Debug) → xvfb → MauiDevFlow agent → smoke tests

Current State

DO NOT MERGE — work in progress

Builds PolyPilot.Gtk in Debug mode, launches under xvfb virtual display, and runs MauiDevFlow smoke tests: - Visual tree inspection - Screenshot capture - CDP/Blazor WebView status - Application log capture Triggered by workflow_dispatch with optional issue_number input. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The first run showed the app launches successfully but MauiDevFlow agent wasn't connecting. Added: - Explicit broker start before app launch - Port scanning fallback (9223-9230) if broker discovery fails - Direct --agent-port on all smoke test commands - Stderr capture from app launch - Diagnostics step when agent not found (ps, ss, stderr log) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

Expert Code Review — PR #615

Methodology: 3 independent reviewers with adversarial consensus. Findings required 2/3+ agreement. Single-reviewer findings were challenged via follow-up and included only if 2+ ultimately agreed.

Findings Summary

#	Severity	Consensus	Issue
1	🟡 MODERATE	3/3	Binary name mismatch (`PolyPilot.Gtk` vs `PolyPilot`) + missing empty-binary guard
2	🟡 MODERATE	2/3	Unquoted `${{ }}` expression in `run:` block — breakage on empty/spaces
3	🟡 MODERATE	3/3	`
4	🟡 MODERATE	2/3	Wrong MAUI workload (`maui-android` instead of `maui` for GTK target)
5	🟡 MODERATE	3/3	All smoke tests swallow failures — workflow always reports green
6	🟡 MODERATE	3/3	No `permissions:` block — inherits excessive `write-all` token scope
7	🟢 MINOR	3/3	`issue_number` and `scenario` inputs declared but never used
8	🟢 MINOR	2/3	Stale action versions (`@v4` vs `@v6`/`@v7` used in `build.yml`)

Key Interaction Chain

Findings 1, 2, and 3 compound into a complete workflow failure: the binary name mismatch (#1) guarantees the primary find always misses → the missing guard lets an empty value through → the unquoted expression (#2) silently backgrounds nothing → MauiDevFlow wait times out 60s later with a misleading error. Meanwhile, the wrong workload (#4) may fail the build entirely, and || true on tool install (#3) ensures that failure is also invisible.

Even after fixing these correctness issues, the swallowed smoke test failures (#5) mean the workflow would report false-positive green runs, defeating the purpose of integration testing.

Discarded Findings (no consensus)

/tmp screenshot path — 1/3; correctly identified as standard CI runner practice, not applicable to the agent-workspace /tmp prohibition
Wrong package ID (Microsoft.Maui.Cli) — 1/3; verified incorrect — Microsoft.Maui.Cli is the correct package that wraps maui devflow
Manual-only trigger — 1/3; follow-up reviewers unanimously disagreed — workflow_dispatch is intentional for heavyweight integration tests

CI Status

No CI checks have run on this PR (new workflow file only — no build targets to validate)

Test Coverage

N/A — this is a workflow file, not application code. No unit tests affected.

Prior Reviews

No prior human reviews found on this PR.

Generated by Expert Code Review (auto) for issue #615

github-actions · 2026-04-18T18:47:53Z

+      - name: Find built binary
+        id: find-binary
+        run: |
+          BINARY=$(find PolyPilot.Gtk/bin/Debug -name "PolyPilot.Gtk" -type f -executable | head -1)


🟡 MODERATE — Binary name mismatch: primary search always fails (Flagged by: 3/3 reviewers)

PolyPilot.Gtk.csproj declares (AssemblyName)PolyPilot(/AssemblyName), so the output binary is named PolyPilot, not PolyPilot.Gtk. This find will never match. The fallback on line 64 (PolyPilot*) is overly broad and may match .dll, .runtimeconfig.json, or other artifacts.

If both searches fail, BINARY is empty and line 79 becomes a bare & — a shell no-op that assigns a transient PID. kill -0 succeeds momentarily (subshell still in process table), printing a false "✅ PolyPilot is running" before failing 60s later at MauiDevFlow wait with a misleading error.

Suggested fix:

BINARY=$(find PolyPilot.Gtk/bin/Debug -name "PolyPilot" -type f -executable | head -1) if [ -z "$BINARY" ]; then echo "❌ No executable binary found"; exit 1 fi

github-actions · 2026-04-18T18:47:53Z

+          sleep 2
+
+          # Launch PolyPilot in background
+          ${{ steps.find-binary.outputs.binary }} &


🟡 MODERATE — Unquoted expression interpolation (Flagged by: 2/3 reviewers)

GitHub Actions substitutes $\{\{ steps.find-binary.outputs.binary }} at YAML parse time, before the shell sees it. If the resolved path is empty (see binary name finding above) or contains spaces, the shell splits it incorrectly. An empty value makes this line & — backgrounding nothing but assigning a PID that briefly appears valid.

Suggested fix: Pass through an environment variable:

env: BINARY: $\{\{ steps.find-binary.outputs.binary }} run: | if [ -z "$BINARY" ]; then echo "❌ No binary"; exit 1; fi "$BINARY" &

github-actions · 2026-04-18T18:47:53Z

+        run: |
+          dotnet tool install --global Microsoft.Maui.Cli \
+            --add-source https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet10/nuget/v3/index.json \
+            --prerelease || true


🟡 MODERATE — || true silently swallows tool install failure (Flagged by: 3/3 reviewers)

If the MauiDevFlow CLI install fails (feed outage, version conflict, network error), the step exits 0 and the workflow proceeds. All subsequent maui devflow commands fail with "command not found" after burning 60s on the wait timeout, with a misleading "agent did not connect" error.

Suggested fix: Remove || true and add a verification:

dotnet tool install --global Microsoft.Maui.Cli \ --add-source https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet10/nuget/v3/index.json \ --prerelease command -v maui || { echo "❌ MauiDevFlow CLI not installed"; exit 1; }

github-actions · 2026-04-18T18:47:53Z

+          sudo apt-get install -y libgtk-4-dev libadwaita-1-dev libwebkitgtk-6.0-dev xvfb
+
+      - name: Install MAUI workload
+        run: dotnet workload install maui-android


🟡 MODERATE — Wrong MAUI workload for GTK/Linux target (Flagged by: 2/3 reviewers)

maui-android installs only Android SDK tooling. The GTK project targets net10.0 (plain .NET) and needs the base MAUI workload. The existing build.yml uses dotnet workload install maui for its non-Android jobs.

Suggested fix:

run: dotnet workload install maui

github-actions · 2026-04-18T18:47:53Z

+      - name: "Smoke Test: MAUI status"
+        run: |
+          export DISPLAY=:99
+          echo "=== Agent Status ==="
+          maui devflow MAUI status || echo "MAUI status failed"
+
+      - name: "Smoke Test: Visual tree"
+        run: |
+          export DISPLAY=:99
+          echo "=== Visual Tree ==="
+          maui devflow MAUI tree --depth 5 || echo "Tree inspection failed"
+
+      - name: "Smoke Test: Screenshot"
+        run: |
+          export DISPLAY=:99
+          echo "=== Taking screenshot ==="
+          maui devflow MAUI screenshot --output /tmp/polypilot-ci-screenshot.png || echo "Screenshot failed"
+
+      - name: "Smoke Test: CDP/Blazor status"
+        run: |
+          export DISPLAY=:99
+          echo "=== CDP Status ==="
+          maui devflow cdp status || echo "CDP status failed"
+
+      - name: "Smoke Test: Application logs"
+        run: |
+          export DISPLAY=:99
+          echo "=== Recent logs ==="
+          timeout 5 maui devflow MAUI logs || true


🟡 MODERATE — Smoke tests swallow all failures; CI always reports green (Flagged by: 3/3 reviewers)

Every smoke test uses || echo "... failed" or || true, exiting 0 regardless. A completely broken app produces a green checkmark. The only step that can fail the job is the MauiDevFlow wait — and that indicates infrastructure failure, not behavioral regression.

Suggested fix: Use continue-on-error: true at the step level for informational steps (logs) but let core validation steps (MAUI status, visual tree) fail the workflow:

- name: "Smoke Test: MAUI status" run: | export DISPLAY=:99 maui devflow MAUI status - name: "Smoke Test: Application logs" continue-on-error: true run: | export DISPLAY=:99 timeout 5 maui devflow MAUI logs || true

github-actions · 2026-04-18T18:47:53Z

+
+jobs:
+  integration-test:
+    runs-on: ubuntu-latest


🟡 MODERATE — No permissions: block; inherits excessive token scope (Flagged by: 3/3 reviewers after follow-up)

Every job in the existing build.yml declares an explicit permissions: block. This workflow has none, inheriting the repository default (write-all for workflow_dispatch). The workflow only needs read access.

Suggested fix:

jobs: integration-test: runs-on: ubuntu-latest timeout-minutes: 30 permissions: contents: read

github-actions · 2026-04-18T18:47:53Z

+    inputs:
+      issue_number:
+        description: 'Issue number to analyze (optional)'
+        required: false
+        type: number
+      scenario:
+        description: 'Test scenario to run (smoke, full)'
+        required: false
+        type: choice
+        options:
+          - smoke
+          - full
+        default: smoke


🟢 MINOR — Unused issue_number and scenario inputs (Flagged by: 3/3 reviewers)

Both inputs are declared but never referenced in any step. A user selecting scenario: full gets the same smoke tests — silently misleading.

Suggested fix: Remove both inputs until implemented, or wire scenario into conditional steps with if: inputs.scenario == 'full'.

github-actions · 2026-04-18T18:47:53Z

+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4


🟢 MINOR — Stale action versions (Flagged by: 2/3 reviewers)

build.yml uses actions/checkout@v6 and actions/upload-artifact@v7. This workflow uses @v4 for both. Align with existing repo conventions.

Suggested fix: actions/checkout@v6 (line 26), actions/upload-artifact@v7 (line 141).

Previous run showed app running but DevFlow agent not listening. - Increase startup wait from 15s to 30s for GTK+WebView+DevFlow init - Capture stdout (agent may log port there) - Search for .mauidevflow file - Scan broader port range including 5000/5001/8080 - Remove broken broker start step - Upload app logs as artifacts for debugging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The app crashed with 'Unable to resolve CodespaceService'. The GTK MauiProgram.cs was missing 5 service registrations that were added to the main MAUI project but not synced to GTK: - CodespaceService - ScheduledTaskService - PrLinkService - AuditLogService - EfficiencyAnalysisService Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The maui devflow CLI fails with 'Cannot connect to agent' even though curl confirms the agent responds on port 9223. The CLI may require broker registration that doesn't work in CI. Switched to direct curl calls to the agent's HTTP API endpoints: - /api/status, /api/maui/tree, /api/maui/screenshot - /api/cdp/status, /api/maui/logs Also kept CLI attempts with --verbose for debugging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The /api/maui/tree and /api/cdp/status routes returned 'Route not found'. Need to discover the actual API routes the agent exposes. This commit scans ~25 common route patterns and reports which ones respond with non-404 status codes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three parallel platform jobs: - Linux/GTK (ubuntu-latest) — working with xvfb + DevFlow on 9223 - Mac Catalyst (macos-15) — native macOS runner, no virtual display - Windows (windows-latest) — WinUI, PowerShell-based steps All use the same pattern: build Debug → launch → port scan for DevFlow agent → status/tree/screenshot smoke tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Mac: Add -p:ValidateXcodeVersion=false (macos-15 has Xcode 26.2, SDK wants 26.3) - Windows: Rename $pid to $appPid (read-only in PowerShell) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Passes COPILOT_GITHUB_TOKEN via env (masked by GitHub Actions). PolyPilot's ServerManager auto-reads this env var and forwards it to the headless copilot process — no manual CLI install needed. New steps: - Check for copilot headless server on port 4321 - Verify copilot process running - Check PID file and app logs for server startup - CDP snapshot of Blazor UI to verify session state - Post-startup screenshot Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Iterating toward creating a Copilot session in CI: - Scan all DevFlow API routes to discover interaction endpoints - Parse visual tree to find clickable buttons (New Session, etc.) - Try POST-based tap/navigate/evaluate routes - Attempt JS evaluation via DevFlow for Blazor UI interaction - Multiple screenshots (before/after) for visual debugging - App log capture for diagnostics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previous run confirmed: - Copilot headless server running on port 4321 - Authenticated as PureWeen via env token - 15 models loaded - Persistent mode connected Now trying to create a session via the bundled copilot CLI binary connecting to the running server. Also simplified the DevFlow verification to focus on session creation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previous attempt used non-existent CLI flags. This run: - Checks copilot --help to discover actual args - Tries stdin/stdout ACP protocol connection - Dumps all session-related app logs - Gets final screenshot for visual verification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

copilot --help revealed -p/--prompt flag for non-interactive scripting. Use this to create a session and send a test message directly via the bundled copilot CLI binary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previous run: copilot -p produced no output (--no-edit invalid). This run: removed --no-edit, added tee to capture full output, added grep check for expected response, log full output on failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Key fix: use -ap flag (not --agent-port) for direct DevFlow connection without broker. This enables full CLI + CDP interaction. New flow: 1. Verify CLI connects with -ap 2. Check Blazor UI state via CDP (document.title, session count) 3. Click New Session button via JS DOM manipulation 4. Fill chat input with test message 5. Click Send button 6. Wait 15s for response 7. Verify session state and capture screenshots Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

CLI -ap flag doesn't work in CI (different CLI version or transport). Bypass CLI entirely — scan all HTTP API endpoints including POST methods. Need to find tap/fill/click/evaluate routes via HTTP. Also tries /api/query with different parameter formats. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Found that POST /api/cdp accepts Chrome DevTools Protocol messages. This is the key to driving the Blazor UI — Runtime.evaluate lets us execute JavaScript to click buttons, fill inputs, and read state. Flow: 1. Runtime.evaluate to get page state (title, buttons list) 2. Runtime.evaluate to click New Session button via DOM 3. Runtime.evaluate to fill chat input with test message 4. Runtime.evaluate to click Send button 5. Wait 15s for Copilot response 6. Runtime.evaluate to read message state Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previous run successfully clicked 'Create something new' but it opened a popover menu. Now does the full two-step flow: 1. Click '+' button to open popover 2. Click 'Session' option in the popover 3. Find and fill session name input 4. Click Create/Submit button 5. Verify session appears Also adds cdp_eval helper function for cleaner CDP calls. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

After session creation (confirmed working), now: 7. Click session item to expand/select it 8. Find visible chat input (.card-input input/textarea) 9. Fill with test message + dispatch input/change events 10. Click .send-btn or dispatch Enter keydown 11. Wait 20s for Copilot response 12. Check for response messages in DOM Multiple screenshots captured at each stage for debugging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-22T04:00:13Z

🔍 Code Review — PR #615

feat: PolyPilot CI integration test workflow
CI Status: ✅ All 3 checks passing (Linux, Mac Catalyst, Windows)

All 8 findings from Round 1 — FIXED ✅

#	Finding	Status
1	Missing if guard on CDP step	✅ Fixed
2	Windows artifact path mismatch	✅ Fixed
3	sleep 30 → poll loop	✅ Fixed
4	Unnecessary token on CDP step	✅ Fixed
5	or-true swallowing failures	✅ Fixed
6	Fragile cdp_eval interpolation	✅ Fixed
7	Empty binary guard	✅ Fixed
8	Hardcoded Xcode	✅ Fixed

Verified Clean

Token security ✅
GTK DI registrations ✅
CDP injection safety ✅
Cleanup ✅

Recommendation

✅ Approve — all findings addressed, CI green on all 3 platforms.

Addresses all review findings from PR #615: 1. Add missing if: guard on CDP session-creation step 2. Fix Windows artifact paths: $env:TEMP → $env:RUNNER_TEMP 3. Remove unnecessary COPILOT_GITHUB_TOKEN on CDP step 4. Add explicit guard for empty binary path 5. Use default Xcode instead of hardcoded versions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

3. Replace sleep 30 with poll loop (up to 60s) on Linux and Mac — checks both process health and DevFlow agent readiness 5. Tool install: verify maui --version on failure instead of || true 6. cdp_eval: use python3 json.dumps for safe JSON construction instead of fragile shell string interpolation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The linker fails with MT0180 when Xcode version doesn't match the SDK. ValidateXcodeVersion=false bypasses the initial check but the linker still fails. MtouchLink=SdkOnly avoids linking against new APIs that require the newer SDK headers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen and others added 3 commits April 18, 2026 13:30

Add push trigger for PR branch testing

3cdc2bc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot requested changes Apr 18, 2026

View reviewed changes

PureWeen and others added 21 commits April 18, 2026 14:14

Broaden push trigger paths to include GTK and PolyPilot source

a760c25

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix Mac Catalyst Xcode version check and Windows $pid cleanup

09fb7f2

- Mac: Add -p:ValidateXcodeVersion=false (macos-15 has Xcode 26.2, SDK wants 26.3) - Windows: Rename $pid to $appPid (read-only in PowerShell) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix: macOS has no timeout command — use perl alarm

0e781c3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Trigger CI run

3e50120

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Trigger CI run

39bb1ac

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen and others added 3 commits April 21, 2026 23:01

PureWeen marked this pull request as ready for review April 22, 2026 04:51

PureWeen merged commit 60db6de into main Apr 22, 2026
3 checks passed

PureWeen deleted the feature/polypilot-ci-integration branch April 22, 2026 04:51

github-actions Bot mentioned this pull request Apr 22, 2026

[review-retro] Review Retrospective — PR #615 #700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: PolyPilot CI integration test workflow (Linux/GTK + MauiDevFlow)#615

feat: PolyPilot CI integration test workflow (Linux/GTK + MauiDevFlow)#615
PureWeen merged 27 commits intomainfrom
feature/polypilot-ci-integration

PureWeen commented Apr 18, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

github-actions Bot Apr 18, 2026

Uh oh!

PureWeen commented Apr 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PureWeen commented Apr 18, 2026

Goal

Architecture

Current State

DO NOT MERGE — work in progress

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Expert Code Review — PR #615

Findings Summary

Key Interaction Chain

Discarded Findings (no consensus)

CI Status

Test Coverage

Prior Reviews

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Code Review — PR #615

All 8 findings from Round 1 — FIXED ✅

Verified Clean

Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PureWeen commented Apr 22, 2026 •

edited

Loading