Skip to content

feat: PolyPilot CI integration test workflow (Linux/GTK + MauiDevFlow)#615

Merged
PureWeen merged 27 commits intomainfrom
feature/polypilot-ci-integration
Apr 22, 2026
Merged

feat: PolyPilot CI integration test workflow (Linux/GTK + MauiDevFlow)#615
PureWeen merged 27 commits intomainfrom
feature/polypilot-ci-integration

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Goal

A GitHub Action that builds and runs PolyPilot on Linux/GTK with MauiDevFlow for automated UI inspection and end-to-end testing.

Architecture

ubuntu-latest → Build PolyPilot.Gtk (Debug) → xvfb → MauiDevFlow agent → smoke tests

Current State

  • Workflow file created
  • Build passes in CI
  • App launches under xvfb
  • MauiDevFlow connects
  • Smoke tests pass
  • Copilot CLI integration
  • End-to-end scenario

DO NOT MERGE — work in progress

PureWeen and others added 3 commits April 18, 2026 13:30
Builds PolyPilot.Gtk in Debug mode, launches under xvfb virtual
display, and runs MauiDevFlow smoke tests:
- Visual tree inspection
- Screenshot capture
- CDP/Blazor WebView status
- Application log capture

Triggered by workflow_dispatch with optional issue_number input.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The first run showed the app launches successfully but MauiDevFlow
agent wasn't connecting. Added:
- Explicit broker start before app launch
- Port scanning fallback (9223-9230) if broker discovery fails
- Direct --agent-port on all smoke test commands
- Stderr capture from app launch
- Diagnostics step when agent not found (ps, ss, stderr log)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Code Review — PR #615

Methodology: 3 independent reviewers with adversarial consensus. Findings required 2/3+ agreement. Single-reviewer findings were challenged via follow-up and included only if 2+ ultimately agreed.

Findings Summary

# Severity Consensus Issue
1 🟡 MODERATE 3/3 Binary name mismatch (PolyPilot.Gtk vs PolyPilot) + missing empty-binary guard
2 🟡 MODERATE 2/3 Unquoted ${{ }} expression in run: block — breakage on empty/spaces
3 🟡 MODERATE 3/3 `
4 🟡 MODERATE 2/3 Wrong MAUI workload (maui-android instead of maui for GTK target)
5 🟡 MODERATE 3/3 All smoke tests swallow failures — workflow always reports green
6 🟡 MODERATE 3/3 No permissions: block — inherits excessive write-all token scope
7 🟢 MINOR 3/3 issue_number and scenario inputs declared but never used
8 🟢 MINOR 2/3 Stale action versions (@v4 vs @v6/@v7 used in build.yml)

Key Interaction Chain

Findings 1, 2, and 3 compound into a complete workflow failure: the binary name mismatch (#1) guarantees the primary find always misses → the missing guard lets an empty value through → the unquoted expression (#2) silently backgrounds nothing → MauiDevFlow wait times out 60s later with a misleading error. Meanwhile, the wrong workload (#4) may fail the build entirely, and || true on tool install (#3) ensures that failure is also invisible.

Even after fixing these correctness issues, the swallowed smoke test failures (#5) mean the workflow would report false-positive green runs, defeating the purpose of integration testing.

Discarded Findings (no consensus)

  • /tmp screenshot path — 1/3; correctly identified as standard CI runner practice, not applicable to the agent-workspace /tmp prohibition
  • Wrong package ID (Microsoft.Maui.Cli) — 1/3; verified incorrect — Microsoft.Maui.Cli is the correct package that wraps maui devflow
  • Manual-only trigger — 1/3; follow-up reviewers unanimously disagreed — workflow_dispatch is intentional for heavyweight integration tests

CI Status

  • No CI checks have run on this PR (new workflow file only — no build targets to validate)

Test Coverage

  • N/A — this is a workflow file, not application code. No unit tests affected.

Prior Reviews

  • No prior human reviews found on this PR.

Generated by Expert Code Review (auto) for issue #615

- name: Find built binary
id: find-binary
run: |
BINARY=$(find PolyPilot.Gtk/bin/Debug -name "PolyPilot.Gtk" -type f -executable | head -1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Binary name mismatch: primary search always fails (Flagged by: 3/3 reviewers)

PolyPilot.Gtk.csproj declares (AssemblyName)PolyPilot(/AssemblyName), so the output binary is named PolyPilot, not PolyPilot.Gtk. This find will never match. The fallback on line 64 (PolyPilot*) is overly broad and may match .dll, .runtimeconfig.json, or other artifacts.

If both searches fail, BINARY is empty and line 79 becomes a bare & — a shell no-op that assigns a transient PID. kill -0 succeeds momentarily (subshell still in process table), printing a false "✅ PolyPilot is running" before failing 60s later at MauiDevFlow wait with a misleading error.

Suggested fix:

BINARY=$(find PolyPilot.Gtk/bin/Debug -name "PolyPilot" -type f -executable | head -1)
if [ -z "$BINARY" ]; then
  echo "❌ No executable binary found"; exit 1
fi

sleep 2

# Launch PolyPilot in background
${{ steps.find-binary.outputs.binary }} &
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Unquoted expression interpolation (Flagged by: 2/3 reviewers)

GitHub Actions substitutes $\{\{ steps.find-binary.outputs.binary }} at YAML parse time, before the shell sees it. If the resolved path is empty (see binary name finding above) or contains spaces, the shell splits it incorrectly. An empty value makes this line & — backgrounding nothing but assigning a PID that briefly appears valid.

Suggested fix: Pass through an environment variable:

env:
  BINARY: $\{\{ steps.find-binary.outputs.binary }}
run: |
  if [ -z "$BINARY" ]; then echo "❌ No binary"; exit 1; fi
  "$BINARY" &

run: |
dotnet tool install --global Microsoft.Maui.Cli \
--add-source https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet10/nuget/v3/index.json \
--prerelease || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — || true silently swallows tool install failure (Flagged by: 3/3 reviewers)

If the MauiDevFlow CLI install fails (feed outage, version conflict, network error), the step exits 0 and the workflow proceeds. All subsequent maui devflow commands fail with "command not found" after burning 60s on the wait timeout, with a misleading "agent did not connect" error.

Suggested fix: Remove || true and add a verification:

dotnet tool install --global Microsoft.Maui.Cli \
  --add-source https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet10/nuget/v3/index.json \
  --prerelease
command -v maui || { echo "❌ MauiDevFlow CLI not installed"; exit 1; }

sudo apt-get install -y libgtk-4-dev libadwaita-1-dev libwebkitgtk-6.0-dev xvfb

- name: Install MAUI workload
run: dotnet workload install maui-android
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Wrong MAUI workload for GTK/Linux target (Flagged by: 2/3 reviewers)

maui-android installs only Android SDK tooling. The GTK project targets net10.0 (plain .NET) and needs the base MAUI workload. The existing build.yml uses dotnet workload install maui for its non-Android jobs.

Suggested fix:

run: dotnet workload install maui

Comment on lines +109 to +137
- name: "Smoke Test: MAUI status"
run: |
export DISPLAY=:99
echo "=== Agent Status ==="
maui devflow MAUI status || echo "MAUI status failed"

- name: "Smoke Test: Visual tree"
run: |
export DISPLAY=:99
echo "=== Visual Tree ==="
maui devflow MAUI tree --depth 5 || echo "Tree inspection failed"

- name: "Smoke Test: Screenshot"
run: |
export DISPLAY=:99
echo "=== Taking screenshot ==="
maui devflow MAUI screenshot --output /tmp/polypilot-ci-screenshot.png || echo "Screenshot failed"

- name: "Smoke Test: CDP/Blazor status"
run: |
export DISPLAY=:99
echo "=== CDP Status ==="
maui devflow cdp status || echo "CDP status failed"

- name: "Smoke Test: Application logs"
run: |
export DISPLAY=:99
echo "=== Recent logs ==="
timeout 5 maui devflow MAUI logs || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Smoke tests swallow all failures; CI always reports green (Flagged by: 3/3 reviewers)

Every smoke test uses || echo "... failed" or || true, exiting 0 regardless. A completely broken app produces a green checkmark. The only step that can fail the job is the MauiDevFlow wait — and that indicates infrastructure failure, not behavioral regression.

Suggested fix: Use continue-on-error: true at the step level for informational steps (logs) but let core validation steps (MAUI status, visual tree) fail the workflow:

- name: "Smoke Test: MAUI status"
  run: |
    export DISPLAY=:99
    maui devflow MAUI status

- name: "Smoke Test: Application logs"
  continue-on-error: true
  run: |
    export DISPLAY=:99
    timeout 5 maui devflow MAUI logs || true


jobs:
integration-test:
runs-on: ubuntu-latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — No permissions: block; inherits excessive token scope (Flagged by: 3/3 reviewers after follow-up)

Every job in the existing build.yml declares an explicit permissions: block. This workflow has none, inheriting the repository default (write-all for workflow_dispatch). The workflow only needs read access.

Suggested fix:

jobs:
  integration-test:
    runs-on: ubuntu-latest
    timeout-minutes: 30
    permissions:
      contents: read

Comment on lines +5 to +17
inputs:
issue_number:
description: 'Issue number to analyze (optional)'
required: false
type: number
scenario:
description: 'Test scenario to run (smoke, full)'
required: false
type: choice
options:
- smoke
- full
default: smoke
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Unused issue_number and scenario inputs (Flagged by: 3/3 reviewers)

Both inputs are declared but never referenced in any step. A user selecting scenario: full gets the same smoke tests — silently misleading.

Suggested fix: Remove both inputs until implemented, or wire scenario into conditional steps with if: inputs.scenario == 'full'.


steps:
- name: Checkout code
uses: actions/checkout@v4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Stale action versions (Flagged by: 2/3 reviewers)

build.yml uses actions/checkout@v6 and actions/upload-artifact@v7. This workflow uses @v4 for both. Align with existing repo conventions.

Suggested fix: actions/checkout@v6 (line 26), actions/upload-artifact@v7 (line 141).

PureWeen and others added 21 commits April 18, 2026 14:14
Previous run showed app running but DevFlow agent not listening.
- Increase startup wait from 15s to 30s for GTK+WebView+DevFlow init
- Capture stdout (agent may log port there)
- Search for .mauidevflow file
- Scan broader port range including 5000/5001/8080
- Remove broken broker start step
- Upload app logs as artifacts for debugging

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The app crashed with 'Unable to resolve CodespaceService'. The GTK
MauiProgram.cs was missing 5 service registrations that were added
to the main MAUI project but not synced to GTK:
- CodespaceService
- ScheduledTaskService
- PrLinkService
- AuditLogService
- EfficiencyAnalysisService

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The maui devflow CLI fails with 'Cannot connect to agent' even
though curl confirms the agent responds on port 9223. The CLI
may require broker registration that doesn't work in CI.

Switched to direct curl calls to the agent's HTTP API endpoints:
- /api/status, /api/maui/tree, /api/maui/screenshot
- /api/cdp/status, /api/maui/logs
Also kept CLI attempts with --verbose for debugging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The /api/maui/tree and /api/cdp/status routes returned 'Route not
found'. Need to discover the actual API routes the agent exposes.
This commit scans ~25 common route patterns and reports which ones
respond with non-404 status codes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three parallel platform jobs:
- Linux/GTK (ubuntu-latest) — working with xvfb + DevFlow on 9223
- Mac Catalyst (macos-15) — native macOS runner, no virtual display
- Windows (windows-latest) — WinUI, PowerShell-based steps

All use the same pattern: build Debug → launch → port scan for
DevFlow agent → status/tree/screenshot smoke tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Mac: Add -p:ValidateXcodeVersion=false (macos-15 has Xcode 26.2,
  SDK wants 26.3)
- Windows: Rename $pid to $appPid (read-only in PowerShell)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Passes COPILOT_GITHUB_TOKEN via env (masked by GitHub Actions).
PolyPilot's ServerManager auto-reads this env var and forwards it
to the headless copilot process — no manual CLI install needed.

New steps:
- Check for copilot headless server on port 4321
- Verify copilot process running
- Check PID file and app logs for server startup
- CDP snapshot of Blazor UI to verify session state
- Post-startup screenshot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iterating toward creating a Copilot session in CI:
- Scan all DevFlow API routes to discover interaction endpoints
- Parse visual tree to find clickable buttons (New Session, etc.)
- Try POST-based tap/navigate/evaluate routes
- Attempt JS evaluation via DevFlow for Blazor UI interaction
- Multiple screenshots (before/after) for visual debugging
- App log capture for diagnostics

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous run confirmed:
- Copilot headless server running on port 4321
- Authenticated as PureWeen via env token
- 15 models loaded
- Persistent mode connected

Now trying to create a session via the bundled copilot CLI binary
connecting to the running server. Also simplified the DevFlow
verification to focus on session creation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous attempt used non-existent CLI flags. This run:
- Checks copilot --help to discover actual args
- Tries stdin/stdout ACP protocol connection
- Dumps all session-related app logs
- Gets final screenshot for visual verification

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
copilot --help revealed -p/--prompt flag for non-interactive
scripting. Use this to create a session and send a test message
directly via the bundled copilot CLI binary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous run: copilot -p produced no output (--no-edit invalid).
This run: removed --no-edit, added tee to capture full output,
added grep check for expected response, log full output on failure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Key fix: use -ap flag (not --agent-port) for direct DevFlow
connection without broker. This enables full CLI + CDP interaction.

New flow:
1. Verify CLI connects with -ap
2. Check Blazor UI state via CDP (document.title, session count)
3. Click New Session button via JS DOM manipulation
4. Fill chat input with test message
5. Click Send button
6. Wait 15s for response
7. Verify session state and capture screenshots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CLI -ap flag doesn't work in CI (different CLI version or transport).
Bypass CLI entirely — scan all HTTP API endpoints including POST
methods. Need to find tap/fill/click/evaluate routes via HTTP.

Also tries /api/query with different parameter formats.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Found that POST /api/cdp accepts Chrome DevTools Protocol messages.
This is the key to driving the Blazor UI — Runtime.evaluate lets us
execute JavaScript to click buttons, fill inputs, and read state.

Flow:
1. Runtime.evaluate to get page state (title, buttons list)
2. Runtime.evaluate to click New Session button via DOM
3. Runtime.evaluate to fill chat input with test message
4. Runtime.evaluate to click Send button
5. Wait 15s for Copilot response
6. Runtime.evaluate to read message state

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous run successfully clicked 'Create something new' but it
opened a popover menu. Now does the full two-step flow:
1. Click '+' button to open popover
2. Click 'Session' option in the popover
3. Find and fill session name input
4. Click Create/Submit button
5. Verify session appears

Also adds cdp_eval helper function for cleaner CDP calls.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After session creation (confirmed working), now:
7. Click session item to expand/select it
8. Find visible chat input (.card-input input/textarea)
9. Fill with test message + dispatch input/change events
10. Click .send-btn or dispatch Enter keydown
11. Wait 20s for Copilot response
12. Check for response messages in DOM

Multiple screenshots captured at each stage for debugging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 22, 2026

🔍 Code Review — PR #615

feat: PolyPilot CI integration test workflow
CI Status: ✅ All 3 checks passing (Linux, Mac Catalyst, Windows)


All 8 findings from Round 1 — FIXED ✅

# Finding Status
1 Missing if guard on CDP step ✅ Fixed
2 Windows artifact path mismatch ✅ Fixed
3 sleep 30 → poll loop ✅ Fixed
4 Unnecessary token on CDP step ✅ Fixed
5 or-true swallowing failures ✅ Fixed
6 Fragile cdp_eval interpolation ✅ Fixed
7 Empty binary guard ✅ Fixed
8 Hardcoded Xcode ✅ Fixed

Verified Clean

  • Token security ✅
  • GTK DI registrations ✅
  • CDP injection safety ✅
  • Cleanup ✅

Recommendation

Approve — all findings addressed, CI green on all 3 platforms.

PureWeen and others added 3 commits April 21, 2026 23:01
Addresses all review findings from PR #615:
1. Add missing if: guard on CDP session-creation step
2. Fix Windows artifact paths: $env:TEMP → $env:RUNNER_TEMP
3. Remove unnecessary COPILOT_GITHUB_TOKEN on CDP step
4. Add explicit guard for empty binary path
5. Use default Xcode instead of hardcoded versions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3. Replace sleep 30 with poll loop (up to 60s) on Linux and Mac —
   checks both process health and DevFlow agent readiness
5. Tool install: verify maui --version on failure instead of || true
6. cdp_eval: use python3 json.dumps for safe JSON construction
   instead of fragile shell string interpolation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The linker fails with MT0180 when Xcode version doesn't match
the SDK. ValidateXcodeVersion=false bypasses the initial check
but the linker still fails. MtouchLink=SdkOnly avoids linking
against new APIs that require the newer SDK headers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen marked this pull request as ready for review April 22, 2026 04:51
@PureWeen PureWeen merged commit 60db6de into main Apr 22, 2026
3 checks passed
@PureWeen PureWeen deleted the feature/polypilot-ci-integration branch April 22, 2026 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant