Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 256 additions & 0 deletions .github/integration-tests/scheduled-tasks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
#!/usr/bin/env bash
# Scheduled Tasks Integration Tests
# Runs against a live PolyPilot app with MauiDevFlow agent connected.
# Usage: ./scheduled-tasks.sh <devflow-port> [--no-cleanup]
#
# Tests:
# 1. Navigate to /scheduled-tasks page
# 2. Create a new interval task via the form
# 3. Verify the task card appears with correct details
# 4. Run the task immediately (Run Now)
# 5. Verify run history shows success
# 6. Disable/enable the task
# 7. Delete the task
# 8. Verify form validation (invalid cron)
Comment on lines +7 to +14
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Header lists 8 tests but only 6 are implemented · Flagged by: 3/3 reviewers

The header claims tests for "Run the task immediately (Run Now)" (#4) and "Verify run history shows success" (#5), but neither is implemented. The actual tests are:

  1. Navigate to /scheduled-tasks
  2. Create interval task
  3. Verify task card details
  4. Toggle disable/enable
  5. Form validation (invalid cron)
  6. Delete task

This gives false confidence that "Run Now" and run-history paths have integration coverage when they don't.

Suggestion: Update the header to match the 6 implemented tests, or add the missing "Run Now" and run-history assertions.

#
# Requires: curl, jq

set -euo pipefail

PORT="${1:?Usage: $0 <devflow-port> [--no-cleanup]}"
NO_CLEANUP="${2:-}"
BASE="http://localhost:$PORT"
PASS=0
FAIL=0
ERRORS=""

log() { echo " $1"; }
pass() { PASS=$((PASS + 1)); echo " ✅ $1"; }
fail() { FAIL=$((FAIL + 1)); ERRORS="${ERRORS}\n ❌ $1"; echo " ❌ $1"; }

# Helper: evaluate JS in the Blazor WebView via CDP
cdp_eval() {
local expr="$1"
local result
result=$(curl -s --max-time 10 "$BASE/api/cdp/evaluate" \
-H "Content-Type: application/json" \
-d "{\"expression\": \"$expr\"}" 2>/dev/null) || true
echo "$result"
}
Comment on lines +35 to +39
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — cdp_eval builds JSON via string interpolation (latent injection risk) · Flagged by: 2/3 reviewers

-d "{\"expression\": \"$expr\"}" embeds $expr directly into JSON. Characters like ", \, or newlines in the expression would produce malformed JSON and a silent CDP failure (masked by || true).

Current callers are safe — all values are simple alphanumeric strings. However, this is a maintenance hazard: a future test with a value like He said "hi" or C:\path would silently break.

Suggestion: Use jq to construct the JSON payload safely:

cdp_eval() {
  local expr="$1"
  local payload
  payload=$(jq -n --arg e "$expr" '{"expression": $e}')
  curl -s --max-time 10 "$BASE/api/cdp/evaluate" \
    -H "Content-Type: application/json" \
    -d "$payload" 2>/dev/null || true
}


# Helper: click an element by CSS selector
cdp_click() {
local selector="$1"
cdp_eval "document.querySelector('$selector')?.click()"
}

# Helper: set input value and trigger change
cdp_set_value() {
local selector="$1"
local value="$2"
cdp_eval "(() => { const el = document.querySelector('$selector'); if (el) { el.value = '$value'; el.dispatchEvent(new Event('input', {bubbles:true})); el.dispatchEvent(new Event('change', {bubbles:true})); } return el ? 'ok' : 'not found'; })()"
}

# Helper: select dropdown option
Comment on lines +51 to +54
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — cdp_set_value / cdp_select embed $value into single-quoted JS · Flagged by: 2/3 reviewers

el.value = '$value' interpolates the value inside a JS single-quoted string. If any caller passes a value containing ' (e.g., it's a test), the JavaScript becomes syntactically invalid and the CDP call silently fails.

Current callers use safe values, but this is a general-purpose helper that invites reuse with arbitrary strings.

Suggestion: Escape single quotes before interpolation, or pass the value via a JS template literal/variable assignment pattern that avoids inline string quoting issues.

cdp_select() {
local selector="$1"
local value="$2"
cdp_eval "(() => { const el = document.querySelector('$selector'); if (el) { el.value = '$value'; el.dispatchEvent(new Event('change', {bubbles:true})); } return el ? 'ok' : 'not found'; })()"
}

# Helper: check element exists
cdp_exists() {
local selector="$1"
local result
result=$(cdp_eval "document.querySelector('$selector') !== null")
echo "$result" | jq -r '.result.value // false' 2>/dev/null
}

# Helper: get text content
cdp_text() {
Comment on lines +67 to +70
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — set -euo pipefail + jq pipe causes silent script abort on CDP failure · Flagged by: 2/3 reviewers

If the CDP endpoint returns empty/non-JSON (network hiccup, app restart, timeout), cdp_eval returns empty (curl has || true). Then echo "" | jq -r '.result.value // false' 2>/dev/null → jq fails on invalid JSON → with pipefail, the pipe returns non-zero. When called in a bare assignment like DISABLED=$(cdp_exists "...") (line 155), set -e kills the entire script instantly — no summary, no test results, cryptic CI output.

The same issue affects cdp_text (line 76).

Suggestion: Guard against empty/invalid JSON:

echo "$result" | jq -r '.result.value // false' 2>/dev/null || echo "false"

And for cdp_text:

echo "$result" | jq -r '.result.value // empty' 2>/dev/null || true

local selector="$1"
local result
result=$(cdp_eval "document.querySelector('$selector')?.textContent?.trim()")
echo "$result" | jq -r '.result.value // empty' 2>/dev/null
}

# Helper: wait for element to appear
wait_for() {
local selector="$1"
local desc="${2:-$selector}"
local timeout="${3:-15}"
for i in $(seq 1 "$timeout"); do
if [ "$(cdp_exists "$selector")" = "true" ]; then
return 0
fi
sleep 1
done
return 1
}

echo "═══════════════════════════════════════════"
echo " Scheduled Tasks Integration Tests"
echo " DevFlow agent: $BASE"
echo "═══════════════════════════════════════════"
echo ""

# Verify agent is connected
echo "▸ Verifying DevFlow agent..."
STATUS=$(curl -s --max-time 5 "$BASE/api/status" 2>/dev/null)
if [ -z "$STATUS" ]; then
echo " ❌ DevFlow agent not reachable at $BASE"
exit 1
fi
pass "DevFlow agent connected"
echo ""

# ─── Test 1: Navigate to Scheduled Tasks page ───
echo "▸ Test 1: Navigate to /scheduled-tasks"
cdp_eval "window.location.href = '/scheduled-tasks'"
sleep 2
if wait_for "#scheduled-tasks-page" "scheduled tasks page" 10; then
pass "Navigated to /scheduled-tasks page"
else
fail "Could not navigate to /scheduled-tasks page"
Comment on lines +111 to +114
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — No early bail-out on critical setup failures → cascading test noise · Flagged by: 2/3 reviewers

If Test 1 (navigation) or Test 2 (task creation) fails, fail() increments the counter but execution continues. All subsequent tests depend on the page being loaded and the task existing — they'll all fail too, producing 5+ cascading failures that obscure the root cause in CI logs.

Suggestion: After critical setup steps, bail out early:

if ! wait_for "#scheduled-tasks-page" ... 10; then
  fail "Could not navigate — aborting remaining tests"
  # print summary and exit
  exit $FAIL
fi

fi
echo ""

# ─── Test 2: Create a new task ───
echo "▸ Test 2: Create a new scheduled task"
cdp_click "#scheduled-task-new"
sleep 1

if wait_for "#scheduled-task-form" "task form" 5; then
pass "Task creation form opened"
else
fail "Task creation form did not open"
fi

TASK_NAME="CI-Test-$(date +%s)"
cdp_set_value "#scheduled-task-name" "$TASK_NAME"
cdp_set_value "#scheduled-task-prompt" "echo hello from integration test"
cdp_select "#scheduled-task-schedule" "Interval"
sleep 0.5
cdp_set_value "#scheduled-task-interval" "60"

cdp_click "#scheduled-task-save"
sleep 2

if wait_for ".task-card[data-task-name='$TASK_NAME']" "task card" 10; then
pass "Task '$TASK_NAME' created and visible"
else
fail "Task card did not appear after creation"
fi
echo ""

# ─── Test 3: Verify task card details ───
echo "▸ Test 3: Verify task card content"
SCHEDULE_TEXT=$(cdp_text ".task-card[data-task-name='$TASK_NAME'] .task-schedule")
if echo "$SCHEDULE_TEXT" | grep -qi "60 min\|1 hour\|every"; then
pass "Schedule description correct: '$SCHEDULE_TEXT'"
else
fail "Unexpected schedule description: '$SCHEDULE_TEXT'"
fi

PROMPT_TEXT=$(cdp_text ".task-card[data-task-name='$TASK_NAME'] .task-prompt-preview")
if echo "$PROMPT_TEXT" | grep -qi "hello"; then
Comment on lines +153 to +156
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Fixed sleep instead of polling creates race conditions · Flagged by: 3/3 reviewers

Multiple test steps use bare sleep 1 followed by an immediate state check, rather than the wait_for polling pattern used elsewhere in the script. On loaded CI runners, Blazor re-renders can exceed 1 second, causing flaky false negatives.

Affected locations:

  • Line 153: toggle disable check (sleep 1 → check .disabled class)
  • Line 161: toggle re-enable check
  • Line 190: validation error check after form submit
  • Line 183: Test 5 form readiness (uses sleep 1 instead of wait_for "#scheduled-task-form" like Test 2 does)

Suggestion: Replace bare sleep + immediate assertion with wait_for:

cdp_click "... [data-task-action='toggle-enabled']"
if wait_for ".task-card[data-task-name='$TASK_NAME'].disabled" "task disabled" 5; then
  pass "Task disabled"
else
  fail "Task not visually disabled after toggle"
fi

pass "Prompt preview shows task prompt"
else
fail "Prompt preview unexpected: '$PROMPT_TEXT'"
fi
echo ""

# ─── Test 4: Disable/Enable toggle ───
echo "▸ Test 4: Toggle task enabled/disabled"
cdp_click ".task-card[data-task-name='$TASK_NAME'] [data-task-action='toggle-enabled']"
sleep 1

DISABLED=$(cdp_exists ".task-card[data-task-name='$TASK_NAME'].disabled")
if [ "$DISABLED" = "true" ]; then
pass "Task disabled (has .disabled class)"
else
fail "Task not visually disabled after toggle"
fi

# Re-enable
cdp_click ".task-card[data-task-name='$TASK_NAME'] [data-task-action='toggle-enabled']"
sleep 1

ENABLED=$(cdp_exists ".task-card[data-task-name='$TASK_NAME']:not(.disabled)")
if [ "$ENABLED" = "true" ]; then
pass "Task re-enabled"
else
fail "Task not re-enabled after second toggle"
fi
echo ""

# ─── Test 5: Form validation (invalid cron) ───
echo "▸ Test 5: Form validation — invalid cron"
cdp_click "#scheduled-task-new"
sleep 1

cdp_set_value "#scheduled-task-name" "invalid-cron-test"
cdp_set_value "#scheduled-task-prompt" "test"
cdp_select "#scheduled-task-schedule" "Cron"
sleep 0.5
cdp_set_value "#scheduled-task-cron" "not a valid cron"
cdp_click "#scheduled-task-save"
sleep 1

ERROR_TEXT=$(cdp_text "#scheduled-task-form-error")
if echo "$ERROR_TEXT" | grep -qi "invalid\|cron\|error"; then
pass "Validation error shown: '$ERROR_TEXT'"
else
fail "No validation error for invalid cron: '$ERROR_TEXT'"
fi

# Cancel form
cdp_click "#scheduled-task-cancel"
sleep 0.5

# Verify invalid task was NOT created
INVALID_EXISTS=$(cdp_exists ".task-card[data-task-name='invalid-cron-test']")
if [ "$INVALID_EXISTS" = "false" ]; then
pass "Invalid task was not created"
else
fail "Invalid task was created despite validation error"
fi
echo ""

# ─── Test 6: Delete task ───
echo "▸ Test 6: Delete the test task"
if [ "$NO_CLEANUP" != "--no-cleanup" ]; then
cdp_click ".task-card[data-task-name='$TASK_NAME'] [data-task-action='delete']"
sleep 1

if wait_for ".task-card[data-task-name='$TASK_NAME'] .delete-confirm-bar" "delete confirmation" 5; then
pass "Delete confirmation bar appeared"
else
fail "Delete confirmation bar did not appear"
fi

cdp_click ".task-card[data-task-name='$TASK_NAME'] [data-task-action='confirm-delete']"
sleep 2

DELETED=$(cdp_exists ".task-card[data-task-name='$TASK_NAME']")
if [ "$DELETED" = "false" ]; then
pass "Task deleted successfully"
else
fail "Task still visible after deletion"
fi
else
log "Skipping cleanup (--no-cleanup)"
fi
echo ""

# ─── Summary ───
echo "═══════════════════════════════════════════"
echo " Results: $PASS passed, $FAIL failed"
if [ $FAIL -gt 0 ]; then
echo ""
echo " Failures:"
echo -e "$ERRORS"
fi
echo "═══════════════════════════════════════════"

exit $FAIL
18 changes: 17 additions & 1 deletion .github/workflows/polypilot-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,13 @@ on:
type: string
default: ''
scenario:
description: 'Test scenario to run (smoke, full)'
description: 'Test scenario to run (smoke, full, scheduled-tasks)'
required: false
type: choice
options:
- smoke
- full
- scheduled-tasks
default: smoke
push:
branches:
Expand Down Expand Up @@ -259,6 +260,13 @@ jobs:
echo "=== App stderr (if captured) ==="
cat /tmp/polypilot-stderr.log 2>/dev/null || echo "No stderr log"

- name: "Feature Test: Scheduled Tasks"
if: steps.devflow.outputs.agent_port != '' && (inputs.scenario == 'scheduled-tasks' || inputs.scenario == 'full')
run: |
PORT=${{ steps.devflow.outputs.agent_port }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — $\{\{ }} expression injected directly into run: shell script · Flagged by: 2/3 reviewers

PORT=$\{\{ steps.devflow.outputs.agent_port }} injects the step output directly into the shell script before bash evaluates it. While agent_port is numeric in practice, this is the canonical GitHub Actions expression injection pattern — if the output were ever manipulated, it would allow arbitrary command execution. The same pattern appears in the Windows job (line 676).

Suggestion: Pass through environment variables instead:

env:
  PORT: $\{\{ steps.devflow.outputs.agent_port }}
run: |
  echo "Running scheduled tasks integration tests against port $PORT"
  bash .github/integration-tests/scheduled-tasks.sh "$PORT"

echo "Running scheduled tasks integration tests against port $PORT"
bash .github/integration-tests/scheduled-tasks.sh "$PORT"

- name: Upload screenshot
if: always()
uses: actions/upload-artifact@v4
Expand Down Expand Up @@ -661,6 +669,14 @@ jobs:
$size = (Get-Item "$env:RUNNER_TEMP\polypilot-ci-screenshot.png").Length
Write-Host "✅ Screenshot saved ($size bytes)"

- name: "Feature Test: Scheduled Tasks"
if: steps.devflow.outputs.agent_port != '' && (inputs.scenario == 'scheduled-tasks' || inputs.scenario == 'full')
shell: bash
run: |
PORT=${{ steps.devflow.outputs.agent_port }}
echo "Running scheduled tasks integration tests against port $PORT"
bash .github/integration-tests/scheduled-tasks.sh "$PORT"

- name: Upload artifacts
if: always()
uses: actions/upload-artifact@v4
Expand Down
Loading