feat: scheduled-tasks integration test scenario#714
Conversation
Adds a bash test script that validates the scheduled tasks feature end-to-end via MauiDevFlow CDP: 1. Navigate to /scheduled-tasks page 2. Create a task via the form (name, prompt, interval) 3. Verify task card appears with correct details 4. Toggle disable/enable 5. Validate invalid cron rejection 6. Delete the task Wired into polypilot-integration.yml as 'scheduled-tasks' scenario option, runs on both Linux and Windows when selected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🟡 Design-Level Finding: Missing Mac Catalyst coverageFlagged by: 1/3 reviewers → verified by 2 follow-up reviewers (both agree) The "Feature Test: Scheduled Tasks" step was added to both the If a Blazor rendering or interaction bug is specific to macOS WebKit, it would go undetected. Suggestion: Add the same step block to the - name: "Feature Test: Scheduled Tasks"
if: steps.devflow.outputs.agent_port != '' && (inputs.scenario == 'scheduled-tasks' || inputs.scenario == 'full')
run: |
PORT=${{ steps.devflow.outputs.agent_port }}
bash .github/integration-tests/scheduled-tasks.sh "$PORT"
|
There was a problem hiding this comment.
Expert Code Review — PR #714: scheduled-tasks integration test scenario
Methodology: 3 independent reviewers with adversarial consensus. Findings required 2/3+ agreement to be included. Disputed findings (1/3) were verified via targeted follow-up review. Model names are anonymized as Reviewer 1/2/3.
Findings Summary (ranked by severity)
| # | Severity | Finding | Flagged by | Location |
|---|---|---|---|---|
| 1 | 🟡 MODERATE | ${{ }} expression injection in workflow run: blocks |
2/3 | polypilot-integration.yml:266,676 |
| 2 | 🟡 MODERATE | set -euo pipefail + jq pipe → silent script abort on CDP failure |
2/3 | scheduled-tasks.sh:68,76 |
| 3 | 🟡 MODERATE | Header lists 8 tests, only 6 implemented (missing "Run Now" and run-history) | 3/3 | scheduled-tasks.sh:7-14 |
| 4 | 🟡 MODERATE | Fixed sleep instead of wait_for polling → flaky race conditions |
3/3 | scheduled-tasks.sh:153,161,183,190 |
| 5 | 🟡 MODERATE | cdp_set_value/cdp_select embed values into single-quoted JS (fragile) |
2/3 | scheduled-tasks.sh:52-54 |
| 6 | 🟡 MODERATE | No scheduled-tasks step in Mac Catalyst job (platform gap) | 1/3 + 2 follow-ups agree | polypilot-integration.yml (design) |
| 7 | 🟢 MINOR | cdp_eval builds JSON via string interpolation (latent injection risk) |
2/3 | scheduled-tasks.sh:37 |
| 8 | 🟢 MINOR | No early bail-out on critical setup failure → cascading test noise | 2/3 | scheduled-tasks.sh:111-114 |
Discarded Findings (single reviewer only, not confirmed)
echo -eportability concern (1/3 — bash shebang makes this safe)exit $FAILexit code semantics (1/3 — max 12 tests, well under 255)- CDP endpoint path mismatch speculation (1/3 — no evidence of API difference)
- Windows
jqavailability (1/3 → 0/3 after follow-up —jqis pre-installed onwindows-latest)
CI Status
- Commit status:
pending(0 statuses reported) - Check runs: 6 total — 4 completed (all ✅ success), 2 in progress (
agentjobs) - No prior reviews on this PR
Assessment
This PR adds solid integration test scaffolding with well-structured helpers and good test coverage for the scheduled-tasks CRUD flow. The main concerns are around test reliability (race conditions from fixed sleeps, silent abort from pipefail + jq) and maintainability (fragile string interpolation in CDP helpers, misleading header comments). The workflow integration pattern should use environment variables instead of inline ${{ }} expressions per GitHub security best practices. None of the findings indicate data loss or regression risk to existing functionality — this is net-new test code.
Generated by Expert Code Review (auto) for issue #714 · ● 8.9M
| # 1. Navigate to /scheduled-tasks page | ||
| # 2. Create a new interval task via the form | ||
| # 3. Verify the task card appears with correct details | ||
| # 4. Run the task immediately (Run Now) | ||
| # 5. Verify run history shows success | ||
| # 6. Disable/enable the task | ||
| # 7. Delete the task | ||
| # 8. Verify form validation (invalid cron) |
There was a problem hiding this comment.
🟡 MODERATE — Header lists 8 tests but only 6 are implemented · Flagged by: 3/3 reviewers
The header claims tests for "Run the task immediately (Run Now)" (#4) and "Verify run history shows success" (#5), but neither is implemented. The actual tests are:
- Navigate to /scheduled-tasks
- Create interval task
- Verify task card details
- Toggle disable/enable
- Form validation (invalid cron)
- Delete task
This gives false confidence that "Run Now" and run-history paths have integration coverage when they don't.
Suggestion: Update the header to match the 6 implemented tests, or add the missing "Run Now" and run-history assertions.
| fi | ||
|
|
||
| PROMPT_TEXT=$(cdp_text ".task-card[data-task-name='$TASK_NAME'] .task-prompt-preview") | ||
| if echo "$PROMPT_TEXT" | grep -qi "hello"; then |
There was a problem hiding this comment.
🟡 MODERATE — Fixed sleep instead of polling creates race conditions · Flagged by: 3/3 reviewers
Multiple test steps use bare sleep 1 followed by an immediate state check, rather than the wait_for polling pattern used elsewhere in the script. On loaded CI runners, Blazor re-renders can exceed 1 second, causing flaky false negatives.
Affected locations:
- Line 153: toggle disable check (
sleep 1→ check.disabledclass) - Line 161: toggle re-enable check
- Line 190: validation error check after form submit
- Line 183: Test 5 form readiness (uses
sleep 1instead ofwait_for "#scheduled-task-form"like Test 2 does)
Suggestion: Replace bare sleep + immediate assertion with wait_for:
cdp_click "... [data-task-action='toggle-enabled']"
if wait_for ".task-card[data-task-name='$TASK_NAME'].disabled" "task disabled" 5; then
pass "Task disabled"
else
fail "Task not visually disabled after toggle"
fi| result=$(curl -s --max-time 10 "$BASE/api/cdp/evaluate" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d "{\"expression\": \"$expr\"}" 2>/dev/null) || true | ||
| echo "$result" | ||
| } |
There was a problem hiding this comment.
🟢 MINOR — cdp_eval builds JSON via string interpolation (latent injection risk) · Flagged by: 2/3 reviewers
-d "{\"expression\": \"$expr\"}" embeds $expr directly into JSON. Characters like ", \, or newlines in the expression would produce malformed JSON and a silent CDP failure (masked by || true).
Current callers are safe — all values are simple alphanumeric strings. However, this is a maintenance hazard: a future test with a value like He said "hi" or C:\path would silently break.
Suggestion: Use jq to construct the JSON payload safely:
cdp_eval() {
local expr="$1"
local payload
payload=$(jq -n --arg e "$expr" '{"expression": $e}')
curl -s --max-time 10 "$BASE/api/cdp/evaluate" \
-H "Content-Type: application/json" \
-d "$payload" 2>/dev/null || true
}| - name: "Feature Test: Scheduled Tasks" | ||
| if: steps.devflow.outputs.agent_port != '' && (inputs.scenario == 'scheduled-tasks' || inputs.scenario == 'full') | ||
| run: | | ||
| PORT=${{ steps.devflow.outputs.agent_port }} |
There was a problem hiding this comment.
🟡 MODERATE — $\{\{ }} expression injected directly into run: shell script · Flagged by: 2/3 reviewers
PORT=$\{\{ steps.devflow.outputs.agent_port }} injects the step output directly into the shell script before bash evaluates it. While agent_port is numeric in practice, this is the canonical GitHub Actions expression injection pattern — if the output were ever manipulated, it would allow arbitrary command execution. The same pattern appears in the Windows job (line 676).
Suggestion: Pass through environment variables instead:
env:
PORT: $\{\{ steps.devflow.outputs.agent_port }}
run: |
echo "Running scheduled tasks integration tests against port $PORT"
bash .github/integration-tests/scheduled-tasks.sh "$PORT"| } | ||
|
|
||
| # Helper: get text content | ||
| cdp_text() { |
There was a problem hiding this comment.
🟡 MODERATE — set -euo pipefail + jq pipe causes silent script abort on CDP failure · Flagged by: 2/3 reviewers
If the CDP endpoint returns empty/non-JSON (network hiccup, app restart, timeout), cdp_eval returns empty (curl has || true). Then echo "" | jq -r '.result.value // false' 2>/dev/null → jq fails on invalid JSON → with pipefail, the pipe returns non-zero. When called in a bare assignment like DISABLED=$(cdp_exists "...") (line 155), set -e kills the entire script instantly — no summary, no test results, cryptic CI output.
The same issue affects cdp_text (line 76).
Suggestion: Guard against empty/invalid JSON:
echo "$result" | jq -r '.result.value // false' 2>/dev/null || echo "false"And for cdp_text:
echo "$result" | jq -r '.result.value // empty' 2>/dev/null || true| if wait_for "#scheduled-tasks-page" "scheduled tasks page" 10; then | ||
| pass "Navigated to /scheduled-tasks page" | ||
| else | ||
| fail "Could not navigate to /scheduled-tasks page" |
There was a problem hiding this comment.
🟢 MINOR — No early bail-out on critical setup failures → cascading test noise · Flagged by: 2/3 reviewers
If Test 1 (navigation) or Test 2 (task creation) fails, fail() increments the counter but execution continues. All subsequent tests depend on the page being loaded and the task existing — they'll all fail too, producing 5+ cascading failures that obscure the root cause in CI logs.
Suggestion: After critical setup steps, bail out early:
if ! wait_for "#scheduled-tasks-page" ... 10; then
fail "Could not navigate — aborting remaining tests"
# print summary and exit
exit $FAIL
fi| cdp_eval "(() => { const el = document.querySelector('$selector'); if (el) { el.value = '$value'; el.dispatchEvent(new Event('input', {bubbles:true})); el.dispatchEvent(new Event('change', {bubbles:true})); } return el ? 'ok' : 'not found'; })()" | ||
| } | ||
|
|
||
| # Helper: select dropdown option |
There was a problem hiding this comment.
🟡 MODERATE — cdp_set_value / cdp_select embed $value into single-quoted JS · Flagged by: 2/3 reviewers
el.value = '$value' interpolates the value inside a JS single-quoted string. If any caller passes a value containing ' (e.g., it's a test), the JavaScript becomes syntactically invalid and the CDP call silently fails.
Current callers use safe values, but this is a general-purpose helper that invites reuse with arbitrary strings.
Suggestion: Escape single quotes before interpolation, or pass the value via a JS template literal/variable assignment pattern that avoids inline string quoting issues.
Adds end-to-end integration tests for the scheduled tasks feature (PR #587).
Test script (
.github/integration-tests/scheduled-tasks.sh):/scheduled-taskspageUsage:
Runs on both Linux/GTK and Windows via MauiDevFlow CDP.