feat: task execution integration tests (Run Now + auto-fire) by PureWeen · Pull Request #730 · PureWeen/PolyPilot

PureWeen · 2026-04-22T21:08:16Z

Adds 3 tests that prove the scheduling engine actually works:

RunNow_CreatesRunHistory — clicks Run Now, waits for run history
ScheduledExecution_TaskFiresAutomatically — 1-min task, waits for auto-fire
RunNow_TwiceCreatesUniqueSessionNames — two runs, unique sessions

These close the biggest gap: proving the timer fires, prompts are dispatched, and runs are recorded.

Three new tests that prove the scheduling engine works end-to-end: 1. RunNow_CreatesRunHistory — clicks Run Now, waits for completion, verifies run history entry with session name appears 2. ScheduledExecution_TaskFiresAutomatically — creates 1-min interval task, waits up to 120s for the timer to fire, verifies Last Run 3. RunNow_TwiceCreatesUniqueSessionNames — runs twice, verifies two distinct session names in history These prove the actual task execution pipeline works: timer fires → session created → prompt sent → run recorded → history displayed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-22T21:20:17Z

Code Review — Reviewer 1

I reviewed the full ScheduledTaskTests.cs, IntegrationTestBase.cs, AppFixture.cs, and the workflow diff. 7 findings below, sorted by severity.

🔴 Finding 1 — Execution tests run twice in CI (trait filter overlap)

ScheduledTaskTests.cs:12 + lines 195, 252, 297

The class-level [Trait("Category", "ScheduledTasks")] is inherited by every method. The new execution tests add [Trait("Category", "ScheduledTaskExecution")] at the method level, so each carries both traits. In CI, --filter "Category=ScheduledTasks" (the "fast" step) matches any test whose Category trait includes "ScheduledTasks" — which includes all three execution tests.

Failing scenario: The "fast" CI step silently runs the 120–180 second execution tests, causing unexpected timeouts. Then the "slow" step runs them again, doubling wall-clock time.

Fix: Either (a) move execution tests to a separate class without the ScheduledTasks trait, or (b) change the first CI filter to --filter "Category=ScheduledTasks&Category!=ScheduledTaskExecution".

🟡 Finding 2 — `RunNow_CreatesRunHistory` timeout comment is wrong (180 s, not 90 s)

ScheduledTaskTests.cs:214

for (var i = 0; i < 90; i++) // 90 seconds max with await Task.Delay(2000) per iteration = 180 seconds, not 90. Compare with ScheduledExecution_TaskFiresAutomatically (line 268) which correctly comments // 120 seconds max (2s intervals) for 60 iterations.

Fix: Change comment to // 180 seconds max (2s intervals), or reduce iterations to 45.

🟡 Finding 3 — No try-finally cleanup for scheduled tasks (leaked 1-min timer)

ScheduledTaskTests.cs:259–293

DeleteTaskAsync calls (lines 248, 293, 349) are not in try/finally. If an assertion fails before cleanup, the task is leaked. Most critically, ScheduledExecution_TaskFiresAutomatically creates a 1-minute interval task — a leaked task keeps auto-firing, creating phantom sessions that can pollute later test assertions.

Fix: Wrap each execution test body in try/finally with DeleteTaskAsync in the finally block.

🟡 Finding 4 — `RunNow_TwiceCreatesUniqueSessionNames` doesn't poll for second run

ScheduledTaskTests.cs:325–326

After triggering the second Run Now, the test uses a flat await Task.Delay(30_000) instead of polling. The first-run check (ExistsAsync($"{card} .run-status")) remains true from the first run, so there's no condition-based detection of second-run completion.

Failing scenario: If the second run takes >30 s (agent cold start, CI load), the test finds only 1 entry and fails intermittently.

Fix: Poll for .run-entry count ≥ 2 with a timeout loop, matching the pattern used in the other execution tests.

🟢 Finding 5 — Session uniqueness assertion silently skipped

ScheduledTaskTests.cs:345–347

if (names.Length >= 2)
    Assert.NotEqual(names[0], names[1]);

If CDP eval returns fewer than 2 pipe-delimited names (e.g., .run-session elements exist but have empty text), the core invariant — unique session names — is never checked.

Fix: Change the if to Assert.True(names.Length >= 2, "Should extract at least 2 session names").

🟢 Finding 6 — `|| true` removed from lifecycle tests, blocks execution tests on failure

Workflow diff

Before this PR, the single dotnet test had || true. After the split, only the execution step keeps it. If any lifecycle test fails, the shell exits non-zero before reaching execution tests.

Fix: Add || true to the lifecycle step, or split into separate workflow steps with continue-on-error: true.

🟢 Finding 7 — `EscapeForJs` is a weaker duplicate of base class `EscapeJs`

ScheduledTaskTests.cs:354 vs IntegrationTestBase.cs:191

EscapeForJs escapes \ and " but not '. The base class EscapeJs also escapes single quotes but is private. Currently safe because task names use only alphanumerics/hyphens, but it's a maintenance hazard.

Fix: Change IntegrationTestBase.EscapeJs to protected and remove the duplicate.

Summary: 1 critical (trait overlap defeating the fast/slow CI split), 3 moderate (wrong timeout comment, leaked timer tasks, race condition on second run), 3 minor.

Generated by Expert Code Review (auto) for issue #730 · ● 5.8M · ◷

github-actions

Code Review Summary — PR #730: Task Execution Integration Tests

Methodology: 3 independent reviewers with adversarial consensus. Findings included only when ≥2/3 reviewers agree. 1 disputed finding was escalated to follow-up review and confirmed unanimously.

Findings by Severity

#	Severity	Finding	Reviewers
1	🔴 CRITICAL	Class-level `[Trait("Category", "ScheduledTasks")]` causes all 3 execution tests to match both CI filter steps — they run twice, defeating the fast/slow split	3/3
2	🟡 MODERATE	First batch lost `
3	🟡 MODERATE	Execution tests run with `
4	🟡 MODERATE	`RunNow_TwiceCreatesUniqueSessionNames` uses fixed `Task.Delay(30_000)` instead of polling — flaky on slow CI	3/3
5	🟡 MODERATE	No `try/finally` around test bodies — orphaned 1-min interval tasks keep firing and pollute subsequent tests	2/3
6	🟡 MODERATE	`if (names.Length >= 2)` guard silently skips the uniqueness assertion if CDP eval returns empty	2/3
7	🟢 MINOR	Hardcoded `"Last run"` string check — inconsistent with other tests, causes misleading timeout on UI copy change	3/3 (1 + 2 follow-up)
8	🟢 MINOR	Comment says "90 seconds max" but loop is 90 × 2s = 180s	2/3
9	🟢 MINOR	`EscapeForJs` duplicates `EscapeJs` from base class	2/3

Key Recommendation

Finding #1 is the highest priority: the trait inheritance issue means execution tests run in the "fast" batch (now blocking due to finding #2) and the "slow" batch. Fix by moving execution tests to a separate class or adjusting the filter.

CI Status

Commit status: Pending (no statuses reported)
Check runs: 2 agent jobs in progress, activation/pre_activation completed successfully
Test coverage: PR adds 3 new integration tests covering Run Now execution, auto-fire scheduling, and session uniqueness. No existing tests removed.
Prior reviews: None

Generated by Expert Code Review (auto) for issue #730 · ● 5.8M

github-actions · 2026-04-22T21:20:18Z

+    // ─── Execution Tests ───
+
+    [Fact]
+    [Trait("Category", "ScheduledTaskExecution")]


🔴 CRITICAL — Tests run in BOTH CI batches due to dual trait inheritance

The class already has [Trait("Category", "ScheduledTasks")] at the class level. Adding [Trait("Category", "ScheduledTaskExecution")] per-method means xUnit matches both traits. The first CI step (--filter "Category=ScheduledTasks") picks up all 3 execution tests (they inherit the class trait), then the second step (--filter "Category=ScheduledTaskExecution") runs them again.

Impact: Slow execution tests (up to 120s each + real scheduler waits) execute twice per CI run — ~10+ wasted minutes.

Fix: Move the 3 execution tests to a separate class without the class-level ScheduledTasks trait, or add an exclusion to the first filter: --filter "Category=ScheduledTasks&Category!=ScheduledTaskExecution".

Flagged by: 3/3 reviewers

github-actions · 2026-04-22T21:20:18Z

+        Output.WriteLine("Second Run Now triggered, waiting for completion...");
+
+        // Wait for second run to appear in history
+        await Task.Delay(30_000); // Give it 30 seconds


🟡 MODERATE — Fixed 30s sleep instead of polling for second run completion

After triggering the second Run Now, the test sleeps a hard 30 seconds rather than polling for .run-status like the first run does (polled up to 90s). On a loaded CI machine, the agent can easily exceed 30s. The test then asserts count >= 2 on a history that may only show 1 entry, producing a spurious failure.

Fix: Replace with a polling loop:

for (var i = 0; i < 45; i++) { var currentCount = await CdpEvalAsync( $"document.querySelectorAll(\"{EscapeForJs(card)} .run-entry\").length.toString()"); if (int.TryParse(currentCount, out var n) && n >= 2) break; await Task.Delay(2000); }

Flagged by: 3/3 reviewers

github-actions · 2026-04-22T21:20:18Z

+          # Run execution tests (slow — waits for tasks to fire)
+          POLYPILOT_AGENT_PORT=$PORT dotnet test PolyPilot.IntegrationTests \
+            --filter "Category=ScheduledTaskExecution" \
            --nologo --verbosity normal 2>&1 || true


🟡 MODERATE — Execution tests are permanently non-gating (|| true)

The PR description says these tests "close the biggest gap: proving the timer fires, prompts are dispatched, and runs are recorded." But with || true, a complete scheduler regression passes CI green. These tests can never fail the build.

Fix: Remove || true once the tests are stable, or document explicitly that these are informational-only. Consider a separate CI job with continue-on-error: true for visibility without blocking.

Flagged by: 3/3 reviewers

github-actions · 2026-04-22T21:20:18Z

+            Assert.False(string.IsNullOrWhiteSpace(sessionName), "Run entry should show session name");
+        }
+
+        await DeleteTaskAsync(taskName);


🟡 MODERATE — No try/finally cleanup; orphaned tasks accumulate on failure

DeleteTaskAsync(taskName) is only called at the end of the test body. If any assertion throws before reaching it (e.g., the Assert.True(hasHistory, ...) at line 231), the task remains in the scheduler. All tests share a single AppFixture, so orphaned tasks persist across test runs. The 1-minute interval task from ScheduledExecution_TaskFiresAutomatically is especially problematic — it keeps firing and can pollute UI state for subsequent tests.

Fix: Wrap each test body in try/finally:

try { /* test body */ } finally { await DeleteTaskAsync(taskName); }

Flagged by: 2/3 reviewers

github-actions · 2026-04-22T21:20:18Z

+          # Run UI lifecycle tests first (fast)
          POLYPILOT_AGENT_PORT=$PORT dotnet test PolyPilot.IntegrationTests \
            --filter "Category=ScheduledTasks" \
+            --nologo --verbosity normal 2>&1


🟡 MODERATE — First batch lost || true, making existing lifecycle tests a blocking gate

Before this PR, the entire ScheduledTasks run was non-blocking (|| true). This change removes that guard from the first batch. Any flaky existing test on a slow CI runner now fails the workflow step, preventing the artifact upload step (screenshots) from running.

Combined with the trait overlap issue above, the slow execution tests also run in this "fast" batch — a timing failure in the 120s scheduler wait aborts the step entirely.

Fix: Restore || true on the first batch, or deliberately ensure only stable tests run here.

Flagged by: 2/3 reviewers

github-actions · 2026-04-22T21:20:18Z

+        // The task creates a new session, sends the prompt, and waits up to 11 minutes.
+        // For a simple "echo" prompt, it should complete in ~30 seconds.
+        var hasHistory = false;
+        for (var i = 0; i < 90; i++) // 90 seconds max


🟢 MINOR — Comment says "90 seconds max" but actual max is 180s

for (var i = 0; i < 90; i++) with await Task.Delay(2000) = 90 × 2s = 180 seconds max wait, not 90. Either use i < 45 for a true 90s timeout, or update the comment to "180 seconds max."

Flagged by: 2/3 reviewers

github-actions · 2026-04-22T21:20:18Z

+        Output.WriteLine($"Session names: {sessions}");
+
+        var names = sessions.Split('|', StringSplitOptions.RemoveEmptyEntries);
+        if (names.Length >= 2)


🟡 MODERATE — Conditional guard silently skips the uniqueness assertion

If CdpEvalAsync returns an empty string (e.g., on CDP parse error), names.Length < 2 and the if guard swallows the Assert.NotEqual entirely. The test passes green having never verified uniqueness — which is the entire point of this test. The preceding count >= 2 assertion passes via the DOM count, but the session-name extraction can independently fail.

Fix: Assert unconditionally:

Assert.True(names.Length >= 2, $"Expected 2+ session names but got '{sessions}'"); Assert.NotEqual(names[0], names[1]);

Flagged by: 2/3 reviewers

github-actions · 2026-04-22T21:20:18Z

+            if (i % 10 == 0)
+                Output.WriteLine($"Poll {i * 2}s: lastRun='{lastRunText}', status={statusExists}");
+
+            if (statusExists && !string.IsNullOrWhiteSpace(lastRunText) && lastRunText.Contains("Last run"))


🟢 MINOR — Hardcoded "Last run" string couples test to UI copy

The other two tests check statusExists && !string.IsNullOrWhiteSpace(lastRunText) without matching specific text. This test adds .Contains("Last run") which means a UI wording change (e.g., "Ran at") causes a misleading 120s timeout failure instead of a clear assertion.

Fix: Drop the .Contains("Last run") clause — statusExists && !string.IsNullOrWhiteSpace(lastRunText) is sufficient and consistent with the other tests.

Flagged by: 3/3 reviewers (1 initial + 2 confirmed in follow-up)

github-actions · 2026-04-22T21:20:19Z

+
    // ─── Helpers ───

+    private static string EscapeForJs(string value) =>


🟢 MINOR — EscapeForJs duplicates EscapeJs from base class

IntegrationTestBase already has an EscapeJs method that escapes \, ", and '. This local duplicate omits single-quote escaping — technically correct here but creates a maintenance trap. Consider promoting the base class method to protected and reusing it.

Flagged by: 2/3 reviewers

PureWeen merged commit dbf7c81 into main Apr 22, 2026

PureWeen deleted the feat/task-execution-test branch April 22, 2026 21:08

github-actions Bot reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: task execution integration tests (Run Now + auto-fire)#730

feat: task execution integration tests (Run Now + auto-fire)#730
PureWeen merged 1 commit intomainfrom
feat/task-execution-test

PureWeen commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

github-actions Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		// ─── Helpers ───

		private static string EscapeForJs(string value) =>

Conversation

PureWeen commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Code Review — Reviewer 1

🔴 Finding 1 — Execution tests run twice in CI (trait filter overlap)

🟡 Finding 2 — RunNow_CreatesRunHistory timeout comment is wrong (180 s, not 90 s)

🟡 Finding 3 — No try-finally cleanup for scheduled tasks (leaked 1-min timer)

🟡 Finding 4 — RunNow_TwiceCreatesUniqueSessionNames doesn't poll for second run

🟢 Finding 5 — Session uniqueness assertion silently skipped

🟢 Finding 6 — || true removed from lifecycle tests, blocks execution tests on failure

🟢 Finding 7 — EscapeForJs is a weaker duplicate of base class EscapeJs

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review Summary — PR #730: Task Execution Integration Tests

Findings by Severity

Key Recommendation

CI Status

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🟡 Finding 2 — `RunNow_CreatesRunHistory` timeout comment is wrong (180 s, not 90 s)

🟡 Finding 4 — `RunNow_TwiceCreatesUniqueSessionNames` doesn't poll for second run

🟢 Finding 6 — `|| true` removed from lifecycle tests, blocks execution tests on failure

🟢 Finding 7 — `EscapeForJs` is a weaker duplicate of base class `EscapeJs`