Skip to content

feat: task execution integration tests (Run Now + auto-fire)#730

Merged
PureWeen merged 1 commit intomainfrom
feat/task-execution-test
Apr 22, 2026
Merged

feat: task execution integration tests (Run Now + auto-fire)#730
PureWeen merged 1 commit intomainfrom
feat/task-execution-test

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Adds 3 tests that prove the scheduling engine actually works:

  1. RunNow_CreatesRunHistory — clicks Run Now, waits for run history
  2. ScheduledExecution_TaskFiresAutomatically — 1-min task, waits for auto-fire
  3. RunNow_TwiceCreatesUniqueSessionNames — two runs, unique sessions

These close the biggest gap: proving the timer fires, prompts are dispatched, and runs are recorded.

Three new tests that prove the scheduling engine works end-to-end:

1. RunNow_CreatesRunHistory — clicks Run Now, waits for completion,
   verifies run history entry with session name appears
2. ScheduledExecution_TaskFiresAutomatically — creates 1-min interval
   task, waits up to 120s for the timer to fire, verifies Last Run
3. RunNow_TwiceCreatesUniqueSessionNames — runs twice, verifies
   two distinct session names in history

These prove the actual task execution pipeline works: timer fires →
session created → prompt sent → run recorded → history displayed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen merged commit dbf7c81 into main Apr 22, 2026
@PureWeen PureWeen deleted the feat/task-execution-test branch April 22, 2026 21:08
@github-actions
Copy link
Copy Markdown
Contributor

Code Review — Reviewer 1

I reviewed the full ScheduledTaskTests.cs, IntegrationTestBase.cs, AppFixture.cs, and the workflow diff. 7 findings below, sorted by severity.


🔴 Finding 1 — Execution tests run twice in CI (trait filter overlap)

ScheduledTaskTests.cs:12 + lines 195, 252, 297

The class-level [Trait("Category", "ScheduledTasks")] is inherited by every method. The new execution tests add [Trait("Category", "ScheduledTaskExecution")] at the method level, so each carries both traits. In CI, --filter "Category=ScheduledTasks" (the "fast" step) matches any test whose Category trait includes "ScheduledTasks" — which includes all three execution tests.

Failing scenario: The "fast" CI step silently runs the 120–180 second execution tests, causing unexpected timeouts. Then the "slow" step runs them again, doubling wall-clock time.

Fix: Either (a) move execution tests to a separate class without the ScheduledTasks trait, or (b) change the first CI filter to --filter "Category=ScheduledTasks&Category!=ScheduledTaskExecution".


🟡 Finding 2 — RunNow_CreatesRunHistory timeout comment is wrong (180 s, not 90 s)

ScheduledTaskTests.cs:214

for (var i = 0; i < 90; i++) // 90 seconds max with await Task.Delay(2000) per iteration = 180 seconds, not 90. Compare with ScheduledExecution_TaskFiresAutomatically (line 268) which correctly comments // 120 seconds max (2s intervals) for 60 iterations.

Fix: Change comment to // 180 seconds max (2s intervals), or reduce iterations to 45.


🟡 Finding 3 — No try-finally cleanup for scheduled tasks (leaked 1-min timer)

ScheduledTaskTests.cs:259–293

DeleteTaskAsync calls (lines 248, 293, 349) are not in try/finally. If an assertion fails before cleanup, the task is leaked. Most critically, ScheduledExecution_TaskFiresAutomatically creates a 1-minute interval task — a leaked task keeps auto-firing, creating phantom sessions that can pollute later test assertions.

Fix: Wrap each execution test body in try/finally with DeleteTaskAsync in the finally block.


🟡 Finding 4 — RunNow_TwiceCreatesUniqueSessionNames doesn't poll for second run

ScheduledTaskTests.cs:325–326

After triggering the second Run Now, the test uses a flat await Task.Delay(30_000) instead of polling. The first-run check (ExistsAsync($"{card} .run-status")) remains true from the first run, so there's no condition-based detection of second-run completion.

Failing scenario: If the second run takes >30 s (agent cold start, CI load), the test finds only 1 entry and fails intermittently.

Fix: Poll for .run-entry count ≥ 2 with a timeout loop, matching the pattern used in the other execution tests.


🟢 Finding 5 — Session uniqueness assertion silently skipped

ScheduledTaskTests.cs:345–347

if (names.Length >= 2)
    Assert.NotEqual(names[0], names[1]);

If CDP eval returns fewer than 2 pipe-delimited names (e.g., .run-session elements exist but have empty text), the core invariant — unique session names — is never checked.

Fix: Change the if to Assert.True(names.Length >= 2, "Should extract at least 2 session names").


🟢 Finding 6 — || true removed from lifecycle tests, blocks execution tests on failure

Workflow diff

Before this PR, the single dotnet test had || true. After the split, only the execution step keeps it. If any lifecycle test fails, the shell exits non-zero before reaching execution tests.

Fix: Add || true to the lifecycle step, or split into separate workflow steps with continue-on-error: true.


🟢 Finding 7 — EscapeForJs is a weaker duplicate of base class EscapeJs

ScheduledTaskTests.cs:354 vs IntegrationTestBase.cs:191

EscapeForJs escapes \ and " but not '. The base class EscapeJs also escapes single quotes but is private. Currently safe because task names use only alphanumerics/hyphens, but it's a maintenance hazard.

Fix: Change IntegrationTestBase.EscapeJs to protected and remove the duplicate.


Summary: 1 critical (trait overlap defeating the fast/slow CI split), 3 moderate (wrong timeout comment, leaked timer tasks, race condition on second run), 3 minor.

Generated by Expert Code Review (auto) for issue #730 · ● 5.8M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary — PR #730: Task Execution Integration Tests

Methodology: 3 independent reviewers with adversarial consensus. Findings included only when ≥2/3 reviewers agree. 1 disputed finding was escalated to follow-up review and confirmed unanimously.


Findings by Severity

# Severity Finding Reviewers
1 🔴 CRITICAL Class-level [Trait("Category", "ScheduledTasks")] causes all 3 execution tests to match both CI filter steps — they run twice, defeating the fast/slow split 3/3
2 🟡 MODERATE First batch lost `
3 🟡 MODERATE Execution tests run with `
4 🟡 MODERATE RunNow_TwiceCreatesUniqueSessionNames uses fixed Task.Delay(30_000) instead of polling — flaky on slow CI 3/3
5 🟡 MODERATE No try/finally around test bodies — orphaned 1-min interval tasks keep firing and pollute subsequent tests 2/3
6 🟡 MODERATE if (names.Length >= 2) guard silently skips the uniqueness assertion if CDP eval returns empty 2/3
7 🟢 MINOR Hardcoded "Last run" string check — inconsistent with other tests, causes misleading timeout on UI copy change 3/3 (1 + 2 follow-up)
8 🟢 MINOR Comment says "90 seconds max" but loop is 90 × 2s = 180s 2/3
9 🟢 MINOR EscapeForJs duplicates EscapeJs from base class 2/3

Key Recommendation

Finding #1 is the highest priority: the trait inheritance issue means execution tests run in the "fast" batch (now blocking due to finding #2) and the "slow" batch. Fix by moving execution tests to a separate class or adjusting the filter.

CI Status

  • Commit status: Pending (no statuses reported)
  • Check runs: 2 agent jobs in progress, activation/pre_activation completed successfully
  • Test coverage: PR adds 3 new integration tests covering Run Now execution, auto-fire scheduling, and session uniqueness. No existing tests removed.
  • Prior reviews: None

Generated by Expert Code Review (auto) for issue #730 · ● 5.8M

// ─── Execution Tests ───

[Fact]
[Trait("Category", "ScheduledTaskExecution")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL — Tests run in BOTH CI batches due to dual trait inheritance

The class already has [Trait("Category", "ScheduledTasks")] at the class level. Adding [Trait("Category", "ScheduledTaskExecution")] per-method means xUnit matches both traits. The first CI step (--filter "Category=ScheduledTasks") picks up all 3 execution tests (they inherit the class trait), then the second step (--filter "Category=ScheduledTaskExecution") runs them again.

Impact: Slow execution tests (up to 120s each + real scheduler waits) execute twice per CI run — ~10+ wasted minutes.

Fix: Move the 3 execution tests to a separate class without the class-level ScheduledTasks trait, or add an exclusion to the first filter: --filter "Category=ScheduledTasks&Category!=ScheduledTaskExecution".

Flagged by: 3/3 reviewers

Output.WriteLine("Second Run Now triggered, waiting for completion...");

// Wait for second run to appear in history
await Task.Delay(30_000); // Give it 30 seconds
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Fixed 30s sleep instead of polling for second run completion

After triggering the second Run Now, the test sleeps a hard 30 seconds rather than polling for .run-status like the first run does (polled up to 90s). On a loaded CI machine, the agent can easily exceed 30s. The test then asserts count >= 2 on a history that may only show 1 entry, producing a spurious failure.

Fix: Replace with a polling loop:

for (var i = 0; i < 45; i++)
{
    var currentCount = await CdpEvalAsync(
        $"document.querySelectorAll(\"{EscapeForJs(card)} .run-entry\").length.toString()");
    if (int.TryParse(currentCount, out var n) && n >= 2) break;
    await Task.Delay(2000);
}

Flagged by: 3/3 reviewers

# Run execution tests (slow — waits for tasks to fire)
POLYPILOT_AGENT_PORT=$PORT dotnet test PolyPilot.IntegrationTests \
--filter "Category=ScheduledTaskExecution" \
--nologo --verbosity normal 2>&1 || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Execution tests are permanently non-gating (|| true)

The PR description says these tests "close the biggest gap: proving the timer fires, prompts are dispatched, and runs are recorded." But with || true, a complete scheduler regression passes CI green. These tests can never fail the build.

Fix: Remove || true once the tests are stable, or document explicitly that these are informational-only. Consider a separate CI job with continue-on-error: true for visibility without blocking.

Flagged by: 3/3 reviewers

Assert.False(string.IsNullOrWhiteSpace(sessionName), "Run entry should show session name");
}

await DeleteTaskAsync(taskName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — No try/finally cleanup; orphaned tasks accumulate on failure

DeleteTaskAsync(taskName) is only called at the end of the test body. If any assertion throws before reaching it (e.g., the Assert.True(hasHistory, ...) at line 231), the task remains in the scheduler. All tests share a single AppFixture, so orphaned tasks persist across test runs. The 1-minute interval task from ScheduledExecution_TaskFiresAutomatically is especially problematic — it keeps firing and can pollute UI state for subsequent tests.

Fix: Wrap each test body in try/finally:

try { /* test body */ }
finally { await DeleteTaskAsync(taskName); }

Flagged by: 2/3 reviewers

# Run UI lifecycle tests first (fast)
POLYPILOT_AGENT_PORT=$PORT dotnet test PolyPilot.IntegrationTests \
--filter "Category=ScheduledTasks" \
--nologo --verbosity normal 2>&1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — First batch lost || true, making existing lifecycle tests a blocking gate

Before this PR, the entire ScheduledTasks run was non-blocking (|| true). This change removes that guard from the first batch. Any flaky existing test on a slow CI runner now fails the workflow step, preventing the artifact upload step (screenshots) from running.

Combined with the trait overlap issue above, the slow execution tests also run in this "fast" batch — a timing failure in the 120s scheduler wait aborts the step entirely.

Fix: Restore || true on the first batch, or deliberately ensure only stable tests run here.

Flagged by: 2/3 reviewers

// The task creates a new session, sends the prompt, and waits up to 11 minutes.
// For a simple "echo" prompt, it should complete in ~30 seconds.
var hasHistory = false;
for (var i = 0; i < 90; i++) // 90 seconds max
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Comment says "90 seconds max" but actual max is 180s

for (var i = 0; i < 90; i++) with await Task.Delay(2000) = 90 × 2s = 180 seconds max wait, not 90. Either use i < 45 for a true 90s timeout, or update the comment to "180 seconds max."

Flagged by: 2/3 reviewers

Output.WriteLine($"Session names: {sessions}");

var names = sessions.Split('|', StringSplitOptions.RemoveEmptyEntries);
if (names.Length >= 2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Conditional guard silently skips the uniqueness assertion

If CdpEvalAsync returns an empty string (e.g., on CDP parse error), names.Length < 2 and the if guard swallows the Assert.NotEqual entirely. The test passes green having never verified uniqueness — which is the entire point of this test. The preceding count >= 2 assertion passes via the DOM count, but the session-name extraction can independently fail.

Fix: Assert unconditionally:

Assert.True(names.Length >= 2,
    $"Expected 2+ session names but got '{sessions}'");
Assert.NotEqual(names[0], names[1]);

Flagged by: 2/3 reviewers

if (i % 10 == 0)
Output.WriteLine($"Poll {i * 2}s: lastRun='{lastRunText}', status={statusExists}");

if (statusExists && !string.IsNullOrWhiteSpace(lastRunText) && lastRunText.Contains("Last run"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Hardcoded "Last run" string couples test to UI copy

The other two tests check statusExists && !string.IsNullOrWhiteSpace(lastRunText) without matching specific text. This test adds .Contains("Last run") which means a UI wording change (e.g., "Ran at") causes a misleading 120s timeout failure instead of a clear assertion.

Fix: Drop the .Contains("Last run") clause — statusExists && !string.IsNullOrWhiteSpace(lastRunText) is sufficient and consistent with the other tests.

Flagged by: 3/3 reviewers (1 initial + 2 confirmed in follow-up)


// ─── Helpers ───

private static string EscapeForJs(string value) =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — EscapeForJs duplicates EscapeJs from base class

IntegrationTestBase already has an EscapeJs method that escapes \, ", and '. This local duplicate omits single-quote escaping — technically correct here but creates a maintenance trap. Consider promoting the base class method to protected and reusing it.

Flagged by: 2/3 reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant