Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/polypilot-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -563,8 +563,14 @@ jobs:
git checkout origin/main -- PolyPilot.IntegrationTests/
fi

# Run UI lifecycle tests first (fast)
POLYPILOT_AGENT_PORT=$PORT dotnet test PolyPilot.IntegrationTests \
--filter "Category=ScheduledTasks" \
--nologo --verbosity normal 2>&1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — First batch lost || true, making existing lifecycle tests a blocking gate

Before this PR, the entire ScheduledTasks run was non-blocking (|| true). This change removes that guard from the first batch. Any flaky existing test on a slow CI runner now fails the workflow step, preventing the artifact upload step (screenshots) from running.

Combined with the trait overlap issue above, the slow execution tests also run in this "fast" batch — a timing failure in the 120s scheduler wait aborts the step entirely.

Fix: Restore || true on the first batch, or deliberately ensure only stable tests run here.

Flagged by: 2/3 reviewers


# Run execution tests (slow — waits for tasks to fire)
POLYPILOT_AGENT_PORT=$PORT dotnet test PolyPilot.IntegrationTests \
--filter "Category=ScheduledTaskExecution" \
--nologo --verbosity normal 2>&1 || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Execution tests are permanently non-gating (|| true)

The PR description says these tests "close the biggest gap: proving the timer fires, prompts are dispatched, and runs are recorded." But with || true, a complete scheduler regression passes CI green. These tests can never fail the build.

Fix: Remove || true once the tests are stable, or document explicitly that these are informational-only. Consider a separate CI job with continue-on-error: true for visibility without blocking.

Flagged by: 3/3 reviewers


- name: Upload artifacts
Expand Down
164 changes: 164 additions & 0 deletions PolyPilot.IntegrationTests/ScheduledTaskTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,172 @@ public async Task ScheduledTasksPage_HasCorrectStructure()
Assert.False(string.IsNullOrWhiteSpace(pageText), "Page should have visible content");
}

// ─── Execution Tests ───

[Fact]
[Trait("Category", "ScheduledTaskExecution")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL — Tests run in BOTH CI batches due to dual trait inheritance

The class already has [Trait("Category", "ScheduledTasks")] at the class level. Adding [Trait("Category", "ScheduledTaskExecution")] per-method means xUnit matches both traits. The first CI step (--filter "Category=ScheduledTasks") picks up all 3 execution tests (they inherit the class trait), then the second step (--filter "Category=ScheduledTaskExecution") runs them again.

Impact: Slow execution tests (up to 120s each + real scheduler waits) execute twice per CI run — ~10+ wasted minutes.

Fix: Move the 3 execution tests to a separate class without the class-level ScheduledTasks trait, or add an exclusion to the first filter: --filter "Category=ScheduledTasks&Category!=ScheduledTaskExecution".

Flagged by: 3/3 reviewers

public async Task RunNow_CreatesRunHistory()
{
await WaitForCdpReadyAsync();
await NavigateToAsync("Scheduled Tasks", "#scheduled-tasks-page");

var taskName = $"RunNow-Test-{DateTimeOffset.UtcNow.ToUnixTimeSeconds()}";
await CreateIntervalTaskAsync(taskName, "echo run now test", 60);

var card = $".task-card[data-task-name=\"{taskName}\"]";

// Click Run Now
var runResult = await ClickAsync($"{card} [data-task-action=\"run-now\"]");
Output.WriteLine($"Run Now click: {runResult}");

// Wait for the run to complete — poll for run history to appear
// The task creates a new session, sends the prompt, and waits up to 11 minutes.
// For a simple "echo" prompt, it should complete in ~30 seconds.
var hasHistory = false;
for (var i = 0; i < 90; i++) // 90 seconds max
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Comment says "90 seconds max" but actual max is 180s

for (var i = 0; i < 90; i++) with await Task.Delay(2000) = 90 × 2s = 180 seconds max wait, not 90. Either use i < 45 for a true 90s timeout, or update the comment to "180 seconds max."

Flagged by: 2/3 reviewers

{
// Check if a run-status indicator appeared
var statusExists = await ExistsAsync($"{card} .run-status");
var lastRunText = await GetTextAsync($"{card} .last-run");
Output.WriteLine($"Poll {i}: statusExists={statusExists}, lastRun='{lastRunText}'");

if (statusExists && !string.IsNullOrWhiteSpace(lastRunText))
{
hasHistory = true;
break;
}
await Task.Delay(2000);
}

await ScreenshotAsync("after-run-now");

Assert.True(hasHistory, "Run Now should produce a run history entry with status indicator");

// Expand history and verify run entry
await ClickAsync($"{card} .history-toggle");
await Task.Delay(1000);

var historyVisible = await ExistsAsync($"{card} .run-history");
if (historyVisible)
{
var runEntryExists = await ExistsAsync($"{card} .run-entry");
Assert.True(runEntryExists, "Run history should contain at least one run entry");

var sessionName = await GetTextAsync($"{card} .run-entry .run-session");
Output.WriteLine($"Run session name: '{sessionName}'");
Assert.False(string.IsNullOrWhiteSpace(sessionName), "Run entry should show session name");
}

await DeleteTaskAsync(taskName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — No try/finally cleanup; orphaned tasks accumulate on failure

DeleteTaskAsync(taskName) is only called at the end of the test body. If any assertion throws before reaching it (e.g., the Assert.True(hasHistory, ...) at line 231), the task remains in the scheduler. All tests share a single AppFixture, so orphaned tasks persist across test runs. The 1-minute interval task from ScheduledExecution_TaskFiresAutomatically is especially problematic — it keeps firing and can pollute UI state for subsequent tests.

Fix: Wrap each test body in try/finally:

try { /* test body */ }
finally { await DeleteTaskAsync(taskName); }

Flagged by: 2/3 reviewers

}

[Fact]
[Trait("Category", "ScheduledTaskExecution")]
public async Task ScheduledExecution_TaskFiresAutomatically()
{
await WaitForCdpReadyAsync();
await NavigateToAsync("Scheduled Tasks", "#scheduled-tasks-page");

// Create a 1-minute interval task
var taskName = $"AutoRun-Test-{DateTimeOffset.UtcNow.ToUnixTimeSeconds()}";
await CreateIntervalTaskAsync(taskName, "echo scheduled execution test", 1);

var card = $".task-card[data-task-name=\"{taskName}\"]";
Output.WriteLine("Waiting up to 120s for the scheduled task to fire automatically...");

// Wait for the task to fire — the scheduler evaluates every 30 seconds,
// so a 1-minute interval task should fire within ~90 seconds.
var hasFired = false;
for (var i = 0; i < 60; i++) // 120 seconds max (2s intervals)
{
var lastRunText = await GetTextAsync($"{card} .last-run");
var statusExists = await ExistsAsync($"{card} .run-status");

if (i % 10 == 0)
Output.WriteLine($"Poll {i * 2}s: lastRun='{lastRunText}', status={statusExists}");

if (statusExists && !string.IsNullOrWhiteSpace(lastRunText) && lastRunText.Contains("Last run"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Hardcoded "Last run" string couples test to UI copy

The other two tests check statusExists && !string.IsNullOrWhiteSpace(lastRunText) without matching specific text. This test adds .Contains("Last run") which means a UI wording change (e.g., "Ran at") causes a misleading 120s timeout failure instead of a clear assertion.

Fix: Drop the .Contains("Last run") clause — statusExists && !string.IsNullOrWhiteSpace(lastRunText) is sufficient and consistent with the other tests.

Flagged by: 3/3 reviewers (1 initial + 2 confirmed in follow-up)

{
hasFired = true;
Output.WriteLine($"Task fired! lastRun='{lastRunText}'");
break;
}
await Task.Delay(2000);
}

await ScreenshotAsync("after-scheduled-execution");

Assert.True(hasFired, "1-minute interval task should fire automatically within 120 seconds");

// Verify the next-run timer is shown
var nextRun = await GetTextAsync($"{card} .next-run");
Output.WriteLine($"Next run: '{nextRun}'");

await DeleteTaskAsync(taskName);
}

[Fact]
[Trait("Category", "ScheduledTaskExecution")]
public async Task RunNow_TwiceCreatesUniqueSessionNames()
{
await WaitForCdpReadyAsync();
await NavigateToAsync("Scheduled Tasks", "#scheduled-tasks-page");

var taskName = $"Multi-Run-{DateTimeOffset.UtcNow.ToUnixTimeSeconds()}";
await CreateIntervalTaskAsync(taskName, "echo multi run test", 60);

var card = $".task-card[data-task-name=\"{taskName}\"]";

// Run Now — first execution
await ClickAsync($"{card} [data-task-action=\"run-now\"]");
Output.WriteLine("First Run Now triggered, waiting for completion...");

// Wait for first run
for (var i = 0; i < 45; i++)
{
if (await ExistsAsync($"{card} .run-status"))
break;
await Task.Delay(2000);
}
Assert.True(await ExistsAsync($"{card} .run-status"), "First run should complete");

// Run Now — second execution
await ClickAsync($"{card} [data-task-action=\"run-now\"]");
Output.WriteLine("Second Run Now triggered, waiting for completion...");

// Wait for second run to appear in history
await Task.Delay(30_000); // Give it 30 seconds
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Fixed 30s sleep instead of polling for second run completion

After triggering the second Run Now, the test sleeps a hard 30 seconds rather than polling for .run-status like the first run does (polled up to 90s). On a loaded CI machine, the agent can easily exceed 30s. The test then asserts count >= 2 on a history that may only show 1 entry, producing a spurious failure.

Fix: Replace with a polling loop:

for (var i = 0; i < 45; i++)
{
    var currentCount = await CdpEvalAsync(
        $"document.querySelectorAll(\"{EscapeForJs(card)} .run-entry\").length.toString()");
    if (int.TryParse(currentCount, out var n) && n >= 2) break;
    await Task.Delay(2000);
}

Flagged by: 3/3 reviewers


// Expand history
await ClickAsync($"{card} .history-toggle");
await Task.Delay(1000);

// Count run entries
var runCount = await CdpEvalAsync(
$"document.querySelectorAll(\"{EscapeForJs(card)} .run-entry\").length.toString()");
Output.WriteLine($"Run entries: {runCount}");

var count = int.TryParse(runCount, out var c) ? c : 0;
Assert.True(count >= 2, $"Should have at least 2 run entries after running twice, got {count}");

// Verify session names are different
var sessions = await CdpEvalAsync(
$"[...document.querySelectorAll(\"{EscapeForJs(card)} .run-entry .run-session\")].map(s => s.textContent.trim()).join('|')");
Output.WriteLine($"Session names: {sessions}");

var names = sessions.Split('|', StringSplitOptions.RemoveEmptyEntries);
if (names.Length >= 2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Conditional guard silently skips the uniqueness assertion

If CdpEvalAsync returns an empty string (e.g., on CDP parse error), names.Length < 2 and the if guard swallows the Assert.NotEqual entirely. The test passes green having never verified uniqueness — which is the entire point of this test. The preceding count >= 2 assertion passes via the DOM count, but the session-name extraction can independently fail.

Fix: Assert unconditionally:

Assert.True(names.Length >= 2,
    $"Expected 2+ session names but got '{sessions}'");
Assert.NotEqual(names[0], names[1]);

Flagged by: 2/3 reviewers

Assert.NotEqual(names[0], names[1]);

await DeleteTaskAsync(taskName);
}

// ─── Helpers ───

private static string EscapeForJs(string value) =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — EscapeForJs duplicates EscapeJs from base class

IntegrationTestBase already has an EscapeJs method that escapes \, ", and '. This local duplicate omits single-quote escaping — technically correct here but creates a maintenance trap. Consider promoting the base class method to protected and reusing it.

Flagged by: 2/3 reviewers

value.Replace("\\", "\\\\").Replace("\"", "\\\"");


private async Task CreateIntervalTaskAsync(string name, string prompt, int intervalMinutes)
{
await ClickAsync("#scheduled-task-new");
Expand Down
Loading