Skip to content

fix(cortex): harden startup warmup and bulletin coordination#248

Merged
jamiepine merged 4 commits intospacedriveapp:mainfrom
vsumner:fix/warmup-bulletin-coordination
Feb 27, 2026
Merged

fix(cortex): harden startup warmup and bulletin coordination#248
jamiepine merged 4 commits intospacedriveapp:mainfrom
vsumner:fix/warmup-bulletin-coordination

Conversation

@vsumner
Copy link
Contributor

@vsumner vsumner commented Feb 27, 2026

Summary

  • switch startup warmup fanout to JoinSet and enforce a bounded wait that aborts unfinished warmup tasks on timeout
  • remove unbounded post-timeout drain so startup timeout semantics remain bounded even with non-cooperative warmup tasks
  • make run_warmup_once cancellation-safe with a guard that demotes stuck Warming state to Degraded with actionable error context
  • prevent duplicate startup warmup passes by requiring Warm + refresh timestamp for initial-pass completion
  • add explicit debug observability when bulletin-loop generation is skipped due to fresh warmup bulletin
  • update cortex/config docs to define stale fallback threshold: bulletin_age_secs >= max(1, warmup.refresh_secs)
  • add targeted regressions for timeout cancellation, non-cooperative timeout bounds, initial warmup completion gating, and cancellation demotion

Testing

  • cargo fmt --all
  • cargo test -q startup_warmup_wait --bin spacebot
  • cargo test -q startup_warmup_wait_timeout_stays_bounded_for_non_cooperative_task --bin spacebot
  • cargo test -q cancelled_warmup_demotes_warming_state_to_degraded --lib
  • cargo test -q initial_warmup_completion_not_detected_when_timestamp_exists_but_state_is_not_warm --lib
  • cargo test -q bulletin_loop_generation_lock_snapshot_skips_after_fresh_update --lib
  • just preflight
  • just gate-pr

Notes (Optional)

  • Cargo surfaces a non-blocking future-incompat warning for dependency imap-proto v0.10.2; gates remain green.

Note

Changes Summary

This PR hardens the startup warmup and memory bulletin coordination by introducing a bounded timeout for warmup tasks and state machine guards for cancellation safety. Key changes include moving from unbounded async drains to a JoinSet-based approach with explicit abort on timeout, adding a WarmupRunGuard that demotes incomplete warmup states to degraded with error context, gating initial warmup completion by both state and timestamp to prevent duplicate passes, and introducing a bulletin loop snapshot check to skip redundant synthesis when the cached bulletin is still fresh. Documentation updates clarify the relationship between warmup refresh cadence and bulletin staleness thresholds. Five new regression tests ensure timeout bounds hold even for non-cooperative tasks, cancellation safety, and initial completion gating logic.

Written by Tembo for commit 3563f11

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5a050e and f05226c.

📒 Files selected for processing (2)
  • docs/content/docs/(configuration)/config.mdx
  • src/main.rs

Walkthrough

Adds a bounded per-agent startup warmup pass and integrates warmup as the primary bulletin refresh when enabled; introduces guarded warmup-run lifecycle (WarmupRunGuard), lock-aware bulletin generation, updated bulletin/warmup loops, docs updates, and related tests.

Changes

Cohort / File(s) Summary
Documentation
docs/content/docs/(configuration)/config.mdx, docs/content/docs/(core)/architecture.mdx, docs/content/docs/(core)/cortex.mdx
Document warmup as primary bulletin refresher when enabled; update readiness/dispatch criteria to reference warmup state and bulletin freshness; describe reordered startup sequence including an initial warmup pass and updated cortex loop behavior.
Agent Cortex Logic
src/agent/cortex.rs
Introduce warmup gating helpers (should_generate_bulletin_from_bulletin_loop, has_completed_initial_warmup), apply_cancelled_warmup_status, WarmupRunGuard with commit/drop semantics, maybe_generate_bulletin_under_lock, and refactor bulletin & warmup loops to use guarded generation and lock snapshots; add tests for these behaviors.
Startup Init & Tests
src/main.rs
Add wait_for_startup_warmup_tasks to drain/timeout per-agent startup warmup tasks, spawn bounded (30s) startup warmup before adapters start receiving traffic, and add tests for normal completion, timeout, cancellation, and non-cooperative tasks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: hardening startup warmup and bulletin coordination, which is the primary focus of the PR across multiple files.
Description check ✅ Passed The description is directly related to the changeset, providing clear details about warmup timeout handling, state machine guards, bulletin coordination, and regression tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/main.rs (1)

1836-1836: Minor: Prefer .ok() over let _ = for channel sends.

Per coding guidelines, channel sends where the receiver may be dropped should use .ok() rather than let _ =.

🔧 Suggested change
-            let _ = locked_tx.send(());
+            locked_tx.send(()).ok();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.rs` at line 1836, Replace the discard pattern "let _ =
locked_tx.send(())" with the idiomatic ".ok()" on the send call; specifically
change the call site that invokes locked_tx.send(()) so it becomes
locked_tx.send(()).ok() to explicitly ignore the Result when the receiver may
have been dropped (refer to the locked_tx.send(()) usage).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/main.rs`:
- Line 1836: Replace the discard pattern "let _ = locked_tx.send(())" with the
idiomatic ".ok()" on the send call; specifically change the call site that
invokes locked_tx.send(()) so it becomes locked_tx.send(()).ok() to explicitly
ignore the Result when the receiver may have been dropped (refer to the
locked_tx.send(()) usage).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3563f11 and 2cbe40f.

📒 Files selected for processing (1)
  • src/main.rs

@vsumner vsumner force-pushed the fix/warmup-bulletin-coordination branch from b4248bb to b5a050e Compare February 27, 2026 05:18
@jamiepine jamiepine merged commit 636d880 into spacedriveapp:main Feb 27, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants