Skip to content

feat: broker core with single-threaded scheduler loop#4

Merged
vieiralucas merged 3 commits intomainfrom
feat/1-4-broker-core-scheduler-loop
Feb 11, 2026
Merged

feat: broker core with single-threaded scheduler loop#4
vieiralucas merged 3 commits intomainfrom
feat/1-4-broker-core-scheduler-loop

Conversation

@vieiralucas
Copy link
Copy Markdown
Member

@vieiralucas vieiralucas commented Feb 11, 2026

Summary

  • Implement Redis-inspired single-threaded scheduler architecture with a dedicated std::thread running a tight event loop
  • Add SchedulerCommand enum with Enqueue, Ack, Nack, RegisterConsumer, UnregisterConsumer, Shutdown variants using tokio::sync::oneshot for request-response
  • Add Broker struct that manages the scheduler thread lifecycle with graceful shutdown and Drop safety
  • Add BrokerConfig with TOML deserialization and sensible defaults (server + scheduler sections)
  • Add tracing subscriber setup with JSON/pretty-print mode selection
  • Command handlers are stubs to be filled in by subsequent stories (1.5-1.8)

Test plan

  • 5 scheduler tests (shutdown, FIFO ordering, enqueue/ack reply, channel disconnect)
  • 3 broker tests (start/shutdown, enqueue command processing, Drop cleanup)
  • 4 config tests (defaults, full TOML override, empty TOML defaults, partial config)
  • All 28 tests pass via cargo nextest run
  • cargo clippy -- -D warnings passes
  • cargo fmt --check passes

Summary by cubic

Implements a single-threaded scheduler core with a Broker, structured logging, and graceful shutdown, plus a RocksDB-backed storage layer with ordered keys and atomic batches. Satisfies Story 1.4 (Broker Core & Scheduler Loop) and completes Story 1.3 (Core Domain Types & Storage), laying groundwork for enqueue/ack/nack in later stories.

  • New Features

    • Broker spawns the "fila-scheduler" thread with bounded crossbeam commands and clean shutdown.
    • SchedulerCommand enum with oneshot replies where needed; ReadyMessage for consumer delivery.
    • BrokerConfig parses from TOML with defaults; telemetry::init_tracing uses EnvFilter (pretty in debug, JSON in release).
  • Bug Fixes and Refactors

    • Broker::send_command uses try_send to avoid blocking; returns ChannelFull/ChannelDisconnected.
    • Explicit, per-command error types and StorageError::ColumnFamilyNotFound; RocksDbStorage uses a cf() helper for safe CF access; documented explicit error mapping in CLAUDE.md.

Written for commit acc9695. Summary will update on new commits.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 11 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="crates/fila-core/src/broker/mod.rs">

<violation number="1" location="crates/fila-core/src/broker/mod.rs:58">
P2: send_command blocks when the channel is full despite the method contract saying it returns an error on full. Use try_send and map Full vs Disconnected to avoid hanging callers.</violation>
</file>

<file name="_bmad-output/implementation-artifacts/1-4-broker-core-scheduler-loop.md">

<violation number="1" location="_bmad-output/implementation-artifacts/1-4-broker-core-scheduler-loop.md:15">
P3: Acceptance Criteria lists `Admin`, but the rest of the story uses `Shutdown`. This inconsistency makes the required command set unclear. Align AC #3 with the `Shutdown` variant used elsewhere.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread crates/fila-core/src/broker/mod.rs Outdated
Comment thread _bmad-output/implementation-artifacts/1-4-broker-core-scheduler-loop.md Outdated
Comment thread crates/fila-core/src/broker/mod.rs Outdated
scheduler.run();
})
.map_err(|e| {
FilaError::StorageError(format!("failed to spawn scheduler thread: {e}"))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a StorageError here feels like using it as a fallback for any kind of error. I’d like us to be very diligent about defining and using meaningful, specific errors.

Comment thread crates/fila-core/src/broker/mod.rs Outdated
pub fn send_command(&self, cmd: SchedulerCommand) -> Result<()> {
self.command_tx.try_send(cmd).map_err(|e| match e {
crossbeam_channel::TrySendError::Full(_) => {
FilaError::StorageError("scheduler command channel full".to_string())
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a StorageError here feels like using it as a fallback for any kind of error. I’d like us to be very diligent about defining and using meaningful, specific errors.

Comment thread crates/fila-core/src/broker/mod.rs Outdated
FilaError::StorageError("scheduler command channel full".to_string())
}
crossbeam_channel::TrySendError::Disconnected(_) => {
FilaError::StorageError("scheduler channel disconnected".to_string())
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a StorageError here feels like using it as a fallback for any kind of error. I’d like us to be very diligent about defining and using meaningful, specific errors.

@vieiralucas vieiralucas force-pushed the feat/1-3-core-domain-types-storage-layer branch from 5c62cd7 to b468d09 Compare February 11, 2026 12:06
@vieiralucas vieiralucas force-pushed the feat/1-4-broker-core-scheduler-loop branch 2 times, most recently from 792b36c to 5ffa754 Compare February 11, 2026 13:18
@vieiralucas vieiralucas force-pushed the feat/1-3-core-domain-types-storage-layer branch from a6cdccc to 899b27b Compare February 11, 2026 15:54
Base automatically changed from feat/1-3-core-domain-types-storage-layer to main February 11, 2026 15:56
@vieiralucas vieiralucas force-pushed the feat/1-4-broker-core-scheduler-loop branch 3 times, most recently from e1f40eb to 8060632 Compare February 11, 2026 16:18
…ch-alls

add ColumnFamilyNotFound variant to StorageError and document the
explicit error mapping pattern in CLAUDE.md
implement the redis-inspired single-threaded scheduler architecture:
broker struct spawns a dedicated os thread running a scheduler event
loop that drains commands via crossbeam channel. includes scheduler
command enum with oneshot reply channels, broker config with toml
deserialization, tracing subscriber setup, and graceful shutdown.
command handlers are stubs to be filled in by subsequent stories.
@vieiralucas vieiralucas force-pushed the feat/1-4-broker-core-scheduler-loop branch from 8060632 to acc9695 Compare February 11, 2026 16:21
@vieiralucas vieiralucas merged commit eb43822 into main Feb 11, 2026
4 checks passed
@vieiralucas vieiralucas deleted the feat/1-4-broker-core-scheduler-loop branch February 11, 2026 16:22
vieiralucas added a commit that referenced this pull request Mar 18, 2026
- apply_to_broker_storage now returns Result and propagates StorageError
  instead of silently swallowing storage failures (cubic #1)
- add DeleteLeaseExpiry mutation in ack/nack replication paths to clean up
  orphaned lease expiry entries (cubic #3)
- fix no-op leased_msg_keys.retain in recovery — now properly clears
  entries for the recovering queue before rebuild (cubic #4)
- warn when create_group is called without broker_storage set (cubic #5)
- check send_command result in watch_leader_changes — only update leading
  state on success so next poll retries on failure (cubic #6, #7)
- trigger RecoverQueue on first-sight leader state to catch messages
  replicated between startup and first poll (cubic #8)
- replace catch-all _ => {} with explicit variant listing in
  apply_to_broker_storage for compiler-enforced exhaustiveness
vieiralucas added a commit that referenced this pull request Mar 18, 2026
- apply_to_broker_storage now returns Result and propagates StorageError
  instead of silently swallowing storage failures (cubic #1)
- add DeleteLeaseExpiry mutation in ack/nack replication paths to clean up
  orphaned lease expiry entries (cubic #3)
- fix no-op leased_msg_keys.retain in recovery — now properly clears
  entries for the recovering queue before rebuild (cubic #4)
- warn when create_group is called without broker_storage set (cubic #5)
- check send_command result in watch_leader_changes — only update leading
  state on success so next poll retries on failure (cubic #6, #7)
- trigger RecoverQueue on first-sight leader state to catch messages
  replicated between startup and first poll (cubic #8)
- replace catch-all _ => {} with explicit variant listing in
  apply_to_broker_storage for compiler-enforced exhaustiveness
vieiralucas added a commit that referenced this pull request Mar 25, 2026
All 5 external SDKs updated to unified API surface:
- Go PR #4, Python PR #4, JS PR #4, Ruby PR #4, Java PR #4
- batch_enqueue removed, enqueue_many added in all SDKs
- BatchMode renamed to AccumulatorMode
- No batch prefix in any public API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant