feat: 14.4 replication, failover & recovery by vieiralucas · Pull Request #59 · faiscadev/fila

vieiralucas · 2026-03-11T20:50:20Z

Summary

Raft data replication: Queue-level Raft state machines now apply committed entries (enqueue, ack, nack) to broker storage on ALL nodes — not just the leader. Followers have replicated data at all times, enabling zero-loss failover.
Leader change detection: watch_leader_changes() background task polls queue Raft groups for leadership transitions. On leader gain → RecoverQueue (rebuild scheduler state). On leader loss → DropQueueConsumers (close consumer streams).
Per-queue scheduler recovery: recover_queue() rebuilds DRR keys, pending index, and leased_msg_keys from RocksDB for a single queue without disrupting other queues.
Consumer stream leader-awareness: consume() handler rejects non-leader nodes with UNAVAILABLE status, directing clients to reconnect to the leader.

Acceptance Criteria Covered

Raft leader replicates everything via log — quorum-committed writes
Automatic failover with new leader election within 10 seconds
Consumer disconnection and reconnection on node failure
Node rejoin and catch-up via Raft log/snapshot
Integration tests: kill → failover → zero message loss → rejoin
Scheduler state rebuild on leader promotion
Single-node mode behavior unchanged (316/316 tests pass)

Test Summary

316 tests total (up from 313), 0 failures
3 new integration tests:
- test_cluster_failover_new_leader_elected — kill leader, verify new leader <10s
- test_cluster_failover_zero_message_loss — enqueue 5, kill leader, consume all 5
- test_cluster_node_rejoin_catchup — kill node, enqueue, restart, verify catch-up

Files Changed

File	Change
`crates/fila-core/src/broker/command.rs`	RecoverQueue, DropQueueConsumers commands
`crates/fila-core/src/broker/scheduler/mod.rs`	Command handlers
`crates/fila-core/src/broker/scheduler/recovery.rs`	recover_queue(), drop_queue_consumers()
`crates/fila-core/src/cluster/mod.rs`	watch_leader_changes()
`crates/fila-core/src/cluster/multi_raft.rs`	broker_storage, snapshot_groups()
`crates/fila-core/src/cluster/store.rs`	apply_to_broker_storage()
`crates/fila-core/src/cluster/tests.rs`	3 failover/rejoin tests
`crates/fila-server/src/service.rs`	Leader check in consume()
`crates/fila-server/src/main.rs`	Wiring: leader watcher, broker storage

🤖 Generated with Claude Code

Summary by cubic

Adds queue data replication and fast failover with zero‑loss recovery. Queue Raft groups apply committed enqueue/ack/nack to local storage on all nodes; on leader change the new leader rebuilds per‑queue scheduler state and consumers reconnect to the leader. Implements Story 14.4.

New Features
- Queue Raft groups apply committed enqueue/ack/nack to broker RocksDB on all nodes; server wires broker storage into MultiRaft and passes it at construction to MultiRaftManager/FilaRaftStore::for_queue.
- Leadership watcher: watch_leader_changes() sends RecoverQueue on promotion and DropQueueConsumers on loss; triggers recovery on first‑sight leader and retries on send failure. consume() rejects non‑leaders with UNAVAILABLE and returns NOT_FOUND if the group is missing.
- Per‑queue recovery rebuilds DRR keys, pending index, and leased keys from RocksDB without touching other queues; delivers pending after recovery.
- Hardening: propagate storage errors from apply_to_broker_storage(), delete orphaned lease‑expiry entries on ack/nack, warn if a group is created without broker storage, and use explicit match arms for exhaustiveness.
- Integration tests cover leader election (<10s), zero‑loss failover, and node rejoin/catch‑up; single‑node mode unchanged.
Migration
- Clients must handle UNAVAILABLE from consume() by reconnecting to the queue leader.
- No config changes; the leader watcher runs only in cluster mode.

^{Written for commit 149a325. Summary will update on new commits.}

cubic-dev-ai

8 issues found across 12 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/fila-core/src/cluster/store.rs">

<violation number="1" location="crates/fila-core/src/cluster/store.rs:413">
P1: `apply_to_broker_storage` silently swallows storage errors (log-only). Since this runs in the Raft `apply_to_state_machine` path, a failed mutation means the local broker storage diverges from the committed log. On failover, the new leader could be missing data. Consider returning a `Result` and propagating the error as a `StorageError` from `apply_to_state_machine`, which would let Raft handle the failure appropriately.</violation>

<violation number="2" location="crates/fila-core/src/cluster/store.rs:607">
P1: Ack and Nack both do a linear scan of all messages in the queue (`list_messages`) to find a single message by ID. This runs in the Raft `apply_to_state_machine` path on every node for every committed entry. Consider adding a secondary index (msg_id → storage key) or including the full storage key in the `ClusterRequest::Ack`/`Nack` variants so followers can do a direct lookup.</violation>

<violation number="3" location="crates/fila-core/src/cluster/store.rs:614">
P1: Lease expiry entries are not cleaned up on Ack (or Nack). The comment says "Also clean up any lease/lease_expiry entries" but only `DeleteLease` is emitted — no `DeleteLeaseExpiry`. Parse the expiry timestamp from the lease value (via `parse_expiry_from_lease_value`) to construct the `lease_expiry_key` and add a `Mutation::DeleteLeaseExpiry` to the batch. Without this, orphaned expiry entries will trigger spurious expiration attempts.</violation>
</file>

<file name="crates/fila-core/src/broker/scheduler/recovery.rs">

<violation number="1" location="crates/fila-core/src/broker/scheduler/recovery.rs:337">
P1: No-op retain on `leased_msg_keys` leaves stale entries after per-queue recovery. Unlike `pending` and `pending_by_id` (which are properly filtered to remove this queue's entries), `retain(|_, _| true)` removes nothing. After the rebuild loop, messages that are no longer leased will still have ghost entries in `leased_msg_keys`, causing inconsistent scheduler state (e.g. wrong lease counts in metrics, stale lookups in `reclaim_expired_leases`).</violation>
</file>

<file name="crates/fila-core/src/cluster/multi_raft.rs">

<violation number="1" location="crates/fila-core/src/cluster/multi_raft.rs:62">
P1: `create_group()` should fail when broker storage is unset; currently it silently creates a queue Raft store that skips applying committed entries to broker storage.</violation>
</file>

<file name="crates/fila-core/src/cluster/mod.rs">

<violation number="1" location="crates/fila-core/src/cluster/mod.rs:347">
P1: `let _ =` silently discards the `send_command` result, then `leading` is unconditionally updated. If the command channel is full during failover load, recovery is lost and never retried because `was_leader` will be `true` on the next poll.

Only update `leading` on success; log and skip the update on failure so the next poll retries.</violation>

<violation number="2" location="crates/fila-core/src/cluster/mod.rs:354">
P2: Same silent-discard pattern: if `DropQueueConsumers` fails to send, `leading` is set to `false` and the drop is never retried. Consumers would remain connected to a non-leader, receiving stale state or errors.</violation>

<violation number="3" location="crates/fila-core/src/cluster/mod.rs:360">
P1: When the watcher first discovers a queue where this node is already leader, it records the state but does not trigger `RecoverQueue`. Any messages replicated to RocksDB between initial startup recovery and the first poll will be missing from the in-memory scheduler (DRR, pending index), so they won't be delivered.

Trigger recovery on first sight when `is_leader` is true, matching the `is_leader && !was_leader` branch.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-11T20:58:09Z

+            }
+            super::types::ClusterRequest::Ack { queue_id, msg_id } => {
+                // Find the message and its lease, then delete them.
+                let msg_prefix = crate::storage::keys::message_prefix(queue_id);


P1: Ack and Nack both do a linear scan of all messages in the queue (list_messages) to find a single message by ID. This runs in the Raft apply_to_state_machine path on every node for every committed entry. Consider adding a secondary index (msg_id → storage key) or including the full storage key in the ClusterRequest::Ack/Nack variants so followers can do a direct lookup.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/fila-core/src/cluster/store.rs, line 607: <comment>Ack and Nack both do a linear scan of all messages in the queue (`list_messages`) to find a single message by ID. This runs in the Raft `apply_to_state_machine` path on every node for every committed entry. Consider adding a secondary index (msg_id → storage key) or including the full storage key in the `ClusterRequest::Ack`/`Nack` variants so followers can do a direct lookup.</comment> <file context> @@ -541,6 +569,108 @@ impl RaftStorage<TypeConfig> for FilaRaftStore { + } + super::types::ClusterRequest::Ack { queue_id, msg_id } => { + // Find the message and its lease, then delete them. + let msg_prefix = crate::storage::keys::message_prefix(queue_id); + match storage.list_messages(&msg_prefix) { + Ok(messages) => { </file context>

Dismissed — accepted for now.

The ClusterRequest::Ack only carries (queue_id, msg_id) but the storage key requires (queue_id, fairness_key, enqueued_at, msg_id). A proper fix requires either a secondary index or carrying the full key in ClusterRequest, which changes the Raft log format. For current workloads (single-digit to low-thousands messages per queue), this is acceptable. Deferring to a future optimization pass.

Understood—thanks for the context. I’ll treat this as acceptable for current workloads and leave it as a future optimization.

github-actions · 2026-03-11T21:01:19Z

Benchmark Results (median of 3 runs)

Commit: 62d6311
Time: 2026-03-11T20:57:59Z

Benchmark	Value	Unit
compaction_active_p99	0.473112	ms
compaction_idle_p99	0.469832	ms
compaction_p99_delta	0.001124000000000014	ms
consumer_concurrency_100_throughput	1890.3333333333333	msg/s
consumer_concurrency_10_throughput	1975.0	msg/s
consumer_concurrency_1_throughput	356.6666666666667	msg/s
e2e_latency_p50_light	0.404447	ms
e2e_latency_p95_light	0.455877	ms
e2e_latency_p99_light	0.6094040000000001	ms
enqueue_throughput_1kb	2696.7418220281847	msg/s
enqueue_throughput_1kb_mbps	2.633536935574399	MB/s
fairness_accuracy_max_deviation	0.1999999999999988	% deviation
fairness_accuracy_tenant-1	0.1999999999999988	% deviation
fairness_accuracy_tenant-2	0.1999999999999988	% deviation
fairness_accuracy_tenant-3	0.099999999999989	% deviation
fairness_accuracy_tenant-4	0.099999999999989	% deviation
fairness_accuracy_tenant-5	0.099999999999989	% deviation
fairness_overhead_fair_throughput	1424.5716548601458	msg/s
fairness_overhead_fifo_throughput	1464.9117913130938	msg/s
fairness_overhead_pct	2.844595961565055	%
key_cardinality_10_throughput	2447.183884629227	msg/s
key_cardinality_10k_throughput	663.230675843288	msg/s
key_cardinality_1k_throughput	1272.2347427741768	msg/s
lua_on_enqueue_overhead_us	10.181648252500167	us
lua_throughput_with_hook	1186.2266369922424	msg/s
memory_per_message_overhead	3133.0304	bytes/msg
memory_rss_idle	166.2421875	MB
memory_rss_loaded_10k	195.85546875	MB

github-actions · 2026-03-11T21:01:37Z

Benchmark Results (median of 3 runs)

Commit: 867f1fd
Time: 2026-03-11T20:58:30Z

Benchmark	Value	Unit
compaction_active_p99	0.490282	ms
compaction_idle_p99	0.510239	ms
compaction_p99_delta	-0.01657700000000001	ms
consumer_concurrency_100_throughput	1793.0	msg/s
consumer_concurrency_10_throughput	1702.6666666666667	msg/s
consumer_concurrency_1_throughput	350.6666666666667	msg/s
e2e_latency_p50_light	0.416558	ms
e2e_latency_p95_light	0.5014620000000001	ms
e2e_latency_p99_light	0.631899	ms
enqueue_throughput_1kb	2615.1615512165185	msg/s
enqueue_throughput_1kb_mbps	2.5538687023598814	MB/s
fairness_accuracy_max_deviation	0.1999999999999988	% deviation
fairness_accuracy_tenant-1	0.1999999999999988	% deviation
fairness_accuracy_tenant-2	0.1999999999999988	% deviation
fairness_accuracy_tenant-3	0.099999999999989	% deviation
fairness_accuracy_tenant-4	0.099999999999989	% deviation
fairness_accuracy_tenant-5	0.099999999999989	% deviation
fairness_overhead_fair_throughput	1388.190565800054	msg/s
fairness_overhead_fifo_throughput	1428.9010057383225	msg/s
fairness_overhead_pct	3.0067873685114543	%
key_cardinality_10_throughput	2377.567703581095	msg/s
key_cardinality_10k_throughput	653.7513964310592	msg/s
key_cardinality_1k_throughput	1261.9387661039598	msg/s
lua_on_enqueue_overhead_us	17.47565649940202	us
lua_throughput_with_hook	1173.9290324682	msg/s
memory_per_message_overhead	2870.4768	bytes/msg
memory_rss_idle	166.78125	MB
memory_rss_loaded_10k	194.08984375	MB

github-actions · 2026-03-11T21:11:54Z

Benchmark Results (median of 3 runs)

Commit: 2b35b79
Time: 2026-03-11T21:08:59Z

Benchmark	Value	Unit
compaction_active_p99	0.477944	ms
compaction_idle_p99	0.4936400000000001	ms
compaction_p99_delta	-0.0156960000000001	ms
consumer_concurrency_100_throughput	1889.0	msg/s
consumer_concurrency_10_throughput	1977.3333333333333	msg/s
consumer_concurrency_1_throughput	360.6666666666667	msg/s
e2e_latency_p50_light	0.409485	ms
e2e_latency_p95_light	0.465177	ms
e2e_latency_p99_light	0.570052	ms
enqueue_throughput_1kb	2640.9942127013487	msg/s
enqueue_throughput_1kb_mbps	2.579095910841161	MB/s
fairness_accuracy_max_deviation	0.1999999999999988	% deviation
fairness_accuracy_tenant-1	0.1999999999999988	% deviation
fairness_accuracy_tenant-2	0.1999999999999988	% deviation
fairness_accuracy_tenant-3	0.099999999999989	% deviation
fairness_accuracy_tenant-4	0.099999999999989	% deviation
fairness_accuracy_tenant-5	0.099999999999989	% deviation
fairness_overhead_fair_throughput	1398.1921196577605	msg/s
fairness_overhead_fifo_throughput	1424.9603837263985	msg/s
fairness_overhead_pct	1.9245231522195592	%
key_cardinality_10_throughput	2397.0841516416017	msg/s
key_cardinality_10k_throughput	660.806842855799	msg/s
key_cardinality_1k_throughput	1246.8257274276002	msg/s
lua_on_enqueue_overhead_us	16.625616853850147	us
lua_throughput_with_hook	1171.486309320429	msg/s
memory_per_message_overhead	2862.6944	bytes/msg
memory_rss_idle	166.609375	MB
memory_rss_loaded_10k	193.74609375	MB

github-actions · 2026-03-11T21:12:45Z

Benchmark Results (median of 3 runs)

Commit: ffeb0c7
Time: 2026-03-11T21:09:39Z

Benchmark	Value	Unit
compaction_active_p99	0.503222	ms
compaction_idle_p99	0.504818	ms
compaction_p99_delta	0.012414000000000036	ms
consumer_concurrency_100_throughput	1833.3333333333333	msg/s
consumer_concurrency_10_throughput	1956.6666666666667	msg/s
consumer_concurrency_1_throughput	361.6666666666667	msg/s
e2e_latency_p50_light	0.404147	ms
e2e_latency_p95_light	0.454002	ms
e2e_latency_p99_light	0.564263	ms
enqueue_throughput_1kb	2650.9465931462855	msg/s
enqueue_throughput_1kb_mbps	2.5888150323694195	MB/s
fairness_accuracy_max_deviation	0.1999999999999988	% deviation
fairness_accuracy_tenant-1	0.1999999999999988	% deviation
fairness_accuracy_tenant-2	0.1999999999999988	% deviation
fairness_accuracy_tenant-3	0.099999999999989	% deviation
fairness_accuracy_tenant-4	0.099999999999989	% deviation
fairness_accuracy_tenant-5	0.099999999999989	% deviation
fairness_overhead_fair_throughput	1294.1419318985088	msg/s
fairness_overhead_fifo_throughput	1336.0393881901125	msg/s
fairness_overhead_pct	3.294447499028408	%
key_cardinality_10_throughput	2398.7436926372793	msg/s
key_cardinality_10k_throughput	639.6102266447645	msg/s
key_cardinality_1k_throughput	1243.9461868879553	msg/s
lua_on_enqueue_overhead_us	18.80144959855579	us
lua_throughput_with_hook	1070.4383043541943	msg/s
memory_per_message_overhead	2818.4576	bytes/msg
memory_rss_idle	168.70703125	MB
memory_rss_loaded_10k	195.9296875	MB

vieiralucas

Cubic findings addressed

Fixed in commit cca7587:

#1 (P1): apply_to_broker_storage now returns Result<(), StorageError> and propagates errors instead of silently logging
#3 (P1): Added DeleteLeaseExpiry mutations in both Ack and Nack paths
#4 (P1): Fixed no-op leased_msg_keys.retain(|_, _| true) — now properly clears entries for the recovering queue
#5 (P1): Added warning log when create_group() is called without broker_storage set
#6 (P1): send_command result is now checked; leading state only updated on success so next poll retries
#7 (P2): Same fix as #6 for DropQueueConsumers
#8 (P1): First-sight leader now triggers RecoverQueue to catch messages replicated between startup and first poll

Dismissed

#2 (P1): O(n) linear scan in Ack/Nack apply_to_broker_storage — accepted for now. The ClusterRequest::Ack only carries (queue_id, msg_id) but the storage key requires (queue_id, fairness_key, enqueued_at, msg_id). Adding a secondary index or carrying the full key in ClusterRequest would be the proper fix, but is a design change that affects the Raft log format (serialized ClusterRequest). Deferring to a future optimization pass. For the current workloads (single-digit to low-thousands messages per queue), this is acceptable.

github-actions · 2026-03-18T11:55:13Z

Benchmark Results (median of 3 runs)

Commit: 74a854a
Time: 2026-03-18T11:51:15Z

Benchmark	Value	Unit
compaction_active_p99	0.500209	ms
compaction_idle_p99	0.526993	ms
compaction_p99_delta	-0.02678400000000003	ms
consumer_concurrency_100_throughput	1763.0	msg/s
consumer_concurrency_10_throughput	2004.6666666666667	msg/s
consumer_concurrency_1_throughput	339.6666666666667	msg/s
e2e_latency_p50_light	0.421875	ms
e2e_latency_p95_light	0.4987	ms
e2e_latency_p99_light	0.6490680000000001	ms
enqueue_throughput_1kb	2580.231132098345	msg/s
enqueue_throughput_1kb_mbps	2.51975696493979	MB/s
fairness_accuracy_max_deviation	0.1999999999999988	% deviation
fairness_accuracy_tenant-1	0.1999999999999988	% deviation
fairness_accuracy_tenant-2	0.1999999999999988	% deviation
fairness_accuracy_tenant-3	0.099999999999989	% deviation
fairness_accuracy_tenant-4	0.099999999999989	% deviation
fairness_accuracy_tenant-5	0.099999999999989	% deviation
fairness_overhead_fair_throughput	1376.3734917722604	msg/s
fairness_overhead_fifo_throughput	1411.4410003042544	msg/s
fairness_overhead_pct	2.48451819979969	%
key_cardinality_10_throughput	2300.6137326121643	msg/s
key_cardinality_10k_throughput	637.1643396817127	msg/s
key_cardinality_1k_throughput	1243.2467847084158	msg/s
lua_on_enqueue_overhead_us	19.055038721792357	us
lua_throughput_with_hook	1149.6036036281848	msg/s
memory_per_message_overhead	2747.1872	bytes/msg
memory_rss_idle	166.56640625	MB
memory_rss_loaded_10k	192.6171875	MB

github-actions · 2026-03-18T12:01:11Z

Benchmark Results (median of 3 runs)

Commit: d0048e2
Time: 2026-03-18T11:57:50Z

Benchmark	Value	Unit
compaction_active_p99	0.507293	ms
compaction_idle_p99	0.506066	ms
compaction_p99_delta	0.0029319999999999347	ms
consumer_concurrency_100_throughput	1305.3333333333333	msg/s
consumer_concurrency_10_throughput	1527.3333333333333	msg/s
consumer_concurrency_1_throughput	273.3333333333333	msg/s
e2e_latency_p50_light	0.418446	ms
e2e_latency_p95_light	0.49676	ms
e2e_latency_p99_light	0.607382	ms
enqueue_throughput_1kb	2588.402565050548	msg/s
enqueue_throughput_1kb_mbps	2.5277368799321756	MB/s
fairness_accuracy_max_deviation	0.1999999999999988	% deviation
fairness_accuracy_tenant-1	0.1999999999999988	% deviation
fairness_accuracy_tenant-2	0.1999999999999988	% deviation
fairness_accuracy_tenant-3	0.099999999999989	% deviation
fairness_accuracy_tenant-4	0.099999999999989	% deviation
fairness_accuracy_tenant-5	0.099999999999989	% deviation
fairness_overhead_fair_throughput	1373.7525812071967	msg/s
fairness_overhead_fifo_throughput	1417.8567803961366	msg/s
fairness_overhead_pct	3.110624415578666	%
key_cardinality_10_throughput	2343.9744545850695	msg/s
key_cardinality_10k_throughput	619.9422824207743	msg/s
key_cardinality_1k_throughput	1237.256305879917	msg/s
lua_on_enqueue_overhead_us	26.565044982001837	us
lua_throughput_with_hook	1136.8130211625937	msg/s
memory_per_message_overhead	2782.0032	bytes/msg
memory_rss_idle	167.1171875	MB
memory_rss_loaded_10k	193.0546875	MB

- Add broker storage replication: queue-level Raft state machines now apply committed entries (enqueue, ack, nack) to the broker's RocksDB on ALL nodes, not just the leader. Followers have full data for zero-loss failover. - Add LeaderChangeWatcher: background task monitors queue Raft groups for leadership changes. On leader promotion, sends RecoverQueue to rebuild in-memory scheduler state. On leader loss, sends DropQueueConsumers to close consumer streams so clients reconnect to the new leader. - Add per-queue scheduler recovery: RecoverQueue command rebuilds DRR keys, pending index, and leased_msg_keys for a single queue from RocksDB without disrupting other queues. - Add consumer stream leader-awareness: consume() handler rejects requests on non-leader nodes with UNAVAILABLE status. - 3 new integration tests: failover leader election, zero message loss after failover, node rejoin and catchup.

- apply_to_broker_storage now returns Result and propagates StorageError instead of silently swallowing storage failures (cubic #1) - add DeleteLeaseExpiry mutation in ack/nack replication paths to clean up orphaned lease expiry entries (cubic #3) - fix no-op leased_msg_keys.retain in recovery — now properly clears entries for the recovering queue before rebuild (cubic #4) - warn when create_group is called without broker_storage set (cubic #5) - check send_command result in watch_leader_changes — only update leading state on success so next poll retries on failure (cubic #6, #7) - trigger RecoverQueue on first-sight leader state to catch messages replicated between startup and first poll (cubic #8) - replace catch-all _ => {} with explicit variant listing in apply_to_broker_storage for compiler-enforced exhaustiveness

…eLock RocksDB exists before both Broker and ClusterManager, so there's no chicken-and-egg problem. Pass Arc<dyn StorageEngine> directly to MultiRaftManager::new and make FilaRaftStore::for_queue take it non-optionally.

github-actions · 2026-03-18T21:32:05Z

Benchmark Results (median of 3 runs)

Commit: 62241aa
Time: 2026-03-18T21:29:03Z

Benchmark	Value	Unit
compaction_active_p99	0.491021	ms
compaction_idle_p99	0.482967	ms
compaction_p99_delta	0.015488000000000002	ms
consumer_concurrency_100_throughput	1792.6666666666667	msg/s
consumer_concurrency_10_throughput	1945.0	msg/s
consumer_concurrency_1_throughput	352.0	msg/s
e2e_latency_p50_light	0.416752	ms
e2e_latency_p95_light	0.483415	ms
e2e_latency_p99_light	0.6693680000000001	ms
enqueue_throughput_1kb	2636.0487771399385	msg/s
enqueue_throughput_1kb_mbps	2.574266383925721	MB/s
fairness_accuracy_max_deviation	0.1999999999999988	% deviation
fairness_accuracy_tenant-1	0.1999999999999988	% deviation
fairness_accuracy_tenant-2	0.1999999999999988	% deviation
fairness_accuracy_tenant-3	0.099999999999989	% deviation
fairness_accuracy_tenant-4	0.099999999999989	% deviation
fairness_accuracy_tenant-5	0.099999999999989	% deviation
fairness_overhead_fair_throughput	1395.6475839772509	msg/s
fairness_overhead_fifo_throughput	1430.5437157259312	msg/s
fairness_overhead_pct	2.324509129365193	%
key_cardinality_10_throughput	2402.2694041399504	msg/s
key_cardinality_10k_throughput	653.8055834278072	msg/s
key_cardinality_1k_throughput	1265.2077749572184	msg/s
lua_on_enqueue_overhead_us	18.82712041212744	us
lua_throughput_with_hook	1170.0446041564696	msg/s
memory_per_message_overhead	2960.1792	bytes/msg
memory_rss_idle	166.12109375	MB
memory_rss_loaded_10k	194.37890625	MB

vieiralucas added a commit that referenced this pull request Mar 11, 2026

chore: record PR #59 for story 14.4

0520758

cubic-dev-ai Bot reviewed Mar 11, 2026

View reviewed changes

vieiralucas added a commit that referenced this pull request Mar 11, 2026

chore: update execution state — PR #59 green, pr-complete

286bc39

vieiralucas commented Mar 18, 2026

View reviewed changes

vieiralucas force-pushed the feat/14.3-request-routing-transparent-delivery branch from 95de06f to 8c8bcc1 Compare March 18, 2026 11:43

vieiralucas added a commit that referenced this pull request Mar 18, 2026

chore: record PR #59 for story 14.4

13432ee

vieiralucas added a commit that referenced this pull request Mar 18, 2026

chore: update execution state — PR #59 green, pr-complete

76f9f0e

vieiralucas force-pushed the feat/14.4-replication-failover-recovery branch from b1a911e to 157b463 Compare March 18, 2026 11:43

vieiralucas force-pushed the feat/14.3-request-routing-transparent-delivery branch from 8c8bcc1 to d9a736b Compare March 18, 2026 14:40

Base automatically changed from feat/14.3-request-routing-transparent-delivery to main March 18, 2026 20:45

vieiralucas added 10 commits March 18, 2026 17:46

chore: create story 14.4 spec — replication, failover & recovery

b3a983a

chore: update story 14.4 tracking — dev complete, entering code review

2b65f11

chore: update execution state — code review passed, entering pr-ci

257139b

chore: update sprint status — story 14.4 in review

7b41c77

chore: record PR #59 for story 14.4

7e17ed6

chore: update execution state — PR #59 green, pr-complete

a010e31

chore: mark story 14.4 done — all tracking updated

cee1f15

fix: rustfmt formatting in apply_to_broker_storage

3afdd6f

vieiralucas force-pushed the feat/14.4-replication-failover-recovery branch from b09a368 to 3afdd6f Compare March 18, 2026 20:51

vieiralucas mentioned this pull request Mar 18, 2026

cluster: ack/nack does linear scan of all messages in raft apply path #64

Closed

vieiralucas merged commit 8969ac9 into main Mar 18, 2026
9 checks passed

vieiralucas deleted the feat/14.4-replication-failover-recovery branch March 18, 2026 21:41

Conversation

vieiralucas commented Mar 11, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Acceptance Criteria Covered

Test Summary

Files Changed

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vieiralucas Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 11, 2026

Benchmark Results (median of 3 runs)

Uh oh!

github-actions Bot commented Mar 11, 2026

Benchmark Results (median of 3 runs)

Uh oh!

github-actions Bot commented Mar 11, 2026

Benchmark Results (median of 3 runs)

Uh oh!

github-actions Bot commented Mar 11, 2026

Benchmark Results (median of 3 runs)

Uh oh!

vieiralucas left a comment

Choose a reason for hiding this comment

Cubic findings addressed

Dismissed

Uh oh!

github-actions Bot commented Mar 18, 2026

Benchmark Results (median of 3 runs)

Uh oh!

github-actions Bot commented Mar 18, 2026

Benchmark Results (median of 3 runs)

Uh oh!

github-actions Bot commented Mar 18, 2026

Benchmark Results (median of 3 runs)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vieiralucas commented Mar 11, 2026 •

edited by cubic-dev-ai Bot

Loading

cubic-dev-ai Bot Mar 11, 2026 •

edited

Loading