feat: 26.1 - rust sdk auto-batching (linger_ms timer)#102
Conversation
- Combined retro covering performance optimization pipeline (Epics 22-24) - Add Epic 26 (SDK Batch Operations & Auto-Batching) to epics and sprint-status - Add Epic 27 (Profiling Infrastructure) to epics and sprint-status - Trim CLAUDE.md: remove stale sections (Future Phases, Raft backward compat) - Add profile-first rule to CLAUDE.md - Relocate Raft backward compat rule to code comment on ClusterRequest
When BatchConfig with linger_ms is set via ConnectOptions, enqueue() buffers messages and flushes via BatchEnqueue RPC when either batch_size messages accumulate or linger_ms milliseconds elapse. Partial failures propagate individual results to each caller. When auto-batching is disabled (default), enqueue() uses the existing single-message RPC with zero behavior change.
There was a problem hiding this comment.
6 issues found across 12 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="crates/fila-sdk/src/client.rs">
<violation number="1" location="crates/fila-sdk/src/client.rs:667">
P2: `zip` silently drops trailing `BatchItem`s when the server returns fewer results than messages sent. Callers whose items have no matching result will see a confusing "auto-batcher dropped result channel" error. Check the length and handle mismatches explicitly.</violation>
</file>
<file name="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md">
<violation number="1" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:3">
P3: Set story status to `completed` at PR creation instead of `review` to match the project’s execute-epic workflow.
(Based on your team's feedback about marking stories completed when opening a PR.) [FEEDBACK_USED]</violation>
<violation number="2" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:60">
P2: The test checklist is inconsistent with AC #10: it claims partial-failure propagation is verified, but no partial-failure test is listed. Add that test entry (or adjust AC wording) so completion status is accurate.</violation>
</file>
<file name="crates/fila-sdk/tests/integration.rs">
<violation number="1" location="crates/fila-sdk/tests/integration.rs:259">
P2: This test sends enqueues serially, so it may never exercise batch-size-triggered flushing. Send the enqueues concurrently so multiple messages are buffered before awaiting results.</violation>
<violation number="2" location="crates/fila-sdk/tests/integration.rs:284">
P2: `contains` only checks membership and can miss duplicate/missing deliveries. Remove matched IDs as you consume messages to enforce uniqueness.</violation>
</file>
<file name="_bmad-output/implementation-artifacts/epic-execution-state.yaml">
<violation number="1" location="_bmad-output/implementation-artifacts/epic-execution-state.yaml:7">
P2: Set this story status to `completed` at PR creation time; leaving it as `in-progress` breaks the repository’s execute-epic state convention.
(Based on your team's feedback about setting story status to completed when a PR is opened.) [FEEDBACK_USED]</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| @@ -0,0 +1,128 @@ | |||
| # Story 26.1: Rust SDK Auto-Batching (linger_ms Timer) | |||
|
|
|||
| Status: review | |||
There was a problem hiding this comment.
P3: Set story status to completed at PR creation instead of review to match the project’s execute-epic workflow.
(Based on your team's feedback about marking stories completed when opening a PR.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At _bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md, line 3:
<comment>Set story status to `completed` at PR creation instead of `review` to match the project’s execute-epic workflow.
(Based on your team's feedback about marking stories completed when opening a PR.) </comment>
<file context>
@@ -0,0 +1,128 @@
+# Story 26.1: Rust SDK Auto-Batching (linger_ms Timer)
+
+Status: review
+
+## Story
</file context>
| Status: review | |
| Status: completed |
- Handle result count mismatch in flush_batch: iterate items independently of results, sending explicit error for any items that don't get a server result (identified by cubic) - Fix batch_size test to send messages concurrently so batch-size flush is actually exercised (identified by cubic) - Use HashSet to verify message ID uniqueness in consume assertions (identified by cubic) - Add partial failure propagation test: one valid queue + one non-existent queue in the same batch (identified by cubic)
Replace BatchConfig with BatchMode enum: - Auto (default): opportunistic batching. Drains whatever messages are available in the channel and flushes without blocking the loop. Multiple RPCs in flight concurrently. At low load each message is sent individually; at high load messages naturally cluster into batches. Zero config, zero added latency, full concurrency. - Linger: explicit timer-based batching (preserved for users who want forced batching with configurable linger_ms/batch_size). - Disabled: no batching, each enqueue() is a separate RPC. connect() now uses Auto by default — all existing code gets smart batching without any changes.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="crates/fila-sdk/src/client.rs">
<violation number="1" location="crates/fila-sdk/src/client.rs:163">
P2: Stale intra-doc link: `with_batch_config` was renamed to `with_batch_mode` in this PR, but the `enqueue()` doc comment still references the old name. This will produce a broken rustdoc link.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| /// Default is [`BatchMode::Auto`] — Nagle-style adaptive batching. | ||
| /// Use [`BatchMode::Disabled`] to turn off batching entirely. | ||
| /// Use [`BatchMode::Linger`] for explicit timer-based batching. | ||
| pub fn with_batch_mode(mut self, mode: BatchMode) -> Self { |
There was a problem hiding this comment.
P2: Stale intra-doc link: with_batch_config was renamed to with_batch_mode in this PR, but the enqueue() doc comment still references the old name. This will produce a broken rustdoc link.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/fila-sdk/src/client.rs, line 163:
<comment>Stale intra-doc link: `with_batch_config` was renamed to `with_batch_mode` in this PR, but the `enqueue()` doc comment still references the old name. This will produce a broken rustdoc link.</comment>
<file context>
@@ -139,13 +155,13 @@ impl ConnectOptions {
+ /// Default is [`BatchMode::Auto`] — Nagle-style adaptive batching.
+ /// Use [`BatchMode::Disabled`] to turn off batching entirely.
+ /// Use [`BatchMode::Linger`] for explicit timer-based batching.
+ pub fn with_batch_mode(mut self, mode: BatchMode) -> Self {
+ self.batch_mode = mode;
self
</file context>
Reflects the shift from linger-based BatchConfig to opportunistic BatchMode (Auto/Linger/Disabled). All 5 external SDK stories updated to reference the same algorithm pattern established in the Rust SDK.
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md">
<violation number="1" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:33">
P3: The acceptance criterion overstates compatibility: replacing `BatchConfig` with `BatchMode` is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
|
||
| 9. **And** `BatchMode::Disabled` turns off batching — each `enqueue()` is a direct single-message RPC | ||
|
|
||
| 10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes |
There was a problem hiding this comment.
P3: The acceptance criterion overstates compatibility: replacing BatchConfig with BatchMode is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At _bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md, line 33:
<comment>The acceptance criterion overstates compatibility: replacing `BatchConfig` with `BatchMode` is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.</comment>
<file context>
@@ -1,108 +1,92 @@
+9. **And** `BatchMode::Disabled` turns off batching — each `enqueue()` is a direct single-message RPC
-10. **And** new integration tests verify: auto-batch flush on `batch_size` threshold, auto-batch flush on `linger_ms` timeout, partial failure propagation, disabled auto-batching uses single-message RPC
+10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes
-11. **And** all existing tests pass (zero regressions)
</file context>
| 10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes | |
| 10. **And** `connect()` uses `Auto` by default — existing code that does not opt into custom batching gets smart batching without changes |
Summary
fila-sdk): whenBatchConfigwithlinger_msis set viaConnectOptions,enqueue()transparently buffers messages and flushes viaBatchEnqueueRPCbatch_sizethreshold ORlinger_mstimeout (whichever first)Changes
crates/fila-sdk/Cargo.toml— addedtimefeature to tokiocrates/fila-sdk/src/client.rs— addedBatchItem,batcher_txfield,run_batcher(),flush_batch(), modifiedenqueue()routing, addedwith_batch_config()toConnectOptionscrates/fila-sdk/tests/integration.rs— 4 new integration tests: batch_size flush, linger timeout flush, disabled path, explicit+auto coexistenceTest plan
auto_batch_flush_on_batch_size— enqueue exactly batch_size messages, verify immediate flushauto_batch_flush_on_linger_timeout— enqueue 1 message, verify timer-based flush within linger_msauto_batch_disabled_uses_single_message_rpc— verify no delay without auto-batchingexplicit_batch_enqueue_works_with_auto_batching— verify manual batch_enqueue() works alongside auto-batching🤖 Generated with Claude Code
Summary by cubic
Adds Nagle-style auto-batching to the Rust
fila-sdk.enqueue()now uses a newBatchModewith defaultAutoto send immediately when idle and batch under load;Lingerkeeps timer-based batching, andDisabledpreserves one-by-one sends.New Features
BatchMode::{Auto { max_batch_size }, Linger { linger_ms, batch_size }, Disabled}viaConnectOptions::with_batch_mode(...);connect()defaults toAuto.Auto: drains queued messages and spawns concurrent flushes; caps batch size; single-item usesEnqueue, multi-item usesBatchEnqueue; per-message results via oneshot; auth header is attached on batched paths.enqueue()routes through a background batcher when enabled;consume()leader-redirect reconnects with batching disabled; new tests cover idle/immediate send, under-load batching, batch-size and linger flush, disabled mode, explicit+auto coexistence, and partial failure.Bug Fixes
Written for commit 488b9da. Summary will update on new commits.
Benchmark Results (vs main baseline)
Baseline commit:
8d9e880PR commit:3d58eccThreshold: 10%Summary: 2 regressed, 1 improved, 46 unchanged