feat: 26.1 - rust sdk auto-batching (linger_ms timer) by vieiralucas · Pull Request #102 · faiscadev/fila

vieiralucas · 2026-03-24T12:13:46Z

Summary

Implements auto-batching in the Rust SDK (fila-sdk): when BatchConfig with linger_ms is set via ConnectOptions, enqueue() transparently buffers messages and flushes via BatchEnqueue RPC
Background batcher task flushes on batch_size threshold OR linger_ms timeout (whichever first)
Per-message result propagation via oneshot channels — partial failures fan individual results to each caller
Graceful shutdown: batcher flushes remaining messages when all client clones are dropped
Zero behavior change when auto-batching is disabled (default)

Changes

crates/fila-sdk/Cargo.toml — added time feature to tokio
crates/fila-sdk/src/client.rs — added BatchItem, batcher_tx field, run_batcher(), flush_batch(), modified enqueue() routing, added with_batch_config() to ConnectOptions
crates/fila-sdk/tests/integration.rs — 4 new integration tests: batch_size flush, linger timeout flush, disabled path, explicit+auto coexistence

Test plan

auto_batch_flush_on_batch_size — enqueue exactly batch_size messages, verify immediate flush
auto_batch_flush_on_linger_timeout — enqueue 1 message, verify timer-based flush within linger_ms
auto_batch_disabled_uses_single_message_rpc — verify no delay without auto-batching
explicit_batch_enqueue_works_with_auto_batching — verify manual batch_enqueue() works alongside auto-batching
All 3 existing SDK integration tests pass (zero regressions)
clippy clean, rustfmt clean

🤖 Generated with Claude Code

Summary by cubic

Adds Nagle-style auto-batching to the Rust fila-sdk. enqueue() now uses a new BatchMode with default Auto to send immediately when idle and batch under load; Linger keeps timer-based batching, and Disabled preserves one-by-one sends.

New Features
- Introduce BatchMode::{Auto { max_batch_size }, Linger { linger_ms, batch_size }, Disabled} via ConnectOptions::with_batch_mode(...); connect() defaults to Auto.
- Auto: drains queued messages and spawns concurrent flushes; caps batch size; single-item uses Enqueue, multi-item uses BatchEnqueue; per-message results via oneshot; auth header is attached on batched paths.
- enqueue() routes through a background batcher when enabled; consume() leader-redirect reconnects with batching disabled; new tests cover idle/immediate send, under-load batching, batch-size and linger flush, disabled mode, explicit+auto coexistence, and partial failure.
Bug Fixes
- Batch flush handles server result count mismatches and maps per-message errors back to the right callers.

^{Written for commit 488b9da. Summary will update on new commits.}

Benchmark Results (vs main baseline)

Baseline commit: 8d9e880 PR commit: 3d58ecc Threshold: 10%

Benchmark	Baseline	Current	Change	Unit
compaction_active_enqueue_max	41.44	41.41	-0.1%	ms
compaction_active_enqueue_p50	0.70	0.70	+0.0%	ms
compaction_active_enqueue_p95	0.77	0.78	+1.7%	ms
compaction_active_enqueue_p99	0.82	0.88	+7.1%	ms
compaction_active_enqueue_p99_9	1.23	1.61	+31.8%	ms	🔴
compaction_active_enqueue_p99_99	41.22	41.22	+0.0%	ms
compaction_idle_enqueue_max	41.34	41.47	+0.3%	ms
compaction_idle_enqueue_p50	0.36	0.36	+1.1%	ms
compaction_idle_enqueue_p95	0.42	0.44	+4.5%	ms
compaction_idle_enqueue_p99	0.46	0.50	+8.7%	ms
compaction_idle_enqueue_p99_9	0.82	0.86	+4.8%	ms
compaction_idle_enqueue_p99_99	41.22	41.28	+0.2%	ms
compaction_p99_delta	0.36	0.37	+2.5%	ms
consumer_concurrency_100_throughput	1782.33	1739.00	-2.4%	msg/s
consumer_concurrency_10_throughput	1245.67	1246.33	+0.1%	msg/s
consumer_concurrency_1_throughput	73.33	72.67	-0.9%	msg/s
e2e_latency_light_max	42.49	42.34	-0.4%	ms
e2e_latency_light_p50	40.64	41.31	+1.7%	ms
e2e_latency_light_p95	41.53	41.50	-0.1%	ms
e2e_latency_light_p99	41.57	41.57	+0.0%	ms
e2e_latency_light_p99_9	41.60	41.63	+0.1%	ms
e2e_latency_light_p99_99	42.49	42.34	-0.4%	ms
enqueue_throughput_1kb	2701.37	2663.99	-1.4%	msg/s
enqueue_throughput_1kb_mbps	2.64	2.60	-1.4%	MB/s
equal_weight_fairness_jains_index	1.00	1.00	+0.0%	index
equal_weight_fairness_max_deviation	0.00	0.00	n/a	% deviation
equal_weight_fairness_tenant-1	0.00	0.00	n/a	% deviation
equal_weight_fairness_tenant-2	0.00	0.00	n/a	% deviation
equal_weight_fairness_tenant-3	0.00	0.00	n/a	% deviation
equal_weight_fairness_tenant-4	0.00	0.00	n/a	% deviation
equal_weight_fairness_tenant-5	0.00	0.00	n/a	% deviation
fairness_accuracy_jains_index	1.00	1.00	+0.0%	index
fairness_accuracy_max_deviation	0.20	0.20	+0.0%	% deviation
fairness_accuracy_tenant-1	0.20	0.20	+0.0%	% deviation
fairness_accuracy_tenant-2	0.20	0.20	+0.0%	% deviation
fairness_accuracy_tenant-3	0.10	0.10	+0.0%	% deviation
fairness_accuracy_tenant-4	0.10	0.10	+0.0%	% deviation
fairness_accuracy_tenant-5	0.10	0.10	+0.0%	% deviation
fairness_overhead_fair_throughput	1123.25	1113.44	-0.9%	msg/s
fairness_overhead_fifo_throughput	1147.88	1139.80	-0.7%	msg/s
fairness_overhead_pct	1.11	2.31	+109.2%	%	🔴
key_cardinality_10_throughput	1310.59	1304.43	-0.5%	msg/s
key_cardinality_10k_throughput	509.16	494.85	-2.8%	msg/s
key_cardinality_1k_throughput	792.96	778.07	-1.9%	msg/s
lua_on_enqueue_overhead_us	26.59	11.96	-55.0%	us	🟢
lua_throughput_with_hook	922.83	912.79	-1.1%	msg/s
memory_per_message_overhead	2932.74	2931.51	-0.0%	bytes/msg
memory_rss_idle	335.71	331.47	-1.3%	MB
memory_rss_loaded_10k	363.68	359.68	-1.1%	MB

Summary: 2 regressed, 1 improved, 46 unchanged

⚠️ Performance regression detected — 2 metric(s) exceeded the 10% threshold

- Combined retro covering performance optimization pipeline (Epics 22-24) - Add Epic 26 (SDK Batch Operations & Auto-Batching) to epics and sprint-status - Add Epic 27 (Profiling Infrastructure) to epics and sprint-status - Trim CLAUDE.md: remove stale sections (Future Phases, Raft backward compat) - Add profile-first rule to CLAUDE.md - Relocate Raft backward compat rule to code comment on ClusterRequest

When BatchConfig with linger_ms is set via ConnectOptions, enqueue() buffers messages and flushes via BatchEnqueue RPC when either batch_size messages accumulate or linger_ms milliseconds elapse. Partial failures propagate individual results to each caller. When auto-batching is disabled (default), enqueue() uses the existing single-message RPC with zero behavior change.

cubic-dev-ai

6 issues found across 12 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/fila-sdk/src/client.rs">

<violation number="1" location="crates/fila-sdk/src/client.rs:667">
P2: `zip` silently drops trailing `BatchItem`s when the server returns fewer results than messages sent. Callers whose items have no matching result will see a confusing "auto-batcher dropped result channel" error. Check the length and handle mismatches explicitly.</violation>
</file>

<file name="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md">

<violation number="1" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:3">
P3: Set story status to `completed` at PR creation instead of `review` to match the project’s execute-epic workflow.

(Based on your team's feedback about marking stories completed when opening a PR.) [FEEDBACK_USED]</violation>

<violation number="2" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:60">
P2: The test checklist is inconsistent with AC #10: it claims partial-failure propagation is verified, but no partial-failure test is listed. Add that test entry (or adjust AC wording) so completion status is accurate.</violation>
</file>

<file name="crates/fila-sdk/tests/integration.rs">

<violation number="1" location="crates/fila-sdk/tests/integration.rs:259">
P2: This test sends enqueues serially, so it may never exercise batch-size-triggered flushing. Send the enqueues concurrently so multiple messages are buffered before awaiting results.</violation>

<violation number="2" location="crates/fila-sdk/tests/integration.rs:284">
P2: `contains` only checks membership and can miss duplicate/missing deliveries. Remove matched IDs as you consume messages to enforce uniqueness.</violation>
</file>

<file name="_bmad-output/implementation-artifacts/epic-execution-state.yaml">

<violation number="1" location="_bmad-output/implementation-artifacts/epic-execution-state.yaml:7">
P2: Set this story status to `completed` at PR creation time; leaving it as `in-progress` breaks the repository’s execute-epic state convention.

(Based on your team's feedback about setting story status to completed when a PR is opened.) [FEEDBACK_USED]</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-24T12:17:20Z

@@ -0,0 +1,128 @@
+# Story 26.1: Rust SDK Auto-Batching (linger_ms Timer)
+
+Status: review


P3: Set story status to completed at PR creation instead of review to match the project’s execute-epic workflow.

(Based on your team's feedback about marking stories completed when opening a PR.)

View Feedback

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At _bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md, line 3: <comment>Set story status to `completed` at PR creation instead of `review` to match the project’s execute-epic workflow. (Based on your team's feedback about marking stories completed when opening a PR.) </comment> <file context> @@ -0,0 +1,128 @@ +# Story 26.1: Rust SDK Auto-Batching (linger_ms Timer) + +Status: review + +## Story </file context>

Suggested change

Status: review

Status: completed

- Handle result count mismatch in flush_batch: iterate items independently of results, sending explicit error for any items that don't get a server result (identified by cubic) - Fix batch_size test to send messages concurrently so batch-size flush is actually exercised (identified by cubic) - Use HashSet to verify message ID uniqueness in consume assertions (identified by cubic) - Add partial failure propagation test: one valid queue + one non-existent queue in the same batch (identified by cubic)

Replace BatchConfig with BatchMode enum: - Auto (default): opportunistic batching. Drains whatever messages are available in the channel and flushes without blocking the loop. Multiple RPCs in flight concurrently. At low load each message is sent individually; at high load messages naturally cluster into batches. Zero config, zero added latency, full concurrency. - Linger: explicit timer-based batching (preserved for users who want forced batching with configurable linger_ms/batch_size). - Disabled: no batching, each enqueue() is a separate RPC. connect() now uses Auto by default — all existing code gets smart batching without any changes.

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/fila-sdk/src/client.rs">

<violation number="1" location="crates/fila-sdk/src/client.rs:163">
P2: Stale intra-doc link: `with_batch_config` was renamed to `with_batch_mode` in this PR, but the `enqueue()` doc comment still references the old name. This will produce a broken rustdoc link.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-24T12:53:53Z

+    /// Default is [`BatchMode::Auto`] — Nagle-style adaptive batching.
+    /// Use [`BatchMode::Disabled`] to turn off batching entirely.
+    /// Use [`BatchMode::Linger`] for explicit timer-based batching.
+    pub fn with_batch_mode(mut self, mode: BatchMode) -> Self {


P2: Stale intra-doc link: with_batch_config was renamed to with_batch_mode in this PR, but the enqueue() doc comment still references the old name. This will produce a broken rustdoc link.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At crates/fila-sdk/src/client.rs, line 163: <comment>Stale intra-doc link: `with_batch_config` was renamed to `with_batch_mode` in this PR, but the `enqueue()` doc comment still references the old name. This will produce a broken rustdoc link.</comment> <file context> @@ -139,13 +155,13 @@ impl ConnectOptions { + /// Default is [`BatchMode::Auto`] — Nagle-style adaptive batching. + /// Use [`BatchMode::Disabled`] to turn off batching entirely. + /// Use [`BatchMode::Linger`] for explicit timer-based batching. + pub fn with_batch_mode(mut self, mode: BatchMode) -> Self { + self.batch_mode = mode; self </file context>

Reflects the shift from linger-based BatchConfig to opportunistic BatchMode (Auto/Linger/Disabled). All 5 external SDK stories updated to reference the same algorithm pattern established in the Rust SDK.

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md">

<violation number="1" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:33">
P3: The acceptance criterion overstates compatibility: replacing `BatchConfig` with `BatchMode` is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-24T12:56:31Z

+
+9. **And** `BatchMode::Disabled` turns off batching — each `enqueue()` is a direct single-message RPC
+
+10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes


P3: The acceptance criterion overstates compatibility: replacing BatchConfig with BatchMode is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At _bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md, line 33: <comment>The acceptance criterion overstates compatibility: replacing `BatchConfig` with `BatchMode` is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.</comment> <file context> @@ -1,108 +1,92 @@ +9. **And** `BatchMode::Disabled` turns off batching — each `enqueue()` is a direct single-message RPC -10. **And** new integration tests verify: auto-batch flush on `batch_size` threshold, auto-batch flush on `linger_ms` timeout, partial failure propagation, disabled auto-batching uses single-message RPC +10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes -11. **And** all existing tests pass (zero regressions) </file context>

Suggested change

10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes

10. **And** `connect()` uses `Auto` by default — existing code that does not opt into custom batching gets smart batching without changes

vieiralucas added 6 commits March 24, 2026 07:58

chore: initialize epic 26 and story 26.1 tracking

7b4c712

refactor: remove redundant batch_config assignment in leader reconnect

543f126

chore: update story 26.1 tracking to review status

66ec497

fix: apply clippy and rustfmt fixes

1ed39bf

cubic-dev-ai Bot reviewed Mar 24, 2026

View reviewed changes

vieiralucas added 2 commits March 24, 2026 09:36

cubic-dev-ai Bot reviewed Mar 24, 2026

View reviewed changes

docs: update story 26.1 and epic 26 stories for BatchMode redesign

324131b

Reflects the shift from linger-based BatchConfig to opportunistic BatchMode (Auto/Linger/Disabled). All 5 external SDK stories updated to reference the same algorithm pattern established in the Rust SDK.

cubic-dev-ai Bot reviewed Mar 24, 2026

View reviewed changes

vieiralucas added 2 commits March 24, 2026 10:12

chore: mark story 26.1 complete, stories 26.2-26.6 in-progress

993f564

chore: mark all epic 26 stories complete, PRs open for review

488b9da

vieiralucas merged commit 0cc9b40 into main Mar 24, 2026
7 of 8 checks passed

vieiralucas deleted the feat/26.1-rust-sdk-auto-batching branch March 24, 2026 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 26.1 - rust sdk auto-batching (linger_ms timer)#102

feat: 26.1 - rust sdk auto-batching (linger_ms timer)#102
vieiralucas merged 11 commits intomainfrom
feat/26.1-rust-sdk-auto-batching

vieiralucas commented Mar 24, 2026 •

edited by github-actions Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot Mar 24, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Mar 24, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,128 @@
		# Story 26.1: Rust SDK Auto-Batching (linger_ms Timer)

		Status: review


		9. And `BatchMode::Disabled` turns off batching — each `enqueue()` is a direct single-message RPC

		10. And `connect()` uses `Auto` by default — existing code gets smart batching without changes

Conversation

vieiralucas commented Mar 24, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by cubic

Benchmark Results (vs main baseline)

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vieiralucas commented Mar 24, 2026 •

edited by github-actions Bot

Loading

cubic-dev-ai Bot Mar 24, 2026 •

edited

Loading

cubic-dev-ai Bot Mar 24, 2026 •

edited

Loading

cubic-dev-ai Bot Mar 24, 2026 •

edited

Loading