Skip to content

feat: 26.1 - rust sdk auto-batching (linger_ms timer)#102

Merged
vieiralucas merged 11 commits intomainfrom
feat/26.1-rust-sdk-auto-batching
Mar 24, 2026
Merged

feat: 26.1 - rust sdk auto-batching (linger_ms timer)#102
vieiralucas merged 11 commits intomainfrom
feat/26.1-rust-sdk-auto-batching

Conversation

@vieiralucas
Copy link
Copy Markdown
Member

@vieiralucas vieiralucas commented Mar 24, 2026

Summary

  • Implements auto-batching in the Rust SDK (fila-sdk): when BatchConfig with linger_ms is set via ConnectOptions, enqueue() transparently buffers messages and flushes via BatchEnqueue RPC
  • Background batcher task flushes on batch_size threshold OR linger_ms timeout (whichever first)
  • Per-message result propagation via oneshot channels — partial failures fan individual results to each caller
  • Graceful shutdown: batcher flushes remaining messages when all client clones are dropped
  • Zero behavior change when auto-batching is disabled (default)

Changes

  • crates/fila-sdk/Cargo.toml — added time feature to tokio
  • crates/fila-sdk/src/client.rs — added BatchItem, batcher_tx field, run_batcher(), flush_batch(), modified enqueue() routing, added with_batch_config() to ConnectOptions
  • crates/fila-sdk/tests/integration.rs — 4 new integration tests: batch_size flush, linger timeout flush, disabled path, explicit+auto coexistence

Test plan

  • auto_batch_flush_on_batch_size — enqueue exactly batch_size messages, verify immediate flush
  • auto_batch_flush_on_linger_timeout — enqueue 1 message, verify timer-based flush within linger_ms
  • auto_batch_disabled_uses_single_message_rpc — verify no delay without auto-batching
  • explicit_batch_enqueue_works_with_auto_batching — verify manual batch_enqueue() works alongside auto-batching
  • All 3 existing SDK integration tests pass (zero regressions)
  • clippy clean, rustfmt clean

🤖 Generated with Claude Code


Summary by cubic

Adds Nagle-style auto-batching to the Rust fila-sdk. enqueue() now uses a new BatchMode with default Auto to send immediately when idle and batch under load; Linger keeps timer-based batching, and Disabled preserves one-by-one sends.

  • New Features

    • Introduce BatchMode::{Auto { max_batch_size }, Linger { linger_ms, batch_size }, Disabled} via ConnectOptions::with_batch_mode(...); connect() defaults to Auto.
    • Auto: drains queued messages and spawns concurrent flushes; caps batch size; single-item uses Enqueue, multi-item uses BatchEnqueue; per-message results via oneshot; auth header is attached on batched paths.
    • enqueue() routes through a background batcher when enabled; consume() leader-redirect reconnects with batching disabled; new tests cover idle/immediate send, under-load batching, batch-size and linger flush, disabled mode, explicit+auto coexistence, and partial failure.
  • Bug Fixes

    • Batch flush handles server result count mismatches and maps per-message errors back to the right callers.

Written for commit 488b9da. Summary will update on new commits.

Benchmark Results (vs main baseline)

Baseline commit: 8d9e880 PR commit: 3d58ecc Threshold: 10%

Benchmark Baseline Current Change Unit
compaction_active_enqueue_max 41.44 41.41 -0.1% ms
compaction_active_enqueue_p50 0.70 0.70 +0.0% ms
compaction_active_enqueue_p95 0.77 0.78 +1.7% ms
compaction_active_enqueue_p99 0.82 0.88 +7.1% ms
compaction_active_enqueue_p99_9 1.23 1.61 +31.8% ms 🔴
compaction_active_enqueue_p99_99 41.22 41.22 +0.0% ms
compaction_idle_enqueue_max 41.34 41.47 +0.3% ms
compaction_idle_enqueue_p50 0.36 0.36 +1.1% ms
compaction_idle_enqueue_p95 0.42 0.44 +4.5% ms
compaction_idle_enqueue_p99 0.46 0.50 +8.7% ms
compaction_idle_enqueue_p99_9 0.82 0.86 +4.8% ms
compaction_idle_enqueue_p99_99 41.22 41.28 +0.2% ms
compaction_p99_delta 0.36 0.37 +2.5% ms
consumer_concurrency_100_throughput 1782.33 1739.00 -2.4% msg/s
consumer_concurrency_10_throughput 1245.67 1246.33 +0.1% msg/s
consumer_concurrency_1_throughput 73.33 72.67 -0.9% msg/s
e2e_latency_light_max 42.49 42.34 -0.4% ms
e2e_latency_light_p50 40.64 41.31 +1.7% ms
e2e_latency_light_p95 41.53 41.50 -0.1% ms
e2e_latency_light_p99 41.57 41.57 +0.0% ms
e2e_latency_light_p99_9 41.60 41.63 +0.1% ms
e2e_latency_light_p99_99 42.49 42.34 -0.4% ms
enqueue_throughput_1kb 2701.37 2663.99 -1.4% msg/s
enqueue_throughput_1kb_mbps 2.64 2.60 -1.4% MB/s
equal_weight_fairness_jains_index 1.00 1.00 +0.0% index
equal_weight_fairness_max_deviation 0.00 0.00 n/a % deviation
equal_weight_fairness_tenant-1 0.00 0.00 n/a % deviation
equal_weight_fairness_tenant-2 0.00 0.00 n/a % deviation
equal_weight_fairness_tenant-3 0.00 0.00 n/a % deviation
equal_weight_fairness_tenant-4 0.00 0.00 n/a % deviation
equal_weight_fairness_tenant-5 0.00 0.00 n/a % deviation
fairness_accuracy_jains_index 1.00 1.00 +0.0% index
fairness_accuracy_max_deviation 0.20 0.20 +0.0% % deviation
fairness_accuracy_tenant-1 0.20 0.20 +0.0% % deviation
fairness_accuracy_tenant-2 0.20 0.20 +0.0% % deviation
fairness_accuracy_tenant-3 0.10 0.10 +0.0% % deviation
fairness_accuracy_tenant-4 0.10 0.10 +0.0% % deviation
fairness_accuracy_tenant-5 0.10 0.10 +0.0% % deviation
fairness_overhead_fair_throughput 1123.25 1113.44 -0.9% msg/s
fairness_overhead_fifo_throughput 1147.88 1139.80 -0.7% msg/s
fairness_overhead_pct 1.11 2.31 +109.2% % 🔴
key_cardinality_10_throughput 1310.59 1304.43 -0.5% msg/s
key_cardinality_10k_throughput 509.16 494.85 -2.8% msg/s
key_cardinality_1k_throughput 792.96 778.07 -1.9% msg/s
lua_on_enqueue_overhead_us 26.59 11.96 -55.0% us 🟢
lua_throughput_with_hook 922.83 912.79 -1.1% msg/s
memory_per_message_overhead 2932.74 2931.51 -0.0% bytes/msg
memory_rss_idle 335.71 331.47 -1.3% MB
memory_rss_loaded_10k 363.68 359.68 -1.1% MB

Summary: 2 regressed, 1 improved, 46 unchanged

⚠️ Performance regression detected — 2 metric(s) exceeded the 10% threshold

- Combined retro covering performance optimization pipeline (Epics 22-24)
- Add Epic 26 (SDK Batch Operations & Auto-Batching) to epics and sprint-status
- Add Epic 27 (Profiling Infrastructure) to epics and sprint-status
- Trim CLAUDE.md: remove stale sections (Future Phases, Raft backward compat)
- Add profile-first rule to CLAUDE.md
- Relocate Raft backward compat rule to code comment on ClusterRequest
When BatchConfig with linger_ms is set via ConnectOptions, enqueue()
buffers messages and flushes via BatchEnqueue RPC when either batch_size
messages accumulate or linger_ms milliseconds elapse. Partial failures
propagate individual results to each caller. When auto-batching is
disabled (default), enqueue() uses the existing single-message RPC
with zero behavior change.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 12 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/fila-sdk/src/client.rs">

<violation number="1" location="crates/fila-sdk/src/client.rs:667">
P2: `zip` silently drops trailing `BatchItem`s when the server returns fewer results than messages sent. Callers whose items have no matching result will see a confusing "auto-batcher dropped result channel" error. Check the length and handle mismatches explicitly.</violation>
</file>

<file name="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md">

<violation number="1" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:3">
P3: Set story status to `completed` at PR creation instead of `review` to match the project’s execute-epic workflow.

(Based on your team's feedback about marking stories completed when opening a PR.) [FEEDBACK_USED]</violation>

<violation number="2" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:60">
P2: The test checklist is inconsistent with AC #10: it claims partial-failure propagation is verified, but no partial-failure test is listed. Add that test entry (or adjust AC wording) so completion status is accurate.</violation>
</file>

<file name="crates/fila-sdk/tests/integration.rs">

<violation number="1" location="crates/fila-sdk/tests/integration.rs:259">
P2: This test sends enqueues serially, so it may never exercise batch-size-triggered flushing. Send the enqueues concurrently so multiple messages are buffered before awaiting results.</violation>

<violation number="2" location="crates/fila-sdk/tests/integration.rs:284">
P2: `contains` only checks membership and can miss duplicate/missing deliveries. Remove matched IDs as you consume messages to enforce uniqueness.</violation>
</file>

<file name="_bmad-output/implementation-artifacts/epic-execution-state.yaml">

<violation number="1" location="_bmad-output/implementation-artifacts/epic-execution-state.yaml:7">
P2: Set this story status to `completed` at PR creation time; leaving it as `in-progress` breaks the repository’s execute-epic state convention.

(Based on your team's feedback about setting story status to completed when a PR is opened.) [FEEDBACK_USED]</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread crates/fila-sdk/src/client.rs Outdated
Comment thread _bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md Outdated
Comment thread crates/fila-sdk/tests/integration.rs
Comment thread crates/fila-sdk/tests/integration.rs
Comment thread _bmad-output/implementation-artifacts/epic-execution-state.yaml Outdated
@@ -0,0 +1,128 @@
# Story 26.1: Rust SDK Auto-Batching (linger_ms Timer)

Status: review
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Set story status to completed at PR creation instead of review to match the project’s execute-epic workflow.

(Based on your team's feedback about marking stories completed when opening a PR.)

View Feedback

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At _bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md, line 3:

<comment>Set story status to `completed` at PR creation instead of `review` to match the project’s execute-epic workflow.

(Based on your team's feedback about marking stories completed when opening a PR.) </comment>

<file context>
@@ -0,0 +1,128 @@
+# Story 26.1: Rust SDK Auto-Batching (linger_ms Timer)
+
+Status: review
+
+## Story
</file context>
Suggested change
Status: review
Status: completed
Fix with Cubic

- Handle result count mismatch in flush_batch: iterate items
  independently of results, sending explicit error for any items
  that don't get a server result (identified by cubic)
- Fix batch_size test to send messages concurrently so batch-size
  flush is actually exercised (identified by cubic)
- Use HashSet to verify message ID uniqueness in consume assertions
  (identified by cubic)
- Add partial failure propagation test: one valid queue + one
  non-existent queue in the same batch (identified by cubic)
Replace BatchConfig with BatchMode enum:
- Auto (default): opportunistic batching. Drains whatever messages are
  available in the channel and flushes without blocking the loop.
  Multiple RPCs in flight concurrently. At low load each message is
  sent individually; at high load messages naturally cluster into
  batches. Zero config, zero added latency, full concurrency.
- Linger: explicit timer-based batching (preserved for users who want
  forced batching with configurable linger_ms/batch_size).
- Disabled: no batching, each enqueue() is a separate RPC.

connect() now uses Auto by default — all existing code gets smart
batching without any changes.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/fila-sdk/src/client.rs">

<violation number="1" location="crates/fila-sdk/src/client.rs:163">
P2: Stale intra-doc link: `with_batch_config` was renamed to `with_batch_mode` in this PR, but the `enqueue()` doc comment still references the old name. This will produce a broken rustdoc link.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

/// Default is [`BatchMode::Auto`] — Nagle-style adaptive batching.
/// Use [`BatchMode::Disabled`] to turn off batching entirely.
/// Use [`BatchMode::Linger`] for explicit timer-based batching.
pub fn with_batch_mode(mut self, mode: BatchMode) -> Self {
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Stale intra-doc link: with_batch_config was renamed to with_batch_mode in this PR, but the enqueue() doc comment still references the old name. This will produce a broken rustdoc link.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/fila-sdk/src/client.rs, line 163:

<comment>Stale intra-doc link: `with_batch_config` was renamed to `with_batch_mode` in this PR, but the `enqueue()` doc comment still references the old name. This will produce a broken rustdoc link.</comment>

<file context>
@@ -139,13 +155,13 @@ impl ConnectOptions {
+    /// Default is [`BatchMode::Auto`] — Nagle-style adaptive batching.
+    /// Use [`BatchMode::Disabled`] to turn off batching entirely.
+    /// Use [`BatchMode::Linger`] for explicit timer-based batching.
+    pub fn with_batch_mode(mut self, mode: BatchMode) -> Self {
+        self.batch_mode = mode;
         self
</file context>
Fix with Cubic

Reflects the shift from linger-based BatchConfig to opportunistic
BatchMode (Auto/Linger/Disabled). All 5 external SDK stories updated
to reference the same algorithm pattern established in the Rust SDK.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md">

<violation number="1" location="_bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md:33">
P3: The acceptance criterion overstates compatibility: replacing `BatchConfig` with `BatchMode` is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


9. **And** `BatchMode::Disabled` turns off batching — each `enqueue()` is a direct single-message RPC

10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: The acceptance criterion overstates compatibility: replacing BatchConfig with BatchMode is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At _bmad-output/implementation-artifacts/stories/26-1-rust-sdk-auto-batching.md, line 33:

<comment>The acceptance criterion overstates compatibility: replacing `BatchConfig` with `BatchMode` is a documented breaking API change, so "existing code ... without changes" is inaccurate as written.</comment>

<file context>
@@ -1,108 +1,92 @@
+9. **And** `BatchMode::Disabled` turns off batching — each `enqueue()` is a direct single-message RPC
 
-10. **And** new integration tests verify: auto-batch flush on `batch_size` threshold, auto-batch flush on `linger_ms` timeout, partial failure propagation, disabled auto-batching uses single-message RPC
+10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes
 
-11. **And** all existing tests pass (zero regressions)
</file context>
Suggested change
10. **And** `connect()` uses `Auto` by default — existing code gets smart batching without changes
10. **And** `connect()` uses `Auto` by default — existing code that does not opt into custom batching gets smart batching without changes
Fix with Cubic

@vieiralucas vieiralucas merged commit 0cc9b40 into main Mar 24, 2026
7 of 8 checks passed
@vieiralucas vieiralucas deleted the feat/26.1-rust-sdk-auto-batching branch March 24, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant