Skip to content

perf: make IggyMessagesBatch::last_offset O(1)#2840

Open
cijiugechu wants to merge 3 commits intoapache:masterfrom
cijiugechu:perf/last-offset-o1
Open

perf: make IggyMessagesBatch::last_offset O(1)#2840
cijiugechu wants to merge 3 commits intoapache:masterfrom
cijiugechu:perf/last-offset-o1

Conversation

@cijiugechu
Copy link

Which issue does this PR close?

None.

Rationale

Avoid unnecessary O(n) lookup in IggyMessagesBatch::last_offset() to improve throughput.

Bench (single run each; Apple M4, macOS 15.7.4; pinned-producer/TCP; 8 producers, 8 streams; 1000B msgs; 1000 msgs/batch; total data 5GB):

Results:

Throughput delta: ~88.75Mb/s

master:

Throughput - Pinned Producer Benchmark (master)

PR:

Throughput - Pinned Producer Benchmark (last_offset_o1)

What changed?

IggyMessagesBatch::last_offset() previously used iter().last(), which walk the whole batch and made the call O(n) even though the last element is known. In producer hot paths with large messages_per_batch, that extra scan is on the critical path.

The method now reads the last message via indexed get(count - 1) (O(1)), with an iterator fallback only if the indexed lookup fails.

Local Execution

  • Passed
  • Pre-commit hooks ran

AI Usage

None.

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.65%. Comparing base (29a27cc) to head (bdee7b9).

Files with missing lines Patch % Lines
core/common/src/types/message/messages_batch.rs 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2840      +/-   ##
============================================
- Coverage     67.69%   67.65%   -0.04%     
  Complexity      739      739              
============================================
  Files          1031     1031              
  Lines         83912    83918       +6     
  Branches      60706    60722      +16     
============================================
- Hits          56802    56778      -24     
- Misses        24760    24779      +19     
- Partials       2350     2361      +11     
Flag Coverage Δ
csharp 67.10% <ø> (-0.15%) ⬇️
go 6.84% <ø> (ø)
java 54.83% <ø> (ø)
node 92.26% <ø> (-0.15%) ⬇️
python 0.00% <ø> (ø)
rust 69.66% <71.42%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
core/common/src/types/message/messages_batch.rs 60.29% <71.42%> (+0.19%) ⬆️

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@numinnex
Copy link
Contributor

numinnex commented Mar 1, 2026

Hi, thanks for the contribution.

I am pretty sure a simpler solution to that problem would be to provide the Iterator with a size hint, this should give compiler enough information to generate code identical to yours. You can try doing that instead, here is example of that optimization being applied for an Vec<T>: https://godbolt.org/z/4erhYsxsG

The clue is those two lines

  mov rdi, qword ptr [rbx + 8]
  mov rbx, qword ptr [rdi + 8*rax - 8] ;Load the ptr + (len - 1)

@cijiugechu
Copy link
Author

cijiugechu commented Mar 2, 2026

Hi, thanks for the contribution.

I am pretty sure a simpler solution to that problem would be to provide the Iterator with a size hint, this should give compiler enough information to generate code identical to yours. You can try doing that instead, here is example of that optimization being applied for an Vec<T>: https://godbolt.org/z/4erhYsxsG

The clue is those two lines

  mov rdi, qword ptr [rbx + 8]
  mov rbx, qword ptr [rdi + 8*rax - 8] ;Load the ptr + (len - 1)

The last optimization for Vec here probably isn’t achieved via only size_hint magic. In std, slice’s Iterator::last is implemented by calling DoubleEndedIterator::next_back, and next_back reaches O(1) pointer access via the unsafe pre_dec_end helper. By contrast, our IggyMessageViewIterator only implements Iterator::next, so last falls back to the default fold-style loop, which prevents this optimization.

@cijiugechu
Copy link
Author

Maybe we should consider implementing more efficient Iterator methods to override the default behavior, so we don’t end up relying on std fallback implementation...

@numinnex
Copy link
Contributor

numinnex commented Mar 2, 2026

Maybe we should consider implementing more efficient Iterator methods to override the default behavior, so we don’t end up relying on std fallback implementation...

Right, so maybe a idea would be to implement DoubleEndedIterator on top of the Iterator trait, as by the docs DoubleEndedIterator: Iterator

@cijiugechu
Copy link
Author

Maybe we should consider implementing more efficient Iterator methods to override the default behavior, so we don’t end up relying on std fallback implementation...

Right, so maybe a idea would be to implement DoubleEndedIterator on top of the Iterator trait, as by the docs DoubleEndedIterator: Iterator

I gave it a try, and implementing DoubleEndedIterator would require caching intermediate state, which would be a fairly substantial change. Also, making last O(1) doesn’t inherently depend on DoubleEndedIterator—it’s just how std happens to implement it. Given our current optimization goals, implementing an O(1) last for Iterator only is probably the more reasonable option.

@cijiugechu
Copy link
Author

I got Iterator::last working, but the change is kind of big…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants