Skip to content

Externalize size cache: remove __buffa_cached_size from generated structs#22

Merged
iainmcgin merged 5 commits intomainfrom
size-cache-external
Apr 27, 2026
Merged

Externalize size cache: remove __buffa_cached_size from generated structs#22
iainmcgin merged 5 commits intomainfrom
size-cache-external

Conversation

@iainmcgin
Copy link
Copy Markdown
Collaborator

@iainmcgin iainmcgin commented Mar 30, 2026

Closes #14.

Summary

Removes the in-struct __buffa_cached_size: CachedSize (AtomicU32) field from all generated owned and view types by threading an external SizeCache through the two-pass encode protocol.

Before

fn compute_size(&self) -> u32;       // stores into self.__buffa_cached_size via interior mutability
fn write_to(&self, buf);             // reads self.__buffa_cached_size
fn cached_size(&self) -> u32;

After

fn compute_size(&self, cache: &mut SizeCache) -> u32;   // pre-order reserve/set
fn write_to(&self, cache: &mut SizeCache, buf);         // pre-order consume
fn encoded_len(&self) -> u32;                            // provided: compute_size(&mut SizeCache::new())

encode(), encode_length_delimited(), encode_to_vec(), encode_to_bytes() construct a SizeCache internally — no caller-visible change for the common path. A new encode_with_cache(&self, &mut SizeCache, buf) provided method lets hot loops reuse a single cache (and its spill capacity) across many encodes.

Why

  • Generated structs contain only proto fields (+ __buffa_unknown_fields). Exhaustive struct destructuring works; struct literals don't need ..Default::default() for the cache.
  • No interior mutability. Send/Sync are structural instead of via AtomicU32 + Relaxed. &Message is genuinely read-only.
  • Smaller structs. 4 bytes (+ padding) per message and per nested sub-message gone.

SizeCache

buffa::SizeCache is a pre-order slot table: compute_size calls reserve() before recursing into each LEN-delimited sub-message and set() after; write_to calls consume_next() in the same pre-order. Groups (tag-delimited) thread the cache without reserving. Map compute and write iterate for (k, v) in &map identically, so slot order is correct by construction (no reliance on .values()/.iter() ordering agreement).

Storage is [u32; 16] inline + Vec<u32> spill, so SizeCache::new() is allocation-free for messages with ≤16 nested LEN sub-messages — which covers every benchmark dataset except analytics_event.

Performance

vs main (criterion, this host's noise floor ~±10% per the JSON-encode control):

dataset encode encode_view build+encode build+encode_view
api_response −7.5% +2.7% −1.1% −0.2%
log_record −1.0% +0.2% −4.1% −3.7%
analytics_event¹ +7.8% +7.6% +1.3% −1.0%
google_message1 −4.7% −8.5% −4.9% −4.0%
media_frame 0.0% 0.0% −7.4% −2.6%

¹ spills past the 16-slot inline cache; build+encode (the realistic construct→encode→drop path) is within noise.

The earlier Vec-only SizeCache regressed small-message encode by 16–28% (per-call heap allocation); the inline-storage hybrid recovers that to parity-or-better. compute_size itself with a reused cache is at parity.

Scope

Applies to both Message and ViewEncode — view structs also lose __buffa_cached_size (reversing the field addition from #55, which had not yet shipped in a release). MessageField/MessageFieldView forwarders, the map-write path (no longer recomputes value sizes during write), clear() codegen, and all hand-written test impls updated.

Breaking change

Yes — compute_size/write_to signatures change and cached_size() is removed on both traits. Targeted for v0.4.0 alongside the __buffa:: namespacing and DefaultInstance/DefaultViewInstance safety changes.

Migration

For typical users (calling encode*()): no change.

For hand-written Message/ViewEncode impls or callers of compute_size/write_to directly:

  • msg.compute_size()msg.encoded_len() (if you only want the size)
  • msg.compute_size(); msg.write_to(buf)msg.encode(buf) (or thread a SizeCache explicitly if reusing across calls)
  • Drop the __buffa_cached_size field and cached_size() method from hand-written impls.

Generated message and view types no longer carry a __buffa_cached_size
field. Nested-message sizes are recorded in an external pre-order
Vec<u32> SizeCache that compute_size populates and write_to consumes,
constructed and discarded inside the provided encode* methods.

- Message::compute_size/write_to and ViewEncode::compute_size/write_to
  now take &mut SizeCache; cached_size() is removed; encoded_len() added.
- Generated structs hold only proto fields plus __buffa_unknown_fields:
  no interior mutability, structurally Send + Sync, concurrent encode()
  of the same &msg is sound (each thread uses its own cache).
- Codegen reserves a slot before recursing into each LEN-delimited
  sub-message and consumes it in the same pre-order during write_to;
  groups thread the cache without reserving. Map message-values are
  handled per-phase, also fixing the previous map_write_to_stmt
  recomputation of v.compute_size() during write.
- CachedSize type and __private re-export deleted.

Closes #14.
…ew()

Backs the cache with a 16-slot uninitialized inline array plus a Vec
spill, so SizeCache::new() does no heap allocation and no array zeroing
— effectively free for the per-encode fresh-cache path. reserve() writes
a 0 placeholder per slot so next_size()'s assume_init is sound regardless
of caller discipline (no UB possible from safe code); set() asserts the
slot was reserved.

Recovers the per-message encode_to_vec() overhead introduced by moving
size state out of the struct: log_record encode +12% -> -1%,
google_message1 encode +20% -> -5% vs the in-struct AtomicU32 baseline.
build+encode (the realistic construct-then-serialize path) improves
1-7% across the suite from the smaller, atomic-free struct layout.
@iainmcgin iainmcgin force-pushed the size-cache-external branch from bb2dee3 to 81eee09 Compare April 27, 2026 20:12
@iainmcgin iainmcgin changed the title Externalize size cache to remove __buffa_cached_size struct field Externalize size cache: remove __buffa_cached_size from generated structs Apr 27, 2026
…Cache

- map_compute_size_stmt now iterates `for (k, v) in &self.field` in all
  non-constant arms, identical to map_write_to_stmt, so SizeCache slot
  order matches by construction rather than relying on .values()/.iter()
  walking the same table slots. The (true, true) constant-size arm keeps
  the len()*const fold (no slots reserved there).
- SizeCache.inline reverted to plain [u32; 16]: new() zero-inits,
  reserve() still writes the per-slot 0 placeholder so a reserve-without
  -set after clear() reads 0 deterministically, next_size() is a plain
  index. No unsafe.
- Tests: reserve_without_set_yields_zero,
  clear_then_reserve_without_set_yields_zero.
- L3: SizeCache::next_size -> consume_next; updated codegen, tests, docs.
- L4: Message::encode_with_cache / ViewEncode::encode_with_cache provided
  methods for hot-loop cache reuse (clear + compute_size + write_to).
- L5: SizeCache rustdoc notes the type is intentionally not Clone.
- L6: debug_assert on len < u32::MAX in reserve().
- L2: codegen names the cache parameter `__cache` only when the message
  has a sub-message/group field, oneof message variant, or message-typed
  map value; leaf messages get `_cache`. Removes the let _ = &__cache
  marker line. New helper message_uses_size_cache() encapsulates the
  check for both owned and view emission.
@iainmcgin iainmcgin requested a review from asacamano April 27, 2026 22:08
@iainmcgin iainmcgin marked this pull request as ready for review April 27, 2026 22:08
@iainmcgin iainmcgin merged commit c1fdbe1 into main Apr 27, 2026
7 checks passed
@iainmcgin iainmcgin deleted the size-cache-external branch April 27, 2026 22:22
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

__buffa_cached_size makes destructuring harder

2 participants