Externalize size cache: remove __buffa_cached_size from generated structs by iainmcgin · Pull Request #22 · anthropics/buffa

iainmcgin · 2026-03-30T20:08:40Z

Closes #14.

Summary

Removes the in-struct __buffa_cached_size: CachedSize (AtomicU32) field from all generated owned and view types by threading an external SizeCache through the two-pass encode protocol.

Before

fn compute_size(&self) -> u32;       // stores into self.__buffa_cached_size via interior mutability
fn write_to(&self, buf);             // reads self.__buffa_cached_size
fn cached_size(&self) -> u32;

After

fn compute_size(&self, cache: &mut SizeCache) -> u32;   // pre-order reserve/set
fn write_to(&self, cache: &mut SizeCache, buf);         // pre-order consume
fn encoded_len(&self) -> u32;                            // provided: compute_size(&mut SizeCache::new())

encode(), encode_length_delimited(), encode_to_vec(), encode_to_bytes() construct a SizeCache internally — no caller-visible change for the common path. A new encode_with_cache(&self, &mut SizeCache, buf) provided method lets hot loops reuse a single cache (and its spill capacity) across many encodes.

Why

Generated structs contain only proto fields (+ __buffa_unknown_fields). Exhaustive struct destructuring works; struct literals don't need ..Default::default() for the cache.
No interior mutability. Send/Sync are structural instead of via AtomicU32 + Relaxed. &Message is genuinely read-only.
Smaller structs. 4 bytes (+ padding) per message and per nested sub-message gone.

SizeCache

buffa::SizeCache is a pre-order slot table: compute_size calls reserve() before recursing into each LEN-delimited sub-message and set() after; write_to calls consume_next() in the same pre-order. Groups (tag-delimited) thread the cache without reserving. Map compute and write iterate for (k, v) in &map identically, so slot order is correct by construction (no reliance on .values()/.iter() ordering agreement).

Storage is [u32; 16] inline + Vec<u32> spill, so SizeCache::new() is allocation-free for messages with ≤16 nested LEN sub-messages — which covers every benchmark dataset except analytics_event.

Performance

vs main (criterion, this host's noise floor ~±10% per the JSON-encode control):

dataset	encode	encode_view	build+encode	build+encode_view
api_response	−7.5%	+2.7%	−1.1%	−0.2%
log_record	−1.0%	+0.2%	−4.1%	−3.7%
analytics_event¹	+7.8%	+7.6%	+1.3%	−1.0%
google_message1	−4.7%	−8.5%	−4.9%	−4.0%
media_frame	0.0%	0.0%	−7.4%	−2.6%

¹ spills past the 16-slot inline cache; build+encode (the realistic construct→encode→drop path) is within noise.

The earlier Vec-only SizeCache regressed small-message encode by 16–28% (per-call heap allocation); the inline-storage hybrid recovers that to parity-or-better. compute_size itself with a reused cache is at parity.

Scope

Applies to both Message and ViewEncode — view structs also lose __buffa_cached_size (reversing the field addition from #55, which had not yet shipped in a release). MessageField/MessageFieldView forwarders, the map-write path (no longer recomputes value sizes during write), clear() codegen, and all hand-written test impls updated.

Breaking change

Yes — compute_size/write_to signatures change and cached_size() is removed on both traits. Targeted for v0.4.0 alongside the __buffa:: namespacing and DefaultInstance/DefaultViewInstance safety changes.

Migration

For typical users (calling encode*()): no change.

For hand-written Message/ViewEncode impls or callers of compute_size/write_to directly:

msg.compute_size() → msg.encoded_len() (if you only want the size)
msg.compute_size(); msg.write_to(buf) → msg.encode(buf) (or thread a SizeCache explicitly if reusing across calls)
Drop the __buffa_cached_size field and cached_size() method from hand-written impls.

Generated message and view types no longer carry a __buffa_cached_size field. Nested-message sizes are recorded in an external pre-order Vec<u32> SizeCache that compute_size populates and write_to consumes, constructed and discarded inside the provided encode* methods. - Message::compute_size/write_to and ViewEncode::compute_size/write_to now take &mut SizeCache; cached_size() is removed; encoded_len() added. - Generated structs hold only proto fields plus __buffa_unknown_fields: no interior mutability, structurally Send + Sync, concurrent encode() of the same &msg is sound (each thread uses its own cache). - Codegen reserves a slot before recursing into each LEN-delimited sub-message and consumes it in the same pre-order during write_to; groups thread the cache without reserving. Map message-values are handled per-phase, also fixing the previous map_write_to_stmt recomputation of v.compute_size() during write. - CachedSize type and __private re-export deleted. Closes #14.

…ew() Backs the cache with a 16-slot uninitialized inline array plus a Vec spill, so SizeCache::new() does no heap allocation and no array zeroing — effectively free for the per-encode fresh-cache path. reserve() writes a 0 placeholder per slot so next_size()'s assume_init is sound regardless of caller discipline (no UB possible from safe code); set() asserts the slot was reserved. Recovers the per-message encode_to_vec() overhead introduced by moving size state out of the struct: log_record encode +12% -> -1%, google_message1 encode +20% -> -5% vs the in-struct AtomicU32 baseline. build+encode (the realistic construct-then-serialize path) improves 1-7% across the suite from the smaller, atomic-free struct layout.

…Cache - map_compute_size_stmt now iterates `for (k, v) in &self.field` in all non-constant arms, identical to map_write_to_stmt, so SizeCache slot order matches by construction rather than relying on .values()/.iter() walking the same table slots. The (true, true) constant-size arm keeps the len()*const fold (no slots reserved there). - SizeCache.inline reverted to plain [u32; 16]: new() zero-inits, reserve() still writes the per-slot 0 placeholder so a reserve-without -set after clear() reads 0 deterministically, next_size() is a plain index. No unsafe. - Tests: reserve_without_set_yields_zero, clear_then_reserve_without_set_yields_zero.

- L3: SizeCache::next_size -> consume_next; updated codegen, tests, docs. - L4: Message::encode_with_cache / ViewEncode::encode_with_cache provided methods for hot-loop cache reuse (clear + compute_size + write_to). - L5: SizeCache rustdoc notes the type is intentionally not Clone. - L6: debug_assert on len < u32::MAX in reserve(). - L2: codegen names the cache parameter `__cache` only when the message has a sub-message/group field, oneof message variant, or message-typed map value; leaf messages get `_cache`. Removes the let _ = &__cache marker line. New helper message_uses_size_cache() encapsulates the check for both owned and view emission.

iainmcgin mentioned this pull request Apr 27, 2026

DefaultInstance trait is not actually unsafe #68

Closed

iainmcgin added 3 commits April 27, 2026 19:43

benchmarks: thread reusable SizeCache through compute_size bench

fbcd790

iainmcgin force-pushed the size-cache-external branch from bb2dee3 to 81eee09 Compare April 27, 2026 20:12

iainmcgin changed the title ~~Externalize size cache to remove __buffa_cached_size struct field~~ Externalize size cache: remove __buffa_cached_size from generated structs Apr 27, 2026

iainmcgin added 2 commits April 27, 2026 20:21

iainmcgin requested a review from asacamano April 27, 2026 22:08

iainmcgin marked this pull request as ready for review April 27, 2026 22:08

asacamano approved these changes Apr 27, 2026

View reviewed changes

iainmcgin merged commit c1fdbe1 into main Apr 27, 2026
7 checks passed

iainmcgin deleted the size-cache-external branch April 27, 2026 22:22

github-actions Bot locked and limited conversation to collaborators Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Externalize size cache: remove __buffa_cached_size from generated structs#22

Externalize size cache: remove __buffa_cached_size from generated structs#22
iainmcgin merged 5 commits intomainfrom
size-cache-external

iainmcgin commented Mar 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iainmcgin commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before

After

Why

SizeCache

Performance

Scope

Breaking change

Migration

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iainmcgin commented Mar 30, 2026 •

edited

Loading