Externalize size cache: remove __buffa_cached_size from generated structs#22
Merged
Externalize size cache: remove __buffa_cached_size from generated structs#22
Conversation
Generated message and view types no longer carry a __buffa_cached_size field. Nested-message sizes are recorded in an external pre-order Vec<u32> SizeCache that compute_size populates and write_to consumes, constructed and discarded inside the provided encode* methods. - Message::compute_size/write_to and ViewEncode::compute_size/write_to now take &mut SizeCache; cached_size() is removed; encoded_len() added. - Generated structs hold only proto fields plus __buffa_unknown_fields: no interior mutability, structurally Send + Sync, concurrent encode() of the same &msg is sound (each thread uses its own cache). - Codegen reserves a slot before recursing into each LEN-delimited sub-message and consumes it in the same pre-order during write_to; groups thread the cache without reserving. Map message-values are handled per-phase, also fixing the previous map_write_to_stmt recomputation of v.compute_size() during write. - CachedSize type and __private re-export deleted. Closes #14.
…ew() Backs the cache with a 16-slot uninitialized inline array plus a Vec spill, so SizeCache::new() does no heap allocation and no array zeroing — effectively free for the per-encode fresh-cache path. reserve() writes a 0 placeholder per slot so next_size()'s assume_init is sound regardless of caller discipline (no UB possible from safe code); set() asserts the slot was reserved. Recovers the per-message encode_to_vec() overhead introduced by moving size state out of the struct: log_record encode +12% -> -1%, google_message1 encode +20% -> -5% vs the in-struct AtomicU32 baseline. build+encode (the realistic construct-then-serialize path) improves 1-7% across the suite from the smaller, atomic-free struct layout.
bb2dee3 to
81eee09
Compare
…Cache - map_compute_size_stmt now iterates `for (k, v) in &self.field` in all non-constant arms, identical to map_write_to_stmt, so SizeCache slot order matches by construction rather than relying on .values()/.iter() walking the same table slots. The (true, true) constant-size arm keeps the len()*const fold (no slots reserved there). - SizeCache.inline reverted to plain [u32; 16]: new() zero-inits, reserve() still writes the per-slot 0 placeholder so a reserve-without -set after clear() reads 0 deterministically, next_size() is a plain index. No unsafe. - Tests: reserve_without_set_yields_zero, clear_then_reserve_without_set_yields_zero.
- L3: SizeCache::next_size -> consume_next; updated codegen, tests, docs. - L4: Message::encode_with_cache / ViewEncode::encode_with_cache provided methods for hot-loop cache reuse (clear + compute_size + write_to). - L5: SizeCache rustdoc notes the type is intentionally not Clone. - L6: debug_assert on len < u32::MAX in reserve(). - L2: codegen names the cache parameter `__cache` only when the message has a sub-message/group field, oneof message variant, or message-typed map value; leaf messages get `_cache`. Removes the let _ = &__cache marker line. New helper message_uses_size_cache() encapsulates the check for both owned and view emission.
asacamano
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #14.
Summary
Removes the in-struct
__buffa_cached_size: CachedSize(AtomicU32) field from all generated owned and view types by threading an externalSizeCachethrough the two-pass encode protocol.Before
After
encode(),encode_length_delimited(),encode_to_vec(),encode_to_bytes()construct aSizeCacheinternally — no caller-visible change for the common path. A newencode_with_cache(&self, &mut SizeCache, buf)provided method lets hot loops reuse a single cache (and its spill capacity) across many encodes.Why
__buffa_unknown_fields). Exhaustive struct destructuring works; struct literals don't need..Default::default()for the cache.Send/Syncare structural instead of viaAtomicU32+ Relaxed.&Messageis genuinely read-only.SizeCache
buffa::SizeCacheis a pre-order slot table:compute_sizecallsreserve()before recursing into each LEN-delimited sub-message andset()after;write_tocallsconsume_next()in the same pre-order. Groups (tag-delimited) thread the cache without reserving. Map compute and write iteratefor (k, v) in &mapidentically, so slot order is correct by construction (no reliance on.values()/.iter()ordering agreement).Storage is
[u32; 16]inline +Vec<u32>spill, soSizeCache::new()is allocation-free for messages with ≤16 nested LEN sub-messages — which covers every benchmark dataset exceptanalytics_event.Performance
vs
main(criterion, this host's noise floor ~±10% per the JSON-encode control):¹ spills past the 16-slot inline cache; build+encode (the realistic construct→encode→drop path) is within noise.
The earlier
Vec-onlySizeCacheregressed small-message encode by 16–28% (per-call heap allocation); the inline-storage hybrid recovers that to parity-or-better.compute_sizeitself with a reused cache is at parity.Scope
Applies to both
MessageandViewEncode— view structs also lose__buffa_cached_size(reversing the field addition from #55, which had not yet shipped in a release).MessageField/MessageFieldViewforwarders, the map-write path (no longer recomputes value sizes during write),clear()codegen, and all hand-written test impls updated.Breaking change
Yes —
compute_size/write_tosignatures change andcached_size()is removed on both traits. Targeted for v0.4.0 alongside the__buffa::namespacing andDefaultInstance/DefaultViewInstancesafety changes.Migration
For typical users (calling
encode*()): no change.For hand-written
Message/ViewEncodeimpls or callers ofcompute_size/write_todirectly:msg.compute_size()→msg.encoded_len()(if you only want the size)msg.compute_size(); msg.write_to(buf)→msg.encode(buf)(or thread aSizeCacheexplicitly if reusing across calls)__buffa_cached_sizefield andcached_size()method from hand-written impls.