Improve Model Deserialization Speed#136
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Inconsistent compact mode check in byte_size
- Updated
byte_size()to use the sameheads_.empty() && compactHeads_compact-mode guard as other accessors so it cannot read compact metadata after runtime heads are materialized.
- Updated
Or push these changes by commenting:
@cursor push 536d836a13
Preview (536d836a13)
diff --git a/include/simfil/model/arena.h b/include/simfil/model/arena.h
--- a/include/simfil/model/arena.h
+++ b/include/simfil/model/arena.h
@@ -122,7 +122,7 @@
* @return The current size, in bytes, of the array arena if serialized.
*/
[[nodiscard]] size_t byte_size() const {
- if (compactHeads_) {
+ if (heads_.empty() && compactHeads_) {
return compactHeads_->byte_size() + data_.byte_size();
}
auto result = heads_.size() * sizeof(CompactArrayChunk);| result += head.size * sizeof(ElementType_); | ||
| } | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Inconsistent compact mode check in byte_size
Low Severity
byte_size() checks if (compactHeads_) to select the compact path, while all other methods (size(), size(a), at_impl, iterate) consistently use if (heads_.empty() && compactHeads_). During the window between ensure_runtime_heads_from_compact populating heads_ and the subsequent compactHeads_.reset(), byte_size() would incorrectly take the compact path even though the arena has transitioned to runtime mode, potentially returning stale size information.
Additional Locations (1)
…storage Further Model Size Reductions
|
TODO for merge:
|
…olding Rework Diagnostics Folding
|
|






Summary
std::vector<uint8_t>as column buffer type to skip segmented vector page allocations during deserialization.Result: 10-20x speed improvement for model deserialization in erdblick. This greatly improves completion/search performance, and slices roughly half off the tile render time.
Sacrifice: Big endian compatibility.
Note
High Risk
High risk because it changes the binary serialization/deserialization format and data layout assumptions (raw byte payloads, compacted arenas, little-endian only), which can break compatibility or expose subtle corruption/offset bugs.
Overview
Speeds up model (de)serialization by introducing
ModelColumn(paged column storage with Bitsery support that reads/writes raw byte payloads viamemcpy) and switchingModelPool’s primitive columns to it.Reworks
ArrayArenaserialization to a compact representation (packed heads + contiguous element buffer) with lazy expansion back to runtime chunks on mutation/access, and addsbyte_size()helpers for size stats.Updates deserialization APIs:
ModelPool::readandStringPool::readnow acceptstd::vector<uint8_t>(plus optional offset) using Bitsery’s buffer adapter instead ofstd::istream; tests are adjusted accordingly, and several node/field structs are tagged withMODEL_COLUMN_TYPEplus a new little-endian-only constraint.Written by Cursor Bugbot for commit e9c17b1. This will update automatically on new commits. Configure here.