Skip to content

Improve Model Deserialization Speed#136

Merged
josephbirkner merged 25 commits intov0.6.3from
noserde
Mar 16, 2026
Merged

Improve Model Deserialization Speed#136
josephbirkner merged 25 commits intov0.6.3from
noserde

Conversation

@josephbirkner
Copy link
Collaborator

@josephbirkner josephbirkner commented Feb 24, 2026

Summary

  1. Move to memcpy-style deserialization for Model columns, which are arrays of primitive elements.
  2. Allow std::vector<uint8_t> as column buffer type to skip segmented vector page allocations during deserialization.
  3. Introduce compact mode for ArrayArena.
  4. Deserialize using vector<uint8_t> instead of stringstream, which has a seemingly slow emscripten impl.

Result: 10-20x speed improvement for model deserialization in erdblick. This greatly improves completion/search performance, and slices roughly half off the tile render time.

Sacrifice: Big endian compatibility.


Note

High Risk
High risk because it changes the binary serialization/deserialization format and data layout assumptions (raw byte payloads, compacted arenas, little-endian only), which can break compatibility or expose subtle corruption/offset bugs.

Overview
Speeds up model (de)serialization by introducing ModelColumn (paged column storage with Bitsery support that reads/writes raw byte payloads via memcpy) and switching ModelPool’s primitive columns to it.

Reworks ArrayArena serialization to a compact representation (packed heads + contiguous element buffer) with lazy expansion back to runtime chunks on mutation/access, and adds byte_size() helpers for size stats.

Updates deserialization APIs: ModelPool::read and StringPool::read now accept std::vector<uint8_t> (plus optional offset) using Bitsery’s buffer adapter instead of std::istream; tests are adjusted accordingly, and several node/field structs are tagged with MODEL_COLUMN_TYPE plus a new little-endian-only constraint.

Written by Cursor Bugbot for commit e9c17b1. This will update automatically on new commits. Configure here.

@josephbirkner josephbirkner changed the title ModelColumn migration with automatic column-type validation Improve Model Deserialization Speed Feb 24, 2026
@github-actions
Copy link

github-actions bot commented Feb 24, 2026

Test Results

 1 files  ±0   1 suites  ±0   6m 49s ⏱️ +3s
90 tests +2  90 ✅ +2  0 💤 ±0  0 ❌ ±0 
95 runs  +2  95 ✅ +2  0 💤 ±0  0 ❌ ±0 

Results for commit 202d867. ± Comparison against base commit c97ea20.

♻️ This comment has been updated with latest results.

cursor[bot]

This comment was marked as spam.

@johannes-wolf johannes-wolf self-requested a review March 4, 2026 11:50
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Inconsistent compact mode check in byte_size
    • Updated byte_size() to use the same heads_.empty() && compactHeads_ compact-mode guard as other accessors so it cannot read compact metadata after runtime heads are materialized.

Create PR

Or push these changes by commenting:

@cursor push 536d836a13
Preview (536d836a13)
diff --git a/include/simfil/model/arena.h b/include/simfil/model/arena.h
--- a/include/simfil/model/arena.h
+++ b/include/simfil/model/arena.h
@@ -122,7 +122,7 @@
      * @return The current size, in bytes, of the array arena if serialized.
      */
     [[nodiscard]] size_t byte_size() const {
-        if (compactHeads_) {
+        if (heads_.empty() && compactHeads_) {
             return compactHeads_->byte_size() + data_.byte_size();
         }
         auto result = heads_.size() * sizeof(CompactArrayChunk);
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

result += head.size * sizeof(ElementType_);
}
return result;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent compact mode check in byte_size

Low Severity

byte_size() checks if (compactHeads_) to select the compact path, while all other methods (size(), size(a), at_impl, iterate) consistently use if (heads_.empty() && compactHeads_). During the window between ensure_runtime_heads_from_compact populating heads_ and the subsequent compactHeads_.reset(), byte_size() would incorrectly take the compact path even though the arena has transitioned to runtime mode, potentially returning stale size information.

Additional Locations (1)

Fix in Cursor Fix in Web

@josephbirkner
Copy link
Collaborator Author

josephbirkner commented Mar 16, 2026

TODO for merge:

  • Add docs for ModelColumn, TwoPart<A,B> storage.
  • Add docs for ArrayArena with singleton support.
  • Generally update dev docs.

@sonarqubecloud
Copy link

@github-actions
Copy link

Package Line Rate Branch Rate Health
include.simfil 24% 10%
include.simfil.model 75% 46%
src 79% 47%
src.model 82% 46%
Summary 46% (7858 / 17023) 27% (4770 / 17363)

@josephbirkner josephbirkner merged commit 87a0469 into v0.6.3 Mar 16, 2026
6 checks passed
@josephbirkner josephbirkner deleted the noserde branch March 16, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants