-
Notifications
You must be signed in to change notification settings - Fork 2
feat: use Arc cloning in extend_ref instead of deep cloning BranchNodeCompact #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cliff/triedb
Are you sure you want to change the base?
Conversation
| use alloy_primitives::map::DefaultHashBuilder; | ||
| use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion}; | ||
| use reth_trie_common::{updates::TrieUpdates, BranchNodeCompact, Nibbles}; | ||
| use std::{collections::HashMap, sync::Arc}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls follow standard rust import sequences, std first, followed by 3rd party, then workspace. other places change also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the order of imports
| let mut updates = TrieUpdates::default(); | ||
|
|
||
| for i in 0..num_nodes { | ||
| let path = Nibbles::from_nibbles(&[i as u8 % 16, (i / 16) as u8 % 16]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for ethereum and acct, the Nibble should be 64 u8. better check with actual case,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to 64 nibbles
| // the important part is the Arc cloning behavior, not node content | ||
| let node = BranchNodeCompact::default(); | ||
|
|
||
| updates.account_nodes.insert(path, Arc::new(node)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can add storage_tries also, storage trie usually is much bigger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added storage_updates for benchmark tests
|
|
||
| if !branch_nodes_equal(task.as_ref(), regular.as_ref(), database.as_ref())? { | ||
| diff.account_nodes.insert(key, EntryDiff { task, regular, database }); | ||
| if !branch_nodes_equal(task.as_ref().map(|n| &**n), regular.as_ref().map(|n| &**n), database.as_ref())? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&**n can just be written as n.as_ref()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated &**n to n.as_ref() at all occurences
| diff.account_nodes.insert(key, EntryDiff { task, regular, database }); | ||
| if !branch_nodes_equal(task.as_ref().map(|n| &**n), regular.as_ref().map(|n| &**n), database.as_ref())? { | ||
| diff.account_nodes.insert(key, EntryDiff { | ||
| task: task.map(|n| (*n).clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to define as Arc? so do not clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes We can change EntryDiff to use Arc to avoid cloning. This is diagnostic code that only runs when there are differences, so we should avoid the unnecessary clones.
| // Create account trie updates: one Some (update) and one None (removal) | ||
| let account_nodes = vec![ | ||
| (account_nibbles1, Some(node1.clone())), // This will update existing node | ||
| (account_nibbles1, Some(Arc::new(node1.clone()))), // This will update existing node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could just be Arc::new(node1), without clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use &node1
| storage_nodes: vec![ | ||
| (storage_nibbles1, Some(storage_node1.clone())), // Updated node already in db | ||
| (storage_nibbles2, Some(storage_node2.clone())), /* Updated node not in db | ||
| (storage_nibbles1, Some(Arc::new(storage_node1.clone()))), // Updated node already in db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can clone 1 time, and use Arc::clone()?
let storage_node1 = Arc::new(storage_node1.clone());
Arc::clone(&storage_node1)
Arc::clone(&storage_node1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed redundancy by cloning to local-variable
| (storage_nibbles1, Some(storage_node1.clone())), // Updated node from overlay | ||
| (storage_nibbles2, Some(storage_node2.clone())), /* Updated node not in overlay | ||
| (storage_nibbles1, Some(Arc::new(storage_node1.clone()))), // Updated node from overlay | ||
| (storage_nibbles2, Some(Arc::new(storage_node2.clone()))), /* Updated node not in overlay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many clone here, pls check whether can save any clone if possible
| fn from(value: &'a super::TrieUpdates) -> Self { | ||
| Self { | ||
| account_nodes: Cow::Borrowed(&value.account_nodes), | ||
| account_nodes: Cow::Owned( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to clone and own here, what would be the overall impact? possible just borrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clone happens because:
- We have
Arc<BranchNodeCompact>in the source - Bincode serialization expects
BranchNodeCompact(owned, not Arc) - We must unwrap the
Arc ([.as_ref()]and clone the inner value
Bincode serialization is infrequent (persistence/snapshots) and the clone cost is acceptable for this use case.
The Arc optimization still wins massively on the hot path (extend_ref, aggregation).
crates/trie/common/src/updates.rs
Outdated
| is_deleted: value.is_deleted, | ||
| storage_nodes: Cow::Borrowed(&value.storage_nodes), | ||
| storage_nodes: Cow::Owned( | ||
| value.storage_nodes.iter().map(|(k, v)| (*k, (**v).clone())).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous version does not need to clone and own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trade-off for clone & own:
What we gained:
- Hot path (extend_ref, aggregation): 3.08x speedup, just Arc::clone (8 bytes) instead of full clone (112 bytes)
- Happens thousands of times per block
What we lost:
- Cold path (bincode serialization): Must clone entire trie for serialization
- Happens once per persistence/snapshot operation
This is the correct trade-off for production because:
- Trie aggregation happens continuously (hot)
- Bincode serialization happens rarely (cold - snapshots, persistence)
crates/trie/db/src/trie_cursor.rs
Outdated
| self.cursor.upsert( | ||
| self.hashed_address, | ||
| &StorageTrieEntry { nibbles, node: node.clone() }, | ||
| &StorageTrieEntry { nibbles, node: (**node).clone() }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to write (**node).clone() to node.as_ref().clone(), check other places also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to get new owned value by getting as_ref() and cloning it
| cursor_entry: Option<(Nibbles, BranchNodeCompact)>, | ||
| /// Forward-only in-memory cursor over storage trie nodes. | ||
| in_memory_cursor: ForwardInMemoryCursor<'a, Nibbles, Option<BranchNodeCompact>>, | ||
| in_memory_cursor: ForwardInMemoryCursor<'a, Nibbles, Option<std::sync::Arc<BranchNodeCompact>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use std::sync::Arc on the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reordered imports
| // then we return the overlay's node. | ||
| return Ok(Some((mem_key, node))) | ||
| // then we return the overlay's node. Clone the Arc to get the actual node. | ||
| return Ok(Some((mem_key, (*node).clone()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's additional clone, what is the overall impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Frequency: Lower - only during state root calculation or proof generation
- What's added: Must clone BranchNodeCompact (112 bytes) when returning from Arc
- Cost: One 112-byte clone per node read from overlay
savings outweigh costs
- saving: using Arc in extend_ref() calls - aggregating blocks into RPC cache
- What's saved: Cloning BranchNodeCompact (112 bytes) -> now just clones Arc pointer (8 bytes)
Without Arc: 1000 blocks × 50 nodes × 112 bytes = 5.6 MB + 50,000 expensive clones
With Arc: 1000 blocks × 50 nodes × 8 bytes = 0.4 MB + 50,000 cheap clones + some read clones
Savings: 5.2 MB memory + 50,000 fast aggregations
Cost: ~50-500 read clones during proof generation (depending on trie structure)
| let entry = match (mem_entry, &self.cursor_entry) { | ||
| (Some((mem_key, entry_inner)), _) if mem_key == key => { | ||
| entry_inner.map(|node| (key, node)) | ||
| entry_inner.as_ref().map(|node| (key, (**node).clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here also, clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to get new owned value by getting as_ref() and cloning it
| // collect account updates and sort them in descending order, so that when we pop them | ||
| // off the Vec they are popped in ascending order. | ||
| self.account_nodes.extend(updates.account_nodes); | ||
| self.account_nodes.extend(updates.account_nodes.into_iter().map(|(k, v)| (k, (*v).clone()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to get new owned value by getting as_ref() and cloning it
Arc Optimization for TrieUpdates
Summary
This PR optimizes TrieUpdates aggregation by using
Arc<BranchNodeCompact>instead of ownedBranchNodeCompactvalues, eliminating expensive deep cloning during block processing.Performance Impact
extend_ref()operations (440 micro-seconds vs 1,357 micro-seconds for 1,024 blocks)Changes Overview
Core Optimization
Primary Change:
crates/trie/common/src/updates.rs- ChangedHashMap<Nibbles, BranchNodeCompact>→HashMap<Nibbles, Arc<BranchNodeCompact>>Propagation to Trie Components:
crates/trie/sparse/src/traits.rs- UpdatedSparseTrieUpdates.updated_nodesto use Arccrates/trie/sparse/src/trie.rs- Wrap branch nodes inArc::new()on insertioncrates/trie/trie/src/trie.rs- Map hash builder updates to Arccrates/trie/db/src/trie_cursor.rs- Handle Arc in trie cursor operationscrates/trie/sparse-parallel/src/trie.rs- Arc support in parallel trie implementationBenchmark & Profiling
New Additions:
crates/trie/common/benches/extend_ref_benchmark.rs- Benchmark demonstrating 3.08x speedupcrates/trie/common/Cargo.toml- Added criterion dependency for benchmarkingcrates/chain-state/src/trie_profiler.rs- Profiling instrumentation (318 lines)crates/chain-state/src/lib.rs- Export profiler moduleRun benchmark:
Propagation Fixes
Required updates to support Arc changes throughout the codebase:
crates/storage/provider/src/providers/database/provider.rs- Wrap DB nodes in Arc, update static arrayscrates/trie/trie/src/trie_cursor/in_memory.rs- Update in-memory cursor for Arc typescrates/trie/trie/src/node_iter.rs- Unwrap Arc in test helper functionscrates/trie/trie/src/verify.rs- Handle Arc in trie verificationcrates/trie/db/tests/trie.rs- Update test assertions for Arc typescrates/engine/tree/src/tree/trie_updates.rs- Handle Arc in trie update comparisonTest Fixes
Required changes to make tests work with Arc types:
crates/engine/invalid-block-hooks/src/witness.rs- Wrap test nodes in Arccrates/exex/test-utils/src/lib.rs- Add TriedbProvider to test setupcrates/storage/db-common/src/init.rs- Add TriedbProvider to test setupcrates/chain-state/src/in_memory.rs- Import PlainPostState for testsTesting
All Core Trie Tests Pass
Test Commands
Technical Details
Before (Deep Clone)
extend_ref()call deep clones all BranchNodeCompact valuesAfter (Arc)
extend_ref()call clones Arc pointers (cheap)Memory Layout Comparison
Benchmark Details
What the Benchmark Tests
Compares two implementations of trie node aggregation:
Runs 16 test cases across two scenarios:
How It Works
Arc-based approach:
Deep clone approach:
Measured Results
Block accumulation (50 nodes per block):
Single extend operations:
Key observations:
Statistical data
target/criterion/with HTML reportsValidation