Rework storage iterators#13284
Rework storage iterators#13284paritytech-processbot[bot] merged 27 commits intoparitytech:masterfrom
Conversation
| prefix: Option<StorageKey>, | ||
| current_key: Vec<u8>, | ||
| _phantom: PhantomData<Block>, | ||
| skip_if_first: Option<StorageKey>, |
There was a problem hiding this comment.
Note: the iterators on the storage backend and on Client work differently regarding the start_at.
For the backend if a start_at is specified then it will seek to that key and it will always include that key in the iterator's output. For the Client if the key specified in start_at matches exactly the first key it encounters then it skips that key.
This inconsistency is a little confusing, but that's how it works.
I've added some extra tests that verify this behavior, and I've also checked that they pass before my changes.
There was a problem hiding this comment.
Yes that's a point to be extremely careful with, retaining the sometime ~ behavior of iteration calls from the existing host functions (all that is touched by clear_prefix from sp_externalities).
I remember something ~ with the way error are handled, but I may be confusing with something else.
There was a problem hiding this comment.
Yeah, I'll probably run a full burn-in before merging this just to make sure nothing's broken.
| } | ||
|
|
||
| fn was_complete(&self) -> bool { | ||
| matches!(self.state, IterState::FinishedComplete) |
There was a problem hiding this comment.
what about FinishedIncomplete?
There was a problem hiding this comment.
If it finished incomplete it didn't finish completely :D
| }; | ||
| if let Err(e) = result { | ||
| debug!(target: "trie", "Error while iterating by prefix: {}", e); | ||
| if self.root == Default::default() { |
There was a problem hiding this comment.
It's always a bit awkward to tell if self.root is vec![0; 32] or hash(trie_empty_node) (eg MemoryDb::null_node_data), actually it looks like self.empty could be use instead (that is the root of a empty trie).
There was a problem hiding this comment.
Hmmm... why do we have self.empty anyway? In practice won't H::default always point to an empty storage root and a value which doesn't exist? (And idiomatically T::default() is essentially always used to denote an empty something, e.g. Vec::default, HashMap::default, etc. are all empty, so it's kinda weird to have two things which mean "empty".)
But anyhow, as far as I can see changing this to self.root == self.empty will fail at least one test (pairs_are_empty_on_empty_storage) because the root there is initialized with Default::default, so the fake empty key self.empty won't match, so it will go and call TrieDBRawIterator::new_prefixed, and that will fail with an Invalid state root error. (This worked previously without special-casing it like this because pairs just ignored the errors, and this new iterator doesn't do that anymore.)
There was a problem hiding this comment.
H::default is [0; 32], that's an invalid trie root. Could have given the actual empty trie value but that's not convenient at all (makes the H depends on trie internals). In practice it 's just one step from saying any invalid root is an empty trie, but that would be confusing (like if you got a root and no content you cannot say if it is empty).
But yes that's rather awkward that some place allows H::default.
| }); | ||
|
|
||
| if let Err(error) = result { | ||
| log::debug!(target: "trie", "Error while iterating the storage: {}", error); |
There was a problem hiding this comment.
Could be good to add a comment or insist on the fact that ignoring error is a host function related behavior. Don't really know how to word it.
| .map_err(client_err) | ||
| } | ||
|
|
||
| // TODO: This is horribly broken; either remove it, or make it streaming. |
There was a problem hiding this comment.
I think the paged variant is supposed to replace this rpc, so was planed to be removed.
There was a problem hiding this comment.
Yep, I know. But technically we don't necessarily need to remove those non-paged variants. We could just fix them and make them stream the data. (It's out of scope of this PR though.)
| type Backend = RecordStatsState<S, B>; | ||
| type Error = S::Error; | ||
|
|
||
| fn next_key(&mut self, backend: &Self::Backend) -> Option<Result<StorageKey, Self::Error>> { |
There was a problem hiding this comment.
Not for this PR, but could be good to record state (at least adding it looks straight forward).
Sure thing. |
|
Thanks a lot 🙏 |
cheme
left a comment
There was a problem hiding this comment.
I forgot to approve btw :) (IIRC only the question of he empty root was a bit bugging me but it is mainly me being finicky).
|
@cheme Done. Do you want to get this merged as soon as possible, or can it wait a little while? As I've said, I wanted to run a full burn-in before merging this just to be on a safe side, but since running those takes time and is a pain I've planned to get my other storage iterator related changes finished too and run them in the same burn-in (even though I'm not going to push them to this PR) and only merge this once that passes. But if you want to get this merged sooner I can just start a burn-in only with these changes present. |
|
@tomaka probably want it as soon as possible, but alternatively I can publish 0.24.1 with the fix. |
Okay; then let me start a burn in with only the changes from this PR then. (Not sure if it matters but as a side effect this will also test your changes.) |
|
My burnin finished successfully, so this should be good to go I think. |
|
The CI pipeline was cancelled due to failure one of the required jobs. |
|
bot merge |
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
* Rework storage iterators * Make sure storage iteration is also accounted for when benchmarking * Use `trie-db` from crates.io * Appease clippy * Bump `trie-bench` to 0.35.0 * Fix tests' compilation * Update comment to clarify how `IterArgs::start_at` works * Add extra tests * Fix iterators on `Client` so that they behave as before * Add extra `unwrap`s in tests * More clippy fixes * Come on clippy, give me a break already * Rename `allow_missing` to `stop_on_incomplete_database` * Add `#[inline]` to `with_recorder_and_cache` * Use `with_recorder_and_cache` in `with_trie_db`; add doc comment * Simplify code: use `with_trie_db` in `next_storage_key_from_root` * Remove `expect`s in the benchmarking CLI * Add extra doc comments * Move `RawIter` before `TrieBackendEssence` (no code changes; just cut-paste) * Remove a TODO in tests * Update comment for `StorageIterator::was_complete` * Update `trie-db` to 0.25.1
This PR reworks the way we iterate over storage.
Summary of the changes:
Vec. There's no real good reason to have those (except maybe in tests), and unless extremely careful using them just results in things exploding.sp_state_machine::Backendtrait now has methods which return proper iterators. All of the iteration methods in there (apply_to_key_values_while,for_keys_with_prefix,pairs,keys,child_pairs, etc.) are now default implemented on the trait itself, and are channeled through theraw_itermethod. This both is simpler to use (you can just use an iterator now, instead of having to use one of the methods which take a callback) and simpler to implement (you don't have to reimplement the whole zoo of methods which all do the same thing - iterate over the storage - but in subtly different ways).This requires a new version of
trie-db(paritytech/trie#181).Potential future work
for_keys_with_prefix, etc. Since the trait now exposes proper iterators those are now completely unnecessary. I would have done this in this PR, but the diff's already ~1.5k lines long, so I didn't want to bloat it up any further. I'll do it in the next PR once this one is merged.polkadot companion: paritytech/polkadot#6653