Important
This solution changes the tests_dir, so please either pass it explicitly as an argument or modify contest/grader/grader.cpp
We focused mainly on single-threaded performance optimization and achieve 25-30% performance gain
Replacing std::map with absl::flat_hash_map in CellStorageStat::add_used_storage significantly improved performance.
This is an obvious, but highly effective optimization
Warning
Replacing std::map with std::unordered_map can be exploited by crafting cache with colliding keys,
while absl::flat_hash_map mitigates this risk through built-in randomization
Additionally, I created a specialized method CellStorageStat::add_used_storage_fast, which:
- Always adds the cell to the seen set (no
kill_dupparam) - Always counts if there are
bitsandcells(noskip_count_rootparam) - Optimizes the cell and bits overflow checks
After exploring various approaches for several days, current implementation of this method was the most efficient
I implemented caching for cells that have been seen in the current block - even if they appear in another transaction (or even account) within the same block
- Caching is activated for account with more than 1 transaction. In accounts with only one transaction cache building takes more time than profit from it
- Only cells with no more than 8 children are cached. Caching cells with more children may take more time and space, but almost never repeats in the same block
- Cache is cleared at the start of each block processing
I optimized the abseil hash (AbslHashValue) of CellHash by eliminating intermediate steps
and performing a direct conversion of [8–16] bytes into td::uint64
Default implementation std::hash<vm::CellHash>()(cell_hash) performs the following steps:
- Slice hash calculation:
cell_hash_slice_hash(s.as_slice()) - Substring extraction:
hash.substr(8, 8) - Conversion:
td::as<size_t>
Optimization:
All these steps are now consolidated into a single zero-copy call, improving efficiency
To further improve the efficiency of CellStorageStat::add_used_storage_fast,
I calculate the number of cells and the merkle_depth during DataCell::create.
This values can be reused later. This change can enable (but not yet implemented):
- Elimination of
merkle_depth** fromCacheInfo - Conversion
seenfrom aflat_hash_mapto aflat_hash_set - Determine whether a cell should be cached in the beginning cell processing
Parallelize ContestValidateQuery::check_transactions by processing each account in a new thread.
It takes a lot of time... However, this approach yielded less performance improvement and lead to
exceptions in random tests due to different non-thread-safe places. So we decided to focus on
single-threaded performance optimization
It was observed that most cells appear only once per block, making it ineffective to cache every cell
Platform specific
PS I forgot to delete additional logs, timer1 and timer2 from code.
It will take some additional time during testing, so results can be even better