Skip to content

[DRAFT] Add pure/Queue benchmark#119

Draft
Kamirus wants to merge 1 commit intomainfrom
kamil/add-pure-queue-benchmark
Draft

[DRAFT] Add pure/Queue benchmark#119
Kamirus wants to merge 1 commit intomainfrom
kamil/add-pure-queue-benchmark

Conversation

@Kamirus
Copy link
Copy Markdown

@Kamirus Kamirus commented Mar 19, 2025

No description provided.

@github-actions
Copy link
Copy Markdown

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 196_203 ($\textcolor{red}{1.07\%}$) 8_206_679_241 ($\textcolor{red}{0.16\%}$) 56_000_256 339_059 ($\textcolor{green}{-1.09\%}$) 6_475_651_249 ($\textcolor{red}{0.09\%}$) 364_721 ($\textcolor{green}{-1.01\%}$) 10_762_547_991 ($\textcolor{green}{-0.04\%}$)
triemap 201_637 ($\textcolor{red}{0.98\%}$) 13_763_528_194 ($\textcolor{red}{0.68\%}$) 68_228_576 252_802 ($\textcolor{red}{0.04\%}$) 657_949 ($\textcolor{red}{0.01\%}$) 648_383 ($\textcolor{red}{0.03\%}$) 15_613_982_698 ($\textcolor{red}{0.49\%}$)
orderedmap 199_971 ($\textcolor{red}{0.76\%}$) 5_958_270_613 ($\textcolor{red}{0.68\%}$) 36_000_524 120_192 ($\textcolor{red}{0.07\%}$) 287_622 ($\textcolor{red}{0.03\%}$) 326_637 ($\textcolor{red}{0.05\%}$) 4_511_038_595 ($\textcolor{green}{-1.58\%}$)
rbtree 191_685 ($\textcolor{red}{0.79\%}$) 7_058_685_157 ($\textcolor{red}{0.93\%}$) 52_000_464 116_503 ($\textcolor{red}{0.07\%}$) 317_344 ($\textcolor{red}{0.01\%}$) 330_464 ($\textcolor{red}{0.05\%}$) 6_888_881_359 ($\textcolor{green}{-1.43\%}$)
splay 196_744 ($\textcolor{red}{1.02\%}$) 13_116_527_298 ($\textcolor{red}{0.49\%}$) 48_000_400 625_897 ($\textcolor{red}{0.01\%}$) 657_150 ($\textcolor{red}{0.02\%}$) 920_358 ($\textcolor{red}{0.01\%}$) 4_342_200_463 ($\textcolor{red}{0.47\%}$)
btree 236_042 ($\textcolor{red}{0.69\%}$) 10_241_086_717 ($\textcolor{red}{0.21\%}$) 25_108_416 357_667 ($\textcolor{red}{0.02\%}$) 485_549 ($\textcolor{red}{0.02\%}$) 539_636 ($\textcolor{red}{0.02\%}$) 2_860_091_551 ($\textcolor{green}{-1.60\%}$)
zhenya_hashmap 194_499 ($\textcolor{red}{0.80\%}$) 2_337_530_147 ($\textcolor{green}{-1.02\%}$) 16_777_504 58_447 ($\textcolor{red}{0.25\%}$) 66_606 ($\textcolor{red}{0.02\%}$) 79_843 ($\textcolor{red}{0.08\%}$) 3_081_184_159 ($\textcolor{green}{-0.10\%}$)
btreemap_rs 611_851 1_809_789_841 27_590_656 74_098 124_626 85_214 3_208_130_200
imrc_hashmap_rs 613_202 2_634_915_707 244_908_032 35_894 198_252 96_520 6_383_840_797
hashmap_rs 601_477 438_103_157 73_138_176 20_788 25_678 23_645 1_545_701_419

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50.1 upgrade
heap 173_713 ($\textcolor{red}{1.55\%}$) 5_571_564_835 ($\textcolor{red}{0.25\%}$) 24_000_360 621_886 ($\textcolor{red}{0.02\%}$) 227_339 ($\textcolor{red}{0.02\%}$) 592_744 ($\textcolor{red}{0.01\%}$) 3_163_816_019 ($\textcolor{green}{-2.38\%}$)
heap_rs 596_953 143_262_451 18_284_544 58_563 21_622 58_466 647_923_463

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 180_076 ($\textcolor{red}{1.10\%}$) 2_633_692 ($\textcolor{red}{1.25\%}$) 65_652 96_182 ($\textcolor{red}{0.64\%}$) 819_834 ($\textcolor{red}{2.03\%}$) 173_682 ($\textcolor{red}{0.06\%}$) 3_148_596 ($\textcolor{red}{0.06\%}$)
vector 177_461 ($\textcolor{red}{0.88\%}$) 1_952_489 ($\textcolor{green}{-0.01\%}$) 24_588 126_306 ($\textcolor{red}{0.08\%}$) 187_015 ($\textcolor{red}{0.25\%}$) 176_306 ($\textcolor{red}{0.06\%}$) 4_706_963 ($\textcolor{green}{-1.53\%}$)
vec_rs 588_969 287_516 1_376_256 16_494 30_089 22_346 3_806_788

Stable structures

Note
Same as main branch, skipping.

Statistics

  • binary_size: 0.96% [0.82%, 1.11%]
  • max_mem: no change
  • cycles: -0.04% [-0.22%, 0.14%]

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 199_441 ($\textcolor{red}{0.09\%}$) 266_971_573 ($\textcolor{green}{-5.62\%}$) 247_077_677 ($\textcolor{green}{-6.04\%}$) 33_989 ($\textcolor{green}{-1.11\%}$) 24_814 ($\textcolor{green}{-2.06\%}$)
Rust 596_836 82_782_948 56_788_520 42_522 41_228

Certified map

binary_size generate 10k max mem inc witness upgrade
Motoko 250_279 ($\textcolor{red}{0.90\%}$) 352_762_609 ($\textcolor{green}{-3.51\%}$) 342_396 383_575 ($\textcolor{green}{-3.54\%}$) 269_168 ($\textcolor{red}{0.53\%}$) 21_573_318 ($\textcolor{green}{-3.68\%}$)
Rust 640_537 489_666_578 1_310_720 660_965 220_622 450_827_450

Statistics

  • binary_size: 0.49% [-2.04%, 3.03%]
  • max_mem: no change
  • cycles: -3.13% [-4.60%, -1.65%]

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal upgrade
Motoko 282_582 ($\textcolor{red}{1.33\%}$) 512_488 ($\textcolor{green}{-0.17\%}$) 22_924 ($\textcolor{green}{-1.54\%}$) 18_605 ($\textcolor{green}{-3.33\%}$) 19_997 ($\textcolor{green}{-2.27\%}$) 152_786 ($\textcolor{green}{-5.76\%}$)
Rust 902_362 516_184 92_673 118_753 113_669 1_499_571

DIP721 NFT

binary_size init mint_token transfer_token upgrade
Motoko 226_700 ($\textcolor{red}{0.75\%}$) 482_842 ($\textcolor{red}{0.13\%}$) 28_977 ($\textcolor{green}{-6.84\%}$) 8_823 ($\textcolor{green}{-0.64\%}$) 86_040 ($\textcolor{green}{-6.91\%}$)
Rust 931_779 205_310 309_520 73_609 1_635_207 ($\textcolor{red}{0.00\%}$)

Statistics

  • binary_size: 1.04% [-0.77%, 2.86%]
  • max_mem: no change
  • cycles: -2.73% [-4.37%, -1.09%]

Heartbeat

binary_size heartbeat
Motoko 142_343 ($\textcolor{red}{0.07\%}$) 22_236 ($\textcolor{green}{-19.12\%}$)
Rust 26_684 1_201

Timer

binary_size setTimer cancelTimer
Motoko 150_627 ($\textcolor{red}{0.37\%}$) 50_009 ($\textcolor{green}{-10.95\%}$) 4_781 ($\textcolor{red}{1.83\%}$)
Rust 554_248 64_790 12_216

Statistics

  • binary_size: 0.37%
  • max_mem: no change
  • cycles: -4.56% [-44.91%, 35.79%]

Garbage Collection

generate 700k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_139_200_903 ($\textcolor{red}{6.06\%}$) 47_793_792 119 119 119
copying 1_139_200_785 ($\textcolor{red}{6.06\%}$) 47_793_792 1_138_938_356 ($\textcolor{red}{6.06\%}$) 1_139_023_441 ($\textcolor{red}{6.06\%}$) 1_138_939_831 ($\textcolor{red}{6.06\%}$)
compacting 1_544_780_593 ($\textcolor{green}{-0.61\%}$) 47_793_792 1_115_364_819 ($\textcolor{green}{-7.11\%}$) 1_425_910_550 ($\textcolor{red}{0.13\%}$) 1_458_704_611 ($\textcolor{red}{0.74\%}$)
generational 2_135_063_388 ($\textcolor{green}{-8.24\%}$) 47_802_256 760_508_882 ($\textcolor{green}{-15.41\%}$) 1_136_802 ($\textcolor{green}{-6.42\%}$) 1_038_713 ($\textcolor{green}{-6.18\%}$)
incremental 31_472_749 ($\textcolor{red}{6.67\%}$) 976_079_036 ($\textcolor{green}{-0.00\%}$) 416_341_828 ($\textcolor{green}{-11.23\%}$) 441_797_881 ($\textcolor{green}{-11.02\%}$) 1_197_069_644 ($\textcolor{green}{-6.68\%}$)

Actor class

binary size put new bucket put existing bucket get
Map 420_705 ($\textcolor{green}{-0.17\%}$) 557_314 ($\textcolor{green}{-26.55\%}$) 16_357 ($\textcolor{red}{0.05\%}$) 16_981 ($\textcolor{red}{0.38\%}$)

Statistics

  • binary_size: no change
  • max_mem: -0.00%
  • cycles: -2.92% [-6.16%, 0.32%]

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 171_414 ($\textcolor{red}{3.39\%}$) 151_117 ($\textcolor{red}{0.67\%}$) 26_730 ($\textcolor{green}{-18.66\%}$) 11_196 ($\textcolor{green}{-8.23\%}$) 21_536 ($\textcolor{green}{-20.43\%}$) 6_503 ($\textcolor{green}{-1.80\%}$)
Rust 593_655 629_046 59_348 39_106 74_039 43_504

Statistics

  • binary_size: 2.03% [-6.57%, 10.62%]
  • max_mem: no change
  • cycles: -12.28% [-22.66%, -1.90%]

Overall Statistics

  • binary_size: 1.01% [0.71%, 1.30%]
  • max_mem: -0.00%
  • cycles: -1.83% [-2.73%, -0.94%]

@github-actions
Copy link
Copy Markdown

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.
The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.
  • upgrade. Upgrade the canister with the same Wasm module. For non-stable benchmarks, the map state is persisted by serializing and deserializing states into stable memory. For stable benchmarks, the upgrade takes no cycles, as the state is already in the stable memory.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.
  • The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.
  • Different library has different ways for persisting data during upgrades, there are mainly three categories:
    • Use stable variable directly in Motoko: zhenya_hashmap, btree, vector
    • Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs
    • Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs
  • The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 196_203 8_206_679_241 56_000_256 339_059 6_475_651_249 364_721 10_762_547_991
triemap 201_637 13_763_528_194 68_228_576 252_802 657_949 648_383 15_613_982_698
orderedmap 199_971 5_958_270_613 36_000_524 120_192 287_622 326_637 4_511_038_595
rbtree 191_685 7_058_685_157 52_000_464 116_503 317_344 330_464 6_888_881_359
splay 196_744 13_116_527_298 48_000_400 625_897 657_150 920_358 4_342_200_463
btree 236_042 10_241_086_717 25_108_416 357_667 485_549 539_636 2_860_091_551
zhenya_hashmap 194_499 2_337_530_147 16_777_504 58_447 66_606 79_843 3_081_184_159
btreemap_rs 611_851 1_809_789_841 27_590_656 74_098 124_626 85_214 3_208_130_200
imrc_hashmap_rs 613_202 2_634_915_707 244_908_032 35_894 198_252 96_520 6_383_840_797
hashmap_rs 601_477 438_103_157 73_138_176 20_788 25_678 23_645 1_545_701_419

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50 upgrade
heap 173_713 5_571_564_835 24_000_360 621_886 227_339 592_744 3_163_816_019
heap_rs 596_953 143_262_451 18_284_544 58_563 21_622 58_466 647_923_463

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 180_076 2_633_692 65_652 96_182 819_834 173_682 3_148_596
vector 177_461 1_952_489 24_588 126_306 187_015 176_306 4_706_963
vec_rs 588_969 287_516 1_376_256 16_494 30_089 22_346 3_806_788

Stable structures

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
btreemap_rs 611_851 77_021_026 2_555_904 63_656 96_504 84_265 139_792_280
btreemap_stable_rs 616_876 4_773_834_814 2_031_616 2_893_685 5_266_123 8_870_300 729_405
heap_rs 596_953 7_230_201 2_293_760 50_652 21_870 50_383 33_581_842
heap_stable_rs 576_040 283_742_492 458_752 2_526_262 246_537 2_506_863 729_375
vec_rs 588_969 3_077_883 2_293_760 16_494 17_489 16_734 31_302_411
vec_stable_rs 572_835 63_993_021 458_752 66_549 80_266 85_639 729_377

Environment

  • dfx 0.25.0
  • Motoko compiler 0.13.7 (source 0r070mg1-rx9fyax2-n7yqcfm2-xqryza9p)
  • rustc 1.81.0 (eeb90cda1 2024-09-04)
  • ic-repl 0.7.6
  • ic-wasm 0.9.0

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

  • SHA-2 benchmarks
    • SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
    • account_id. Compute the ledger account id from principal, based on SHA-224.
    • neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
  • Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
    • generate 10k. Insert 10k 7-character word as both key and value into the certified map.
    • max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
    • inc. Increment a counter and insert the counter value into the map.
    • witness. Generate the root hash and a witness for the counter.
    • upgrade. Upgrade the canister with the same Wasm. In Motoko, we use stable variable. In Rust, we convert the tree to a vector before serialization.

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 199_441 266_971_573 247_077_677 33_989 24_814
Rust 596_836 82_782_948 56_788_520 42_522 41_228

Certified map

binary_size generate 10k max mem inc witness upgrade
Motoko 250_279 352_762_609 342_396 383_575 269_168 21_573_318
Rust 640_537 489_666_578 1_310_720 660_965 220_622 450_827_450

Environment

  • dfx 0.25.0
  • Motoko compiler 0.13.7 (source 0r070mg1-rx9fyax2-n7yqcfm2-xqryza9p)
  • rustc 1.81.0 (eeb90cda1 2024-09-04)
  • ic-repl 0.7.6
  • ic-wasm 0.9.0

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal upgrade
Motoko 282_582 512_488 22_924 18_605 19_997 152_786
Rust 902_362 516_184 92_673 118_753 113_669 1_499_571

DIP721 NFT

binary_size init mint_token transfer_token upgrade
Motoko 226_700 482_842 28_977 8_823 86_040
Rust 931_779 205_310 309_520 73_609 1_635_207

Environment

  • dfx 0.25.0
  • Motoko compiler 0.13.7 (source 0r070mg1-rx9fyax2-n7yqcfm2-xqryza9p)
  • rustc 1.81.0 (eeb90cda1 2024-09-04)
  • ic-repl 0.7.6
  • ic-wasm 0.9.0

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 142_343 22_236
Rust 26_684 1_201

Timer

binary_size setTimer cancelTimer
Motoko 150_627 50_009 4_781
Rust 554_248 64_790 12_216

Environment

  • dfx 0.25.0
  • Motoko compiler 0.13.7 (source 0r070mg1-rx9fyax2-n7yqcfm2-xqryza9p)
  • rustc 1.81.0 (eeb90cda1 2024-09-04)
  • ic-repl 0.7.6
  • ic-wasm 0.9.0

Motoko Specific Benchmarks

Measure various features only available in Motoko.

  • Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_heap_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.

    • default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
    • copying. Compile with --force-gc --copying-gc.
    • compacting. Compile with --force-gc --compacting-gc.
    • generational. Compile with --force-gc --generational-gc.
    • incremental. Compile with --force-gc --incremental-gc.
  • Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

generate 700k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_139_200_903 47_793_792 119 119 119
copying 1_139_200_785 47_793_792 1_138_938_356 1_139_023_441 1_138_939_831
compacting 1_544_780_593 47_793_792 1_115_364_819 1_425_910_550 1_458_704_611
generational 2_135_063_388 47_802_256 760_508_882 1_136_802 1_038_713
incremental 31_472_749 976_079_036 416_341_828 441_797_881 1_197_069_644

Actor class

binary size put new bucket put existing bucket get
Map 420_705 557_314 16_357 16_981

Environment

  • dfx 0.25.0
  • Motoko compiler 0.13.7 (source 0r070mg1-rx9fyax2-n7yqcfm2-xqryza9p)
  • rustc 1.81.0 (eeb90cda1 2024-09-04)
  • ic-repl 0.7.6
  • ic-wasm 0.9.0

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 171_414 151_117 26_730 11_196 21_536 6_503
Rust 593_655 629_046 59_348 39_106 74_039 43_504

Environment

  • dfx 0.25.0
  • Motoko compiler 0.13.7 (source 0r070mg1-rx9fyax2-n7yqcfm2-xqryza9p)
  • rustc 1.81.0 (eeb90cda1 2024-09-04)
  • ic-repl 0.7.6
  • ic-wasm 0.9.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant