Conversation
|
/benchmark runtime pallet pallet_contracts |
|
Benchmark Runtime Pallet for branch "at-bump-wasmi" with command cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic="*" --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs Toolchain: stable-x86_64-unknown-linux-gnu (default) Results |
Why is this only a problem with It is trivial to create a thin wrapper around this struct in case you want to mimic the old behavior. To answer your question: You have 2 options:
Prefer 2) if you are operating in a hot loop, otherwise it won't matter too much what you do there. If you already do have a reusable byte buffer for other common operations you could start using it here. |
|
It isn't a problem, really. It was just a convenience API to save myself this |
…path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs
|
Should we run a burn-in just in case? |
|
I put it on ice because it turns out that we have a performance regression. While the weights with regards to instantiation time and host functions go down we have almost 50% regression for instruction weights. @Robbepop confirmed that with a coremark benchmark just now. burn-in is a good idea after we resolved the regression. |
|
Turns out this is more complex than stating that I tested [profile.release]
lto = false # or "fat"
codegen-units = 16 # or 1The Wasm
Summary
For more information about the huge difference in |
|
Runtime uses |
I'd welcome that we at least run benchmarks in order to see what is really happening. |
|
What do you mean? Weights are the benchmark. |
Sorry, i mean benchmarking wasmi version 0.10.0 instead of version 0.11.0. With this we can better locate the slowdown. |
|
/benchmark runtime pallet pallet_contracts |
|
Benchmark Runtime Pallet for branch "at-bump-wasmi" with command cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic="*" --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs Toolchain: stable-x86_64-unknown-linux-gnu (default) Results |
|
I am currently running a benchmark here with |
…path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs
|
With |
|
That's great news! Especially for the upcoming Ideally we'd check if using |
| let mut release_profile = Table::new(); | ||
| release_profile.insert("panic".into(), "abort".into()); | ||
| release_profile.insert("lto".into(), true.into()); | ||
| release_profile.insert("codegen-units".into(), 1.into()); |
There was a problem hiding this comment.
Do you think it makes sense to have a production profile here as to reduce compile time for the normal dev-flow?
Polkadot opted for it, and Substrate in general as well.
There was a problem hiding this comment.
Yeah this would make sense. This is just an ad-hoc commit that will not be merged. However, it is already build with lto for every release build. Do you think cgu=1 is so much slower that it is worth an extra profile?
There was a problem hiding this comment.
I just tested it on my laptop, the compile time was always ~25s.
So it probably makes no difference for small binaries and is not worth the effort here.
There was a problem hiding this comment.
I did some benchmarks in #1262, it does make things significantly slower (tested node-runtime and Polkadot compilation there).
Also note that release_profile.insert("lto".into(), true.into()); is -C lto, not -C lto=fat!
There was a problem hiding this comment.
to=true and lto="fat" are the same: https://doc.rust-lang.org/cargo/reference/profiles.html#lto
There was a problem hiding this comment.
I reproducibly get 2 different sizes when switching between -C lto and -C lto=fat. Created zulip thread about this: https://rust-lang.zulipchat.com/#narrow/stream/122652-new-members/topic/-C.20lto.20and.20-C.20lto-fat.20are.20different.3F
There was a problem hiding this comment.
I just tested wasmi with lto="fat" and lto=true and I received the same outputs everytime. In some cases Cargo did not even bother to recompile the crate. So I cannot confirm your observations unfortunately.
There was a problem hiding this comment.
I was adding .append_to_rust_flags("-C lto=fat") to build.rs, I guess you changed those lines above instead? Not sure how there could be any difference though.
There was a problem hiding this comment.
Yes I just changed the lines that @athei also changed.
There was a problem hiding this comment.
If you append it to the build flags in the build.rs you will have essentially both: true in the profile (set by the substrate-wasm-builder as per my patch) and then the hard coded flags (as set by you). Not sure why this makes a difference. Unless, the substrate-wasm-builder uses its dev profile 😱
|
I compared the benchmark results of this PR with the ones of #10709. With We merge this as soon as the runtime is build with those flags for production/benchmarks @ggwpez and we ran a burn in. |
This reverts commit fd792e7.
|
/benchmark runtime pallet pallet_contracts |
|
Benchmark Runtime Pallet for branch "at-bump-wasmi" with command cargo run --quiet --profile=production --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic="*" --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs Toolchain: stable-x86_64-unknown-linux-gnu (default) Results |
There was a problem hiding this comment.
Is there a reason that we use panic = "abort" for both release and dev profiles but not for production profile? Defaults to panic = "unwind".
There was a problem hiding this comment.
Offtopic: The tons of .into() calls kinda irk me, we should really think about changing the API of Table::insert to take a generic parameter T: Into<WhateverThisIs>.
There was a problem hiding this comment.
production inherits from release. It uses abort implicitly. We know for sure because we would get a build error if we'd try to build a runtime with unwind support (we don't implement unwind support).
There was a problem hiding this comment.
Ah so overriding profile.release also overrides all inherited profiles. Makes sense.
|
#10747 is merged. Runtime is build with |
…--manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs
athei
left a comment
There was a problem hiding this comment.
Inspected the weights:
- Setup weights went up quite a bit.
- Load/Store instructions almost doubled in weight.
- Other instructions stayed the same or improved slightly
| fn instr_i64load(r: u32, ) -> Weight { | ||
| (74_212_000 as Weight) | ||
| // Standard Error: 0 | ||
| .saturating_add((1_325_000 as Weight).saturating_mul(r as Weight)) | ||
| (118_782_000 as Weight) | ||
| // Standard Error: 2_000 | ||
| .saturating_add((2_711_000 as Weight).saturating_mul(r as Weight)) | ||
| } | ||
| fn instr_i64store(r: u32, ) -> Weight { | ||
| (74_505_000 as Weight) | ||
| // Standard Error: 1_000 | ||
| .saturating_add((1_423_000 as Weight).saturating_mul(r as Weight)) | ||
| (118_311_000 as Weight) | ||
| // Standard Error: 5_000 | ||
| .saturating_add((2_378_000 as Weight).saturating_mul(r as Weight)) | ||
| } |
There was a problem hiding this comment.
Load and store instruction performance almost halved (look at r).
| // Storage: System Account (r:1 w:1) | ||
| fn call() -> Weight { | ||
| (140_169_000 as Weight) | ||
| (187_520_000 as Weight) |
There was a problem hiding this comment.
General call overhead increased quite a bit. Most likely setting up a new module and instance just takes longer in the new version. Do we have a benchmark for this in wasmi (creating a new instance) @Robbepop ?
There was a problem hiding this comment.
yes there are benchmarks for checking the efficiency of module instantiation.
| (210_420_000 as Weight) | ||
| // Standard Error: 133_000 | ||
| .saturating_add((145_601_000 as Weight).saturating_mul(c as Weight)) | ||
| // Standard Error: 8_000 | ||
| .saturating_add((1_760_000 as Weight).saturating_mul(s as Weight)) | ||
| (312_769_000 as Weight) | ||
| // Standard Error: 122_000 | ||
| .saturating_add((142_000_000 as Weight).saturating_mul(c as Weight)) | ||
| // Standard Error: 7_000 | ||
| .saturating_add((1_798_000 as Weight).saturating_mul(s as Weight)) |
There was a problem hiding this comment.
The overhead of instantiating per code byte seems to stay the same. General overhead goes up. I assume setting up the wasm instance is slower now.
|
Performance regressed too much. We wait for the new wasmi_v1 to try again. |
Only minor changes in wasmi. Most importantly we got rid of that
corefeature which allows us to make thesp-sandboxmanifest much nicer.@Robbepop The deprecation of
memory.getdoesn't feel good. It just requires the user to introduce the tedious extra step of allocating a buffer in the common case where we want to allow a dynamically sized buffer. Or how else would be deal with this code:substrate/client/executor/wasmi/src/lib.rs
Lines 175 to 182 in 33c518e
Why didn't the dependabot pick this up?