benchmark parachain and standalone chain

based on my review of previous discussion between Alan S, Basti and Sergei in Element's Parachain Technical room, Alan S shared how he profiled their parachain block authority execution time for benchmarking and stack analysis with trace debugging as follows:

**profiled a parachain's block authority execution time for benchmarking and stack analysis with trace debugging**
* run your node using flags ` --dev`, `-lsync=trace`, `-lsub-libp2p=trace`
* run `perf record -F 999 -p <pid_of_your_node> --call-graph dwarf`
* wait for the block to be produced by your node and then Ctrl+C to stop the perf (you can keep the node running to repeat later)
* get the perf script `perf script --no-inline > perf.script.data`
* open it at https://www.speedscope.app to view execution (i.e. perf.basti-cache-runtime-fix.data from PR #9611 shared in Element's "Parachain Technical" room)


they were using the default cumulus authorship deadline is 500ms (i.e. 12000*(1/24) = SLOT_DURATION * block_proposal_slot_portion), where SLOT_DURATION equals their MILLISECS_PER_BLOCK.

but for the DataHighway's Westlake, we're currently using 4320 for MILLISECS_PER_BLOCK, so our slot duration is much less at 180ms, so maybe we need to change it to the following (i.e. 500/4230 and 750/4320 if we want 500ms as our cumulus authorship deadline too

```
// We got around 500ms for proposing
block_proposal_slot_portion: SlotProportion::new(1f32 / 8f32),
// And a maximum of 750ms if slots are skipped
max_block_proposal_slot_portion: Some(SlotProportion::new(1f32 / 6f32)),
```

Note that in the polkadot repo https://github.com/paritytech/polkadot, both millau and rialto are using 6000 for MILLISECS_PER_BLOCK, and they are using `block_proposal_slot_portion: SlotProportion::new(2f32 / 3f32),` and `max_block_proposal_slot_portion: None,`

Alan S they discovered that their 500ms was split up as follows:

500ms - parachain block authoring
	140ms - reserved for initialization/finalization (i.e. sc_basic_authorship::basic_authorship)
		65% - block production (i.e. including verifying extrinsic signatures for inclusion)
		35% - block finalization
	360ms - applying extrinsics and overhead (`apply_extrinsic`)
		25% - overhead retrieving runtime_code() from storage cached (i.e. sc_client_db::storage_cache) runtime_code() (only if there is no new runtime code, otherwise fetch it from TrieBackend)
		50% - overhead of runtime_code() execution blake2 related before each extrinsic is applied `apply_extrinsic_call_at...contextual_call/runtime_code with blake2` (when running node with `--dev` there isn't this overhead)
		25% - apply extrinsics `extrinsic.check` (i.e. ecdsa signature verification) (requires ~100ms for 100 extrinsics using `system::remark`)

then Basti created this PR https://github.com/paritytech/substrate/pull/9611 that resulted in an improvement with basic extrinsics from 180tx/block max to 450tx/block

i believe we need to:

* profile our parachain using `perf` as mentioned previously with the kinds of extrinsics we'll be using to undertake benchmarking and stack analysis of the block authoring execution time, and use trace debugging to determine whether we need to:
  * increase the block proposal cumulus deadline (i.e. `block_proposal_slot_portion`) to compensate for production overhead (see https://github.com/paritytech/substrate/pull/9611 that increased the amount of transactions per block by ~3x)
  * re-evaluate the `ExtrinsicBaseWeight` we are using in the fork of Substrate that we are using as dependencies
  * check whether we need to change the leniency strategy used by the block_proposal_slot_portion in the fork of Susbtrate we are using as dependencies (i.e. change from `Exponential` to `Linear` for `sc_consensus_slots::SlotLenienceType` in `sc_consensus_slots::proposing_remaining_duration`
  * learn about benchmarking and apply it https://substrate.dev/docs/en/knowledgebase/runtime/benchmarking

note: some user mentioned that "transactions take progressively longer the later they go into a block in a linear way"


here are extracts of relevant parts of codebases that we should consider in possible changes in our 'ilya/parachain-update' branch: 

* extract from https://github.com/substrate-developer-hub/substrate-parachain-template

```
pub const MILLISECS_PER_BLOCK: u64 = 12000;
pub const SLOT_DURATION: u64 = MILLISECS_PER_BLOCK;

// We got around 500ms for proposing
block_proposal_slot_portion: SlotProportion::new(1f32 / 24f32),
// And a maximum of 750ms if slots are skipped
max_block_proposal_slot_portion: Some(SlotProportion::new(1f32 / 16f32)),

...

/// We assume that ~10% of the block weight is consumed by `on_initalize` handlers.
/// This is used to limit the maximal weight of a single extrinsic.
const AVERAGE_ON_INITIALIZE_RATIO: Perbill = Perbill::from_percent(10);
/// We allow `Normal` extrinsics to fill up the block up to 75%, the rest can be used
/// by  Operational  extrinsics.
const NORMAL_DISPATCH_RATIO: Perbill = Perbill::from_percent(75);
/// We allow for 0.5 of a second of compute with a 12 second average block time.
const MAXIMUM_BLOCK_WEIGHT: Weight = WEIGHT_PER_SECOND / 2;
```

* extract from https://github.com/paritytech/substrate

```
pub const WEIGHT_PER_SECOND: Weight = 1_000_000_000_000;
pub const WEIGHT_PER_MILLIS: Weight = WEIGHT_PER_SECOND / 1000; // 1_000_000_000
pub const WEIGHT_PER_MICROS: Weight = WEIGHT_PER_MILLIS / 1000; // 1_000_000

/// Executing 10,000 System remarks (no-op) txs takes ~1.26 seconds -> ~125 µs per tx
pub const ExtrinsicBaseWeight: Weight = 125 * WEIGHT_PER_MICROS;
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark parachain and standalone chain #232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

benchmark parachain and standalone chain #232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions