Skip to content

benchmark parachain and standalone chain #232

@ltfschoen

Description

@ltfschoen

based on my review of previous discussion between Alan S, Basti and Sergei in Element's Parachain Technical room, Alan S shared how he profiled their parachain block authority execution time for benchmarking and stack analysis with trace debugging as follows:

profiled a parachain's block authority execution time for benchmarking and stack analysis with trace debugging

  • run your node using flags --dev, -lsync=trace, -lsub-libp2p=trace
  • run perf record -F 999 -p <pid_of_your_node> --call-graph dwarf
  • wait for the block to be produced by your node and then Ctrl+C to stop the perf (you can keep the node running to repeat later)
  • get the perf script perf script --no-inline > perf.script.data
  • open it at https://www.speedscope.app to view execution (i.e. perf.basti-cache-runtime-fix.data from PR #9611 shared in Element's "Parachain Technical" room)

they were using the default cumulus authorship deadline is 500ms (i.e. 12000*(1/24) = SLOT_DURATION * block_proposal_slot_portion), where SLOT_DURATION equals their MILLISECS_PER_BLOCK.

but for the DataHighway's Westlake, we're currently using 4320 for MILLISECS_PER_BLOCK, so our slot duration is much less at 180ms, so maybe we need to change it to the following (i.e. 500/4230 and 750/4320 if we want 500ms as our cumulus authorship deadline too

// We got around 500ms for proposing
block_proposal_slot_portion: SlotProportion::new(1f32 / 8f32),
// And a maximum of 750ms if slots are skipped
max_block_proposal_slot_portion: Some(SlotProportion::new(1f32 / 6f32)),

Note that in the polkadot repo https://github.com/paritytech/polkadot, both millau and rialto are using 6000 for MILLISECS_PER_BLOCK, and they are using block_proposal_slot_portion: SlotProportion::new(2f32 / 3f32), and max_block_proposal_slot_portion: None,

Alan S they discovered that their 500ms was split up as follows:

500ms - parachain block authoring
140ms - reserved for initialization/finalization (i.e. sc_basic_authorship::basic_authorship)
65% - block production (i.e. including verifying extrinsic signatures for inclusion)
35% - block finalization
360ms - applying extrinsics and overhead (apply_extrinsic)
25% - overhead retrieving runtime_code() from storage cached (i.e. sc_client_db::storage_cache) runtime_code() (only if there is no new runtime code, otherwise fetch it from TrieBackend)
50% - overhead of runtime_code() execution blake2 related before each extrinsic is applied apply_extrinsic_call_at...contextual_call/runtime_code with blake2 (when running node with --dev there isn't this overhead)
25% - apply extrinsics extrinsic.check (i.e. ecdsa signature verification) (requires ~100ms for 100 extrinsics using system::remark)

then Basti created this PR paritytech/substrate#9611 that resulted in an improvement with basic extrinsics from 180tx/block max to 450tx/block

i believe we need to:

  • profile our parachain using perf as mentioned previously with the kinds of extrinsics we'll be using to undertake benchmarking and stack analysis of the block authoring execution time, and use trace debugging to determine whether we need to:

note: some user mentioned that "transactions take progressively longer the later they go into a block in a linear way"

here are extracts of relevant parts of codebases that we should consider in possible changes in our 'ilya/parachain-update' branch:

pub const MILLISECS_PER_BLOCK: u64 = 12000;
pub const SLOT_DURATION: u64 = MILLISECS_PER_BLOCK;

// We got around 500ms for proposing
block_proposal_slot_portion: SlotProportion::new(1f32 / 24f32),
// And a maximum of 750ms if slots are skipped
max_block_proposal_slot_portion: Some(SlotProportion::new(1f32 / 16f32)),

...

/// We assume that ~10% of the block weight is consumed by `on_initalize` handlers.
/// This is used to limit the maximal weight of a single extrinsic.
const AVERAGE_ON_INITIALIZE_RATIO: Perbill = Perbill::from_percent(10);
/// We allow `Normal` extrinsics to fill up the block up to 75%, the rest can be used
/// by  Operational  extrinsics.
const NORMAL_DISPATCH_RATIO: Perbill = Perbill::from_percent(75);
/// We allow for 0.5 of a second of compute with a 12 second average block time.
const MAXIMUM_BLOCK_WEIGHT: Weight = WEIGHT_PER_SECOND / 2;
pub const WEIGHT_PER_SECOND: Weight = 1_000_000_000_000;
pub const WEIGHT_PER_MILLIS: Weight = WEIGHT_PER_SECOND / 1000; // 1_000_000_000
pub const WEIGHT_PER_MICROS: Weight = WEIGHT_PER_MILLIS / 1000; // 1_000_000

/// Executing 10,000 System remarks (no-op) txs takes ~1.26 seconds -> ~125 µs per tx
pub const ExtrinsicBaseWeight: Weight = 125 * WEIGHT_PER_MICROS;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions