Node Support for Aura -> Babe Runtime Upgrades#1927
Conversation
7d8246c to
483729e
Compare
eaba58d to
5d2d0e2
Compare
update Cargo.lock
9d0ea5d to
914030d
Compare
After the conversation with @gztensor we confirmed that we use "-subtensor-" for such cases. @liamaharon , please, rename the crates when you have time. |
3a7e6d4 to
8f2b47e
Compare
|
Thanks @gregzaitsev and @shamil-gadelshin, I've replaced the |
shamil-gadelshin
left a comment
There was a problem hiding this comment.
Thank you!
Consider merging the latest devnet-ready to counter conflicts created by the benchmarking bot.
|
Thanks @shamil-gadelshin , I have merged devnet-ready and also updated the Polkadot SDK fork to include your |
|
just some more conflicts, btw will prob want to hold off on deploying this until next week because we are doing 2 quick deploys this week that need to be low-risk |
25fd226 to
c93bebd
Compare
Phase 2 in #1887
Phase 2 additions:
-otfsuffix to subtensor crates that share the same name as a polkadot-sdk crate. This is to prevent name collisions when applying patches, and ensure we don't accidentally use the wrong crate.Update Aug 26
I discovered there is an issue warp syncing a chain that contains an Aura to Babe migration. The warp sync would succeed up until the first Babe block is mined, and then revert to a regular sync.
Root Cause
Polkadot SDK does not allow starting a service with warp sync if the database is in a partially synced state: https://github.com/opentensor/polkadot-sdk/blob/d13f915d8a1f55af53fd51fdb4544c47badddc7e/substrate/client/network/sync/src/strategy/warp.rs#L235-L252
In the initial implementation of this PR, while a node is warp syncing, it will restart part-way through the sync when it detects the first Babe block.
At the time of the service restart, the node has a partially synced database. This causes the previously referenced code to terminate the warp sync, and fall back to a regular sync.
Solution
To resolve this issue, we must warp sync the entire chain (Aura AND Babe blocks) without restarting the service.
This is achieved in two parts:
First, I prevented the Aura service switching to a Babe service while the node is syncing. This was achieved by adding new code here:
subtensor/node/src/consensus/aura_consensus.rs
Line 222 in a234096
Second, I added support for the node to import Babe blocks while running an Aura service. This was achieved by replacing the
AuraWrappedImportQueuewith aHybridImportQueuethat containsHybridBlockImportthat contains inner full implementations ofAuraBlockImportandBabeBlockImportHybridVerifierthat contains inner full implementations of theAuraVerifierandBabeVerifierimport_queuefunction that builds anImportQueueimplementation capable of completely importing both Aura and Babe blocks.The Aura service is required to construct a
BabeConfigurationto pass to the hybridimport_queue, so it can import the first Babe block it encounters. This required me to pull in some Babe runtime configuration from #1708 into this PR, specifically theBABE_GENESIS_EPOCH_CONFIGandEPOCH_DURATION_IN_BLOCKS.With these runtime constants, we are able to construct what our initial Babe configuration will be while running an Aura service:
subtensor/node/src/consensus/aura_consensus.rs
Lines 249 to 285 in a234096
Summary
With the two changes described in "Solution", when a node is warp syncing, it will warp sync the entire chain (all Aura and Babe blocks) and import the state entirely before it switches to running a Babe service. This resolves the issue root cause of the issue, which is that the node would restart mid-warp-sync.
It is important this phase is merged prior to phase 3, so node operators have time to upgrade in advance of the runtime upgrade.
Steps to simulate Aura -> Babe migration with finney state
$ git checkout node-decentralization $ cargo b -r -p node-subtensor && cp ./target/release/wbuild/node-subtensor-runtime/node_subtensor_runtime.compact.compressed.wasm ./babe-npos.wasm$ git checkout hybrid-node $ rm ./target/release/node-subtensor && cargo b -r -p node-subtensor$ cd ../baedeker-for-subtensor $ ./localnet-baedeker.shQA Checklist
--initial-consensus auragracefully switches to Babe post-upgrade--initial-consensus babegracefully switches to Aura pre-upgradedevnet-readystate/runtimehybrid-nodestate/runtimebaedeker-finneystate/runtimedevnet-readystate/runtimehybrid-nodestate/runtimedevnet-readystate/runtimehybrid-nodestate/runtimebaedeker-finneystate/runtimedevnet-readyhybrid-nodedevnet-readystate/runtimehybrid-nodestate/runtimebaedeker-finneystate/runtime