Skip to content

Add exponential backoff config to AURA #458

@sam0x17

Description

@sam0x17

Right now major migrations in subtensor are perilous. Under AURA consensus, the validators use a round-robin system to pick a validator each time to try to complete the migration. If they fail to do so within the 12 second time limit, a new validator is selected and the process continues. For migrations like the recent 1.0 upgrade, where the migration itself generally always takes more than 12 seconds, this will cause huge delays as the validators have to basically partially complete pieces of the migration, gossip those blocks to each other, and eventually randomly cobble together a complete version of the migration before finalization can continue, which can take hours.

Instead it would be much better if with each successive failing round, the 12 second time limit is increased by some scaling factor like 1.2x so that eventually the time limit will be long enough to complete any migration.

Presumably AURA already has some backoff setting that may or may not do what I describe above that simply needs to be turned on. We should definitely turn this on if so.

AC:

  • find out whether AURA's backoff setting does what we want
  • if it does, turn that on, if not, implement something that does something like this where successive round-robin failures result in higher and higher time limits using some fixed scaling factor.
  • profit?

Metadata

Metadata

Assignees

Labels

blue teamdefensive programming, CI, etc

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions