Skip to content

Conversation

@saeeddawod
Copy link
Contributor

@saeeddawod saeeddawod commented Dec 1, 2025

Summary

This PR fixes a critical bug that prevents successful IBFT2 → QBFT consensus migration in Besu. During migration, nodes would stall and fail to produce blocks because they could only speak one BFT wire protocol instead of both.

Problem

When running a consensus migration from IBFT2 to QBFT using a transitions configuration in genesis, the network would randomly stall. The migration worked sometimes by luck, but failed other times.

Root Cause

In ConsensusScheduleBesuControllerBuilder.createSubProtocolConfiguration(), the original code was:

return besuControllerBuilderSchedule
    .get(besuControllerBuilderSchedule.keySet().stream().skip(1).findFirst().orElseThrow())
    .createSubProtocolConfiguration(ethProtocolManager, maybeSnapProtocolManager);

Issues with this approach:

  1. Non-deterministic ordering: besuControllerBuilderSchedule is a HashMap, so keySet().stream().skip(1) produces unpredictable results depending on hash ordering
  2. Only one protocol registered: Only the sub-protocol configuration from ONE consensus mechanism was registered
  3. Random failures: If HashMap ordering picked QBFT protocol while consensus was still IBFT2, nodes couldn't exchange consensus messages → network stalls. Sometimes by luck it would pick the matching protocol and work.

Solution

The fix registers all BFT sub-protocols from all scheduled consensus mechanisms:

  • IBF/1 (IBFT2) - for pre-migration communication
  • istanbul/100 (QBFT) - for post-migration communication

This allows nodes to communicate using both protocols during the transition period. The fix:

  1. Iterates through all consensus builders (sorted by block number for determinism)
  2. Merges their sub-protocol configurations
  3. Avoids duplicates (e.g., EthProtocol which is common to both)

Testing

Migration Test Results

Phase Block Consensus Status
Pre-fork #0-49 IBFT2 ✅ IbftBesuControllerBuilder
Fork #50 QBFT ✅ QbftBesuControllerBuilder
Post-fork #50-100+ QBFT ✅ Continuous block production

Log Evidence

# Pre-fork (IBFT2)
IbftBesuControllerBuilder | Produced #49 / 0 tx / ...

# Fork block (QBFT takes over)  
QbftBesuControllerBuilder | Imported empty block #50 / 0 tx / ...

# Post-fork (QBFT continues)
QbftBesuControllerBuilder | Produced empty block #73 / 0 tx / ...

Checklist

  • Checked out contribution guidelines
  • Considered documentation - migration docs may need update
  • Considered changelog - This is a bug fix for consensus migration
  • No database changes

Local Tests

  • Manual migration test (IBFT2 → QBFT) - PASSED
  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest

Signed-off-by: saeeddawod <saeed.dawod@gmail.com>
@saeeddawod saeeddawod force-pushed the fix/consensus-migration-protocol-selection branch from c0f000d to aff5bb4 Compare December 1, 2025 11:47
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saeeddawod can you add a changelog entry since this is a bugfix?

Signed-off-by: saeeddawod <saeed.dawod@gmail.com>
@saeeddawod
Copy link
Contributor Author

@saeeddawod can you add a changelog entry since this is a bugfix?

@macfarla done !

@github-project-automation github-project-automation bot moved this to Backlog in RC 25.12.0 Dec 4, 2025
@macfarla macfarla moved this from Backlog to In review in RC 25.12.0 Dec 4, 2025
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@macfarla
Copy link
Contributor

macfarla commented Dec 5, 2025

@saeeddawod spotless check is failing, and I can't push to your branch to fix it - can you run spotlessApply gradle target. Assigning back to you.

@macfarla macfarla marked this pull request as draft December 5, 2025 01:25
@macfarla macfarla moved this from In review to In progress in RC 25.12.0 Dec 5, 2025
@macfarla macfarla moved this from In progress to CFI in RC 25.12.0 Dec 5, 2025
Signed-off-by: saeeddawod <saeed.dawod@gmail.com>
@saeeddawod saeeddawod force-pushed the fix/consensus-migration-protocol-selection branch from a92af05 to f80d25d Compare December 5, 2025 05:03
@saeeddawod
Copy link
Contributor Author

Thanks @jframe @macfarla Addressed the review:

  • Fixed imports (HashSet, Set instead of fully qualified names)
  • Added createsSubProtocolConfigurationWithAllSubProtocols() test
  • Ran spotlessApply

@saeeddawod saeeddawod marked this pull request as ready for review December 5, 2025 05:11
@jframe jframe enabled auto-merge (squash) December 5, 2025 05:37
@jframe jframe merged commit 108024b into hyperledger:main Dec 5, 2025
46 checks passed
@github-project-automation github-project-automation bot moved this from CFI to Done in RC 25.12.0 Dec 5, 2025
AliZDev-v0 pushed a commit to AliZDev-v0/besu that referenced this pull request Dec 10, 2025
…tion (hyperledger#9516)

* Fix: Ensure deterministic sub-protocol registration in ConsensusSchedule

Signed-off-by: saeeddawod <saeed.dawod@gmail.com>
Signed-off-by: Ali Zhagparov <alijakparov.kz@gmail.com>
pinges pushed a commit to pinges/besu that referenced this pull request Dec 15, 2025
…tion (hyperledger#9516)

* Fix: Ensure deterministic sub-protocol registration in ConsensusSchedule

Signed-off-by: saeeddawod <saeed.dawod@gmail.com>
Signed-off-by: stefan.pingel@consensys.net <stefan.pingel@consensys.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants