feat(flashblocks): shift atomic gossip op, wspub on seq after p2p gossip by sieniven · Pull Request #210 · okx/xlayer-reth

sieniven · 2026-03-20T05:22:33Z

Summary

Fixes the websocket publishing logic on the flashblocks builder into the p2p node service instead. This ensures zero re-org on subscribed flashblocks RPC nodes by ensuring p2p gossip to follower sequencers is a blocking successful operation first before gossiping the flashblocks to RPC nodes.

This allows for atomicity on flashblocks replay during sequencer switches / failures of the leader sequencer, to replay consistent gossiped flashblocks across RPC nodes. This ensures flashblocks re-org is guaranteed protected on RPC nodes.

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sieniven · 2026-03-22T05:58:28Z

@claude review this PR

github-actions · 2026-03-22T05:58:47Z

Claude finished @sieniven's task in 3m 31s —— View job

PR Review: feat(flashblocks): shift atomic gossip op, wspub on seq after p2p gossip

Gather context and understand the PR
Read changed files
Analyze logic correctness and atomicity guarantees
Check for bugs, edge cases, and error handling
Provide review feedback

Summary

Good architectural direction — removing ws_pub from the builder and consolidating publish responsibility into the p2p Node simplifies the builder and keeps WS publishing coordinated with p2p gossip. The Message trait → concrete Message enum refactor also reduces unnecessary generics. However, there are several issues worth addressing:

🔴 Critical: WS publishes before p2p gossip in `Node::run()` — opposite of the stated guarantee

crates/builder/src/p2p/mod.rs, lines ~157–162:

if let Message::OpFlashblockPayload(ref fb_payload) = message {
    let flashblock_byte_size = ws_pub.publish(fb_payload)?;   // ← WS first
    metrics.flashblock_byte_size_histogram.record(flashblock_byte_size as f64);
}
if let Err(e) = outgoing_streams_handler.broadcast_message(message).await {  // ← p2p second
    warn!(target: "flashblocks-p2p", "...");
}

The PR description states: "p2p gossip to follower sequencers is a blocking successful operation first before gossiping the flashblocks to RPC nodes."

But the code publishes to WebSocket subscribers before broadcast_message() to peers. This means the leader's own WS subscribers (RPC nodes) receive the flashblock before follower sequencers do, which is the exact re-org risk the PR claims to fix. The broadcast_message() call should come first, and ws_pub.publish() only after it returns (or at least returns without error).

Fix this →

🟡 Medium: `ws_pub.publish()?` kills the p2p node if broadcast channel has no receivers

crates/builder/src/p2p/mod.rs:

let flashblock_byte_size = ws_pub.publish(fb_payload)?;

WebSocketPublisher::publish() calls self.pipe.send(utf8_bytes) (a broadcast::Sender), which returns Err when there are no active receivers. The initial receiver created in WebSocketPublisher::new() is held by the listener_loop task, so in normal operation this is unlikely to fire. However, if the listener_loop task ever panics or exits unexpectedly, subsequent calls to ws_pub.publish() will kill the p2p node entirely via ?.

The previous code in builder.rs handled this with map_err(PayloadBuilderError::other)?, which only terminated the current build job (restartable), not the p2p node service. The handler's follower path (in handler.rs) correctly uses if let Err(e) = ws_pub.publish(...) { warn!(...) }. The Node path should follow the same pattern.

Fix this →

🟡 Medium: `flashblock_byte_size_histogram` metric silently drops when p2p is disabled

crates/builder/src/flashblocks/builder.rs (removed) vs crates/builder/src/p2p/mod.rs (added):

Previously, ctx.metrics.flashblock_byte_size_histogram.record(...) was called directly in the builder — always, regardless of p2p state. Now it's only recorded inside Node::run(), which only runs when p2p_enabled = true. Deployments with p2p_enabled = false will silently lose this metric after this change.

🟢 Good: behavior fix for `no_tx_pool` path

crates/builder/src/flashblocks/builder.rs:

Previously, built_fb_payload_tx.try_send(fb_payload) for the initial flashblock was guarded only by !rebuild_external_payload, meaning syncing FCUs (where no_tx_pool=true) would still trigger p2p gossip. The new code correctly gates it with !no_tx_pool && !rebuild_external_payload, consistent with the comment "not emitting flashblock if no_tx_pool in FCU, it's just syncing". This is a correct fix.

🟢 Good: `Message` trait → enum refactor

Removing the Message generic trait in favor of a concrete Message enum in types.rs is the right call for an internal protocol — reduces boilerplate, removes the generic parameter from Node<M>, and makes deserialization straightforward (serde_json::from_str::<Message>). The types.rs module is well-organized.

Minor: `Default` removed for `NodeBuilder`

Since NodeBuilder::new() now requires ws_pub and metrics, removing Default is correct. Worth verifying no external call sites use NodeBuilder::default() (doesn't appear to be any in the diff, but worth a quick grep if not already done).

Test coverage note

The test in p2p/mod.rs now uses OpFlashblockPayload::default() as the round-trip message and binds WebSocketPublisher to 127.0.0.1:0. Consider adding a test case that verifies the WS publish is called (or not) at the right point relative to p2p gossip, once the ordering fix is applied.

TL;DR: The main concern is the WS-before-p2p ordering in Node::run(), which inverts the core atomicity guarantee the PR is trying to establish. Suggest swapping ws_pub.publish() to run after broadcast_message(), and converting the ? to a warn! to avoid killing the p2p node on publish failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… flag Rename p2p/ module to broadcast/ to better reflect its dual role (p2p gossip + WS publishing). Move wspub.rs into broadcast/, remove the p2p_enabled config flag so the broadcast node always runs, fix WS publish ordering to run after p2p gossip, and improve error handling so ws_pub failures warn instead of crashing the node. Update stale comments referencing the removed p2p_enabled flag. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sieniven

@claude please help to review again, check that all your previous mentioned points have been resolved

sieniven · 2026-03-23T09:01:36Z

@claude please help to review again, check that all your previous mentioned points have been resolved

github-actions · 2026-03-23T09:01:51Z

Claude finished @sieniven's task in 2m 13s —— View job

PR Re-Review: feat(flashblocks): shift atomic gossip op, wspub on seq after p2p gossip

Gather context and read changed files
Check: WS publish ordering vs p2p gossip (🔴 Critical)
Check: ws_pub.publish()? error propagation (🟡 Medium)
Check: flashblock_byte_size_histogram metric with p2p disabled (🟡 Medium)
Check: Minor notes (NodeBuilder Default, test coverage)
Provide updated review feedback

All three previously raised issues have been resolved. Details below.

✅ Fixed: WS publish ordering (was 🔴 Critical)

crates/builder/src/broadcast/mod.rs, lines 169–181:

if let Err(e) = outgoing_streams_handler.broadcast_message(message.clone()).await {
    warn!(...)  // p2p first
}
if let Message::OpFlashblockPayload(ref fb_payload) = message {
    match ws_pub.publish(fb_payload) { ... }  // WS second
}

broadcast_message (p2p gossip) now runs before ws_pub.publish (WS). The ordering now matches the stated atomicity guarantee. ✅

✅ Fixed: `ws_pub.publish()?` error propagation (was 🟡 Medium)

The ? is replaced with a match that logs a warn! on failure — consistent with the follower path in handler.rs. The p2p node will no longer terminate if the WS broadcast channel has no receivers. ✅

✅ Fixed: Metric coverage when p2p disabled (was 🟡 Medium)

The p2p_enabled flag has been removed from args/op.rs. The broadcast node is now always spawned unconditionally in service.rs, so flashblock_byte_size_histogram is always recorded via Node::run(). ✅

✅ Resolved: `NodeBuilder::default()` call sites

No usages of NodeBuilder::default() exist in the codebase. service.rs correctly uses NodeBuilder::new(ws_pub, metrics). ✅

Data flow looks correct end-to-end

Leader path: builder → built_fb_payload_tx → handler built_fb_payload_rx → p2p_tx → Node::run() outgoing_message_rx → p2p gossip first → WS publish second ✅
Follower path: Node::run() incoming stream → handler p2p_rx → cache update → WS publish ✅

No double-publishing between the two paths (outgoing vs. incoming are separate channels). ✅

Open (non-blocking): Test coverage

The test in broadcast/mod.rs validates node connectivity and message round-tripping. A dedicated test asserting WS publish is called after p2p gossip would make the ordering guarantee explicit and regression-proof, but this is not a blocker.

Summary: All three critical/medium issues from the previous review are fully addressed. The PR achieves its stated goal — p2p gossip to follower sequencers runs as a blocking operation before WS publish to RPC subscribers. LGTM.

feat(flashblocks): shift atomic gossip op, wspub on seq after p2p gossip

039d7e7

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sieniven and others added 2 commits March 23, 2026 15:22

fix(flashblocks): publish to websocket after p2p gossip broadcast

5ef3f42

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sieniven commented Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(flashblocks): shift atomic gossip op, wspub on seq after p2p gossip#210

feat(flashblocks): shift atomic gossip op, wspub on seq after p2p gossip#210
sieniven wants to merge 3 commits intomainfrom
niven/flashblocks-no-reorg

sieniven commented Mar 20, 2026

Uh oh!

sieniven commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

sieniven left a comment

Uh oh!

sieniven commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sieniven commented Mar 20, 2026

Summary

Uh oh!

sieniven commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(flashblocks): shift atomic gossip op, wspub on seq after p2p gossip

Summary

🔴 Critical: WS publishes before p2p gossip in Node::run() — opposite of the stated guarantee

🟡 Medium: ws_pub.publish()? kills the p2p node if broadcast channel has no receivers

🟡 Medium: flashblock_byte_size_histogram metric silently drops when p2p is disabled

🟢 Good: behavior fix for no_tx_pool path

🟢 Good: Message trait → enum refactor

Minor: Default removed for NodeBuilder

Test coverage note

Uh oh!

sieniven left a comment

Choose a reason for hiding this comment

Uh oh!

sieniven commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Re-Review: feat(flashblocks): shift atomic gossip op, wspub on seq after p2p gossip

✅ Fixed: WS publish ordering (was 🔴 Critical)

✅ Fixed: ws_pub.publish()? error propagation (was 🟡 Medium)

✅ Fixed: Metric coverage when p2p disabled (was 🟡 Medium)

✅ Resolved: NodeBuilder::default() call sites

Data flow looks correct end-to-end

Open (non-blocking): Test coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 22, 2026 •

edited

Loading

🔴 Critical: WS publishes before p2p gossip in `Node::run()` — opposite of the stated guarantee

🟡 Medium: `ws_pub.publish()?` kills the p2p node if broadcast channel has no receivers

🟡 Medium: `flashblock_byte_size_histogram` metric silently drops when p2p is disabled

🟢 Good: behavior fix for `no_tx_pool` path

🟢 Good: `Message` trait → enum refactor

Minor: `Default` removed for `NodeBuilder`

github-actions bot commented Mar 23, 2026 •

edited

Loading

✅ Fixed: `ws_pub.publish()?` error propagation (was 🟡 Medium)

✅ Resolved: `NodeBuilder::default()` call sites