Skip to content

client: reconcile_topic_map silently invents ChannelIds on parse failure (same split-brain shape as #115) #141

@intendednull

Description

@intendednull

Summary

crates/client/src/lib.rs:287-289 in reconcile_topic_map still uses the same silent-split-brain pattern that issue #115 fixed in accept_invite:

let cid = willow_channel::ChannelId(
    uuid::Uuid::parse_str(id_str).unwrap_or_else(|_| uuid::Uuid::new_v4()),
);

If id_str from state.event_state.channels ever fails to parse as a UUID, the reconciler silently invents a brand-new random ChannelId and uses it to overwrite entries in topic_map. This is the same failure mode that #115 was opened for — on untrusted input, it split-brains the joiner from the rest of the network; on trusted-but-corrupted input (here: locally persisted state), it silently corrupts the in-memory topic map so the client talks to the "wrong" channel forever.

Why this wasn't fixed in #139

#139 explicitly scoped itself to accept_invite in joining.rs (the untrusted-invite path). The reconciler runs over locally persisted event state — a different trust context — so the argument goes that parse failure "can't happen" in practice. That's probably true today: channel IDs in event_state.channels come from events we authored or received and applied, and they're always serialized from Uuid. But:

  1. unwrap_or_else(|_| Uuid::new_v4()) is the opposite of an assertion. If the invariant ever breaks (migration, replay of corrupt events, bug elsewhere), this code corrupts state silently instead of surfacing the problem.
  2. The parser takes a &str from a HashMap<String, _>, so there's no type-level guarantee that it's a valid UUID.
  3. Even if we're happy trusting the invariant, the fallback should be an assertion/log + skip, not a fresh random ID.

Proposed fix

Either:

  • (preferred) If channel IDs in event_state.channels are always UUIDs, change the map key type from String to ChannelId so the parse happens at event apply time where we can fail loudly, and reconcile_topic_map just clones instead of reparsing.

  • (minimal) In reconcile_topic_map, log + skip on parse failure instead of minting a fresh random ID:

    let cid = match uuid::Uuid::parse_str(id_str) {
        Ok(u) => willow_channel::ChannelId(u),
        Err(e) => {
            tracing::warn!(id_str, error = %e, "skipping unparseable channel id in reconcile_topic_map");
            continue;
        }
    };

Context

Found during the audit pass over PRs #135#140 that closed the rest of #108's critical tier. PR #139 (closes #114, #115) fixed the untrusted-invite instance of this pattern; this issue tracks the remaining instance in the reconciler path so it doesn't get forgotten.

Progresses #108.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions