Background
When nodes reconnect/restart, they need some way to see what was received by the other end to re-sync state. For normal (or shutting down state) this naturally follows batching done by commit_signature/revoke_and_ack messages.
The c-lightning prototype used a scheme where init messages contained a single counter: the total number of commit-signature and revoke_and_ack messages it had received. On disconnect, it would also forget any updates which it had not received commit_signature for.
In Milan, @adiabat argued simply retransmitting and discarding duplicates, rather than an explicit ack number. More recently, @pm47 asked to avoid the compulsory discard, and require an exact retransmission of previous messages; @rustyrussell instead asked for a strict superset. But further consideration has raised issues with these approaches.
The Uncertain Signature Problem
- A sends:
update1 commitment_signed disconnect
- B is in three possible states:
- B1: received nothing. Still at previous commit.
- B2: received
update1. Has one update pending.
- B3: received
update1 and commitment_signed. Has sent revoke_and_ack.
Now, when A reconnects, it does an exact retransmission:
- A sends:
update1 commitment_signed
- B1 is fine. B2 is fine if it ignores the duplicate. B3 either considers the COMMIT to have changed nothing (currently illegal), or if it ignores that, the signature is bad (it expects to be using the
next_per_commitment_point it sent in revoke_and_ack.
There is also the case where A adds another change (eg. feechange, or another update).
Possible solutions:
- Insist on an exact retransmission, and allow an empty commit, assert that as a special case, an "empty" commitment uses the previous per-commitment-point (and replies with the same revoke_and_ack as before).
- Allow changed transmission (must be a superset!), and if the signature check fails, try creating the commitment signature using the previous per_commitment_point, and if that succeeds, reply with the same revoke_and_ack as before.
- Send the explicit counter of updates + revocations so we don't encounter this situation.
The Persistence Problem
It's important that an optimal implementation only be required to remember state at the minimal number of points, as a robust implementation will need to synchronously write to disk(s). A node must remember when it receives revoke_and_ack (to create penalty transactions later), and when it sends commitment_signed (as it is committed to the HTLCs at that point, so it must remember them), so these are the minimal "sync" points possible.
Thus, requiring a node to persistently remember updates it has sent but not yet committed to is a poor idea. However, this can be reconstructed: we have to remember incoming HTLCs or fulfill/fails which were going to the reconnecting peer anyway, we can just re-send them. However we would not normally remember fee changes we have not committed to: requiring this to be recorded on sending update_fee adds a disk sync. Nor would we normally remember the order in which we sent the updates, which is imperative for the update_add_htlc id fields to match.
ECLAIR seems to require remembering the state and not rolling back. c-lightning (old, pre-Milan daemon) used reconstruction on reconnect/restart, but assumed the other side would roll back and used a total counter, and thus didn't have an issue if order or fees changed. lnd goes even further, and doesn't even remember id across reconnections: HTLCs are implicitly renumbered from 0 at that point. I don't find this 8-byte ondisk saving convincing: once HTLCs are no longer in the commitment transaction the ID can indeed be forgotten, but so can the amount and routing information: only the cltv and RIPEMD of the payment hash need to be remembered for creating the penalty transaction.
Background
When nodes reconnect/restart, they need some way to see what was received by the other end to re-sync state. For normal (or shutting down state) this naturally follows batching done by commit_signature/revoke_and_ack messages.
The c-lightning prototype used a scheme where init messages contained a single counter: the total number of commit-signature and revoke_and_ack messages it had received. On disconnect, it would also forget any updates which it had not received commit_signature for.
In Milan, @adiabat argued simply retransmitting and discarding duplicates, rather than an explicit ack number. More recently, @pm47 asked to avoid the compulsory discard, and require an exact retransmission of previous messages; @rustyrussell instead asked for a strict superset. But further consideration has raised issues with these approaches.
The Uncertain Signature Problem
update1commitment_signeddisconnectupdate1. Has one update pending.update1andcommitment_signed. Has sentrevoke_and_ack.Now, when A reconnects, it does an exact retransmission:
update1commitment_signednext_per_commitment_pointit sent inrevoke_and_ack.There is also the case where A adds another change (eg. feechange, or another update).
Possible solutions:
The Persistence Problem
It's important that an optimal implementation only be required to remember state at the minimal number of points, as a robust implementation will need to synchronously write to disk(s). A node must remember when it receives
revoke_and_ack(to create penalty transactions later), and when it sendscommitment_signed(as it is committed to the HTLCs at that point, so it must remember them), so these are the minimal "sync" points possible.Thus, requiring a node to persistently remember updates it has sent but not yet committed to is a poor idea. However, this can be reconstructed: we have to remember incoming HTLCs or fulfill/fails which were going to the reconnecting peer anyway, we can just re-send them. However we would not normally remember fee changes we have not committed to: requiring this to be recorded on sending update_fee adds a disk sync. Nor would we normally remember the order in which we sent the updates, which is imperative for the update_add_htlc
idfields to match.ECLAIR seems to require remembering the state and not rolling back. c-lightning (old, pre-Milan daemon) used reconstruction on reconnect/restart, but assumed the other side would roll back and used a total counter, and thus didn't have an issue if order or fees changed. lnd goes even further, and doesn't even remember
idacross reconnections: HTLCs are implicitly renumbered from 0 at that point. I don't find this 8-byte ondisk saving convincing: once HTLCs are no longer in the commitment transaction the ID can indeed be forgotten, but so can the amount and routing information: only the cltv and RIPEMD of the payment hash need to be remembered for creating the penalty transaction.