The great tlv wire migration of 2021 by halseth · Pull Request #4923 · lightningnetwork/lnd

halseth · 2021-01-15T14:03:11Z

This is a continuation/slight rewrite of multi: the great tlv wire migration #4747.

There are a few key differences:

instead of making the lnwire package version aware in order to use it in migrations for the old format, we copy the legacy lnwire code into the migration package. This allow us to change lnwire without worrying about breaking migrations.
~~Instead of having UpfrontShutdownScript be its own field on the [Open|Acccept]Channel messages, we embed it as a proper TLV field withing the ExtraData, and fetch it from there when needed.~~ We are back to having a separate field.

Replaces #4747

Roasbeef

Many thanks for taking over this PR!

The end diff looks large, but it's mostly code movement, with the core logic migration logic residing in a single commit. In the end, copying the lnwire package is def the way to go, to ensure we have proper isolation.

As the PR changed slightly, I'll start to test this out on on eof my nodes, much easier now that I don't need to manually creates a new breanch that combines the lnwire TLV changes along with the migration itself.

Roasbeef · 2021-02-19T03:13:09Z

The dry run migration went well, but upon boot I get:

2021-02-18 21:05:58.628 [ERR] LTND: unable to create server: unable to parse message of unknown type: <unknown>

I dug in a bit and turns out the migration isn't using the proper encoding when writing all the data back again. It uses what's in enclosed_codec.go, which is the old encode/decode logic. Namely, it's missing the new logic for encoding/decoding lnwire.Message w/ the proper length prefix.

I have a commit that fixes this, but before pushing it up, I want to se why this wasn't picked up by the existing tests.

Roasbeef · 2021-02-19T03:50:14Z

Here's the commit: 239f19e

Also the reason why the tests passed was that since the migration logic was using the same encode/decode as the legacy logic (before the length prefix), it wasn't actually doing anything. One way to detect this in the future is to assert in the tests that the bytes are actually different.

Roasbeef · 2021-02-19T04:09:29Z

Also I realized now where I left off last w/ it all: migrating the forwarding packages as well. As is, the migration will try to read out the forwarding packages in loadHltcs, using the LogUpdate.Decode method, which uses ReadElements. This'll fail as we'll read two more extra bytes than is actually encoded, as we're trying to read the length prefix, but one isn't present there yet.

halseth · 2021-02-19T09:21:57Z

Here's the commit: 239f19e

Also the reason why the tests passed was that since the migration logic was using the same encode/decode as the legacy logic (before the length prefix), it wasn't actually doing anything. One way to detect this in the future is to assert in the tests that the bytes are actually different.

Wow, can't believe I missed that! This migration wasn't really doing much that's for sure 🤦

Also I realized now where I left off last w/ it all: migrating the forwarding packages as well. As is, the migration will try to read out the forwarding packages in loadHltcs, using the LogUpdate.Decode method, which uses ReadElements. This'll fail as we'll read two more extra bytes than is actually encoded, as we're trying to read the length prefix, but one isn't present there yet.

Will to a sweep and make sure there aren't more places where we missed this!

halseth · 2021-02-19T15:11:27Z

I updated the PR to address the concerns brought up:

I split legacy and current serialization code used by the migration into two different pacakges, such that we should avoid calling into the wrong method unintended.

I went through the code base looking for all places where a wire message is written to the DB, found two more not already covered. One of them being the forwarding packages @Roasbeef mentioned, the other one the remote unisigned log updates that was recently added: ea91b35

In addition I looked for places where the wire messages gets written to the DB directly, outside the codec. Here I also found to cases, but since they both are written under their own key, we should be fine (but I wouldn't mind also migrating them just to be consistent):

lnd/discovery/message_store.go

Line 123 in 203a669

if _, err := lnwire.WriteMessage(&b, msg, 0); err != nil {

lnd/channeldb/waitingproof.go

Line 235 in 203a669

if err := p.AnnounceSignatures.Encode(w, 0); err != nil {

Roasbeef · 2021-02-19T23:36:09Z

In addition I looked for places where the wire messages gets written to the DB directly, outside the codec. Here I also found to cases,

Yeah I identified those in the past as well, but left them as they're written under an isolated key.

Roasbeef · 2021-02-20T00:08:56Z

Started up with the latest diff, and so far so good 👌

Roasbeef · 2021-02-20T00:09:07Z

Accidentally closed...

Roasbeef

It lives!!!

Really close now, final comment is re if we should expose the TLV details (as is now), or attempt to hide more of them from the user (as it was before).

Roasbeef · 2021-02-20T00:32:59Z

+func (a *AcceptChannel) UpfrontShutdownScript() (DeliveryAddress, error) {
+	var addr DeliveryAddress
+
+	tlvs, err := a.ExtraData.ExtractRecords(addr.NewRecord())


Related to the comment above, we could have the callers use methods to "attach" these fields, which exposes less details to them, and is an alternative to setting nil values. I still loosely favor letting the callers set the fields themselves, and hiding everything behind Encode/Decode re if it's actually a TLV or not. As we evolve the messages, I imagine we'll have more and more "TLV native" fields, which can start to clutter the call sites if they need to always manually pack/encode them.

If the caller alters the exposed field without modifying the blob, the extra data blob gets out of sync with the field. A possible solution is to make the extraData field private, and only create it at time of encoding, packing the set fields in the process. The problem with this however, is that fields unknown to us still must be stored in the blob (imagine reading a message with a blob having unknown fields, and then attempting to add the UpfrontShutdown field to it. At this point we would have to add the data to the blob when encoding it, which is currently not that easy (or something we want?). We also don't know for sure whether the field is already present in the blob.

So I think there are a few avenues we can go down:

(In this PR currently) Making the caller TLV aware such that the whole extra data blob must be set by the caller.

Only exposing the UpfrontShutdownField and not the blob, and documenting that altering this field will invalidate any data already in the blob. At encoding we discard what's already in the blob and create a new one.

Adding Setters/Getters for the field. The getter will simply try to fetch it from the blob, the Setter will add it to the blob (if the blob is non-empty with unknown fields we must either return and error at this point, or extend our TLV code to allow adding fields to an existing stream).

Create a constructor that takes the optional fields. This way we can make sure the fields cannot be altered, and the blob cannot change after creation.

All of these are just ways to avoid the field getting out of sync with the blob of course. If we make sure to never alter the field after creation, then it doesn't matter that much, but would be nice to have it somewhat user-mistake proof .

Thoughts? cc @wpaulino

Only exposing the UpfrontShutdownField and not the blob, and documenting that altering this field will invalidate any data already in the blob. At encoding we discard what's already in the blob and create a new one.

I favor this option personally. I don't think we should idle long on this though, either direction can easily be reversed. I'm mostly imagining the future messages related to dynamic commitments, which will likely be entirely TLV based. If we keep the current API through those messages then we'll end up with a bunch of boiler plate just to create the message.

Crypt-iQ · 2021-02-22T21:53:53Z

One small comment: WriteMessage in lnwire/message.go should check against MaxMsgBody instead of MaxMessagePayload

wpaulino

Looks pretty good! Confirmed there aren't any other unaddressed cases within the codebase and also had a successful migration on my node.

I agree with @Roasbeef regarding not exposing the TLV details to callers, it seems much cleaner. Imagine a scenario where a message goes through multiple subsystems and each subsystem attaches its own TLV field. With the current approach, both subsystems would need to be aware of all TLVs when not required, potentially causing import cycles.

halseth · 2021-02-23T12:15:11Z

Imagine a scenario where a message goes through multiple subsystems and each subsystem attaches its own TLV field.

This would be pretty cool functionality. Looks like our TLV code doesn't currently supports adding fields to an existing stream though, maybe that is something we want? cc @cfromknecht

Roasbeef · 2021-02-23T19:10:56Z

Looks like our TLV code doesn't currently supports adding fields to an existing stream though, maybe that is something we want?

I don't think that's required. The encoding would happen only at the very end, so callers can add new TLV (which are just fields), then they're all assembled into a single stream and encoded at the end.

With this change, errors from migrations will have the proper local line number.

To avoid code changing underneath the static migrations, we copy the lnwire dependency in its current form into the migration directory. Ideally the migrations doesn't depend on any code that might change, this takes us a step closer.

…pdate To more easily use different version of it post-/pre-migration, we rename the method and make it take the LogUpdate as an argument.

halseth · 2021-02-24T14:50:48Z

Update the PR to keep the shutdown script as a separate field again.

To make this work we read out all TLV data, then snip out the script bytes (that MUST be there) and set the script field on the struct: 7ec3067#diff-682634355d285b253b49554f4ea4a4a15ce3a07d98b55bb978944bb5e0e62ab7R235-R245

When writing the message, we again concatenate the script field with the rest of the data.

This enables us to change the shutdown script on the struct without invalidating the extra data in the message.

The only case that wont be entirely correct doing it this way, is if we read a legacy message (that have no upfront shutdown script set at all), then write it out again. In this case the encoding would change since we add the zero-length field in that case. But I figured this was okay, since we were doing this already before this PR.

The legacy encoding depends on the lnwire21 version of lnwire, so it will let us change lnwire after the migration. To make sure it is separated from the new encoding, we add it to a new package 'legacy'. We also put common types in a new package 'common', which will house types that won't change during the migration, and can be used by both legacy and current serialization code.

In this commit, we modify the way we write wire messages across the entire database. We'll now ensure that we always write wire messages with a length prefix. We update the `codec.go` file to always write a 2 byte length prefix, this affects the way we write the `CommitDiff` and `LogUpdates` struct to disk, and the network results bucket in the switch as it includes a wire message.

We copy the new serialization togic to a new package 'current' and export the methods we will need from the migration. This ensures that we have isolation between the legacy and current serialization methods, cleanly inidcating what type we are using during the migration.

In this commit, we migrate all wire messages in the database from the `legacy` to the `current` encoding. This affects the way we write the `CommitDiff` and `LogUpdates` struct to disk. We also need to migrate the network results bucket in the switch as it includes a wire message without a length prefix.

In order to prep for allowing TLV extensions for the `ReplyChannelRange` and `QueryChannelRange` messages, we'll need to remove the struct embedding as is. If we don't remove this, then we'll attempt to decode TLV extensions from both the embedded and outer struct. All relevant call sites have been updated to reflect this minor change.

In this commit, we create a new `ExtraOpaqueData` based on the field with the same name that's present in all the announcement related messages. In later commits, we'll embed this new type in each message, so we'll have a generic way to add/parse TLV extensions from messages.

Messages: - UpdateFulfillHTLC - UpdateFee - UpdateFailMalformedHTLC - UpdateFailHTLC - UpdateAddHTLC - Shutdown - RevokeAndAck - ReplyShortChanIDsEnd - ReplyChannelRange - QueryShortChanIDs - QueryChannelRange - NodeAnnouncement - Init - GossipTimestampRange - FundingSigned - FundingLocked - FundingCreated - CommitSig - ClosingSigned - ChannelUpdate - ChannelReestablish - ChannelAnnouncement - AnnounceSignatures lnwire: update quickcheck tests, use constant for Error multi: update unit tests to pass deep equal assertions with messages In this commit, we update a series of unit tests in the code base to now pass due to the new wire message encode/decode logic. In many instances, we'll now manually set the extra bytes to an empty byte slice to avoid comparisons that fail due to one message having an empty byte slice and the other having a nil pointer.

…pe in ExtraData In this commit, we convert the delivery address in the open and accept channel methods to be a TLV type. This works as an "empty" delivery address is encoded using a two zero bytes (uint16 length zero), and a tlv type of 0 is encoded in the same manner (byte for type, byte for zero length). This change allows us to easily extend these messages in the future, in a uniform manner. When decoding the message we snip the bytes from the read TLV data. Similarly, when encoding we concatenate the TLV record for the shutdown script with the rest of the TLV data.

cfromknecht

@halseth nice work!! Even though it is a lot of code we need to keep around, I find this diff easy to follow 👍

This would be pretty cool functionality. Looks like our TLV code doesn't currently supports adding fields to an existing stream though, maybe that is something we want?

That would definitely be a nice addition, shouldn't be too hard to add that.

cfromknecht · 2021-02-25T20:13:17Z

 	reply *lnwire.ReplyChannelRange) bool {

-	return reply.QueryChannelRange == *query
+	return (reply.ChainHash == query.ChainHash &&


unnecessary parens?

cfromknecht · 2021-02-25T20:16:35Z

 func (c *UpdateFulfillHTLC) MaxPayloadLength(uint32) uint32 {
-	// 32 + 8 + 32
-	return 72
+	return MaxMsgBody


this entire method can be removed from the interface, since every message now has a constant max length

cfromknecht · 2021-02-25T20:37:35Z

 		return err
 	}
+
+	a.UpfrontShutdownScript, a.ExtraData, err = parseShutdownScript(


i think it makes sense to use ExtraOpaqueData deserialization for those cases where we don't have any defined TLV fields, but in cases where we have known fields I had imagined using a custom tlv.Stream that would container a pointer to the fields that need to be [de]serialized. In the long run, i don't think it will be scalable to have a parseXXX field for everything we want to end up extracting

What about something like:

// Assumes all records being parsed start at the beginning of the TLV stream and follow one another. // If this isn't the case, we'd need the custom `tlv.Stream` that contains pointers to each record. func (e *ExtraOpaqueData) ParseKnownRecords(producers ...tlv.RecordProducer) error { for _, p := range producers { r := p.Record() records, err := e.ExtractRecords(r) if err != nil { return err } rawRecordVal, ok := records[r.Type()] if !ok || rawRecordVal != nil { return fmt.Errorf("record with type %v not parsed", r.Type()) } // Remove the bytes corresponding to the record we just parsed // from our TLV stream. parsedRecord := p.Record() valSize := parsedRecord.Size() lenSize := tlv.VarIntSize(valSize) typeSize := tlv.VarIntSize(uint64(parsedRecord.Type())) *e = []byte(*e)[typeSize+lenSize+valSize:] } return nil }

In the case of lnwire.OpenChannel, it'd be used as:

var tlvRecords ExtraOpaqueData if err := ReadElements(r, &tlvRecords); err != nil { return err } err = tlvRecords.ParseKnownRecords(&o.UpfrontShutdownScript) if err != nil { return err } o.ExtraData = tlvRecords

I agree that this approach won't be scalable in the generic case, but the shutdown script is really a special case, since it must always be there.

I think we should defer making this more generic until we have actual new fields we want to add, we'll know more about the requirements then.

Agreed, we can move ahead with the bulk of this PR given once we are properly writing/parsing all the data, we can make any future in-memory TLV representation changes at a later point.

wpaulino · 2021-02-25T23:13:22Z


+// packShutdownScript takes an upfront shutdown script and an opaque data blob
+// and concatenates them.
+func packShutdownScript(addr DeliveryAddress, extraData ExtraOpaqueData) (


We could make this a bit more generic so that it applies for all wire messages and their TLV types with something like PackTLVRecords(ExtraOpaqueData, ...tlv.RecordProducer). This would assume all records follow one another, so we should also sort them by type before sending them to peers.

Don't think this is non-blocking for this PR, once we start to add more TLV fields to messages, and also messages that are purely TLV, we'll have a better idea w.r.t ergonomics of the main interface.

wpaulino · 2021-02-25T23:51:19Z

+	}
+
+	// Not among TLV records, this means the data was invalid.
+	if _, ok := tlvs[DeliveryAddrType]; !ok {


From the tlv.TypeMap docs:

// TypeMap is a map of parsed Types. The map values are byte slices. If the byte
// slice is nil, the type was successfully parsed. Otherwise the value is byte
// slice containing the encoded data.

Not sure if that's still the case, but if so, don't we need to check that the map value is also nil?

Hm, looks like that doc is not entirely correct? @cfromknecht

This is the output of DecodeWithParsedTypes, which'll tally all the types parsed as it goes along: https://github.com/lightningnetwork/lnd/blob/master/tlv/stream.go#L234-L238

If the type is known, it'll be a nice slice (as the TLV record will be populated). If the type is unknown, but odd, then we'll retain the bytes in the map.

I see. I don't think it should matter then, since we are only checking a known type, so we'll know that it is nil always.

wpaulino · 2021-02-26T00:21:33Z

 		return err
 	}
+
+	a.UpfrontShutdownScript, a.ExtraData, err = parseShutdownScript(


What about something like:

// Assumes all records being parsed start at the beginning of the TLV stream and follow one another. // If this isn't the case, we'd need the custom `tlv.Stream` that contains pointers to each record. func (e *ExtraOpaqueData) ParseKnownRecords(producers ...tlv.RecordProducer) error { for _, p := range producers { r := p.Record() records, err := e.ExtractRecords(r) if err != nil { return err } rawRecordVal, ok := records[r.Type()] if !ok || rawRecordVal != nil { return fmt.Errorf("record with type %v not parsed", r.Type()) } // Remove the bytes corresponding to the record we just parsed // from our TLV stream. parsedRecord := p.Record() valSize := parsedRecord.Size() lenSize := tlv.VarIntSize(valSize) typeSize := tlv.VarIntSize(uint64(parsedRecord.Type())) *e = []byte(*e)[typeSize+lenSize+valSize:] } return nil }

In the case of lnwire.OpenChannel, it'd be used as:

var tlvRecords ExtraOpaqueData if err := ReadElements(r, &tlvRecords); err != nil { return err } err = tlvRecords.ParseKnownRecords(&o.UpfrontShutdownScript) if err != nil { return err } o.ExtraData = tlvRecords

Roasbeef

LGTM 🍙

Updated another older testnet node, w/ this migration, and it's been running fine w/ no problems for the past few days. This unblocks a ton of other projects, so really excited to finally get this in!

Roasbeef · 2021-03-02T03:05:57Z

 func (a *AcceptChannel) Encode(w io.Writer, pver uint32) error {
+	// Since the upfront script is encoded as a TLV record, concatenate it
+	// with the ExtraData, and write them as one.
+	tlvRecords, err := packShutdownScript(


Roasbeef · 2021-03-02T03:12:02Z

+	}
+
+	// Not among TLV records, this means the data was invalid.
+	if _, ok := tlvs[DeliveryAddrType]; !ok {


This is the output of DecodeWithParsedTypes, which'll tally all the types parsed as it goes along: https://github.com/lightningnetwork/lnd/blob/master/tlv/stream.go#L234-L238

If the type is known, it'll be a nice slice (as the TLV record will be populated). If the type is unknown, but odd, then we'll retain the bytes in the map.

halseth requested a review from Roasbeef January 15, 2021 14:03

halseth added this to the 0.13.0 milestone Jan 15, 2021

halseth mentioned this pull request Jan 15, 2021

multi: the great tlv wire migration #4747

Closed

halseth commented Jan 15, 2021

View reviewed changes

Comment thread channeldb/migration21/migration.go Outdated

Roasbeef requested a review from wpaulino January 20, 2021 23:28

Roasbeef added database Related to the database/storage of LND migration tlv labels Jan 20, 2021

Roasbeef added the P1 MUST be fixed or reviewed label Jan 28, 2021

Roasbeef reviewed Feb 9, 2021

View reviewed changes

Comment thread channeldb/channel.go Outdated

Comment thread channeldb/codec.go Outdated

halseth force-pushed the the-great-tlv-wire-migration branch 2 times, most recently from e8b09e1 to c51c546 Compare February 11, 2021 14:15

cfromknecht added the v0.13 label Feb 18, 2021

halseth force-pushed the the-great-tlv-wire-migration branch 2 times, most recently from 541f842 to 44cfceb Compare February 19, 2021 15:01

Roasbeef closed this Feb 20, 2021

Roasbeef reopened this Feb 20, 2021

Roasbeef reviewed Feb 20, 2021

View reviewed changes

Roasbeef mentioned this pull request Feb 22, 2021

lnwire: correct payload length checks in WriteMessage #5050

Closed

wpaulino reviewed Feb 22, 2021

View reviewed changes

Comment thread lnwire/extra_bytes.go

Comment thread lnwire/lnwire_test.go Outdated

Roasbeef and others added 3 commits February 24, 2021 14:34

channeldb: mark ApplyMigration as a test helper

c6a8816

With this change, errors from migrations will have the proper local line number.

channeldb: rename Encode/Decode to serializeLogUpdate/deserializeLogU…

90d36db

…pdate To more easily use different version of it post-/pre-migration, we rename the method and make it take the LogUpdate as an argument.

halseth force-pushed the the-great-tlv-wire-migration branch 2 times, most recently from b84778b to 7ec3067 Compare February 24, 2021 14:43

halseth requested review from Roasbeef and wpaulino February 24, 2021 14:50

halseth and others added 8 commits February 24, 2021 17:15

halseth force-pushed the the-great-tlv-wire-migration branch from 7ec3067 to 6842c0b Compare February 24, 2021 16:46

cfromknecht reviewed Feb 25, 2021

View reviewed changes

wpaulino reviewed Feb 26, 2021

View reviewed changes

Roasbeef approved these changes Mar 2, 2021

View reviewed changes

wpaulino approved these changes Mar 2, 2021

View reviewed changes

Roasbeef merged commit 75eca3a into lightningnetwork:master Mar 4, 2021

Conversation

halseth commented Jan 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Roasbeef left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Roasbeef commented Feb 19, 2021

Uh oh!

Roasbeef commented Feb 19, 2021

Uh oh!

Roasbeef commented Feb 19, 2021

Uh oh!

halseth commented Feb 19, 2021

Uh oh!

halseth commented Feb 19, 2021

Uh oh!

Roasbeef commented Feb 19, 2021

Uh oh!

Roasbeef commented Feb 20, 2021

Uh oh!

Roasbeef commented Feb 20, 2021

Uh oh!

Roasbeef left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Crypt-iQ commented Feb 22, 2021

Uh oh!

wpaulino left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

halseth commented Feb 23, 2021

Uh oh!

Roasbeef commented Feb 23, 2021

Uh oh!

halseth commented Feb 24, 2021

Uh oh!

cfromknecht left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halseth commented Jan 15, 2021 •

edited

Loading