Skip to content

The great tlv wire migration of 2021#4923

Merged
Roasbeef merged 11 commits into
lightningnetwork:masterfrom
halseth:the-great-tlv-wire-migration
Mar 4, 2021
Merged

The great tlv wire migration of 2021#4923
Roasbeef merged 11 commits into
lightningnetwork:masterfrom
halseth:the-great-tlv-wire-migration

Conversation

@halseth
Copy link
Copy Markdown
Contributor

@halseth halseth commented Jan 15, 2021

This is a continuation/slight rewrite of multi: the great tlv wire migration #4747.

There are a few key differences:

  • instead of making the lnwire package version aware in order to use it in migrations for the old format, we copy the legacy lnwire code into the migration package. This allow us to change lnwire without worrying about breaking migrations.
  • Instead of having UpfrontShutdownScript be its own field on the [Open|Acccept]Channel messages, we embed it as a proper TLV field withing the ExtraData, and fetch it from there when needed. We are back to having a separate field.

Replaces #4747

@halseth halseth requested a review from Roasbeef January 15, 2021 14:03
@halseth halseth added this to the 0.13.0 milestone Jan 15, 2021
Comment thread channeldb/migration21/migration.go Outdated
@Roasbeef Roasbeef requested a review from wpaulino January 20, 2021 23:28
@Roasbeef Roasbeef added database Related to the database/storage of LND migration tlv labels Jan 20, 2021
@Roasbeef Roasbeef added the P1 MUST be fixed or reviewed label Jan 28, 2021
Copy link
Copy Markdown
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for taking over this PR!

The end diff looks large, but it's mostly code movement, with the core logic migration logic residing in a single commit. In the end, copying the lnwire package is def the way to go, to ensure we have proper isolation.

As the PR changed slightly, I'll start to test this out on on eof my nodes, much easier now that I don't need to manually creates a new breanch that combines the lnwire TLV changes along with the migration itself.

Comment thread channeldb/channel.go Outdated
Comment thread channeldb/codec.go Outdated
@halseth halseth force-pushed the the-great-tlv-wire-migration branch 2 times, most recently from e8b09e1 to c51c546 Compare February 11, 2021 14:15
@Roasbeef
Copy link
Copy Markdown
Member

The dry run migration went well, but upon boot I get:

2021-02-18 21:05:58.628 [ERR] LTND: unable to create server: unable to parse message of unknown type: <unknown>

I dug in a bit and turns out the migration isn't using the proper encoding when writing all the data back again. It uses what's in enclosed_codec.go, which is the old encode/decode logic. Namely, it's missing the new logic for encoding/decoding lnwire.Message w/ the proper length prefix.

I have a commit that fixes this, but before pushing it up, I want to se why this wasn't picked up by the existing tests.

@Roasbeef
Copy link
Copy Markdown
Member

Here's the commit: 239f19e

Also the reason why the tests passed was that since the migration logic was using the same encode/decode as the legacy logic (before the length prefix), it wasn't actually doing anything. One way to detect this in the future is to assert in the tests that the bytes are actually different.

@Roasbeef
Copy link
Copy Markdown
Member

Also I realized now where I left off last w/ it all: migrating the forwarding packages as well. As is, the migration will try to read out the forwarding packages in loadHltcs, using the LogUpdate.Decode method, which uses ReadElements. This'll fail as we'll read two more extra bytes than is actually encoded, as we're trying to read the length prefix, but one isn't present there yet.

@halseth
Copy link
Copy Markdown
Contributor Author

halseth commented Feb 19, 2021

Here's the commit: 239f19e

Also the reason why the tests passed was that since the migration logic was using the same encode/decode as the legacy logic (before the length prefix), it wasn't actually doing anything. One way to detect this in the future is to assert in the tests that the bytes are actually different.

Wow, can't believe I missed that! This migration wasn't really doing much that's for sure 🤦

Also I realized now where I left off last w/ it all: migrating the forwarding packages as well. As is, the migration will try to read out the forwarding packages in loadHltcs, using the LogUpdate.Decode method, which uses ReadElements. This'll fail as we'll read two more extra bytes than is actually encoded, as we're trying to read the length prefix, but one isn't present there yet.

Will to a sweep and make sure there aren't more places where we missed this!

@halseth halseth force-pushed the the-great-tlv-wire-migration branch 2 times, most recently from 541f842 to 44cfceb Compare February 19, 2021 15:01
@halseth
Copy link
Copy Markdown
Contributor Author

halseth commented Feb 19, 2021

I updated the PR to address the concerns brought up:

I split legacy and current serialization code used by the migration into two different pacakges, such that we should avoid calling into the wrong method unintended.

I went through the code base looking for all places where a wire message is written to the DB, found two more not already covered. One of them being the forwarding packages @Roasbeef mentioned, the other one the remote unisigned log updates that was recently added: ea91b35

In addition I looked for places where the wire messages gets written to the DB directly, outside the codec. Here I also found to cases, but since they both are written under their own key, we should be fine (but I wouldn't mind also migrating them just to be consistent):

if _, err := lnwire.WriteMessage(&b, msg, 0); err != nil {

if err := p.AnnounceSignatures.Encode(w, 0); err != nil {

@Roasbeef
Copy link
Copy Markdown
Member

In addition I looked for places where the wire messages gets written to the DB directly, outside the codec. Here I also found to cases,

Yeah I identified those in the past as well, but left them as they're written under an isolated key.

@Roasbeef
Copy link
Copy Markdown
Member

Started up with the latest diff, and so far so good 👌

@Roasbeef Roasbeef closed this Feb 20, 2021
@Roasbeef Roasbeef reopened this Feb 20, 2021
@Roasbeef
Copy link
Copy Markdown
Member

Accidentally closed...

Copy link
Copy Markdown
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It lives!!!

Really close now, final comment is re if we should expose the TLV details (as is now), or attempt to hide more of them from the user (as it was before).

Comment thread channeldb/channel.go
Comment thread funding/manager.go Outdated
Comment thread lnwire/accept_channel.go Outdated
Comment thread lnwire/accept_channel.go Outdated
func (a *AcceptChannel) UpfrontShutdownScript() (DeliveryAddress, error) {
var addr DeliveryAddress

tlvs, err := a.ExtraData.ExtractRecords(addr.NewRecord())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the comment above, we could have the callers use methods to "attach" these fields, which exposes less details to them, and is an alternative to setting nil values. I still loosely favor letting the callers set the fields themselves, and hiding everything behind Encode/Decode re if it's actually a TLV or not. As we evolve the messages, I imagine we'll have more and more "TLV native" fields, which can start to clutter the call sites if they need to always manually pack/encode them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller alters the exposed field without modifying the blob, the extra data blob gets out of sync with the field. A possible solution is to make the extraData field private, and only create it at time of encoding, packing the set fields in the process. The problem with this however, is that fields unknown to us still must be stored in the blob (imagine reading a message with a blob having unknown fields, and then attempting to add the UpfrontShutdown field to it. At this point we would have to add the data to the blob when encoding it, which is currently not that easy (or something we want?). We also don't know for sure whether the field is already present in the blob.

So I think there are a few avenues we can go down:

  • (In this PR currently) Making the caller TLV aware such that the whole extra data blob must be set by the caller.
  • Only exposing the UpfrontShutdownField and not the blob, and documenting that altering this field will invalidate any data already in the blob. At encoding we discard what's already in the blob and create a new one.
  • Adding Setters/Getters for the field. The getter will simply try to fetch it from the blob, the Setter will add it to the blob (if the blob is non-empty with unknown fields we must either return and error at this point, or extend our TLV code to allow adding fields to an existing stream).
  • Create a constructor that takes the optional fields. This way we can make sure the fields cannot be altered, and the blob cannot change after creation.

All of these are just ways to avoid the field getting out of sync with the blob of course. If we make sure to never alter the field after creation, then it doesn't matter that much, but would be nice to have it somewhat user-mistake proof .

Thoughts? cc @wpaulino

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only exposing the UpfrontShutdownField and not the blob, and documenting that altering this field will invalidate any data already in the blob. At encoding we discard what's already in the blob and create a new one.

I favor this option personally. I don't think we should idle long on this though, either direction can easily be reversed. I'm mostly imagining the future messages related to dynamic commitments, which will likely be entirely TLV based. If we keep the current API through those messages then we'll end up with a bunch of boiler plate just to create the message.

@Crypt-iQ
Copy link
Copy Markdown
Collaborator

One small comment: WriteMessage in lnwire/message.go should check against MaxMsgBody instead of MaxMessagePayload

Copy link
Copy Markdown
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! Confirmed there aren't any other unaddressed cases within the codebase and also had a successful migration on my node.

I agree with @Roasbeef regarding not exposing the TLV details to callers, it seems much cleaner. Imagine a scenario where a message goes through multiple subsystems and each subsystem attaches its own TLV field. With the current approach, both subsystems would need to be aware of all TLVs when not required, potentially causing import cycles.

Comment thread lnwire/extra_bytes.go
Comment thread lnwire/lnwire_test.go Outdated
@halseth
Copy link
Copy Markdown
Contributor Author

halseth commented Feb 23, 2021

Imagine a scenario where a message goes through multiple subsystems and each subsystem attaches its own TLV field.

This would be pretty cool functionality. Looks like our TLV code doesn't currently supports adding fields to an existing stream though, maybe that is something we want? cc @cfromknecht

@Roasbeef
Copy link
Copy Markdown
Member

Looks like our TLV code doesn't currently supports adding fields to an existing stream though, maybe that is something we want?

I don't think that's required. The encoding would happen only at the very end, so callers can add new TLV (which are just fields), then they're all assembled into a single stream and encoded at the end.

Roasbeef and others added 3 commits February 24, 2021 14:34
With this change, errors from migrations will have the proper local line
number.
To avoid code changing underneath the static migrations, we copy the
lnwire dependency in its current form into the migration directory.

Ideally the migrations doesn't depend on any code that might change,
this takes us a step closer.
…pdate

To more easily use different version of it post-/pre-migration, we
rename the method and make it take the LogUpdate as an argument.
@halseth halseth force-pushed the the-great-tlv-wire-migration branch 2 times, most recently from b84778b to 7ec3067 Compare February 24, 2021 14:43
@halseth
Copy link
Copy Markdown
Contributor Author

halseth commented Feb 24, 2021

Update the PR to keep the shutdown script as a separate field again.

To make this work we read out all TLV data, then snip out the script bytes (that MUST be there) and set the script field on the struct: 7ec3067#diff-682634355d285b253b49554f4ea4a4a15ce3a07d98b55bb978944bb5e0e62ab7R235-R245

When writing the message, we again concatenate the script field with the rest of the data.

This enables us to change the shutdown script on the struct without invalidating the extra data in the message.

The only case that wont be entirely correct doing it this way, is if we read a legacy message (that have no upfront shutdown script set at all), then write it out again. In this case the encoding would change since we add the zero-length field in that case. But I figured this was okay, since we were doing this already before this PR.

@halseth halseth requested review from Roasbeef and wpaulino February 24, 2021 14:50
halseth and others added 8 commits February 24, 2021 17:15
The legacy encoding depends on the lnwire21 version of lnwire, so it will
let us change lnwire after the migration. To make sure it is separated
from the new encoding, we add it to a new package 'legacy'.

We also put common types in a new package 'common', which will house
types that won't change during the migration, and can be used by both
legacy and current serialization code.
In this commit, we modify the way we write wire messages across the
entire database. We'll now ensure that we always write wire messages
with a length prefix. We update the `codec.go` file to always write a 2
byte length prefix, this affects the way we write the `CommitDiff` and
`LogUpdates` struct to disk, and the network results bucket in the
switch as it includes a wire message.
We copy the new serialization togic to a new package 'current' and
export the methods we will need from the migration. This ensures that we
have isolation between the legacy and current serialization methods,
cleanly inidcating what type we are using during the migration.
In this commit, we migrate all wire messages in the database from the
`legacy` to the `current` encoding.

This affects the way we write the `CommitDiff` and `LogUpdates` struct
to disk. We also need to migrate the network results bucket in the
switch as it includes a wire message without a length prefix.
In order to prep for allowing TLV extensions for the `ReplyChannelRange`
and `QueryChannelRange` messages, we'll need to remove the struct
embedding as is. If we don't remove this, then we'll attempt to decode
TLV extensions from both the embedded and outer struct.

All relevant call sites have been updated to reflect this minor change.
In this commit, we create a new `ExtraOpaqueData` based on the field
with the same name that's present in all the announcement related
messages. In later commits, we'll embed this new type in each message,
so we'll have a generic way to add/parse TLV extensions from messages.
Messages:
- UpdateFulfillHTLC
- UpdateFee
- UpdateFailMalformedHTLC
- UpdateFailHTLC
- UpdateAddHTLC
- Shutdown
- RevokeAndAck
- ReplyShortChanIDsEnd
- ReplyChannelRange
- QueryShortChanIDs
- QueryChannelRange
- NodeAnnouncement
- Init
- GossipTimestampRange
- FundingSigned
- FundingLocked
- FundingCreated
- CommitSig
- ClosingSigned
- ChannelUpdate
- ChannelReestablish
- ChannelAnnouncement
- AnnounceSignatures

lnwire: update quickcheck tests, use constant for Error

multi: update unit tests to pass deep equal assertions with messages

In this commit, we update a series of unit tests in the code base to now
pass due to the new wire message encode/decode logic. In many instances,
we'll now manually set the extra bytes to an empty byte slice to avoid
comparisons that fail due to one message having an empty byte slice and
the other having a nil pointer.
…pe in ExtraData

In this commit, we convert the delivery address in the open and accept
channel methods to be a TLV type. This works as an "empty" delivery
address is encoded using a two zero bytes (uint16 length zero), and a
tlv type of 0 is encoded in the same manner (byte for type, byte for
zero length). This change allows us to easily extend these messages in
the future, in a uniform manner.

When decoding the message we snip the bytes from the read TLV data.
Similarly, when encoding we concatenate the TLV record for the shutdown
script with the rest of the TLV data.
@halseth halseth force-pushed the the-great-tlv-wire-migration branch from 7ec3067 to 6842c0b Compare February 24, 2021 16:46
Copy link
Copy Markdown
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halseth nice work!! Even though it is a lot of code we need to keep around, I find this diff easy to follow 👍

This would be pretty cool functionality. Looks like our TLV code doesn't currently supports adding fields to an existing stream though, maybe that is something we want?

That would definitely be a nice addition, shouldn't be too hard to add that.

Comment thread discovery/syncer.go
reply *lnwire.ReplyChannelRange) bool {

return reply.QueryChannelRange == *query
return (reply.ChainHash == query.ChainHash &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary parens?

func (c *UpdateFulfillHTLC) MaxPayloadLength(uint32) uint32 {
// 32 + 8 + 32
return 72
return MaxMsgBody
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this entire method can be removed from the interface, since every message now has a constant max length

Comment thread lnwire/accept_channel.go
return err
}

a.UpfrontShutdownScript, a.ExtraData, err = parseShutdownScript(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it makes sense to use ExtraOpaqueData deserialization for those cases where we don't have any defined TLV fields, but in cases where we have known fields I had imagined using a custom tlv.Stream that would container a pointer to the fields that need to be [de]serialized. In the long run, i don't think it will be scalable to have a parseXXX field for everything we want to end up extracting

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like:

// Assumes all records being parsed start at the beginning of the TLV stream and follow one another.
// If this isn't the case, we'd need the custom `tlv.Stream` that contains pointers to each record.
func (e *ExtraOpaqueData) ParseKnownRecords(producers ...tlv.RecordProducer) error {
	for _, p := range producers {
		r := p.Record()
		records, err := e.ExtractRecords(r)
		if err != nil {
			return err
		}
		rawRecordVal, ok := records[r.Type()]
		if !ok || rawRecordVal != nil {
			return fmt.Errorf("record with type %v not parsed",
				r.Type())
		}

		// Remove the bytes corresponding to the record we just parsed
		// from our TLV stream.
		parsedRecord := p.Record()
		valSize := parsedRecord.Size()
		lenSize := tlv.VarIntSize(valSize)
		typeSize := tlv.VarIntSize(uint64(parsedRecord.Type()))
		*e = []byte(*e)[typeSize+lenSize+valSize:]
	}

	return nil
}

In the case of lnwire.OpenChannel, it'd be used as:

	var tlvRecords ExtraOpaqueData
	if err := ReadElements(r, &tlvRecords); err != nil {
		return err
	}
	err = tlvRecords.ParseKnownRecords(&o.UpfrontShutdownScript)
	if err != nil {
		return err
	}
	o.ExtraData = tlvRecords

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this approach won't be scalable in the generic case, but the shutdown script is really a special case, since it must always be there.

I think we should defer making this more generic until we have actual new fields we want to add, we'll know more about the requirements then.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can move ahead with the bulk of this PR given once we are properly writing/parsing all the data, we can make any future in-memory TLV representation changes at a later point.

Comment thread lnwire/accept_channel.go

// packShutdownScript takes an upfront shutdown script and an opaque data blob
// and concatenates them.
func packShutdownScript(addr DeliveryAddress, extraData ExtraOpaqueData) (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this a bit more generic so that it applies for all wire messages and their TLV types with something like PackTLVRecords(ExtraOpaqueData, ...tlv.RecordProducer). This would assume all records follow one another, so we should also sort them by type before sending them to peers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is non-blocking for this PR, once we start to add more TLV fields to messages, and also messages that are purely TLV, we'll have a better idea w.r.t ergonomics of the main interface.

Comment thread lnwire/accept_channel.go
}

// Not among TLV records, this means the data was invalid.
if _, ok := tlvs[DeliveryAddrType]; !ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the tlv.TypeMap docs:

// TypeMap is a map of parsed Types. The map values are byte slices. If the byte
// slice is nil, the type was successfully parsed. Otherwise the value is byte
// slice containing the encoded data.

Not sure if that's still the case, but if so, don't we need to check that the map value is also nil?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looks like that doc is not entirely correct? @cfromknecht

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the output of DecodeWithParsedTypes, which'll tally all the types parsed as it goes along: https://github.com/lightningnetwork/lnd/blob/master/tlv/stream.go#L234-L238

If the type is known, it'll be a nice slice (as the TLV record will be populated). If the type is unknown, but odd, then we'll retain the bytes in the map.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I don't think it should matter then, since we are only checking a known type, so we'll know that it is nil always.

Comment thread lnwire/accept_channel.go
return err
}

a.UpfrontShutdownScript, a.ExtraData, err = parseShutdownScript(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like:

// Assumes all records being parsed start at the beginning of the TLV stream and follow one another.
// If this isn't the case, we'd need the custom `tlv.Stream` that contains pointers to each record.
func (e *ExtraOpaqueData) ParseKnownRecords(producers ...tlv.RecordProducer) error {
	for _, p := range producers {
		r := p.Record()
		records, err := e.ExtractRecords(r)
		if err != nil {
			return err
		}
		rawRecordVal, ok := records[r.Type()]
		if !ok || rawRecordVal != nil {
			return fmt.Errorf("record with type %v not parsed",
				r.Type())
		}

		// Remove the bytes corresponding to the record we just parsed
		// from our TLV stream.
		parsedRecord := p.Record()
		valSize := parsedRecord.Size()
		lenSize := tlv.VarIntSize(valSize)
		typeSize := tlv.VarIntSize(uint64(parsedRecord.Type()))
		*e = []byte(*e)[typeSize+lenSize+valSize:]
	}

	return nil
}

In the case of lnwire.OpenChannel, it'd be used as:

	var tlvRecords ExtraOpaqueData
	if err := ReadElements(r, &tlvRecords); err != nil {
		return err
	}
	err = tlvRecords.ParseKnownRecords(&o.UpfrontShutdownScript)
	if err != nil {
		return err
	}
	o.ExtraData = tlvRecords

Copy link
Copy Markdown
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🍙

Updated another older testnet node, w/ this migration, and it's been running fine w/ no problems for the past few days. This unblocks a ton of other projects, so really excited to finally get this in!

Comment thread lnwire/accept_channel.go
func (a *AcceptChannel) Encode(w io.Writer, pver uint32) error {
// Since the upfront script is encoded as a TLV record, concatenate it
// with the ExtraData, and write them as one.
tlvRecords, err := packShutdownScript(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread lnwire/accept_channel.go
}

// Not among TLV records, this means the data was invalid.
if _, ok := tlvs[DeliveryAddrType]; !ok {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the output of DecodeWithParsedTypes, which'll tally all the types parsed as it goes along: https://github.com/lightningnetwork/lnd/blob/master/tlv/stream.go#L234-L238

If the type is known, it'll be a nice slice (as the TLV record will be populated). If the type is unknown, but odd, then we'll retain the bytes in the map.

@Roasbeef Roasbeef merged commit 75eca3a into lightningnetwork:master Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

database Related to the database/storage of LND migration P1 MUST be fixed or reviewed tlv v0.13

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants