The great tlv wire migration of 2021#4923
Conversation
Roasbeef
left a comment
There was a problem hiding this comment.
Many thanks for taking over this PR!
The end diff looks large, but it's mostly code movement, with the core logic migration logic residing in a single commit. In the end, copying the lnwire package is def the way to go, to ensure we have proper isolation.
As the PR changed slightly, I'll start to test this out on on eof my nodes, much easier now that I don't need to manually creates a new breanch that combines the lnwire TLV changes along with the migration itself.
e8b09e1 to
c51c546
Compare
|
The dry run migration went well, but upon boot I get: I dug in a bit and turns out the migration isn't using the proper encoding when writing all the data back again. It uses what's in I have a commit that fixes this, but before pushing it up, I want to se why this wasn't picked up by the existing tests. |
|
Here's the commit: 239f19e Also the reason why the tests passed was that since the migration logic was using the same encode/decode as the legacy logic (before the length prefix), it wasn't actually doing anything. One way to detect this in the future is to assert in the tests that the bytes are actually different. |
|
Also I realized now where I left off last w/ it all: migrating the forwarding packages as well. As is, the migration will try to read out the forwarding packages in |
Wow, can't believe I missed that! This migration wasn't really doing much that's for sure 🤦
Will to a sweep and make sure there aren't more places where we missed this! |
541f842 to
44cfceb
Compare
|
I updated the PR to address the concerns brought up: I split I went through the code base looking for all places where a wire message is written to the DB, found two more not already covered. One of them being the forwarding packages @Roasbeef mentioned, the other one the remote unisigned log updates that was recently added: ea91b35 In addition I looked for places where the wire messages gets written to the DB directly, outside the lnd/discovery/message_store.go Line 123 in 203a669 Line 235 in 203a669 |
Yeah I identified those in the past as well, but left them as they're written under an isolated key. |
|
Started up with the latest diff, and so far so good 👌 |
|
Accidentally closed... |
Roasbeef
left a comment
There was a problem hiding this comment.
It lives!!!
Really close now, final comment is re if we should expose the TLV details (as is now), or attempt to hide more of them from the user (as it was before).
| func (a *AcceptChannel) UpfrontShutdownScript() (DeliveryAddress, error) { | ||
| var addr DeliveryAddress | ||
|
|
||
| tlvs, err := a.ExtraData.ExtractRecords(addr.NewRecord()) |
There was a problem hiding this comment.
Related to the comment above, we could have the callers use methods to "attach" these fields, which exposes less details to them, and is an alternative to setting nil values. I still loosely favor letting the callers set the fields themselves, and hiding everything behind Encode/Decode re if it's actually a TLV or not. As we evolve the messages, I imagine we'll have more and more "TLV native" fields, which can start to clutter the call sites if they need to always manually pack/encode them.
There was a problem hiding this comment.
If the caller alters the exposed field without modifying the blob, the extra data blob gets out of sync with the field. A possible solution is to make the extraData field private, and only create it at time of encoding, packing the set fields in the process. The problem with this however, is that fields unknown to us still must be stored in the blob (imagine reading a message with a blob having unknown fields, and then attempting to add the UpfrontShutdown field to it. At this point we would have to add the data to the blob when encoding it, which is currently not that easy (or something we want?). We also don't know for sure whether the field is already present in the blob.
So I think there are a few avenues we can go down:
- (In this PR currently) Making the caller TLV aware such that the whole extra data blob must be set by the caller.
- Only exposing the
UpfrontShutdownFieldand not the blob, and documenting that altering this field will invalidate any data already in the blob. At encoding we discard what's already in the blob and create a new one. - Adding
Setters/Gettersfor the field. The getter will simply try to fetch it from the blob, theSetterwill add it to the blob (if the blob is non-empty with unknown fields we must either return and error at this point, or extend our TLV code to allow adding fields to an existing stream). - Create a constructor that takes the optional fields. This way we can make sure the fields cannot be altered, and the blob cannot change after creation.
All of these are just ways to avoid the field getting out of sync with the blob of course. If we make sure to never alter the field after creation, then it doesn't matter that much, but would be nice to have it somewhat user-mistake proof .
Thoughts? cc @wpaulino
There was a problem hiding this comment.
Only exposing the UpfrontShutdownField and not the blob, and documenting that altering this field will invalidate any data already in the blob. At encoding we discard what's already in the blob and create a new one.
I favor this option personally. I don't think we should idle long on this though, either direction can easily be reversed. I'm mostly imagining the future messages related to dynamic commitments, which will likely be entirely TLV based. If we keep the current API through those messages then we'll end up with a bunch of boiler plate just to create the message.
|
One small comment: |
wpaulino
left a comment
There was a problem hiding this comment.
Looks pretty good! Confirmed there aren't any other unaddressed cases within the codebase and also had a successful migration on my node.
I agree with @Roasbeef regarding not exposing the TLV details to callers, it seems much cleaner. Imagine a scenario where a message goes through multiple subsystems and each subsystem attaches its own TLV field. With the current approach, both subsystems would need to be aware of all TLVs when not required, potentially causing import cycles.
This would be pretty cool functionality. Looks like our TLV code doesn't currently supports adding fields to an existing stream though, maybe that is something we want? cc @cfromknecht |
I don't think that's required. The encoding would happen only at the very end, so callers can add new TLV (which are just fields), then they're all assembled into a single stream and encoded at the end. |
With this change, errors from migrations will have the proper local line number.
To avoid code changing underneath the static migrations, we copy the lnwire dependency in its current form into the migration directory. Ideally the migrations doesn't depend on any code that might change, this takes us a step closer.
…pdate To more easily use different version of it post-/pre-migration, we rename the method and make it take the LogUpdate as an argument.
b84778b to
7ec3067
Compare
|
Update the PR to keep the To make this work we read out all TLV data, then snip out the script bytes (that MUST be there) and set the script field on the struct: 7ec3067#diff-682634355d285b253b49554f4ea4a4a15ce3a07d98b55bb978944bb5e0e62ab7R235-R245 When writing the message, we again concatenate the script field with the rest of the data. This enables us to change the shutdown script on the struct without invalidating the extra data in the message. The only case that wont be entirely correct doing it this way, is if we read a legacy message (that have no upfront shutdown script set at all), then write it out again. In this case the encoding would change since we add the zero-length field in that case. But I figured this was okay, since we were doing this already before this PR. |
The legacy encoding depends on the lnwire21 version of lnwire, so it will let us change lnwire after the migration. To make sure it is separated from the new encoding, we add it to a new package 'legacy'. We also put common types in a new package 'common', which will house types that won't change during the migration, and can be used by both legacy and current serialization code.
In this commit, we modify the way we write wire messages across the entire database. We'll now ensure that we always write wire messages with a length prefix. We update the `codec.go` file to always write a 2 byte length prefix, this affects the way we write the `CommitDiff` and `LogUpdates` struct to disk, and the network results bucket in the switch as it includes a wire message.
We copy the new serialization togic to a new package 'current' and export the methods we will need from the migration. This ensures that we have isolation between the legacy and current serialization methods, cleanly inidcating what type we are using during the migration.
In this commit, we migrate all wire messages in the database from the `legacy` to the `current` encoding. This affects the way we write the `CommitDiff` and `LogUpdates` struct to disk. We also need to migrate the network results bucket in the switch as it includes a wire message without a length prefix.
In order to prep for allowing TLV extensions for the `ReplyChannelRange` and `QueryChannelRange` messages, we'll need to remove the struct embedding as is. If we don't remove this, then we'll attempt to decode TLV extensions from both the embedded and outer struct. All relevant call sites have been updated to reflect this minor change.
In this commit, we create a new `ExtraOpaqueData` based on the field with the same name that's present in all the announcement related messages. In later commits, we'll embed this new type in each message, so we'll have a generic way to add/parse TLV extensions from messages.
Messages: - UpdateFulfillHTLC - UpdateFee - UpdateFailMalformedHTLC - UpdateFailHTLC - UpdateAddHTLC - Shutdown - RevokeAndAck - ReplyShortChanIDsEnd - ReplyChannelRange - QueryShortChanIDs - QueryChannelRange - NodeAnnouncement - Init - GossipTimestampRange - FundingSigned - FundingLocked - FundingCreated - CommitSig - ClosingSigned - ChannelUpdate - ChannelReestablish - ChannelAnnouncement - AnnounceSignatures lnwire: update quickcheck tests, use constant for Error multi: update unit tests to pass deep equal assertions with messages In this commit, we update a series of unit tests in the code base to now pass due to the new wire message encode/decode logic. In many instances, we'll now manually set the extra bytes to an empty byte slice to avoid comparisons that fail due to one message having an empty byte slice and the other having a nil pointer.
…pe in ExtraData In this commit, we convert the delivery address in the open and accept channel methods to be a TLV type. This works as an "empty" delivery address is encoded using a two zero bytes (uint16 length zero), and a tlv type of 0 is encoded in the same manner (byte for type, byte for zero length). This change allows us to easily extend these messages in the future, in a uniform manner. When decoding the message we snip the bytes from the read TLV data. Similarly, when encoding we concatenate the TLV record for the shutdown script with the rest of the TLV data.
7ec3067 to
6842c0b
Compare
cfromknecht
left a comment
There was a problem hiding this comment.
@halseth nice work!! Even though it is a lot of code we need to keep around, I find this diff easy to follow 👍
This would be pretty cool functionality. Looks like our TLV code doesn't currently supports adding fields to an existing stream though, maybe that is something we want?
That would definitely be a nice addition, shouldn't be too hard to add that.
| reply *lnwire.ReplyChannelRange) bool { | ||
|
|
||
| return reply.QueryChannelRange == *query | ||
| return (reply.ChainHash == query.ChainHash && |
| func (c *UpdateFulfillHTLC) MaxPayloadLength(uint32) uint32 { | ||
| // 32 + 8 + 32 | ||
| return 72 | ||
| return MaxMsgBody |
There was a problem hiding this comment.
this entire method can be removed from the interface, since every message now has a constant max length
| return err | ||
| } | ||
|
|
||
| a.UpfrontShutdownScript, a.ExtraData, err = parseShutdownScript( |
There was a problem hiding this comment.
i think it makes sense to use ExtraOpaqueData deserialization for those cases where we don't have any defined TLV fields, but in cases where we have known fields I had imagined using a custom tlv.Stream that would container a pointer to the fields that need to be [de]serialized. In the long run, i don't think it will be scalable to have a parseXXX field for everything we want to end up extracting
There was a problem hiding this comment.
What about something like:
// Assumes all records being parsed start at the beginning of the TLV stream and follow one another.
// If this isn't the case, we'd need the custom `tlv.Stream` that contains pointers to each record.
func (e *ExtraOpaqueData) ParseKnownRecords(producers ...tlv.RecordProducer) error {
for _, p := range producers {
r := p.Record()
records, err := e.ExtractRecords(r)
if err != nil {
return err
}
rawRecordVal, ok := records[r.Type()]
if !ok || rawRecordVal != nil {
return fmt.Errorf("record with type %v not parsed",
r.Type())
}
// Remove the bytes corresponding to the record we just parsed
// from our TLV stream.
parsedRecord := p.Record()
valSize := parsedRecord.Size()
lenSize := tlv.VarIntSize(valSize)
typeSize := tlv.VarIntSize(uint64(parsedRecord.Type()))
*e = []byte(*e)[typeSize+lenSize+valSize:]
}
return nil
}
In the case of lnwire.OpenChannel, it'd be used as:
var tlvRecords ExtraOpaqueData
if err := ReadElements(r, &tlvRecords); err != nil {
return err
}
err = tlvRecords.ParseKnownRecords(&o.UpfrontShutdownScript)
if err != nil {
return err
}
o.ExtraData = tlvRecords
There was a problem hiding this comment.
I agree that this approach won't be scalable in the generic case, but the shutdown script is really a special case, since it must always be there.
I think we should defer making this more generic until we have actual new fields we want to add, we'll know more about the requirements then.
There was a problem hiding this comment.
Agreed, we can move ahead with the bulk of this PR given once we are properly writing/parsing all the data, we can make any future in-memory TLV representation changes at a later point.
|
|
||
| // packShutdownScript takes an upfront shutdown script and an opaque data blob | ||
| // and concatenates them. | ||
| func packShutdownScript(addr DeliveryAddress, extraData ExtraOpaqueData) ( |
There was a problem hiding this comment.
We could make this a bit more generic so that it applies for all wire messages and their TLV types with something like PackTLVRecords(ExtraOpaqueData, ...tlv.RecordProducer). This would assume all records follow one another, so we should also sort them by type before sending them to peers.
There was a problem hiding this comment.
Don't think this is non-blocking for this PR, once we start to add more TLV fields to messages, and also messages that are purely TLV, we'll have a better idea w.r.t ergonomics of the main interface.
| } | ||
|
|
||
| // Not among TLV records, this means the data was invalid. | ||
| if _, ok := tlvs[DeliveryAddrType]; !ok { |
There was a problem hiding this comment.
From the tlv.TypeMap docs:
// TypeMap is a map of parsed Types. The map values are byte slices. If the byte
// slice is nil, the type was successfully parsed. Otherwise the value is byte
// slice containing the encoded data.
Not sure if that's still the case, but if so, don't we need to check that the map value is also nil?
There was a problem hiding this comment.
Hm, looks like that doc is not entirely correct? @cfromknecht
There was a problem hiding this comment.
This is the output of DecodeWithParsedTypes, which'll tally all the types parsed as it goes along: https://github.com/lightningnetwork/lnd/blob/master/tlv/stream.go#L234-L238
If the type is known, it'll be a nice slice (as the TLV record will be populated). If the type is unknown, but odd, then we'll retain the bytes in the map.
There was a problem hiding this comment.
I see. I don't think it should matter then, since we are only checking a known type, so we'll know that it is nil always.
| return err | ||
| } | ||
|
|
||
| a.UpfrontShutdownScript, a.ExtraData, err = parseShutdownScript( |
There was a problem hiding this comment.
What about something like:
// Assumes all records being parsed start at the beginning of the TLV stream and follow one another.
// If this isn't the case, we'd need the custom `tlv.Stream` that contains pointers to each record.
func (e *ExtraOpaqueData) ParseKnownRecords(producers ...tlv.RecordProducer) error {
for _, p := range producers {
r := p.Record()
records, err := e.ExtractRecords(r)
if err != nil {
return err
}
rawRecordVal, ok := records[r.Type()]
if !ok || rawRecordVal != nil {
return fmt.Errorf("record with type %v not parsed",
r.Type())
}
// Remove the bytes corresponding to the record we just parsed
// from our TLV stream.
parsedRecord := p.Record()
valSize := parsedRecord.Size()
lenSize := tlv.VarIntSize(valSize)
typeSize := tlv.VarIntSize(uint64(parsedRecord.Type()))
*e = []byte(*e)[typeSize+lenSize+valSize:]
}
return nil
}
In the case of lnwire.OpenChannel, it'd be used as:
var tlvRecords ExtraOpaqueData
if err := ReadElements(r, &tlvRecords); err != nil {
return err
}
err = tlvRecords.ParseKnownRecords(&o.UpfrontShutdownScript)
if err != nil {
return err
}
o.ExtraData = tlvRecords
Roasbeef
left a comment
There was a problem hiding this comment.
LGTM 🍙
Updated another older testnet node, w/ this migration, and it's been running fine w/ no problems for the past few days. This unblocks a ton of other projects, so really excited to finally get this in!
| func (a *AcceptChannel) Encode(w io.Writer, pver uint32) error { | ||
| // Since the upfront script is encoded as a TLV record, concatenate it | ||
| // with the ExtraData, and write them as one. | ||
| tlvRecords, err := packShutdownScript( |
| } | ||
|
|
||
| // Not among TLV records, this means the data was invalid. | ||
| if _, ok := tlvs[DeliveryAddrType]; !ok { |
There was a problem hiding this comment.
This is the output of DecodeWithParsedTypes, which'll tally all the types parsed as it goes along: https://github.com/lightningnetwork/lnd/blob/master/tlv/stream.go#L234-L238
If the type is known, it'll be a nice slice (as the TLV record will be populated). If the type is unknown, but odd, then we'll retain the bytes in the map.
This is a continuation/slight rewrite of
multi: the great tlv wire migration#4747.There are a few key differences:
lnwirepackage version aware in order to use it in migrations for the old format, we copy the legacy lnwire code into the migration package. This allow us to changelnwirewithout worrying about breaking migrations.Instead of havingWe are back to having a separate field.UpfrontShutdownScriptbe its own field on the[Open|Acccept]Channelmessages, we embed it as a proper TLV field withing theExtraData, and fetch it from there when needed.Replaces #4747