Skip to content

channeldb: write the CommitSig in CommitDiff in LN wire format#4099

Closed
Roasbeef wants to merge 3 commits into
lightningnetwork:masterfrom
Roasbeef:db-wire-msg-length-prefix
Closed

channeldb: write the CommitSig in CommitDiff in LN wire format#4099
Roasbeef wants to merge 3 commits into
lightningnetwork:masterfrom
Roasbeef:db-wire-msg-length-prefix

Conversation

@Roasbeef
Copy link
Copy Markdown
Member

In this commit, we modify the way we write the CommitSIg in certain locations to match the way it's encoded on the wire. With this change, we'll now properly add a length prefix in front of the message, meaning we'll be able to read the extra data included (if present), and still be able to serialize content after this message.

We add a simple migration to migrate all the current pending commitments, along with a test. We opt to do the simple enclose-copy method for the migration, which results in a larger diff, but IMO an easier to understand migration test.

This PR is needed for #3966 to pass the tests properly.

Comment thread channeldb/channel.go Outdated
Comment thread channeldb/migration14/enclosed_codec.go Outdated
Comment thread channeldb/migration14/migration.go Outdated
Comment thread channeldb/migration14/migration.go Outdated
Comment thread channeldb/migration14/migration.go Outdated
Comment thread channeldb/migration14/migration.go Outdated
@Roasbeef Roasbeef force-pushed the db-wire-msg-length-prefix branch from d0342c1 to 49a8422 Compare March 20, 2020 01:21
@Roasbeef Roasbeef requested a review from cfromknecht March 20, 2020 01:22
@cfromknecht
Copy link
Copy Markdown
Contributor

@Roasbeef linter is failing https://travis-ci.org/github/lightningnetwork/lnd/jobs/664666715#L737, latest version looking pretty close tho 👍

@Roasbeef Roasbeef force-pushed the db-wire-msg-length-prefix branch from 49a8422 to ed4a948 Compare March 24, 2020 00:51
Comment thread channeldb/migration14/enclosed_codec.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One drive-by comment: here we are still relying on packages that may change and break the migration. These are basic types, so it is unlikely, but the migration isn't fully self-contained like for example a sql migration script would be or migration 13.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main package here that impacts the migration is lnwire, specifically the way we write wire messages to disk. I don't think this will be changing any time soon. If they attributes or types of any of the imported packages change, then if they materially affect the migration, this PR should no longer compile.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially the wire format is something that isn't set in stone, but of course it depends on what our support period is for this migration. Also a future change may be detected because the system doesn't build anymore, but that does mean that more copying needs to happen at that point. Touching an old migration to make it build again is a risk. I favor the fully isolated low-level approach.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying is essentially zero cost, it's all effectively dead code anyway. As mentioned offline, making migration14 into a module should resolve any issues with the referenced types/packages changing under us (so it should always build given the deps are static).

Comment thread channeldb/migration14/legacy_decoding.go Outdated
Comment thread channeldb/migration14/migration.go Outdated
@Roasbeef Roasbeef force-pushed the db-wire-msg-length-prefix branch from 5192edf to 63ef302 Compare March 26, 2020 01:46
@Roasbeef
Copy link
Copy Markdown
Member Author

Just pushed an update to this PR to make it now work properly with #3966. If you merge these two PRs into each other, all the unit tests should now pass. The changes in this new version are:

  • Modify codec.go to always include a length prefix. This changes the way we need to write the unsigned unacked updates to disk, so that's migrated as well.
  • Ensure we write the network results with that same length prefix.

The "migration" commits themselves are still small, as channeldb/codec.go does all the heavy lifting.

@Roasbeef Roasbeef requested review from halseth and joostjager March 26, 2020 01:48
Copy link
Copy Markdown
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at stack traces of calls to WriteMessage while running the itest suite. Aren't these additional places where a migration is needed?

  • Discovery message store
  • Channel close summary (last sync message)
  • Forwarding package (separate putLogUpdate)

Maybe there is more.

Comment thread channeldb/channel.go Outdated
Comment thread channeldb/migration14/migration.go Outdated
Comment thread channeldb/codec.go Outdated
Comment thread channeldb/migration14/enclosed_codec.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially the wire format is something that isn't set in stone, but of course it depends on what our support period is for this migration. Also a future change may be detected because the system doesn't build anymore, but that does mean that more copying needs to happen at that point. Touching an old migration to make it build again is a risk. I favor the fully isolated low-level approach.

@joostjager
Copy link
Copy Markdown
Contributor

I would reconsider merging this for 0.10. Touching the persistent channel state is risky regardless of how big or small the migration is. If we need to, it is safer to do it at the beginning of the cycle. As far as I can see, this pr and #3966 are only refactorings and don't add new functionality or fixes, so why take the risk?

Comment thread channeldb/migration14/migration.go Outdated
Comment thread channeldb/migration14/migration.go Outdated
@Roasbeef
Copy link
Copy Markdown
Member Author

Updated to now migrate the channel close summaries. We don't need to migrate the forwarding package or the discovery message store as those wire messages are the sole value being stored in the key. The channel summary migration itself is forwarding looking as we don't store anything after the message, but this change allows us to append new fields to the close summary in the future.

Please see the fixup commits as I had to modify some of the legacy decode/encode methods to bypass Read/WriteElements and use lnwire.Read/WriteMessage directly as the imported codec includes the new serialization for wire messages with the length prefix.

@Roasbeef Roasbeef requested review from halseth and joostjager March 27, 2020 22:34
Comment thread channeldb/migration14/migration.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to now migrate the channel close summaries. We don't need to migrate the forwarding package or the discovery message store as those wire messages are the sole value being stored in the key. The channel summary migration itself is forwarding looking as we don't store anything after the message, but this change allows us to append new fields to the close summary in the future.

Good news that the other messages are stored with their own key. But why do the migration of the close summary now then? Why take the risk if we don't need it now and possibly never? For the single-message keys, you could also argue that we should migrate them in case we add something more to it in the future. I'd think either future-proof them all or future-proof none (preferred).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 y. But why do the migration of the close summary now then? Why take the risk if we don't need it now and possibly never?

You brought it up, and it falls under the same case as the other instances fixed in this PR. IMO we might as well bundle these up so we can have one less migration in the future.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the single-message keys, you could also argue that we should migrate them in case we add something more to it in the future.

Now that we're aware of the issue, a migration isn't the only way to resolve this issue for the single-value use case. They can either use another key or some other option.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought them up as something to check, but didn't mean to say that they should be migrated if it isn't strictly needed.

Migration isn't the only way to resolve this issue for the single-value use case, but neither is it the only way to resolve the issue for multi-valued use cases. Depending on db structure, another key could be added there too.

Comment thread channeldb/migration14/legacy_decoding.go Outdated
Comment thread channeldb/migration14/legacy_decoding.go Outdated
Comment thread channeldb/migration14/legacy_decoding.go Outdated
Comment thread channeldb/codec.go Outdated
@Roasbeef Roasbeef removed the request for review from halseth March 31, 2020 18:26
@Roasbeef Roasbeef modified the milestones: 0.10.0, 0.11.0 Apr 7, 2020
@Roasbeef Roasbeef added database Related to the database/storage of LND migration and removed v0.10 labels Apr 7, 2020
Comment thread channeldb/codec.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know that CommitSig is the only lnwire.Message that isn't length-prefixed?

These are all the messages I found passing through this codec in channeldb unit tests

DECODING CommitSig
DECODING UpdateAddHTLC
DECODING UpdateFailHTLC
DECODING UpdateFulfillHTLC
ENCODING CommitSig
ENCODING UpdateAddHTLC
ENCODING UpdateFailHTLC
ENCODING UpdateFulfillHTLC

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one other message is the ChannelReestablish that we store within the channel close summary. In the pst I think I had portions of this in this PR, but it's been a bit since I've rebased the PR that adds the actual change in the lnwire logic.

I'm gonna rebase that on top of this, and run the tests there, which in the past have let me catch other issues in this base migration.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think those are the LogUpdates in the greater commit diff.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChannelReestablish should be handled when we migrate the closed channel summaries using the same construct:

  • read out using the old non-length prefixed version
  • write out using the new length prefixed version

Comment thread channeldb/migration15/migration.go Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to test this, but I think it's ok @cfromknecht. We'll read things out using the old format here, which includes the log updates without the length prefix. Down below, we'll then write things out using the new format that includes the length prefix for each message.

The chan reest migration is still missing though from what I can tell.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We migrated the CommitSig message in isolation because we used the raw Encode/Decode methods on it rather than lnwire.[Read/Write]Message.

@Roasbeef Roasbeef force-pushed the db-wire-msg-length-prefix branch from 41fe926 to 482f0cc Compare June 16, 2020 23:23
@Roasbeef
Copy link
Copy Markdown
Member Author

Pushed an update as migration17 (no longer 15).

@Roasbeef Roasbeef force-pushed the db-wire-msg-length-prefix branch from 482f0cc to e8b496a Compare June 16, 2020 23:25
@Roasbeef
Copy link
Copy Markdown
Member Author

Roasbeef commented Jun 16, 2020

So I ran this PR in isolation against an existing node. Everything checked out. I then ran the combined branch with #3966 which shone light on an existing issue (brought up earlier I believe):

  • When the migration is run with lnwire: create new ExtraOpaqueData type for parsing TLV extensions #3966, it uses the new lnwire methods that cause it to attempt to continue reading until an EOF is hit.
  • This causes the migration to read out too much data during the translation process, which garbles the wire messages, rendering them unreadable after the migration runs.

If the prior sub-module idea worked like we wanted it to, then we could've pinned this dependnacy, and avoided this issue.

In order to avoid needing to copy over pretty much all of lnwire, I think we can instead add an intermediate commit to create a slimmed down lnwire.ReadMessage that manually reads in each field for the set of messages we care about. This method would be used in the migration, and would let us avoid having to copy over all the structs.

Thoughts?

@Roasbeef
Copy link
Copy Markdown
Member Author

Alternatively, I can instead modify #3966 to actually use the protocol version (pver) argument of lnwire.[Read/Write]Message. As the pver is already threaded down all the way to the individual Encode/Decode methods, we could then make the "TLV read until EOF" only apply to pver=1. The migration would use pver=0 when reading and pver=1 when writing. Everything else would then be updated to use pver=1.

This doesn't add any new copied code and uses an existing extension/backwards-compat feature that has been unused until now. I'm leaning towards this approach myself.

@Roasbeef
Copy link
Copy Markdown
Member Author

For future migrations, I think this could've been caught if we add a "post-condition" to the dry run mode. In this case, it would've tried to read out a commit diff or some other field, hit this error, then rolled everything back.

@Roasbeef
Copy link
Copy Markdown
Member Author

Need to also add a migration for the forwarding packages since they use codec.ReadElements.

@cfromknecht
Copy link
Copy Markdown
Contributor

Alternatively, I can instead modify #3966 to actually use the protocol version (pver) argument of lnwire.[Read/Write]Message. As the pver is already threaded down all the way to the individual Encode/Decode methods, we could then make the "TLV read until EOF" only apply to pver=1. The migration would use pver=0 when reading and pver=1 when writing. Everything else would then be updated to use pver=1.

This indeed sounds like the best approach. If we go this route, maybe it makes sense to have a global constant to prevent us from accidentally forgetting what version we are on. Currently it's just a phantom argument that people don't think about, so we would want to make sure that people don't forget to use the right one in the future.

@cfromknecht cfromknecht modified the milestones: 0.11.0, 0.12.0 Jun 27, 2020
@cfromknecht cfromknecht added v0.12 and removed v0.11 labels Jun 27, 2020
@Roasbeef Roasbeef force-pushed the db-wire-msg-length-prefix branch from e8b496a to 8d69392 Compare July 24, 2020 23:22
@Roasbeef Roasbeef requested review from cfromknecht and halseth July 25, 2020 01:42
@Roasbeef
Copy link
Copy Markdown
Member Author

Alternatively, I can instead modify #3966 to actually use the protocol version (pver) argument of lnwire.[Read/Write]Message.

Updated the other PR to implement this. This change ties these two PRs closer to each other once more, as #3966 won't pass the unit tests without this PR. I want to combine them, but then the resulting diff would be rather large.

@Roasbeef
Copy link
Copy Markdown
Member Author

Need to also add a migration for the forwarding packages since they use codec.ReadElements.

Correction, the forwarding packages are OK since they write out to a value that only has the log updates and nothing else. The issue was with lnwire, which is now resolved in #3966.

@Roasbeef Roasbeef requested a review from joostjager July 25, 2020 01:45
err = closedChanBucket.ForEach(func(k, v []byte) error {
closedChans = append(closedChans, closedChan{
chanKey: k,
summaryBytes: v,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe without copying the slice?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes aren't used outside the transaction, so I think so.

@joostjager
Copy link
Copy Markdown
Contributor

Updated the other PR to implement this. This change ties these two PRs closer to each other once more, as #3966 won't pass the unit tests without this PR. I want to combine them, but then the resulting diff would be rather large.

And how about combine this PR with #3966 excluding the addition of ExtraOpaqueData in all messages?

Copy link
Copy Markdown
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making these 'pure copy' commits? That would make review a lot easier. Something like commit 1: direct copy of legacy serialization, 2: modify main serialization code, 3: direct copy of new serialization code, 4: add migration. Then only 2 and 4 need to be reviewed.

With this change, errors from migrations will have the proper local line
number.
In this commit, we modify the way we write wire messages across the
entire database. We'll now ensure that we always write wire messages
with a length prefix. We update the `codec.go` file to always write a 2
byte length prefix, this affects the way we write the `CommitDiff` and
`LogUpdates` struct to disk. We also need to migrate the network results
bucket in the switch as it includes a wire message without a length
prefix.
@Roasbeef
Copy link
Copy Markdown
Member Author

Roasbeef commented Nov 5, 2020

This PR is going away now (in favor of the combined version), but I pushed up a rebased version for posterity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

database Related to the database/storage of LND migration v0.12

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants