channeldb: write the CommitSig in CommitDiff in LN wire format#4099
channeldb: write the CommitSig in CommitDiff in LN wire format#4099Roasbeef wants to merge 3 commits into
Conversation
89a60c1 to
d0342c1
Compare
d0342c1 to
49a8422
Compare
|
@Roasbeef linter is failing https://travis-ci.org/github/lightningnetwork/lnd/jobs/664666715#L737, latest version looking pretty close tho 👍 |
49a8422 to
ed4a948
Compare
There was a problem hiding this comment.
One drive-by comment: here we are still relying on packages that may change and break the migration. These are basic types, so it is unlikely, but the migration isn't fully self-contained like for example a sql migration script would be or migration 13.
There was a problem hiding this comment.
The main package here that impacts the migration is lnwire, specifically the way we write wire messages to disk. I don't think this will be changing any time soon. If they attributes or types of any of the imported packages change, then if they materially affect the migration, this PR should no longer compile.
There was a problem hiding this comment.
Especially the wire format is something that isn't set in stone, but of course it depends on what our support period is for this migration. Also a future change may be detected because the system doesn't build anymore, but that does mean that more copying needs to happen at that point. Touching an old migration to make it build again is a risk. I favor the fully isolated low-level approach.
There was a problem hiding this comment.
Copying is essentially zero cost, it's all effectively dead code anyway. As mentioned offline, making migration14 into a module should resolve any issues with the referenced types/packages changing under us (so it should always build given the deps are static).
5192edf to
63ef302
Compare
|
Just pushed an update to this PR to make it now work properly with #3966. If you merge these two PRs into each other, all the unit tests should now pass. The changes in this new version are:
The "migration" commits themselves are still small, as |
joostjager
left a comment
There was a problem hiding this comment.
I've looked at stack traces of calls to WriteMessage while running the itest suite. Aren't these additional places where a migration is needed?
- Discovery message store
- Channel close summary (last sync message)
- Forwarding package (separate
putLogUpdate)
Maybe there is more.
There was a problem hiding this comment.
Especially the wire format is something that isn't set in stone, but of course it depends on what our support period is for this migration. Also a future change may be detected because the system doesn't build anymore, but that does mean that more copying needs to happen at that point. Touching an old migration to make it build again is a risk. I favor the fully isolated low-level approach.
|
I would reconsider merging this for 0.10. Touching the persistent channel state is risky regardless of how big or small the migration is. If we need to, it is safer to do it at the beginning of the cycle. As far as I can see, this pr and #3966 are only refactorings and don't add new functionality or fixes, so why take the risk? |
|
Updated to now migrate the channel close summaries. We don't need to migrate the forwarding package or the discovery message store as those wire messages are the sole value being stored in the key. The channel summary migration itself is forwarding looking as we don't store anything after the message, but this change allows us to append new fields to the close summary in the future. Please see the fixup commits as I had to modify some of the legacy decode/encode methods to bypass |
There was a problem hiding this comment.
Updated to now migrate the channel close summaries. We don't need to migrate the forwarding package or the discovery message store as those wire messages are the sole value being stored in the key. The channel summary migration itself is forwarding looking as we don't store anything after the message, but this change allows us to append new fields to the close summary in the future.
Good news that the other messages are stored with their own key. But why do the migration of the close summary now then? Why take the risk if we don't need it now and possibly never? For the single-message keys, you could also argue that we should migrate them in case we add something more to it in the future. I'd think either future-proof them all or future-proof none (preferred).
There was a problem hiding this comment.
y. But why do the migration of the close summary now then? Why take the risk if we don't need it now and possibly never?
You brought it up, and it falls under the same case as the other instances fixed in this PR. IMO we might as well bundle these up so we can have one less migration in the future.
There was a problem hiding this comment.
For the single-message keys, you could also argue that we should migrate them in case we add something more to it in the future.
Now that we're aware of the issue, a migration isn't the only way to resolve this issue for the single-value use case. They can either use another key or some other option.
There was a problem hiding this comment.
I brought them up as something to check, but didn't mean to say that they should be migrated if it isn't strictly needed.
Migration isn't the only way to resolve this issue for the single-value use case, but neither is it the only way to resolve the issue for multi-valued use cases. Depending on db structure, another key could be added there too.
There was a problem hiding this comment.
do we know that CommitSig is the only lnwire.Message that isn't length-prefixed?
These are all the messages I found passing through this codec in channeldb unit tests
DECODING CommitSig
DECODING UpdateAddHTLC
DECODING UpdateFailHTLC
DECODING UpdateFulfillHTLC
ENCODING CommitSig
ENCODING UpdateAddHTLC
ENCODING UpdateFailHTLC
ENCODING UpdateFulfillHTLC
There was a problem hiding this comment.
I think one other message is the ChannelReestablish that we store within the channel close summary. In the pst I think I had portions of this in this PR, but it's been a bit since I've rebased the PR that adds the actual change in the lnwire logic.
I'm gonna rebase that on top of this, and run the tests there, which in the past have let me catch other issues in this base migration.
There was a problem hiding this comment.
Oh I think those are the LogUpdates in the greater commit diff.
There was a problem hiding this comment.
ChannelReestablish should be handled when we migrate the closed channel summaries using the same construct:
- read out using the old non-length prefixed version
- write out using the new length prefixed version
There was a problem hiding this comment.
Need to test this, but I think it's ok @cfromknecht. We'll read things out using the old format here, which includes the log updates without the length prefix. Down below, we'll then write things out using the new format that includes the length prefix for each message.
The chan reest migration is still missing though from what I can tell.
There was a problem hiding this comment.
We migrated the CommitSig message in isolation because we used the raw Encode/Decode methods on it rather than lnwire.[Read/Write]Message.
41fe926 to
482f0cc
Compare
|
Pushed an update as |
482f0cc to
e8b496a
Compare
|
So I ran this PR in isolation against an existing node. Everything checked out. I then ran the combined branch with #3966 which shone light on an existing issue (brought up earlier I believe):
If the prior sub-module idea worked like we wanted it to, then we could've pinned this dependnacy, and avoided this issue. In order to avoid needing to copy over pretty much all of Thoughts? |
|
Alternatively, I can instead modify #3966 to actually use the protocol version ( This doesn't add any new copied code and uses an existing extension/backwards-compat feature that has been unused until now. I'm leaning towards this approach myself. |
|
For future migrations, I think this could've been caught if we add a "post-condition" to the dry run mode. In this case, it would've tried to read out a commit diff or some other field, hit this error, then rolled everything back. |
|
Need to also add a migration for the forwarding packages since they use |
This indeed sounds like the best approach. If we go this route, maybe it makes sense to have a global constant to prevent us from accidentally forgetting what version we are on. Currently it's just a phantom argument that people don't think about, so we would want to make sure that people don't forget to use the right one in the future. |
e8b496a to
8d69392
Compare
Updated the other PR to implement this. This change ties these two PRs closer to each other once more, as #3966 won't pass the unit tests without this PR. I want to combine them, but then the resulting diff would be rather large. |
Correction, the forwarding packages are OK since they write out to a value that only has the log updates and nothing else. The issue was with |
| err = closedChanBucket.ForEach(func(k, v []byte) error { | ||
| closedChans = append(closedChans, closedChan{ | ||
| chanKey: k, | ||
| summaryBytes: v, |
There was a problem hiding this comment.
is this safe without copying the slice?
There was a problem hiding this comment.
The bytes aren't used outside the transaction, so I think so.
And how about combine this PR with #3966 excluding the addition of |
joostjager
left a comment
There was a problem hiding this comment.
What do you think about making these 'pure copy' commits? That would make review a lot easier. Something like commit 1: direct copy of legacy serialization, 2: modify main serialization code, 3: direct copy of new serialization code, 4: add migration. Then only 2 and 4 need to be reviewed.
With this change, errors from migrations will have the proper local line number.
In this commit, we modify the way we write wire messages across the entire database. We'll now ensure that we always write wire messages with a length prefix. We update the `codec.go` file to always write a 2 byte length prefix, this affects the way we write the `CommitDiff` and `LogUpdates` struct to disk. We also need to migrate the network results bucket in the switch as it includes a wire message without a length prefix.
|
This PR is going away now (in favor of the combined version), but I pushed up a rebased version for posterity. |
8d69392 to
b5876b2
Compare
In this commit, we modify the way we write the CommitSIg in certain locations to match the way it's encoded on the wire. With this change, we'll now properly add a length prefix in front of the message, meaning we'll be able to read the extra data included (if present), and still be able to serialize content after this message.
We add a simple migration to migrate all the current pending commitments, along with a test. We opt to do the simple enclose-copy method for the migration, which results in a larger diff, but IMO an easier to understand migration test.
This PR is needed for #3966 to pass the tests properly.