MPP: Enable MultiPathPayments for payment lifecycle#3970
Conversation
dd1b592 to
389e145
Compare
389e145 to
97268ac
Compare
joostjager
left a comment
There was a problem hiding this comment.
Haven't reviewed the pr in total, but wanted to drop this high level comment right away.
I think this is a good idea. Please hold off review until I've pushed a version doing it this way (or found that it is not feasible). |
|
Can be rebased now (on top of the changes for the modification referenced above). |
e276580 to
02f9757
Compare
|
New version pushed, review can be resumed. (unit test changes still WIP) |
To move towards how we will handle existing attempt in case of MPP (collecting their outcome will be done in separate goroutines separate from the payment loop), we move to collect their outcome first. To easily fetch HTLCs that are still not resolved, we add the utility method InFlightHTLCs to channeldb.MPPayment.
In preparation for MPP we return the terminal errors recorded with the control tower. The reason is that we cannot return immediately when a shard fails for MPP, since there might be more shards in flight that we must wait for. For that reason we instead mark the payment failed in the control tower, then return this error when we inspect the payment, seeing it has been failed and there are no shards in flight.
2d68fb3 to
a426f1e
Compare
|
Got this on shutdown: Maybe exempt logging for router shut down |
joostjager
left a comment
There was a problem hiding this comment.
I am ready to roll with this monster PR. Only functional outstanding issue is the inconsistency of the state reporting between TrackPayment and ListPayments. Discussed offline to address in a follow-up.
There is a bit more unification work to be done probably to bring those two rpc calls closer. Also the response of routerrpc.SendToRoute could be a message of type lnrpc.HTLCAttempt.
Overall great work to make the switch-over to multi-part sends possible. Definitely worth a mention in the Lightning history books 📖
There was a problem hiding this comment.
Probably still needed for the legacy sendpayment response that returns the success route on the payment level.
This commit redefines how the control tower handles shard and payment level settles and failures. We now consider the payment in flight as long it has active shards, or it has no active shards but has not reached a terminal condition (settle of one of the shards, or a payment level failure has been encountered). We also make it possible to settle/fail shards regardless of the payment level status (since we must allow late shards recording their status even though we have already settled/failed the payment). Finally, we make it possible to Fail the payment when it is already failed. This is to allow multiple concurrent shards that reach terminal errors to mark the payment failed, without havinng to synchronize.
(almost) PURE CODE MOVE The only code change is to change a few select cases from case _ <- channel: to case <- channel: to please the linter. The test is testing the payment lifecycle, so move it to payment_lifecycle_test.go
This also fixes a test bug that the manually created route didn't match the actual payment amount in the test cases, and adds some fees to the route.
Also rename Success to SettleAttempt in the tests.
And modify the MissionControl mock to return a non-nil failure reason in this case.
This commit finally enables MP payments within the payment lifecycle (used for SendPayment). This is done by letting the loop launch shards as long as there is value remaining to send, inspecting the outcomes for the sent shards when the full payment amount has been filled. The method channeldb.MPPayment.SentAmt() is added to easily look up how much value we have sent for the payment.
This commit enables MPP sends for SendToRoute, by allowing launching another payment attempt if the hash is already registered with the ControlTower. We also set the total payment amount of of the payment from mpp record, to indicate that the shard value might be different from the total payment value. We only mark non-MPP payments as failed in the database after encountering a failure, since we might want to try more shards for MPP. For now this means that MPP sendToRoute payments will be failed only after a restart has happened.
We add validation making sure we are not trying to register MPP shards for non-MPP payments, and vice versa. We also add validtion of total sent amount against payment value, and matching MPP options. We also add methods for copying Route/Hop, since it is useful to use for modifying the route amount in the test.
testSendToRouteMultiPath tests that we are able to successfully route a payment using multiple shards across different paths, by using SendToRoute. Co-authored-by: Joost Jager <joost.jager@gmail.com>
We whitelist a set of "expected" errors that can be returned from RequestRoute, by converting them into a new type noRouteError. For any other error returned by RequestRoute, we'll now exit immediately.
Fixed |
Fixed. |
c18af65 to
5e72a4b
Compare
cfromknecht
left a comment
There was a problem hiding this comment.
Latest version LGTM, congrats @halseth on topping the leaderboard with a new most-commented PR to lnd! 🏆Super excited to get this deployed and get rolling with the follow ups to complete the MPP picture.
This PR makes the changes needed to support multi path payments from the payment lifecycle's perspective, by sending and collecting multiple payment shards simultaneously.
There parts that see the most significant change in this PR:
PaymentSession
The
RequestRoutecall now takesmaxAmtandfeelimitas arguments. The idea here is that it will use this this create a route that carry an amount up to this maximum. The actual splitting is handled in #3967, so note that in this PR thePaymentSessionwill always create a single shard with the full amount always.ControlTower (channeldb.payment_control)
To allow each payment to have multiple HTLCs in flight at the same time, and handle all the cases of the shards failing/settling individually, we redefine (and loosen up the strictness a bit) the
ControlTower.We now allow registering new attempts for a payment as long as the payment is in flight and has not reached a terminal condition. A terminal condition is either a settled shard, or one of the shards encountering a failure that should fail the payment. In these cases we want to wait for all outstanding shards to finish, but we don't want to register any new ones.
Settling/failing HTLC attempts now is handled independently of the payment level status.
Failing the payment is now allowed as long as the payment is known. The idea here is that any shard that encounter a terminal error can fail the payment, but the payment status is not actually failed before all shards have finished, and none of the shards settled. This is to simplify the interaction between shards, as they don't have to coordinate to record the ultimate payment outcome in the database. This also handles the case where the payment is in the process of being failed, but then one of the late shards settle, overriding the recorded failure.
tl;dr: We record shard settles/fail individually for each HTLC, any shard can fail the payment, but when we read the payment status we do it in this order:
PaymentStatus=InFlightPaymentStatus=SucceededPaymentStatus=FailedSendPayment/PaymentLifecycle
The result returned from
SendPaymentis changed slightly. In cases of terminal payment failures, earlier we would return a string depicting the last error we encountered during path finding. This was mainly done to be able to get the failure outcome for aSendToRouteattempt, since for a regularSendPaymentmany many attempts can be made before an error is returned, and in that case the last error isn't worth much more than all other encountered errors.With this insight we now remove this
lastError"hack" (seeSendToRoutesection below), and return a genericno_routeerror fromSendPaymentwhen path finding ultimately fails. With the introduction ofHTLCattempt tracking in #3989, we can instead get the individual HTLC errors from doing aListPaymentscall.The main reason we wanted to remove this
lastErrortracking is that it makes even less sense in a multi-shard scenario. When we have several concurrent shards that fails, there is really no such thing as a "last error", and it makes more sense to look up the HTLC outcomes individually.To enable multi-shard sending, the payment lifecycle will now request shards up to a given amount from the
PaymentSession, and use the resulting route to create a shard. Shards will be launched as long as there is remaining value to be sent. A payment is considered successful the moment a preimage comes back (and failed the moment at least one of the shards encounter a terminal error) and all outstanding shards have failed.SendToRoute
With the abstractions done in the payment lifecycle,
SendToRouteno longer needs to be launched using a payment lifecycle. Instead we calllaunchShardandcollectResultdirectly, avoiding having to use a prebuilt payment session. Note that forSendToRoutepayments, we always record a failed shard as a "terminal error" in the database, since we can never know whether it is the lastSendToRoutecall that will be made to this payment hash. This works out nicely with the newControlTowersetup.TODO:
Builds on #4000