Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the multi-part payments #3857

Merged
merged 24 commits into from
Jul 23, 2020
Merged

Fix the multi-part payments #3857

merged 24 commits into from
Jul 23, 2020

Conversation

cdecker
Copy link
Member

@cdecker cdecker commented Jul 19, 2020

Fixes #3852
Fixes #3851
Fixes #3855

Removes the scary warning about bcli not being able to auto-connect to the JSON-RPC:
Fixes #3553
Fixes #3859

@cdecker cdecker added this to the v0.9.0 milestone Jul 19, 2020
@cdecker cdecker self-assigned this Jul 19, 2020
lightningd/pay.c Outdated Show resolved Hide resolved
plugins/libplugin-pay.c Outdated Show resolved Hide resolved
plugins/libplugin-pay.c Show resolved Hide resolved
plugins/libplugin-pay.c Show resolved Hide resolved
@cdecker
Copy link
Member Author

cdecker commented Jul 20, 2020

Cherry-picked the contents of #3856 as suggested by @ZmnSCPxj but maintained authorship, since credit goes to @vincenzopalazzo and @ZmnSCPxj for finding this and fixing it before I could even look at it 👍

@cdecker
Copy link
Member Author

cdecker commented Jul 21, 2020

Rebased and squashed after merding #3860

@rustyrussell
Copy link
Contributor

rustyrussell commented Jul 21, 2020

OK, this passed the Twitter test!

Minor issue, it splits 2x too aggressively, proposed fix here:
rustyrussell@49cb5ba

Actually, I've just pushed it on the end, along with another fix I lost somehow :(

@rustyrussell
Copy link
Contributor

Ack d6bb191

@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Jul 22, 2020

Perhaps late to bring this up, but: do we need to actually arrange payments in a tree as opposed to a flat array of sub-payments?

The reason I bring this up is because we might want to merge sub-payments. For example, consider the case where the only viable paths for two sub-payments cost too much (i.e. fee budget is violated), but if we merge them (and their fee budgets) we could succeed.

But if the sub-payments we want to merge are cousins instead of siblings in the tree of payments, how do we viably merge them? For example, consider the case below:

        0
       / \
      /   \
     1     2
    / \   / \
   3   4 5   6

Suppose 4 and 6 are currently ongoing (possibly because they reached the destination, so the destination is waiting on the other parts). And 3 and 5 are failing getroute because the viable routes are too expensive (e.g. 4 and 6 have locked up funds on cheaper routes). However, if we merge 3 and 5, it may be feasible (because they would pool their fee budgets together).

As each subpayment needs a unique numeric id, we could use a monotonically-increasing counter and add an explicit field. Merged payments can get any of the ids of the merged payments.

And of course modules have to handle merges as well, if their data structures are not void.

@rustyrussell
Copy link
Contributor

Trivial rebase for typo fix.

@cdecker
Copy link
Member Author

cdecker commented Jul 22, 2020

Perhaps late to bring this up, but: do we need to actually arrange payments in a tree as opposed to a flat array of sub-payments?

The reason I bring this up is because we might want to merge sub-payments. For example, consider the case where the only viable paths for two sub-payments cost too much (i.e. fee budget is violated), but if we merge them (and their fee budgets) we could succeed.

But if the sub-payments we want to merge are cousins instead of siblings in the tree of payments, how do we viably merge them? For example, consider the case below:

        0
       / \
      /   \
     1     2
    / \   / \
   3   4 5   6

Suppose 4 and 6 are currently ongoing (possibly because they reached the destination, so the destination is waiting on the other parts). And 3 and 5 are failing getroute because the viable routes are too expensive (e.g. 4 and 6 have locked up funds on cheaper routes). However, if we merge 3 and 5, it may be feasible (because they would pool their fee budgets together).

As each subpayment needs a unique numeric id, we could use a monotonically-increasing counter and add an explicit field. Merged payments can get any of the ids of the merged payments.

And of course modules have to handle merges as well, if their data structures are not void.

Discussions are always welcome, though I don't think that this should be discussed in a PR where it is likely to get lost, so I opened a new issue for this.

The TL;DR is that I think the merging or not of payments is independent of how we link the payments internally. We can represent the tree in an array just as well, and as you point out the partid are consecutive already, so we could just also stash them in an additional array.

I'm wondering whether we'd want to merge at all, or whether other techniques such as fee stealing from other subpayments that we consider to "have worked" (TM) are more promising.

With the presplitter in particular we would have n attempts but the array
contains n+1 entries, which is kinda weird.
This was highlighted in #3851, so I added an assertion. After the rewrite in
the next commit we would simply skip if any of the constraints were not
maintained, but this serves as the canary in the coalmine, so we don't paper over.
We now check against both constraints on the modifier and the payment before
applying either. This "fixes" the assert that was causing the crash in #3851,
but we are still looking for the source of the inconsistency where the
modifier constraints, initialized to 1/4th of the payment, suddenly get more
permissive than the payment itself.
Mainly used for testing so we make sure we exclude or constrain the correct
channels. Test to follow.
We skip most payment steps and all sub-payments, so consolidate the skip
conditions in one if-statement. We also not use `payment_set_step` to skip any
modifiers after us after the step change.
When using mpp we need to always have partids>0, since we bumped the partid
for the root, but not the next_id we'd end up with partid=1 being
duplicated. Not a big problem since we never ended up sending the root to
lightningd, instead skipping it, but it was confusing me while trying to trace
sub-payment's ancestry.
cdecker and others added 9 commits July 22, 2020 16:14
We were using the current constraints, including any shadow route and other
modifications, when computing the remainder that the second child should
use. Instead we should use the `start_constraints` on the parent payment,
which is a copy of `constraints` created in `payment_start` exactly for this
purpose.

Also added an assert for the invariant on the multiplier.
Because listchannels responses are async, with mpp we can end up
fighting over use of the parent's data giving results from incorrect
final CLTVs to assertion failures like below:

```
pay: plugins/libplugin-pay.c:1964: shadow_route_listchannels: Assertion `amount_msat_greater_eq(p->constraints.fee_budget, d->constraints.fee_budget)' failed.
pay: FATAL SIGNAL 6 (version v0.9.0rc2-11-g29d32e0-modded)
0x563129eb6a15 send_backtrace
	common/daemon.c:38
0x563129eb6abf crashdump
	common/daemon.c:51
0x7fe37c8b920f ???
	???:0
0x7fe37c8b918b ???
	???:0
0x7fe37c898858 ???
	???:0
0x7fe37c898728 ???
	???:0
0x7fe37c8a9f35 ???
	???:0
0x563129ea5c63 shadow_route_listchannels
	plugins/libplugin-pay.c:1964
0x563129e9c56a handle_rpc_reply
	plugins/libplugin.c:547
0x563129e9cbfa rpc_read_response_one
	plugins/libplugin.c:662
0x563129e9ccf0 rpc_conn_read_response
	plugins/libplugin.c:681
0x563129ece660 next_plan
	ccan/ccan/io/io.c:59
0x563129ecf245 do_plan
	ccan/ccan/io/io.c:407
0x563129ecf287 io_ready
	ccan/ccan/io/io.c:417
0x563129ed151f io_loop
	ccan/ccan/io/poll.c:445
0x563129e9ef03 plugin_main
	plugins/libplugin.c:1284
0x563129e9b099 main
	plugins/pay.c:2010
0x7fe37c89a0b2 ???
	???:0
0x563129e9452d ???
	???:0
0xffffffffffffffff ???
	???:0
```
Seems like we get *4* failures, since failing to find a route counts now?

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The code had incorrect assertions, partially because it didn't clearly
distinguish errors from the final node (which, barring blockheight issues,
mean a complete failre) and intermediate nodes.

In particular, we can't trust the *values*, so we need to distinguish
these by the *sender*.

If a route is of length 2 (A, B):
- erring_index == 0 means us complaining about channel A.
- erring_index == 1 means A.node complaining about channel B.
- erring_index == 2 means the final destination node B.node.

This is particularly of note because Travis does NOT run test_pay_routeboost!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Somehow, we occasionally set the wrong amount field?

Doesn't happen all the time, but when it does:

b'2020-07-20T05:40:15.510Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10839569msat < 500000000msat'
b'2020-07-20T05:40:15.510Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.510Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 5 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.513Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: peer_in WIRE_UPDATE_ADD_HTLC'
b'2020-07-20T05:40:15.514Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9710103msat < 500000000msat'
b'2020-07-20T05:40:15.514Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.514Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 10 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.518Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10915092msat < 500000000msat'
b'2020-07-20T05:40:15.518Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.518Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 0 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.521Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9143652msat < 500000000msat'
b'2020-07-20T05:40:15.521Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.521Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 3 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.524Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9840417msat < 500000000msat'
b'2020-07-20T05:40:15.524Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.524Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 6 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.527Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10524535msat < 500000000msat'
b'2020-07-20T05:40:15.527Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.527Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 8 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.536Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9579583msat < 500000000msat'
b'2020-07-20T05:40:15.536Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.536Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 1 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.541Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9048144msat < 500000000msat'
b'2020-07-20T05:40:15.541Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.541Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 7 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.544Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10858167msat < 500000000msat'
b'2020-07-20T05:40:15.544Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.544Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 9 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.548Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10137155msat < 500000000msat'
b'2020-07-20T05:40:15.548Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.548Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 2 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.551Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10002298msat < 500000000msat'
b'2020-07-20T05:40:15.551Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113'
b'2020-07-20T05:40:15.551Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 4 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC'
b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: NEW:: HTLC REMOTE 14 = RCVD_ADD_HTLC/SENT_ADD_HTLC'
b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: peer_in WIRE_UPDATE_ADD_HTLC'
b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: NEW:: HTLC REMOTE 15 = RCVD_ADD_HTLC/SENT_ADD_HTLC'
b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: peer_in WIRE_UPDATE_ADD_HTLC'
b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: NEW:: HTLC REMOTE 16 = RCVD_ADD_HTLC/SENT_ADD_HTLC'
This PR includes the fix discussed on PR #3855. This fix was tested with the use case described inside the issue and worked.

Fixes: #3855

Changelog-None
Changelog-Fixed: plugin: `bcli` no longer logs a harmless warning about being unable to connect to the JSON-RPC interface.
Changelog-Added: plugin: Plugins can opt out of having an RPC connection automatically initialized on startup.
With MPP we end up with more than 1 attempt almost always.
It turns out that by aggressively splitting payments we may end up exhausting
the number of HTLCs we can add to a channel quickly. By tracking the number of
HTLCs we can still add, and excluding the channels to which we cannot add any
more we increase the route diversity, and avoid quickly exhausting the HTLC
budget.

In the next commit we'll also implement an early abort if we've exhausted all
channels, so we don't end up splitting indefinitely and we can also optimize
the initial split to not run afoul of that limit.
The presplit modifier could end up exceeding the maximum number of HTLCs we
can add to a channel right out the gate, so we switch to a dynamic presplit if
that is the case. The presplit will now at most use 1/3rd of the available
HTLCs on the channels if the normal split would exceed the number of availabe
HTLCs. And we also abort early if we don't have a sufficient HTLCs available.
There is little point in trying to split if the resulting HTLCs exceed the
maximum number of HTLCs we can add to our channels. So abort if a split would
result in more HTLCs than our channels can support.
This may be related to the issue #3862, however the water was muddied by it
being the wrong error to return, and the node should not expect this courtesy
feature to be present at all...
We were removing the current hint from the list and not inheriting the current
routehint, so we'd be forgetting a hint at each retry. Now we keep the array
unchanged in the root, and simply skip the ones that are not usable given the
current information we have about the channels (in the form of channel_hints).

Fixes #3861
As the hints get new fields added it is easy to forget to amend one of the
places we create them, since we already have an update method let's use that
to handle all additions to the array of known channel hints.
The adaptive MPP test was showing an issue with always using a routehint, even
when it wasn't necessary: we would insist on routhing to the entrypoint of the
routehint, even through the actual destination. If a channel on that loop
would result being over capacity we'd slam below 0, and then increase again by
unapplying the route. The solution really is not to insist on routing through
a routehint, so we implement random skipping of routehints, and we rotate them
if we have multiples.
We're using it in a couple of places to see if we even performed the attempt,
so we need to make sure it's initialized.
Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack aad62fc

@rustyrussell rustyrussell marked this pull request as ready for review July 23, 2020 00:44
@rustyrussell rustyrussell merged commit 6ada56c into master Jul 23, 2020
@rustyrussell rustyrussell deleted the fixup-0.9.0 branch August 15, 2022 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment