Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate PaymentSent events for MPP payments #1053

Merged

Conversation

valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Aug 20, 2021

TODOs

- [x] dedup PaymentFailed events

  • add serialization for pending_outbound_htlc's MPP IDs
    - [x] test that PaymentFailed events are deduplicated

Also, track whether all paths have failed in PaymentFailed

@codecov
Copy link

codecov bot commented Aug 20, 2021

Codecov Report

Merging #1053 (c828ff4) into main (24065c8) will increase coverage by 0.64%.
The diff coverage is 95.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1053      +/-   ##
==========================================
+ Coverage   91.00%   91.64%   +0.64%     
==========================================
  Files          65       65              
  Lines       33695    36410    +2715     
==========================================
+ Hits        30663    33367    +2704     
- Misses       3032     3043      +11     
Impacted Files Coverage Δ
lightning/src/routing/network_graph.rs 92.12% <ø> (ø)
lightning/src/util/events.rs 27.48% <25.00%> (-0.06%) ⬇️
lightning/src/util/ser.rs 92.03% <92.85%> (+0.34%) ⬆️
lightning/src/ln/functional_test_utils.rs 95.07% <95.65%> (-0.06%) ⬇️
lightning/src/ln/channelmanager.rs 89.56% <96.52%> (+3.66%) ⬆️
lightning/src/ln/channel.rs 92.36% <100.00%> (+3.70%) ⬆️
lightning/src/ln/functional_tests.rs 97.54% <100.00%> (+0.01%) ⬆️
lightning/src/ln/onion_route_tests.rs 97.09% <100.00%> (+<0.01%) ⬆️
lightning/src/ln/onion_utils.rs 94.91% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24065c8...c828ff4. Read the comment docs.

@TheBlueMatt TheBlueMatt added this to the 0.0.101 milestone Aug 23, 2021
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@valentinewallace valentinewallace force-pushed the 2021-08-dedup-payment-sent branch 4 times, most recently from c29ff9f to cbd2317 Compare September 2, 2021 18:15
@valentinewallace valentinewallace marked this pull request as ready for review September 3, 2021 15:49
lightning/src/ln/functional_test_utils.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@valentinewallace valentinewallace force-pushed the 2021-08-dedup-payment-sent branch 2 times, most recently from 388a0e9 to 500f8e9 Compare September 13, 2021 15:34
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator

Should this get a new test for the new event?

@valentinewallace
Copy link
Contributor Author

valentinewallace commented Sep 16, 2021

Should this get a new test for the new event?

mpp_failure() test in functional_tests tests the new event!

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was quite distracted when I looked at this yesterday, sorry about that :(.

In any case, it looks good, one note just a rebase change and otherwise minor gripes.

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took largely a high-level pass. Looking pretty good.

lightning/src/util/events.rs Outdated Show resolved Hide resolved
/// The hash which was given to ChannelManager::send_payment.
payment_hash: PaymentHash,
/// The path that failed.
payment_path: Vec<RouteHop>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a PR that adds the path to PaymentFailed for scoring purpose. It will also include the index of the failed hop. Given that, do you think we need a separate event? Could we add a marker for multi-path payments if needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. Matt pointed out in this comment that there could be value in users knowing whether to retry a portion of a payment vs. retrying the entire payment. @TheBlueMatt I tend to agree that this might make MPPFragmentFailed a bit redundant, thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could break out the relevant commits into their own PR if it helps. I think from a retry perspective, though, we'd need to know the part amount unless InvoicePayer tracks that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this one offline, but we just need to make sure that we expose to users somehow the distinction between a payment failing and an MPP fragment failing, as they have significant implication for what the user does in response. We really don't want users to think that a PaymentFailed event implies the payment failed, when it was only one mpp fragment that failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drafted #1077

lightning/src/ln/channelmanager.rs Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/util/events.rs Outdated Show resolved Hide resolved
lightning/src/util/events.rs Outdated Show resolved Hide resolved
if let Some(mut sessions) = outbounds.remove(&mpp_id) {
if sessions.remove(&session_priv_bytes) {
self.pending_events.lock().unwrap().push(
events::Event::PaymentSent { payment_preimage }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, an MPP will only result in one PaymentSent event -- let me know if I misunderstood. With that in mind, I was thinking of adding the successful path to PaymentSent for scoring. That way a channel's score could be improved if successfully routing through it. But if there is only one PaymentSent event for MPP, then such improvements will only be possible for one path. Would it be possible to include all the paths for PaymentSent event? Or would that require waiting for all paths to succeed before sending such an event?

Copy link
Contributor Author

@valentinewallace valentinewallace Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, an MPP will only result in one PaymentSent event

Yup! We generate it as soon as one path of the MPP succeeds (i.e., we do not wait for all paths to succeed). Reasoning: as soon as we have the preimage, the rest of the paths are irrevocably "solved."

Would it be possible to include all the paths for PaymentSent event?

Hmm so we could iiuc, but I don't think it makes sense to feed paths to the scorer until they actually succeed. Not sure the solution here, will have to think on it a bit. Ideally I guess we'd generate additional events after the first PaymentSent, that contain the path that succeeded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as the receipient is "honest" (or at least wants their money and isn't buggy), we know all the paths succeeded when we get the first preimage here, so I think it would be "correct", but code-wise its probably quite tricky - the actual HTLCSource is stored away in the HTLC set of some other channel so not readily available here. Maybe we'll need a separate event here to indicate further paths that succeeded?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... yeah, that may be a good argument for separating path succeeding into its own event. Alternatively, ChannelManager could be parameterized by the scorer, which it would call as needed instead of using events. Though that could introduce the possibility of reentrancy. Maybe not the best idea and would be inconsistent with network updates.

Related to the other discussion, I guess for multi-path payments the user (or InvoicePayer) ultimately decides if a payment has failed when it no longer wants to retry failed paths (and hasn't received the preimage). Likewise for single-path payments. Either the user has the preimage (success) or doesn't (possible failure) and needs to decided whether to give up (i.e., no longer retry or wait).

With that in mind, maybe having a failed or successful payment at the event level isn't the right abstraction? Instead, there's PreimageReceived, SuccessfulPaymentPath, and FailedPaymentPath. Maybe the last two could be named in terms of HTLCs instead? Sorta thinking out loud on this, so let me know if I'm off on something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... yeah, that may be a good argument for separating path succeeding into its own event.

I think that makes sense. It's a bit awkward to do something different for success and failure but the semantics are pretty different and Val just removed the separate failure event from path event. For path success the probability increases can happen later, we're not trying to make sure we get them in before retrying a new path so I think it's fine.

Alternatively, ChannelManager could be parameterized by the scorer, which it would call as needed instead of using events. Though that could introduce the possibility of reentrancy. Maybe not the best idea and would be inconsistent with network updates.

Yea, I don't think the complexity of the reentrancy is worth it there.

With that in mind, maybe having a failed or successful payment at the event level isn't the right abstraction? Instead, there's PreimageReceived, SuccessfulPaymentPath, and FailedPaymentPath. Maybe the last two could be named in terms of HTLCs instead? Sorta thinking out loud on this, so let me know if I'm off on something.

I think that kinda makes sense. I just suggested renaming PaymentFailed to HTLC/Path/PartFailed. For preimage receipt it does mean you need to treat the payment as sent, though, so I'm not sure as really need to swap that one.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, A few nits and minor things here and there but I'm basically happy.

lightning/src/util/ser.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Show resolved Hide resolved
lightning/src/util/events.rs Outdated Show resolved Hide resolved
read_tlv_fields!(reader, {
(0, payment_hash, required),
(1, network_update, ignorable),
(2, rejected_by_dest, required),
(3, all_paths_failed, required),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required implies that we'll fail to read old versions that are missing the field. I think we should just initialize it to Some(true) and unwrap it below, keeping the broken behavior for events that were pending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought if all_paths_failed was an option fieldty in read_tlv_fields, it had to be an option in write_tlv_fields also. Updating mental model since they def can be asymmetric.

Sorry, I finally drew out the 2x2 tlv ser matrix (type odd/even is for old clients reading new, fieldty option/required is for new clients reading old)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, everything is written the same whether its an option or not, option just declares whether we require it to be present or not, and odd/even declares whether old versions that don't understand it will fail or ignore.

/// For both single-path and multi-path payments, this is set if all paths of the payment have
/// failed. This will be set to false if (1) this is an MPP payment and (2) other parts of the
/// larger MPP payment were still in flight when this event was generated.
all_paths_failed: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to rename PaymentFailed after this IMO - maybe PaymentPartFailed or HTLCFailed, because it definitely no longer means the whole payment failed. I'm ok with pushing this to a followup if you just wanna get this merged without blowing up the diff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with renaming PaymentFailed. I do like using HTLC in the name as mentioned in another comment. Presumably these won't be deduplicated?

However, I'm still uncertain about adding the all_paths_failed field. I feel like I might be missing something, though. How would the retrier make use of it? If false, it needs to retry for the corresponding part unless it has received a PaymentSent. If true, doesn't it need to do the same thing? If not, is there any overlap with rejected_by_dest?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm still uncertain about adding the all_paths_failed field. I feel like I might be missing something, though. How would the retrier make use of it? If false, it needs to retry for the corresponding part unless it has received a PaymentSent. If true, doesn't it need to do the same thing?

I believe we'll need a different call in ChannelManager to retry a failed MPP path, as we won't be able to just blindly call send_payment again to keep state consistent. Thus, we'll need to decide which function we call based on that bool.

More generally, the bool isn't just for the retrier - its also to tell the user if a payment has failed, or if a path has failed. Until you either see all_paths_failed or PaymentSent, you cannot consider a payment resolved.

If not, is there any overlap with rejected_by_dest?

They're two very different things - whether a final node rejected the payment or not, we can't consider a payment failed unless we get all the paths failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'll need a different call in ChannelManager to retry a failed MPP path, as we won't be able to just blindly call send_payment again to keep state consistent. Thus, we'll need to decide which function we call based on that bool.

More generally, the bool isn't just for the retrier - its also to tell the user if a payment has failed, or if a path has failed. Until you either see all_paths_failed or PaymentSent, you cannot consider a payment resolved.

Ah, that's right. For the situation where we retry some paths of an MPP but they all ultimately fail, would the retrier then retry the full payment? Seems it may complicate the retrier as either there are two levels of retries (paths and full payment) or payments that are MPP need to be treated differently from non-MPP. But if we want to avoid the two-levels of retries, we'd also need to know if the payment was MPP. But I don't think the bool will tell us that?

I had hoped this could be generalized in some way where we are always just retrying paths regardless of whether it's MPP or not. But I guess that would require complicating the state tracking in ChannelManager, possibly such that the user would need to say when they have given up (and thus considered the payment failed). 🙁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, come to think of it the retrier will know how many paths it gets back from the router, so this should be sufficient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's right. For the situation where we retry some paths of an MPP but they all ultimately fail, would the retrier then retry the full payment?

Yes, I believe you generally would, as long as you haven't run out of time/retries.

Seems it may complicate the retrier as either there are two levels of retries (paths and full payment) or payments that are MPP need to be treated differently from non-MPP. But if we want to avoid the two-levels of retries, we'd also need to know if the payment was MPP. But I don't think the bool will tell us that?

Yes, the complexity is somewhat unavoidable, I think. Ultimately the channelmanager needs to know, when the retry HTLC(s) are sent, whether its an MPP retry or not. If the full MPP has failed, the channelmanager will (have to) forget about the full payment, so the retrier needs to inform the channelmanager of this. The way I was thinking about it was that the retrier would:

  • if all_paths_failed is not set, call channelmanager.retry_mpp_part(new_route) and the channelmanager would look up the pending mpp data, generate the appropriate retry for only the given path, and send the new part(s),
  • if the above call fails with an Err(NoPendingMPPFound) or all_paths_failed is set, the payment retrier will generate a route for the full payment value and call send_payment.

I'm not really sure how we could handle this better on the channelmanager end, sadly.

Actually, come to think of it the retrier will know how many paths it gets back from the router, so this should be sufficient.

Does the retrier need to store the pending route? I figured it would not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure how we could handle this better on the channelmanager end, sadly.

The only alternative, really, would be to change the API to allow retry_mpp_part to reconstruct the state if you provide the full payment value, but I'm not sure that's ideal - if we get a PaymentFailed that raced the first PaymentFailed and is pending in the queue, with the last node having failed the payment, we shouldn't retry, and this API change would result in us retrying, but only part of the MPP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer a quick follow-up for renaming PaymentFailed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the retrier need to store the pending route? I figured it would not.

Correct. I was thinking it would see that get_route returned multiple paths and note that (as a bool) for later. But I guess that isn't necessary if we allow retrying the full payment as well.

For counting retries, were you thinking the number of attempts across all paths and full payments would be counted together? Or would each path get its own attempts? Was thinking the former for simplicity.

As another related concern, what if get_route returns multiple paths when attempting to retry a part? Seems like this might be desirable, but retry_mpp_part would need to be able to handle this. Maybe that is what you meant by "part(s)".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For counting retries, were you thinking the number of attempts across all paths and full payments would be counted together? Or would each path get its own attempts? Was thinking the former for simplicity.

Dunno, should count something, I'll leave it up to the author :p.

As another related concern, what if get_route returns multiple paths when attempting to retry a part? Seems like this might be desirable, but retry_mpp_part would need to be able to handle this. Maybe that is what you meant by "part(s)".

Yes, ChannelManager should be able to handle this seamlessly.

@valentinewallace
Copy link
Contributor Author

Think I've addressed outstanding feedback

lightning/src/ln/functional_test_utils.rs Outdated Show resolved Hide resolved
lightning/src/ln/functional_test_utils.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification, are we still sending PaymentFailed events for each path? If so, could you update the PR title and description?

lightning/src/ln/channelmanager.rs Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@valentinewallace valentinewallace changed the title Deduplicate PaymentSent/PaymentFailed events for MPP payments Deduplicate PaymentSent events for MPP payments Sep 17, 2021
We'll use this to correlate MPP shards in upcoming commits
by removing all pending outbound payments associated with the same
MPP payment after the preimage is received
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c828ff4. LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants