Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTLC error cleanup #3472

Merged
merged 16 commits into from
Feb 25, 2020
Merged

Conversation

rustyrussell
Copy link
Contributor

Our HTLC error paths are confusing: channeld can detect an incoming HTLC error because it tries to decrypt the onion. This work is done again anyway by lightningd. In many places we have both an onionreply (if we're forwarding an error from a peer) and an error code, which we try to convert in a giant demuxer which is incomplete and growing in complexity.

By the end of this series, we have HTLC errors as follows:

  • incoming htlcs: badonion if non-decodable, otherwise failonion ready to send to prev peer.
  • outgoing htlcs: raw failmsg if we or channeld generated it, failonion if we got it from peer.

If we forward an outgoing htlc error back to an incoming, we convert the failmsg to an onion (this requires the incoming HTLC secret), and the onion becomes the incomijng htlc failonion.

This is, frankly, much clearer that what we had before, though it's a painful transition in several ways.

@rustyrussell rustyrussell added this to the 0.8.1 milestone Jan 30, 2020
@rustyrussell rustyrussell force-pushed the htlc-error-cleanup branch 3 times, most recently from a6551a0 to e89f6d5 Compare February 6, 2020 04:14
@rustyrussell rustyrussell modified the milestones: 0.8.1, 0.8.2 Feb 11, 2020
Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice cleanup @rustyrussell 👍 sorry it took so long to review.

  • cleanup: It is very unfortunate that we call out to gossipd every time we
    forward a payment. I don't really have any numbers as to how much time we
    are spending on these calls, but I assume it is quite expensive. We should
    instead cache these details in lightningd and push updates if anything
    changes.
  • I'm really loving how 51ff523 simplifies channeld 👍

The only real issue is that the migration SQL statement is not portable (x'2002' is not valid syntax for postgres) so we need to find an alternative formulation or create a rewrite rule for hex-encoded fields. I'll look into which one works better and report back.


/* This stays around even if we fail it immediately: it *is*
* part of the current commitment. */
hin = new_htlc_in(channel, channel, added->id, added->amount,
added->cltv_expiry, &added->payment_hash,
shared_secret, added->onion_routing_packet);
failcode ? NULL: &shared_secret,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
failcode ? NULL: &shared_secret,
failcode ? NULL : &shared_secret,

wallet/db.c Outdated
/* For outgoing HTLCs, we now keep a localmsg instead of a failcode.
* Turn anything in transition into a WIRE_TEMPORARY_NODE_FAILURE. */
{SQL("ALTER TABLE channel_htlcs ADD localfailmsg BLOB;"), NULL},
{SQL("UPDATE channel_htlcs SET localfailmsg=x'2002' WHERE malformed_onion != 0 AND direction = 1;"), NULL},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be portable, checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was deeply unclear to me how to set a BLOB.

Comment on lines 216 to 230
/* FIXME: Allow hook to return its own failmsg directly? */
if (val == WIRE_TEMPORARY_NODE_FAILURE)
return towire_temporary_node_failure(ctx);
if (val != WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS)
log_broken(hin->key.channel->log,
"invoice_payment hook returned failcode %u,"
" changing to incorrect_or_unknown_payment_details",
val);

return failmsg_incorrect_or_unknown(ctx, hin);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look reasonable and cover all the plugins I'm aware of 👍

Seems a bit strange though to structure it like this if (positive) {} if (negative) {} positive

@rustyrussell
Copy link
Contributor Author

* We should
  instead cache these details in `lightningd` and push updates if anything
  changes.

I have a fix for that, which works even better. But the series was getting long enough!

This is clearer, especially when we also deal with raw not-yet-onion-wrapped
failure messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. forward_htlc sets hout to NULL.
2. forward_htlc passes &hout to send_htlc_out.
3. forward_htlc checks the failcode and frees(NULL) and sets hout to NULL
   (again).  This in fact covers every failcode which send_htlc_out returns.

We should ensure send_htlc_out sets *houtp to NULL on failure; in fact,
both callers pass houtp, so we can make it unconditional.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For incoming htlcs, we need failure details in case we need to
re-xmit them.  But for outgoing htlcs, lightningd is telling us it
already knows they've failed, so we just need to flag them failed
and don't need the details.

Internally, we set the ->fail to a dummy non-NULL value; this is
cleaned up next.

This matters for the next patch, which moves onion handling into
lightningd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Turn it into temporary node failure: this only happens if we restart
with a failed htlc in, but it's clearer and more robust to handle it
generically.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I hadn't realized that lightningd asks gossipd every time we forward
a payment.  But I'm going to abuse it here to get the latest channel_update,
otherwise (as lightningd takes over error message generation) lightningd
needs to do an async request at various painful points.

So have gossipd tell us the lastest update (stripped so compatible with
the strange in-onion-error format).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of making it ourselves, lightningd does it.  Now we only have
two cases of failed htlcs: completely malformed (BADONION), and with
an already-wrapped onion reply to send.

This makes channeld's job much simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's only called from the db code, and failing_channel is always NULL.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're going to change our internal structure next, so this is preparation.
We populate existing errors with temporary node failures, for simplicity.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We'll use this in the next patch for when we need to create errors to
send back to lightningd; most commonly when the channel doesn't have
capacity for the HTLC.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…g HTLCs

At the moment, we store e.g. WIRE_TEMPORARY_CHANNEL_FAILURE, and then
lightningd has a large demux function which turns that into the correct
error message.

Such an enum demuxer is an anti-pattern.

Instead, store the message directly for output HTLCs; channeld now
sends us an error message rather than an error code.

For input HTLCs we will still need the failure code if the onion was
bad (since we need to prompt channeld to send a completely different
message than normal), though we can (and will!) eliminate its use in
non-BADONION failure cases.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We tell channeld that an htlc is bad by sending it a 'struct
failed_htlc'.  This usually contains an onionreply to forward, but for
the case where the onion itself was bad, it contains a failure code
instead.

This makes the "send a failed_htlc for a bad onion" a completely
separate code path, then we can work on removing failcodes from the
other path.

In several places 'failcode' is now changed to 'badonion' to reflect
that it can only be a BADONION failcode.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
… failcode.

Unfortunately the invoice_payment_hook can give us a failcode, so I simply
restrict it to the two sensible ones.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-deprecated: plugins: invoice_payment_hook "failure_code" only handles simple cases now, use "failure_message".
…TLC errors (unless BADONION).

This cleans up the "local failure" callers for incoming HTLCs to hand
an onionreply instead of making us generate it from the code inside
make_failmsg.

(The db path still needs make_failmsg, so that's next).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-deprecated: Plugins: htlc_accepted_hook "failure_code" only handles simple cases now, use "failure_message".
…all onionreplyies.

This completes the conversion; any in-flight HTLC failures get turned into temporary_node_failures.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That's all it's used for now.

And remove unreferenced failoutchannel.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As promised in the Changelog when we converted from failcodes to messages
internally.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell
Copy link
Contributor Author

Trivial rebase to fold fixup

@cdecker
Copy link
Member

cdecker commented Feb 24, 2020

Swiftly kicking Travis so it knows it reported a wrong result :-)

ACK 4dbeac8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants