Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Stuck Message Management #1469

Closed
dirkmc opened this issue May 26, 2023 · 6 comments
Closed

Improve Stuck Message Management #1469

dirkmc opened this issue May 26, 2023 · 6 comments

Comments

@dirkmc
Copy link
Contributor

dirkmc commented May 26, 2023

Background

  • SPs send messages on chain for Publish Storage Deals, Prove Commit etc.
  • Conceptually, the messages go into a local Message Pool while waiting to be picked up by a miner and added to a block.
  • When a message is sent, lotus attempts to estimate pricing parameters for the message to ensure it will be attractive to a miner.
  • One of the inputs for the estimate is the network base fee.
  • The base fee fluctuates over time so sometimes the estimate is incorrect.
  • If the estimate is too low the message can get stuck in the Message Pool (ie it is not attractive enough to any miner so it is never added to a block).
  • When one message gets stuck, it blocks all the messages behind it in the queue.

Message Pool UX

Lotus CLI

Lotus has a CLI tool that can be used to see messages in the message pool:

lotus mpool pending --local

Boost UI

Boost has a Message Pool page in the Web UI that shows a list of messages from the SP that are pending (they haven't been added to a block).
Screenshot 2023-05-26 at 9 47 55 AM

Replacing a message

Lotus has a CLI command to replace messages that are stuck in the message pool - the message contents remain the same but the user can change the pricing parameters to make the message more attractive to miners, and therefore more likely to get added to a block.

lotus mpool replace --gas-feecap <feecap> --gas-premium <premium> <from> <nonce>

Proposals

1. Surface Stuck Messages

Currently the SP only finds out that messages are stuck in the Message Pool if they

  • notice messages are taking a long time to go out
  • go and manually check the message pool in lotus CLI or the boost web UI

Solutions

1. Monitor the Message Pool

Boost should

  • monitor the Message Pool
  • if any messages are in the pool for a long time display an alert at the top of the UI
  • the time should be configurable (eg 20 epochs)

2. Storage Deal List warning

✅ Completed in #1480

Boost should display a warning message in the Storage Deal List if the Publish Storage Deals message gets stuck for more than the expected confirmation time.

Currently boost just shows "Awaiting Publish Confirmation"
Screenshot 2023-05-26 at 10 01 18 AM

Boost should also show

  • The number of epochs that have elapsed
  • The number of epochs that it is expecting to wait (10 by default)
  • A warning if the elapsed epochs is greater than expected

2. Improve Message Pool UX

Solutions

1. Additional Message Information

On the Message Pool page, in the information displayed for each message there should also be

  • the epoch and wall clock time at which the message was sent
  • the number of epochs and time that has elapsed since the message was sent

Fixes:

  • Show Gas Limit (currently blank)
  • Show Params (currently not working)

2. Replace Message Widget

Next to each message there should be a button called "Replace Message" that shows a widget to allow the SP to replace a stuck message.

Message Pool Replace

3. Base Fee history

At the top of the page there should be a small graph showing base fee history over the last 24 hours:

@TippyFlitsUK
Copy link

This would be extremely welcome @dirkmc!!

@dirkmc
Copy link
Contributor Author

dirkmc commented May 26, 2023

@TippyFlitsUK is there anything above that could be changed / added to make it more useful for you?

@f8-ptrk
Copy link
Contributor

f8-ptrk commented May 27, 2023

we do not need the feature and wish to be able to turn this off if implemented. thanks

the constant mpool watch most likely is too expensive for what we gain in the end and the replace command button is almost impossible to be implemented "right". besides the button to replace this most likely brings no benefit to users.

educating how to set proper message fee limits in lotus/boost/... is way more important than UX improvements on crafting replacement messages. replacing messages should be painful to do, its nothing that should be casually done anyways - its a hint at bad config params regarding fees.

@MetaWaveInfo
Copy link

we do not need the feature and wish to be able to turn this off if implemented. thanks

the constant mpool watch most likely is too expensive for what we gain in the end and the replace command button is almost impossible to be implemented "right". besides the button to replace this most likely brings no benefit to users.

educating how to set proper message fee limits in lotus/boost/... is way more important than UX improvements on crafting replacement messages. replacing messages should be painful to do, its nothing that should be casually done anyways - its a hint at bad config params regarding fees.

Thanks f8, but I think a better UX can bring people a clear roadmap for setting gas fees. Not all operators are as clever as you did in the network.

@f8-ptrk
Copy link
Contributor

f8-ptrk commented May 28, 2023

Hey Meta,

i honestly doubt it makes sense to build a UX that allows the user to just pay to cover up configuration shortcomings. i think it's the wrong way to go, besides the implementation (if any kind of loop is involved) possibly being "dangerous" (we have seen miners pay tens of thousands of $$$ for single messages due to msg replace UX going off the rails under unexpected network conditions).

as soon as the user will be in a position to use a button like that as a tool he doesn't need the button. compared to the command line replace commands a GUI version to actually make this a tool rather than a "pay for config fails" solution will be so clunky that it is hardly useful.

but i might be wrong. as long as this is always a user triggered onetime event - why not, sure. but any solution beyond that, every inch of automation poisons the code base, i (and most likely a lot of other people) will not want that code base in an essential piece of software.

we do not need the feature and wish to be able to turn this off if implemented. thanks

is all i am asking for here.

@LexLuthr LexLuthr added this to the Boost v2 stability milestone Oct 24, 2023
@nonsense nonsense removed this from the Boost v2 stability milestone Oct 26, 2023
@LexLuthr
Copy link
Collaborator

LexLuthr commented Dec 6, 2023

I am closing this as we have completed all the improvements except the automatic/manual message replacement of the stuck message. The remaining work about message replacement is no longer viable for the engineering efforts required to implement it. We are no longer seeing stuck messages frequently.

@LexLuthr LexLuthr closed this as completed Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

6 participants